Computer Vision Applications for the Modernized Business
In this article, one of our Digital Innovation consultants gives an introduction to the technology and provides some real-world applications to illustrate how computer vision can improve your organization’s business processes.
For many years, it was difficult to imagine computer vision having many applications outside of the few industries which popularized it. Though computer vision (and artificial intelligence as a whole) has been around in some form or another for several decades at this point, it saw use primarily in robotics and manufacturing, and the occasional attempt at world domination (alright, that last part wasn’t true, but I had to go there at some point in this blog). However, due to huge leaps forward in the field of computer vision, as well as the rise of affordable artificial intelligence (AI) frameworks, tools, and cloud service offerings, computer vision is more accessible for your organization than ever before. The increased sophistication of these tools has allowed what used to require staggeringly large amounts of time and data to accomplish, can now be done in a fraction of the time.
What is Computer Vision?
In its simplest form, computer vision (CV) is a technology that allows computers the ability to understand what they are seeing in an image or video. Imagine you put a camera in your warehouse, it can identify the items being picked and put away by your employees then notifies you if they’re picking the right items by comparing the items to their barcodes. Or, imagine your security cameras can read license plates. While this may sound like something straight out of science fiction novel, there are many real-world applications for computer vision in the modern organization.
How Computer Vision Works
In general, CV uses machine learning techniques to analyze visual images, video, or other visual media in order to make conclusions or predictions about the data being processed. The basis of the technology is a technique called pattern recognition, which uses algorithms to look for specific visual patterns in an image. For example, an algorithm might look for an edge in an image, indicating the presence of an object. Another algorithm might look for a specific color to identify an object, or it might look for a specific pattern to identify a face. There are many different CV techniques that can be used to extract information from images. Two of the most common types of CV you’ll see are object detection and image recognition/classification.
Object detection is the foundation for many computer vision techniques. At a high-level, object detection is exactly what it sounds like: the process of finding or “detecting” objects in images/videos. Though there are many different varieties of object detection, the bulk of these techniques rely on a machine learning algorithm to train a model to recognize desired objects in an image/video. A model is essentially a specialized program that is fed a data set (in our case, a very large number of images) in an effort to find patterns to be used to make an informed decision.
For example, say you wanted to train a model to recognize a dog in images and videos. In order for the model to produce accurate results, you’d want to feed it “positive” results (i.e. images containing dogs) and “negative” results (images that don’t contain dogs). Over time, the model will begin to recognize patterns in the images containing the desired object (dogs) and deliver accurate results when used to detect those objects in an image/video. Granted, this is perhaps an oversimplification of the process, but that is the general idea. The accuracy of the results you get depend heavily on the size of the data set the model has ingested for training (the more, the better). As the data set grows, the model has the potential to become significantly more accurate.
Image Classification/Recognition is the process of analyzing an image in order to extract meaningful information about its contents through machine learning. The goal of image classification is to determine a “category” that accurately describes the contents of an image. This can be used to find specific objects, or to figure out what the image is about.
Whereas object detection is used to detect the presence of a particular object (or objects) in an image, image classification is the process of then assigning that image to one or more categories based on its contents. Two terms often used when discussing image classification are single-label and multi-label. You can think of labels as being synonymous with categories. The goal is to be able to identify mutually exclusive characteristics existing within an image and assigning it to one (or more) categories respectively.
Computer Vision Application for the Food Service Industry
Now that we’ve gone over some basic definitions and given a brief introduction to computer vision, let’s dive into a real-world application where we used computer vision to improve operational efficiency for one of our clients in the food service industry.
In 2018 Smartbridge developed an all-in-one solution aimed at drastically simplifying the cooking processes for one of our clients that owns a quick-service restaurant brand specializing in Latin-Caribbean cuisine. In a nutshell, the solution was a wall-mounted, tablet application that would instruct the grill operators when to cook food, and how much to cook based on forecasted sales data for the current day and time. The application was designed in a way that it involved very little input from the grill operator in order to function. However, the model collects data for reporting and traceability purposes in addition to confirming core actions within the application. On occasion, the application involved some minor user interactions.
As the application continued to be a core part of daily operations across over 150 restaurants over the past few years, reports of sporadic hardware failures began to become more common. This was due to the grill operators occasionally reaching out to interact with the application while in the midst of their daily duties, causing the tablets to accumulate grime over time that would require cleaning. The wear and tear of removing the tablet from its wall mount to perform this cleaning was the root cause of the hardware failures.
For the reasons mentioned above, eliminating all forms of user interaction within the application wasn’t a feasible idea. But, what if there was a way to still receive input from the grill operators without them having to make physical contact with the application? This is where computer vision comes in.
The Smartbridge Solution
Due to the small number of workflows within the app requiring user input, we were able to determine a set of physical actions in the form of simple hand-gestures the grill operator could perform to indicate that they wanted the application to perform specific actions. Using the camera on the tablet to supply image and video, Smartbridge developed a solution that would allow the application to detect frames in video where the grill operator’s hands were visible at specific points where user input would normally be required. Then through image recognition and classification, a model was trained to identify the specific hand gestures that corresponded to the action the grill operator decided to take at that specific moment in time.
For example, the grill operator would use the tablet’s camera to showcase a thumbs up for a certain action and computer vision would recognize this hand gesture to perform the specific action. Making a peace sign would correlate to a different action and so forth.
This way, the grill operator could interact with the application without ever having to make physical contact with the device, thereby reducing the need for cleaning and, by extension, hardware failures. Pretty cool, right?
What Type of Task Is a Good Candidate for Computer Vision?
If that last example piqued your interest, you might be thinking through which aspects of your business would “classify” as a good candidate for leveraging CV (bad pun? bad pun.) The good news is the possibilities are quite literally, limitless!
The fact of the matter is even with clearly defined processes for nearly every task in your organization, preventing human error and striving for the greatest levels of efficiency is going to be an ongoing challenge. And while CV applications are not infallible, the ability to greatly increase the accuracy of a model over time through training on exceedingly larger data sets gives it the edge. From fraud detection, to personalized ads, and more recently, autonomous driving (with the increase in popularity of self-driving electric vehicles), the applications of CV are far reaching and diverse.
Computer vision and artificial intelligence are technologies that have existed for many years, but it’s only as of recently they’ve become accessible and affordable for modern organizations. In this post, we’ve given a brief introduction to the technology, and a recent real-world example of how computer vision can improve your organization’s business processes. Hopefully, I’ve inspired you to think outside the box about how you can embrace CV to achieve higher levels of operational efficiency in your organization, and we’re here to help.
There’s more to explore at Smartbridge.com!
Sign up to be notified when we publish articles, news, videos and more!
Other ways to