Three Mistakes Computer Vision Developers Make
“Mistakes were made. Pants were pooped.”
The last decade has seen a complete transformation in the field of computer vision. Thanks to recent machine learning advancements, what was once a niche academic specialization has evolved into a feature of daily life in the 2020s. Consequently, demand for computer vision developers has increased considerably. While some build their skill set through formal degree programs, many neophyte developers learn from online courses and tutorials. All these methods are great ways to learn up-to-date best practices. Equally important however (and often overlooked) is learning how not to do computer vision. Identifying common mistakes can help boost your skills and help ensure your systems perform just as well in the field as they do in the lab.
Here are three mistakes commonly made by developers, both early career and experienced:
1. Skipping the fundamentals
Computer vision as a field has been around for over fifty years. Yet it’s only in the last ten years that it has became commercially viable through the rise of convolutional neural networks (and more recently, transformers). Naturally some developers are quick to use these machine learning techniques to solve every problem they encounter. This is particularly true of recent graduates who are still developing the intuition to differentiate cognition problems from computational ones.
Outside the realm of learning algorithms there is a rich history of classical computer vision that has been refined over decades to great effect. Examples of these classical techniques include Hough transforms for shape detection, multi-view geometry, and 2D feature matching. Besides providing excellent results, these can often incur much lower computational expense than learning algorithms (and thus may be more suitable to mobile applications). They’re also readily available for free through open source packages such as OpenCV.
2. Using an insufficiently large and/or diverse dataset
All learning algorithms require data for training and testing. Selecting and curating this data is an entire field unto itself, but the general rule is to provide enough examples to completely cover the domain of the input. In other words, the dataset used to train the network should closely match what it sees out in the real world.
Going small may be tempting, as a small dataset can train quickly and give apparently good results. Once it gets into deployment you’ll quickly learn your network is overfit and generalizes poorly. One recent client spent months building his system around a single example (!) and was in for quite a surprise when they ran it with customer data for the first time.
As a general guideline, your dataset should contain at least an order of magnitude more examples than your network has parameters. It follows that smaller models with larger datasets usually beat complicated models with smaller datasets. This has been confirmed by numerous studies. It’s even been suggested that much of the rise of neural networks is due to improvements in data collection, not algorithms.
3. Taking the dataset at face value
In a typical project workflow, a subject matter expert acquires the dataset and hands it off to a machine learning developer to train a model. As most developers come from a software background, it’s rare to question where the data came from, or if it’s the most efficient representation to the problem. This may lead to training overly complex networks which could have been avoided by a) reducing the domain complexity or b) eliminating the problem altogether.
As an example, a research team was interested in glare removal from photos. After curating an extensive dataset of images with and without glare, they trained a network that produced reasonably good results. The resulting network required a large GPU to operate at real-time frame rates, making their mobile application expensive and power-hungry. They later discovered that an inexpensive polarizing filter attached to the camera achieved comparable results with zero computational overhead.
The lesson here is to not be afraid to question your data. Besides simply being accurate, does it address the question being posed? Is there a simpler way to formulate the problem? A few hours of inquiry might save weeks of development time.
Computer vision in last decade has experienced an unprecendented rennaisance. Although new techniques continue to be introduced every year, it’s important to take a step back to consider the bigger picture. While the mistakes here may sound obvious, it’s often these seemingly small details that are fundamental to the success of your next project.