Computer vision is one facet of artificial intelligence (AI) that is all about helping computer systems understand the visual world and distinguish objects within it.
Deep learning (DL) models are developed, which then need to be trained to recognize faces, objects, and the many different categories of digital images.
Computer vision has long suffered from severe. It has taken the technology a long time to recognize simple differences, such as those between a cat and a dog. Yet, a young baby can do that easily.
Similarly, there are many hurdles still to be overcome before self-driven vehicles ever become a reality. Although fitted with multiple cameras, sensors, and other technologies, they have been known to run into the back of white vans in bright sunlight, as they failed to recognize that an obstacle was in their path. That said, the technology is advancing rapidly.
See below some of the top trends companies and IT teams are seeing in the computer vision market:
Early training of deep learning models was typically accomplished using real examples. The process can be accelerated by using synthetic examples.
“More and more computer vision models are trained on synthetic rather than real data,” said Steve Gu, co-founder and CEO, afi.
“Models that are trained 100% on synthetic and simulated scenes can now work well in real settings without ever seeing a real example.”
This is being driven by two key factors. On one hand, advanced simulation tools and graphics processing units (GPUs) can enable high-fidelity rendering within seconds and even milliseconds, making it possible to generate a large volume of quality training data with a high level of precision and detail. Further, there is an increasing amount of research on bridging the gap between the synthetic world and the real world, making it easier for models trained inside simulation to adapt and to generalize to real application scenarios.
“Instead of spending a lot of money on intensive human labor to annotate data, it is time to investigate simulation tools and synthetic data for computer vision model training,” Gu said.
2. Holistic Perspective
Holger Kuehnle, executive creative director at artifactnoted that multiple networks of cameras, combined with sophisticated computer vision algorithms, are being harnessed to create holistic perspectives of what is going on in an environment or a process—such as a factory floor, for example.
Computer vision can be used — in combination with additional sensors, but increasingly also by itself — to create digital twins of environments or abstractions of what the space is like and what is happening in it, allowing for interesting automation scenarios.
For example, a camera could sense that something has spilled on the factory floor and a cleaning robot could be dispatched automatically to remove the spill. Currently, these scenarios are mainly driven by industrial automation, where lots of legacy production hardware is being augmented with sensors and cameras, so they can keep up with the level of data collection needed for sophisticated AI and machine learning (ML) capabilities.
“Computer vision will be able to sense whether doors and windows were left open, if the washer or dryer was done, or if the stove was indeed turned off,” Kuehnle said.
Computer vision technology has generally relied upon a lot of hand holding by humans. But it is gradually emerging from that stage to the point where self-supervision is becoming more commonplace, said Bruce King, data science technologist, Seagate Technologies.
“Self-supervised machine learning approaches applied to computer vision problems greatly reduce the amount of expensive, human annotated images needed to train models,” King said.
“Self-supervised modeling technologies reduce image annotation costs and enable more sophisticated models.”
4. Foundational Models
An outcrop of the self-supervision trend is the emergence of computer vision foundation models.
King with Seagate noted that these foundation models are trained on large numbers of unlabeled images using self-supervised methods.
These models are then fine-tuned for a wide variety of computer vision tasks on small numbers of training images.
5. Multiple Data Types
The number of data types that could be used in computer vision has been somewhat limited. That has long held the field back.
Hence, machine learning is now being successfully applied to a great many more data types.
“Computer vision machine learning techniques that combine images with their text descriptions or captions facilitate exciting new capabilities, such as ML systems that can generate new, unique images from text or the reverse, generate captions and descriptions from images,” said King with Seagate.