Computer vision applications have a history of more than half a century. However, we have seen a significant breakthrough in the last five years. The reasons for this are the increase of processing power and its lowered prices, cheapening of the computing power, the spread of cloud technologies in the IT world, research on artificial intelligence and developments in deep learning algorithms.
Small or big, many technology companies take part in the development of computer vision. We will group them into two: Big Players and Outstanding Start-Up’s.
With Amazon Rekognition, computer vision services of Amazon, it is possible to detect objects, scenes and actions in images and videos, match faces and recognize people, identify happiness on face, age range, open / closed eyes, glasses, beard/moustache, create a timeline on how the actor’s emotions change in videos, determine the direction of movement such as making a general analysis from an athlete’s movements within the game, recognize text within the image. Inappropriate content detection, recognizing and classifying celebrities from images and videos or people from your own library, identifying street names or plate numbers in the image is also possible. With the help of video analysis services that use cameras located in public spaces, it is possible to identify missing persons, match them with the faces in the database and inform the authorities. Similarly, with a face recognition based application, identity recognition can be done and as a result of analyzing facial expressions it is possible to determine the general tendencies over different product lineups or design in stores. Moreover, in 2018, Amazon introduced AWS DeepLens, a programmable camera, to developers. By using the services mentioned above, deep learning models can be directly implemented on the camera therefore AWS DeepLens seems to offer new expansions for research and development teams on the area since it is compatible not only with Amazon SageMaker, but also with Google TensorFlow and (Berkeley AI Research) Caffe models.
It is impossible to separate computer vision from deep learning and artificial intelligence. With its cloud services called Cloud Vision, Google provides services such as image analysis, text extraction, tag detection, celebrity, face and handwriting recognition using pre-trained models. Configured with Convolutional Neural Networks (CNN), which is an artificial neural network model, AutoML Vision allows users to create their own custom image models. In fact, using a team of real human operators to check and correct automatic tagging done by algorithms is also among the services provided. Cloud Video Intelligence that uses the Recurrent Neural Network (RNN) infrastructure for processing not only images but also videos, provide services such as metadata search, contextual ad insertion, inappropriate content detection and video to text conversion. The ability to extract actionable information from video files seems to be a very strong part of Google services. With the release of Google Glass in 2013, Google entered augmented reality, perhaps a little early, and led to many discussions about the security of personal data, the use of the glasses and its usage in public space or traffic. Although technical features and launch date of the new version remain unclear, there are rumors that it will be mostly for corporate usage.
In 2010, Microsoft was perhaps one of the fastest entrants into computer vision with Kinect, an image processing device integrated to the game console Xbox. Microsoft introduced the 3rd generation of the console to the press at the beginning of 2019. This new model called Azure Kinect DK, which also has improved hardware features, will provide the same kind of basic services such as image analysis, tagging, optical content recognition in images, detection of handwriting, recognition of famous people and places, video analysis thanks to its integration with Microsoft Computer Vision API. Another state-of-the-art computer vision product that Microsoft supports with cloud services is HoloLens2 glasses. Quick trainings, prototype developments and remote guidance in industrial design, medical science and many other industries become possible with this device that superimposes real world and augmented reality images with interactive images and videos supported by artificial intelligence and cloud services.
With Vision SDK, Apple provides developers with object recognition, barcode, text and direction detection services in image and video. However, the real leading area is the augmented reality iOS library called ARKit2. Thus, Apple has prepared the infrastructure for new and different applications to a wide range of developers. On the other hand, Apple has just recently acquired the computer vision start-up, Spektral, in order to adapt the widely-used movie technology ‘green screen’ effect to mobile applications.
In 2015, after the acquisition of a start-up company called Lookserry and thanks to the technology developed by them, Snapchat is defining the vectoral map of the human face. Then by identifying the basic areas such as eyes, nose, lips, ears it creates a new image with the filter selected. As the face recognition algorithms used are already “trained” with millions of faces, they can quickly complete the identification of the different areas of a new face.
Facebook has developed on its platforms a service for visually impaired people that tells them the content of the photos. Trained with NLP (Natural Language Processing), the system is not only tagging photos, but also form understandable sentences about images.
ai.facebook.com/blog/rosetta-understanding-text-in-images-and-videos-with-machine-learning/
GoPro, which is popular with the durable, compact and high-resolution cameras in the field of action and extreme sports, offers its users features such as automatically creating short stories from videos they shoot and the ability to shoot 360 degrees.
NVIDIA is one of the pioneers in developing computer vision technologies, not only with its powerful graphic processors but also with its new platform created for parallel programming of graphics processors called CUDA. The use of CUDA is becoming more widespread, especially since artificial neural networks and deep learning algorithms have found a good implementation area on multi-core graphics processors. With its integrated development kit called Jetson, which enable GPU usage up to 512 cores, NVIDIA will accelerate visual and artificial intelligence applications.
With its service called Visual Discovery Lens, Pinterest has already started to offer you entries or ideas related to an object that you are interested in and have pointed your camera at.
Starting as a research project, ImageNet is a database of millions of images tagged by people in a collective way. Since 2010, ImageNet has been testing deep learning and computer vision algorithms to most accurately recognize images thanks to competitions organized every year and pioneers the development of the whole ecosystem.
Like most major automotive companies, Honda places great emphasis on image processing and, since it also works in its development laboratory on robots such as the human-like Asimo, it synthesizes these two areas and tests autonomous vehicles that can be used in agricultural sites.
Scanning thousands of cameras and terabytes of video in seconds and finding what is searched for… IronYun’s video recording network (NVR), based on artificial intelligence and deep learning, monitors large areas that require continuous security such as banks, hotels, airports, factories, and can perform searches from instant recordings.
Imagine a chip that has all kinds of cameras, sensors, radar and LiDAR kit available inside for driving assistance. Yet, it also manages the driving experience such as getting out of lane, speed control, emergency stops, traffic intensity warnings. The main focus of Intel-supported Mobileye is to perfect the driving experience of the future.
SenseTime, the world's highest valued artificial intelligence enterprise, is able to recognize not only fixed faces but also moving faces in just milliseconds. They have many implementations in payment systems and security areas.
Megvii, aka Face++, which is SenseTime’s biggest competitor, is the machine room of the ‘Big Brother’, according to the Economist. They don’t use ID cards in their office. After guests’ faces are scanned into the system, their identity information is recorded. With the new implementation in Beijing railway stations, the user face is instantly matched with tickets and ID.
Blue Vision, an augmented reality cloud platform, is trying to add digitality to the shared experiences in our city life. Let’s say a new coffee shop opens in an unfamiliar neighborhood. You pick up your phone and one of the funny emojis takes you to the shop. Blue Vision is now preparing to add a new experience to transportation by collaborating with Lyft, the next generation transportation network.
Supermarkets spend hours to complete the products on their shelves. At best, it takes half a day for employees to notice that their stocks have decreased and this lag affects sales. Trax, which works with a miniature battery and a tracking system with thin cameras like game cards, creates 3D visual maps of stores with instant photographs. Every hour, it sends 8 photos to the system and deletes each photo where there is a human face.
The big data of our world is at Orbital Insight. They become a solution partner to governments, large companies or non-governmental organizations with the information obtained from satellites, unmanned aerial vehicles and other geographical data sources. Orbital provides information ranging from car and customer traffic at shopping centers to tank occupancy rates at refineries.
Visuals have a big impact on shopping experience. When easy access thanks to digitalization is added, it is necessary to also simplify the shopping experience. For example, you are in a store taking a photograph of a product. You instantly reach the company’s website and find other matching products. Isn’t this very different from the usual shopping experience?
Know-how of full or semi-autonomous vehicles includes not only the efficiency of sensors following the road, but also the monitoring of the driver. Eyesight's focus is not on the road, but on the state of the passengers in the car. While the driver’s looks, eyelid movements, head posture are monitored, Eyesight is also able to send a warning signal if a person or a pet gets locked in the car.
For years, due to the inability of the human eye to distinguish or catch the millimetric errors in sports games, a lot of faulty decisions have been taken. Monitoring yearly more than 7.200 games or events in 20 sports categories with its video tracking system, Hawkeye helps determine the healthiest results.
OrCam is the eye of visually impaired. The founders are also partners of MobilEye, which improves the autonomous driving experience. The product may look very simple, like a camera on glasses, but its function is revolutionary. Can a blind person be a car racer? With OrCam, he/she can.
Getting custom services in private hospitals is one thing and being able to offer services where human resources and health demands are disproportionate is another. Zebra Medical Vision uses artificial intelligence to make radiologists’ life easier. Zebra is able to analyze the data obtained from millions of visual records and make diagnosis in new scans.
It is not easy to make predictions in important areas such as agriculture, transportation or energy. Descartes Labs helps make predictions in such important areas . Data obtained from the satellites are cleaned and stored. Then images in petabytes can be searched and found in less than 100 milliseconds.
We believe that cameras are more sensitive than the human eye, but Prophesee does not seem to be content with it. Thanks to neuromorphic engineering, Prophesee develops a unique technology with sensors and artificial intelligence that mimic the eye and the brain.
Blue River revolutionizes new generation agricultural equipment. ’Yearly Field Maintenance’ is an old concept now. With Blue River's See & Spray, each seed can now receive special care.
Although autonomous vehicles business is thought to be easy, neither roads nor pedestrians are alike. Plus, it is quite a difficult task to make sense of the car’s surroundings. Mighty AI aims to interpret this data and train computer vision. Moreover, utilizes thousands of people connecting to Mighty AI and describing the visuals in the program. Google and General Motors apply the same method for their autonomous car projects.
Revolution in retail. Enter the store, do the shopping, leave the store without any payment or cashier. Get your receipt on your phone by e-mail. Thanks to Standard Co, the autonomous experience is in our lives not only for driving but also for shopping.
Some of the spots on our skin may not be innocent. If you have suspicious spots, take their photos and upload them to SkinVision. In 30 seconds, you get a response about its risk level as low, medium or high.
Artificial intelligence also makes insurers’ life easier. The system developed by Tractable precisely evaluates damages from a car accident or a natural disaster damage. Moreover, with its estimated damage calculation management, Tractable provides more realistic compensation figures.
If you are a fashion brand, you cannot miss designing a multi-channel experience. Tagging products in the digital world, offering product alternatives according to the person’s preferences, presenting combination possibilities, trying on products in a virtual world… That is why Vue exists.
68% of accidents are caused by drivers’ distractions. Moreover, this rate has increased by 14% since 2014. Besides, one out of every four accidents is caused because of cell phones. Based on all this information, Nauto aims to improve fleet management with its in-vehicle tracking system.
Huge isle one after another, double-sided shelves in each isle, thousands of products… The fact that the shelves are always tidy and full of products, impacts not only sales but also the reputation of the company. Bosso Nova immediately became the solution partner of Wallmart with the robot it developed. While robots control the shelves, salespeople can focus more on customer satisfaction.