Detect objects in real time with ARTIFICIAL intelligence – YOLO

“You only see it once”, detecting common (about 9,000) objects and their respective, mostly high probability. YOLO is a cutting-edge artificial intelligence algorithm for detecting objects in images and videos. Speed, object learning differences and customizability are the basic advantages. Its speed can be attributed to optimized programming of a complex convolutional neural network (CNN). It takes care of changes in the object by training and learning about new changes. Having enough training data — lots of images — can speed up performance. Customizability can be achieved by manually annotating, preferably with ≥200 objects in different images, and transferring intelligence from prior learning in large data sets.

For more action, watch it live in the James Bond trailer.

Object detection is an important concept in computer vision field. It is the detection and classification of objects in image data. It has many applications — from autonomous vehicles to surveillance. It helps us unlock our phones just by looking at them — because tapping your fingers is too taxing.

YOLO applies the same concepts you use to create intelligence. You learn by detecting objects in your environment and classifying them. So does the algorithm. YOLO is very accurate, and the more images it sees, the more accurate it gets.

It does a good job of learning about representations of objects. When you learn about a bowl, you know that other objects that look like bowls — in various sizes and colors — are bowls, too. You don’t have to look at all the bowls to know that a new object is a bowl.

Considering its full form, YOLO only “looks once,” which means it divides the image into regions by creating a neural network and predicts the bounding boxes and probabilities for each region. These boundary boxes are weighted by the predicted category probability to achieve the final classification and boundary boxes.

The algorithm gives average accuracy (true positive/(true positive + false positive) at 40 frames per second (FPS), with a ratio of 78.6%. YOLO is generalizable and learns from the environment around the tagged object. YOLO also provides the probability of an object’s existence. It draws a box around the object it detects. The algorithm can also detect objects for which it is not trained.

As the number of boxes detected increases, you will get a higher IOU value (the similarity of truth and prediction boxes). See figure below: the Y-axis is the IOU value, and the X-axis is the number of boxes.

YOLO was trained on large, well-labeled datasets. It is modeled on a complex neural network and performs 8.52 billion calculations at a rapid rate. It uses hierarchies, for example, “Norfolk terrier” and “Yorkshire terrier” are both loanwords for “terrier”, and “terrier” is a type of “hound”, a type of “dog”. Most other competitive models treat labels as different.

It is customizable and can predict custom objects with high precision. YOLO was customized to test Microsoft’s Hololens, and it performed well, as you can see below.

Custom object detection

YOLO trains on tagged images and gets enhanced data — cropping, saturation, rotation, and so on — from manipulating those images. It looks at the whole image, and therefore the environment around the object. The algorithm has good universality by capturing the basic features and principles of objects.

I encourage you to read this paper for more details. This article provides an in-depth explanation of YOLO and a programming tutorial. Browse the code here.


Real-time, live object detection with ARTIFICIAL Intelligence –YOLO was first published in Nerd For Tech magazine, and people continue the conversation by highlighting and responding to the story.