Takeaway:From simple image classification to 3D pose recognition, computer vision has no shortage of interesting problems and challenges. With the naked eye, we can detect the cat and dog in a pet photo, and we can recognize the stars and moon in Van Gogh’s Starry Night. So how to give the machine the intelligence of “seeing” through the algorithm is what we will talk about next.

This paper will first introduce the concept of target detection, then introduce a simplified target detection problem — location + classification and its existing problems. Finally, it will gradually enter into the common model and method of target detection, such as Faster R-CNN, SSD, etc. This process will involve a lot of details of the concept and knowledge points, the specific technical explanation please download the following e-book to read.

Stamp here to download ebook now

The book is a wonderful collection of dry goods

1. Common models and methods of target detection

R – 1.1 CNN

Scholars have done a lot of research in this direction, and the most famous method is selective search. The specific method will not be explained in detail here. Readers who are interested in selective search can read the papers on it. All you need to know is that this is a way of selecting Regions of Interest (ROI) from a picture. With a method for capturing the ROI, the final target detection results can then be obtained by classifying and merging methods. Based on this idea, we have the following R-CNN method.

  • Select Potential Target Candidate Box (ROI)
  • Train a good feature extractor
  • Train the final classifier
  • A regression model is trained for each class to fine-tune the ROI from the actual rectangular box position and size

1.2 Fast R – CNN

With regard to the three major problems of R-CNN, let’s consider if there is a better solution. The first is speed. CNN feature extraction with 2000 ROIs takes up a lot of time. Is it possible to use a better method, such as shared convolutional layer, to process all 2000 ROIs simultaneously? 
 Second, CNN features will not be updated due to SVM and regression adjustments. 
 R-CNN’s operation process is complex. Is there a better way to make the training process end-to-end? Next, we will introduce FAST R-CNN[2] proposed by Firshick et al. in 2015, which subtly solves several major problems of R-CNN.

1.3 Faster – R – CNN

Faster R-CNN[3], as a classic method of target detection, has frequently appeared in many actual combat projects and competitions. In fact, Faster R-CNN is to build a small network on the basis of FAST R-CNN and directly generate region proposal to replace other methods (such as selective search) to obtain ROI. The small Network is called the Region Proposal Network (RPN). In the training process of Faster R-CNN, RPN is the key, and the rest of the process is basically the same as that of FAST R-CNN.

Let’s take a look at the training process of Faster R-CNN:

  • An RPN network is trained using ImageNet pre-trained models.
  • Use the pre-trained model of ImageNet and the suggested area generated in Step (1) to train the FAST R-CNN network to get the actual category of the object and the position of the fine-tuned rectangular box.
  • Use the network in (2) to initialize RPN, fix the front convolution layer, and only adjust the parameters of the RPN layer.
  • Fixed the front convolution layer, only train and adjust the FC layer of FAST R-CNN.

1.4 YOLO

In R-CNN’s series of algorithms, a large number of proposals are required to be obtained in the first place, but there is a great overlap between proposals, which will lead to a lot of repetitive work. Yolo [5] changed the prediction idea based on proposal and divided the input picture into S*S small grids. He made predictions in each grid and finally merged the results.

Next, let’s take a look at the key steps of YOLO learning: YOLO has requirements on the size of images input from the network. First, the image needs to be scaled to the specified size (448448), and then the image is divided into small SS cells. Within each box, you make predictions about whether the box contains an object, the position of the box containing the object, and the score of the box for each of the C categories.

1.5 the SSD

SSD[4] draws on the idea of YOLO grid and anchor mechanism of Faster R-CNN at the same time, so that SSD can make rapid prediction and obtain target position relatively accurately. Here are some features of SSD:

  • Multi-scale feature layer was used for detection. In the RPN of Faster RCNN, the anchor is generated at the last feature layer of the backbone network, while in SSD, the anchor is not only generated at the last feature layer, but also generates anchors at several high-level feature layers.
  • Anchors generated by all feature layers in SSD will be screened by positive and negative samples, and then classified scores and Bbox positions will be learned directly.

2. Industrial application practice of target detection

The application of target detection technology is explained in detail in front of us. How to combine the technology with the industry to give full play to the maximum value is also what we are most concerned about.

In the situation of stable economic expectations, domestic manufacturing enterprises are accelerating the pace of transformation and upgrading. As a technology company with feelings and a sense of mission, Ali hopes to help traditional enterprises realize transformation and upgrading through technological means.

In the photovoltaic industry, quality inspection has long faced problems such as high professionalism, difficulty in recruiting workers and lack of manpower. Germany, which has a high level of industrial automation, has introduced EL quality inspection technology for components, but it only targets at typical defects and can only be assisted by manual labor (it cannot replace manual labor). In China, photovoltaic enterprises have been trying in the field of intelligent AI identification technology for nearly 10 years, but the automatic quality inspection of polycrystalline cells and modules is far from reaching the level of industrial production.

This paper will focus on introducing the EL quality inspection function of single crystal and polycrystalline components introduced by Ali, which has been running in the production line and the accuracy is stable at more than 95%. AI detection in the field of industrial vision “cost reduction and efficiency” has a very obvious advantage. In the future, Ali Cloud will cooperate with more enterprises to write a new chapter of intelligent manufacturing.

Author: Xin Xuerui XXR

Read the original

This article is the original content of Yunqi community, shall not be reproduced without permission.