CVPR 2018 | Cascade R - CNN: high precision target detector

In this paper, by
A city blogOriginal, by Panzer.

Address: arxiv.org/abs/1712.00…

Code address: github.com/zhaoweicai/…

background

General object detection is one of the most popular problems in computer vision. Although the target detection algorithm from CNN has made rapid progress in accuracy compared with traditional methods in recent years, it still has a long way to go compared with the target classification problem. The early target detection methods are mainly led by THE VJ framework, whose core idea is to enumerate a large number of sliding Windows on the image, extract the image features in the sliding window, score the sliding window through the cascade classifier, and retain the sliding window with higher score as the final detection result. With the RBG god bringing RCNN into the field of target detection, target detection is constructed to solve the problem of classification + regression. This wave is mainly led by two types of algorithms: The two-stage method represented by Faster R-CNN and the single-stage method represented by SSD, among which the accuracy rate of Faster R-CNN is higher, while SSD is Faster.

The author of this paper is a giant in the field of target detection. As early as ECCV 2016, he published a masterpiece of target detection, MSCNN [1], which mainly solves multi-scale problems in the field of target detection. This paper mainly focuses on the selection of IoU threshold value in the training process of target detection. The author thinks deeply about this problem and draws on the idea of cascade in traditional methods, presenting us a wonderful feast both in theory and experiment.

The main content

The basic problem

As we all know, with the great god of RBG bringing RCNN into the field of target detection, target detection is constructed to solve the problem of classification + regression. Therefore, the detection problem is a classification problem in essence, but it is very different from the classification problem, because all candidate boxes in the image are scored in the detection problem. In the training process, positive and negative samples are determined by IoU threshold, so the selection of IoU threshold is a set of hyperparameters that need to be carefully selected. On the one hand, the higher the ious of threshold value, then the sample is closer to the goal, so the training of the detector positioning more accurate, but constantly improve IoU threshold will lead to two questions: one is the sample is too little to training a fitting problem, the second is the training and testing using different threshold led to a decline in evaluating performance; On the other hand, the lower IoU threshold is selected, the more abundant positive samples will be obtained, which is conducive to the training of the detector, but will inevitably lead to a large number of false checks during the test, namely “close but not correct” mentioned in the paper. The above analysis can be supported by the following set of experiments conducted by the author:

FIG. 1 (c) shows the change of IoU matched by target candidate box and real target box after a regression. The horizontal axis represents before regression, and the vertical axis represents after regression. Curves in different colors reflect detectors trained by different IoU threshold values. In general, after the regression, the IoU of the target candidate box is improved, but the difference lies in the following: when the IoU is between 0.55 and 0.6, the output of the regression based on the IoU threshold training of 0.5 is the best (blue line). IoU is between 0.6 and 0.75, and the regression output based on IoU threshold training of 0.6 is the best (green line). With an IoU above 0.75, the regression output trained based on a threshold of 0.7 is the best (red line). The above results show that: To get higher position precision detector (i.e. IoU is bigger, the better), you must choose a larger IoU threshold, however, figure 1 (d) the results show that based on the threshold of 0.7 training the detector (red line) instead of AP is the worst, only in the chosen IoU threshold of 0.85 when the above evaluation, the result perhaps better than the blue line, However, it is still inferior to the green line, which fully verifies our previous analysis: the detector trained based on the IoU threshold value of 0.7 has too few positive samples, so the diversity of positive samples is not enough, which is easy to lead to over-fitting of training, and thus performs poorly in the verification set. Then, the author thinks, is there a way to train the detector with a higher IoU threshold and ensure that the diversity of positive samples is rich enough? Based on the above analysis, we discuss Cascade R-CNN proposed by the author in detail below, whose core idea is “divide and rule”.

Model structure

Figure 3 provides a visual comparison of this approach with other related work. (d) The basic framework of Cascade R-CNN is shown in this paper; (a) It is the classical Faster R-CNN framework, which is also the baseline of this paper; (b) Structurally similar to Cascade R-CNN, the difference is that only the Cascade structure is used for multiple regression of Box in the test, so the “H1” structure of the network is the same, that is, a single IoU threshold is used in the training. (c) Multiple detectors are connected in parallel in the ROI detection network, and these detectors are unrelated, somewhat similar to the idea of “multi-Expert”.

Although the Iterative BBox in Figure 3(b) uses a cascade structure to perform multiple regression for the Box, the use of a single IoU threshold to train a single detector may lead to the following problems: Target candidate box after 0.5 threshold detector sample distribution has changed, as shown in the figure below, can be found through multiple stages of detector, are more focused on the center of the distribution of sample, that is the matching degree of samples and real objects more and more high, this time to use the same IoU threshold detector is obviously suboptimal training, Because if the IoU threshold is not raised to remove these red outliers, it will introduce a lot of noise interference, so it is necessary to increase the IoU threshold to ensure the quality of the sample.

On the other hand, simply increasing the IoU threshold raises the question: does this reduce the number of positive samples? Admittedly, if the initial target candidate box is still used, the answer is yes, but this paper resample from the target candidate box after the regression, and this question does not exist, as evidenced by the following figure: After several stages, the IoU of positive samples is constantly improving. We can increase the THRESHOLD value of IoU arbitrarily and still obtain enough positive samples.

So far, we have completed the explanation of the core part of Cascade R-CNN. The training still adopts the general classification + regression loss, which will not be described here.

Experimental analysis

The experimental details

(1) The verification experiment was conducted on MS-COCO 2017. All the Detectors were developed based on caffe framework to ensure the fairness of comparison.

(2) There are four stages selected by the author. The IoU threshold is set as progressive 0.5/0.6/0.7, and those larger than the threshold are selected as positive samples, while the rest are all negative samples.

The experimental results

First of all, we take a look at the promotion of Cascade for different detectors in this paper. The author selects three two-stage detectors: Faster R-CNN, R-FCN and FPN. It can be found in the following table: Without any trick, Cascade can be steadily increased by 3-4 points for different detectors and different benchmark networks, and the higher the IoU threshold is, the more obvious the improvement is. The effect shown in this table is quite convincing.

In addition, the paper presents a large number of stripping experiments to verify the effectiveness of Cascade (see figure below).

Table 1 shows Cascade’s advantages over Iterative BBox and Intergral Loss, especially in AP90. In the evaluation index, it is necessary to improve the IoU threshold to train the cascade detector. Table 2 shows the necessity of combining the classification scores of multiple classifiers. In terms of AP indicators, STAGE2 showed a 3-point performance improvement compared with STAGE1, while STAGE3 showed no advantage compared with STAGE2. However, combining the classification scores of multiple classifiers could improve AP to 38.9. Table 3 shows the necessity of improving IoU threshold and using different regression statistics. The comparison between line 2 and Line 4 shows that the former is more important than the latter, which proves once again the necessity of training cascade detector by improving IoU threshold. Table 4 shows that several stages will saturate the performance, and it can be found that Stage4 has failed to bring performance improvement. Stage3 can reach the highest point of AP 38.9, while stagE2 brings the most obvious performance improvement. Therefore, two stages are sufficient for practical application.

Conclusion outlook

This article contribution

(1) The selection of IoU threshold in target detection was studied in depth, and the influence of selection of IoU threshold on detector performance was verified through a large number of experimental analysis;

(2) Based on the analysis of the above problems, a Cascade version of Faster R-CNN, also known as Cascade R-CNN target detection algorithm, was proposed, which showed excellent performance on MS COCO universal target detection data set without using any trick.

Personal opinion

(1) This paper discusses the selection of THRESHOLD value of IoU, a very important problem in target detection that no one has paid attention to for a long time. It is a very enlightening work. The author combines the Cascade idea in traditional methods with the current mainstream Faster R-CNN detection framework. The two-stage approach to existing data sets improves detection performance to a new level. Apart from the large amount of experimental analysis in this paper, when we re-examine the two mainstream frameworks of current target detection algorithms (Faster R-CNN and SSD), a question worth thinking is why the accuracy of Faster R-CNN is higher than SSD. The author believes that one of the key points is that the Faster R-CNN completes two predictions of the target candidate box, including one for RPN and one for the detector behind. In this paper, the author went further and stacked several cascade modules of the detector at the back, and adopted different IoU threshold training to further improve the accuracy of THE Faster R-CNN. And then we think about when will the ceiling of this increase occur? Table 4 shows that the performance of CASCADE R-CNN has reached saturation at STAGE3, which is still a certain gap from our expectation. How to further improve the upper limit of CASCADE is a problem worthy of further exploration.

(2) The experiment in this paper reveals two keys to the success of Cascade R-CNN: one is Cascade detector rather than parallel detector; the other is training Cascade detector by improving IoU threshold. However, experimental verification was carried out under the framework of Faster R-CNN detection. Considering that we pay more attention to the timeliness of detector in practical application scenarios, whether this cascade idea can be transferred to the detection framework of SSD to improve the accuracy of SSD is also a question worthy of exploration.

(3) A more noteworthy contribution of this paper is: By raising the IoU threshold to train the Cascade detector, the detector’s positioning accuracy can be higher. Under the more strict IoU threshold evaluation, the performance improvement brought by Cascade R-CNN is more obvious. It has to be said that Cascade R-CNN has taken a solid step towards high-precision target detector. And that’s exactly what we want to see in practice.

reference

[1] A unified multi-scale deep convolutional neural network for fast object detection. ECCV (2016)

This article is an original article of Extrememart platform. For more technology sharing and project cooperation, please pay attention to extrememart platform wechat.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

CVPR 2018 | Cascade R – CNN: high precision target detector

background