Abstract:In order to explore the intelligent waste classification and other issues, the Haihua AI Garbage Classification Contest 2020, organized by Zhongguancun Haihua Information Research Institute, Tsinghua University’s Cross Information Research Institute and Biendata, attracted a large number of engineers and college students to participate

01The problem is introduced

With the development of China’s economy and the accelerating process of urbanization, the threat of MSW to urban environment is increasing. How to deal with domestic garbage efficiently and environmentally is extremely urgent. Therefore, intelligent garbage classification is very important for intelligent garbage sorting and improving the efficiency of garbage sorting. To explore this issue, the Haihua AI Garbage Classification Competition 2020, organized by Zhongguancun Haihua Information Research Institute, Tsinghua University’s Cross Information Research Institute and Biendata, attracted a large number of engineers and college students. The computing power support provided by Huawei NAIE platform also laid the foundation for the smooth development of the competition. The competition aims to inspire a wider range of scientific research and exploration enthusiasm, mining more valuable algorithm optimization and innovation.

02The data analysis

We are on a professional track for the race, so there are two data sets available. One is a single-class dataset of 80,000 garbage images with only one category in each image. In addition, information about the borders of the unique objects in the diagram is provided. The multi-category data set contains 2998 garbage images in the training set, 1000 garbage images in the verification set and 1000 garbage images in the test set, each image contains at most 20 categories.

Different from VOC, COCO, OID and other common target data sets, these data sets have different characteristics:

1. For 205 categories, the size of the single-class garbage dataset is sufficient. However, the garbage image in this dataset is quite different from the garbage image in a multi-class dataset, even if it is the same category of garbage. Direct use of this data set to train the model may result in feature mismatches and performance degradation (Figure 1).

Figure 1: Mirrors in a multi-class dataset on the left and mirrors in a single-class dataset on the right

2. The multi-class data set contains only 2998 garbage images in the training set. However, in the case of 125 categories, the amount of data is relatively small, and the categories are unbalanced. Therefore, the transfer learning of large data sets and the techniques to reduce overfitting are very needed.

3. In most multi-class images, garbage is densely packed in the center of the image and has different shapes and sizes. It can cause the garbage to shade each other, making the detection task more difficult. And the background of each image is relatively clean (Figure 2).

Figure 2: An example of a multi-class dataset image

4. There are many garbage confusing categories in the data set, leading to inconsistency of category labels. Manual checking and correcting labels may be helpful, but there is also a risk of data mismatch (Figure 3).

Figure 3: The left and right images are of the same class of objects, but in a multi-class dataset

The image on the left is labeled as a food plastic box, while the image on the right is labeled as a food packaging box

03Baseline

To accomplish this task, we borrowed solutions from other large target detection competitions, such as Coco, Objects365, and OID. In these solutions, we learned from the solution of Baidu in OID 2019. Baseline model is Cascade R-CNN based on Class-Aware, and Resnet200-VD is used as the backbone of the model and FPN, DCNv2 and non-local methods are added to improve the overall effect of the model. The training uses multi-scale training (480:1440:32) and common data enhancement methods such as horizontal flipping.

Transfer learning can achieve good performance on small data sets, so it should achieve good results in theory by using this method in this competition. Therefore, we chose a pre-training model mixed with COCO, OBJECTS365 and OID. Specific effects are shown in Table 1.

Table 1: AP of Baseline using different pre-training parameters

In terms of training, due to the huge model and multi-scale training method, the batch size of Tesla V100 can only be set to 1. Here, thanks to the computing power support provided by Huawei NAIE. Use the momentum-referenced SGD as the optimizer for training. The basic learning rate was set at 0.001 and the weight attenuation rate was set at 0.0001. Cosine Annealing with Warmup learning rate scheduling strategy is used. Start with 0.0001 learning rate and reach the base learning rate after 1000 iterations. We did 120K iterations of Training Baseline, and the training time was about 40 hours.

04Data enhancement scheme

To reduce overfitting on such a small data set, a large amount of data expansion is required. We tried many data enhancement schemes and finally found that RandomVerticalFlip, AutoAugment and GridMask could effectively improve the model performance.

Unlike natural images, horizontal and vertical flips have the same effect for garbage images, so we used RandomVerticalFlip instead of RandomHorizontalFlip.

Autoaugment method, which is widely used in image classification, was transferred to target detection. Experiments show that this method is effective for all kinds of target detection data sets. And we tried three different automatic enhancement strategies in the Baseline (Table 2), and found that the effect of AutoAugment V0 was the best, and finally we used it.

Table 2: AP of models under different AutoAugment strategies

Furthermore, we use data enhancement methods of GridMask, which include random erasing, hide-and-seek, Dropout and DropBlock. The experimental results show that Gridmask can effectively reduce the over-fitting of target detection. The performance of the model can be improved significantly by using GridMask for a long time. In this experiment, we tried the Gridmask with different probabilities and training time. As shown in Table 3, the training model using GridMask required a longer training time than Baseline. The probability of using 0.3 is sufficient to reduce overfitting, and the longer the training time, the better. Probabilities of 0.5 and 0.7 can even lead to underfitting phenomena. Therefore, if we train more than 300K iterations with a higher GridMask probability, the results can be further improved theoretically.

Table 3: AP of the model with different Gridmask probabilities and iterations

05Model integration

During the final commit phase, we were only able to test on RTX 2080 for two hours, but the model size was not limited. So with these constraints in mind, we trained 6 identical models that were only randomly seeded differently to perform model fusion, using all the effective techniques mentioned above. At the same time, top-k voting NMS was used to merge the detection results of the six models, and the IOU threshold of top-k voting NMS was set as 0.7, which was consistent with the IOU threshold of the third stage of Cascade R-CNN.

06conclusion

In the end, we won the first place in the test set with a score of 0.910. We think the main reasons why we can win are as follows:

(1) Referring to the top schemes of large-scale target detection competition

(2) A hybrid pre-training model of Coco, Object365 and OIDv5 was used

(3) Various data enhancement methods are used

(4) Top-K voting NMS was used in model fusion

Finally, thanks to the NAIE platform provided by Huawei, which has provided great help for the competition training. The platform functions are also very powerful. When we encountered problems using the platform, the authorities answered and assisted us in time, which was very helpful. We are also very honored to have this opportunity to share with you the course of this competition, thank you!

Click on the attention, the first time to understand Huawei cloud fresh technology ~