Abstract:The classification and treatment of domestic garbage is the focus of the whole society at present. How to carry out simple and efficient classification and detection of domestic garbage is of great importance to the transportation and treatment of garbage. The application of AI technology in garbage classification has become the focus of attention.

AI is now synonymous with the intelligence of this era. It can be found in any field, including garbage sorting and supervision.

However, garbage often belongs to the extreme form of commodities, the situation is more special. The current technology can make the garbage classification alarm reminder on the basis of visual visibility, such as judging whether the garbage has been sorted out. As for whether visual detection and classification can be carried out directly and certain effects can be achieved, more data and experimental support are needed to judge the feasibility of this matter. To answer these questions, we might learn from the Haihua Waste Sorting Challenge how the participants are using technology to change the world.

Haihua Garbage Classification Challenge data includes single-category garbage data set and multi-category garbage data set. The single-category garbage data set contains 80,000 single-category household garbage images, and each single-category garbage image contains only one garbage instance. The multi-category garbage data set includes 4998 images, among which 2,998 multi-category garbage images are used as training set data. The A-list and B-list each contain 1000 test images, and each multi-category garbage image contains up to 20 garbage instances. We will describe the two data sets separately.

I. Multi-category garbage

FIG. 1 Category distribution of multi-class garbage data

As shown in Figure 1, the multi-category garbage covers 204 garbage categories, but the data of these 204 garbage categories are very uneven, and some categories are very few or even not present.

FIG. 2 Multi-class garbage data visualization

The two images in Figure 2 are two images in the training set. Garbage targets are mainly concentrated in the central area of the image with a high degree of overlap. In addition, it can be seen that many targets often appear in the other image at different angles.

From the observation and statistics in FIG. 1 and 2, we can draw several conclusions:

(1) Since an object often appears in multiple images, overfitting these objects is very effective, which is why the AP can score above 90 in this game. Therefore, you can consider backbone with a larger number of backbone, such as ResNext101.

(2) The image is shot from overhead, and both horizontal and vertical flip are effective.

(3) Although the categories are very imbalanced, due to the repeated occurrence of targets, the same target can be detected 100% of the time after the training of several targets. Category imbalance mainly affects objects with little data, so only these objects need to be expanded, mainly including ink cartridges, snails, prunes, shellfish, etc.

(4) With high overlap, mixup and other methods can be used to artificially create some targets with high overlap for training.

Table 1 Data statistics

In addition to the macro statistics at the image level, we also do a detailed analysis of the targets in the dataset. Table 1 shows the statistics of target size and aspect ratio. First of all, the object length is divided according to COCO. Objects larger than 96 belong to large objects, and 75% of targets are large objects, which means that the lifting method for small objects is basically invalid. Secondly, there are few objects with large proportion in aspect ratio, which gives us a lot of inspiration for parameter adjustment of Anchor.

Second, single category garbage

The single-category garbage mainly contains 80,000 images, each with one target. As shown in the two figures on the left, the targets of single-category garbage are all large. There are two ways to use single class: one is to expand the data with fewer categories; the other is to use single class data set training to get a better pre-training model.

FIG. 3 Data comparison

When expanding the data, we found that compared with the multi-category garbage, the target of the same category is not exactly the same. The single-category crayfish is crayfish, the multi-category crayfish is actually targeted at milk cartons, and the diode is targeted at plastic tubes. This shows that data augmentation with a single class is impossible because the data is not homogenous. We tried this scheme, but the accuracy remained the same.

For the pre-training model, due to the large target, we splice the image according to 4*4 to reduce the amount of data, improve the number of targets in a single image, and also achieve certain effects. But when combined with other enhancements, it didn’t work, so we abandoned it.

Iii. Model Scheme:


1.Baseline

Figure 4 Baseline scheme

Our baseline uses Cascade RCNN implemented by MMDetection, and Backbone uses ResNeXt101 X64+DCN. Because coco’s evaluation index AP50:95 was adopted in this competition, Cascade RCNN could achieve very good results by setting different thresholds for regression. In addition, bigger backbone companies often get better results on this data set.

2. Adjust parameters

At the beginning of the competition, we selected 2500 pieces of training data from the training set and 498 pieces of local verification, and then adjusted the parameters on this basis. Due to the high degree of target overlap, softnMS threshold 0.001, MAX_per_img =300, flip test effect is better, compared with not using these parameters can improve about 0.02. The image region was randomly clipped (0.8W, 0.8h) from the image due to the limitation of video memory, and then the short edge was randomly limited to [640,960], and the long edge was limited to 1800 for multi-scale training. During the test, the image was moderately enlarged and the short edge was set to 1200. The accuracy can be trained to 88.3%, and the accuracy can be trained to about 88.6% with OHEM, and the training can be improved by 0.5% to 89.2% with the input of 498 locally verified images.

For A small number of categories, we supplemented the classification of shell removal, snail, diode and plum core in multi-category training concentration, and marked some ambiguous targets to improve recall rate, marking about more than 100 targets, which can be increased to about 90% in A list.

As shown in Figure 5, for the adjustment of Anchor, we adjust the proportion of Anchor from [0.5,1.0,2.0] to [2.0/3.0,1.0,1.5]. In addition, in order to improve the detection ability of large objects, we adjusted the level division of FPN from 56 to 70, which is equivalent to increasing the target allocated by each layer of FPN. Then, we changed the scale of Anchor from 8 to 12 to detect these large objects.

Figure 5 Anchor modification

As shown in Figure 6, after parameter adjustment, it can be found that the distribution of target number in FPN is closer to the normal distribution. We believe that such a distribution will be helpful for detection. As can be seen from the number of convolution of several stages in ResNet, the stage parameters corresponding to the middle layer of FPN are large and should detect more targets, while the parameters corresponding to backbone on both sides of FPN are small and the number of detected targets should not be too large.

FIG. 6 Changes in the number distribution of targets on FPN

During image enhancement, we added 24 epochs to the online mixup and trained it to 91.2%~91.3%, but there was no improvement when only 12 epochs were used. Mixup is set relatively simply. The two images are fused at a ratio of 0.5, so there is no need to weight loss.

Figure 7 Mixup effect diagram

3. Model fusion

In the previous test process, we thought that the speed of 1080Ti and 2080 should be similar, and each test on 1080Ti took about 40 minutes, so we only chose about 3 models, which was a disadvantage. In the test of B board, we found that 2080 was much faster than 1080Ti. We only used 25 minutes for a single model plus a flip test, and could have improved the score further with more models. We used Cascade RCNN based on ResNext101 X32 + GCB +DCN, Cascade RCNN based on ResNext101 x64 +DCN, Guided Anchor Cascade RCNN based on ResNext101 X64 +DCN. As for the methods used for Fusion, there is little difference in the effects achieved by different methods. Weighted Boxes Fusion: Under the ensembling Boxes for Object Detection Models, the fusion threshold is set to 0.8.

FIG. 8 WBF effect diagram

4. Parameter effect

Table 2 Parameter Settings

FIG. 9 Accuracy change of A board

Iv. NAIE platform deployment and use

1. Platform understanding

Personal understanding of NAIE platform is mainly composed of three parts, local debugging area, cloud storage area, cloud training area, to understand the three parts of their respective functions can be quickly started.

The local debugging area is based on vscode and is associated with a gpu-free server. You can perform initial deployment and debugging of the environment from the command line just like a normal Linux server.

The cloud storage area mainly stores large data and pre-training models. Such large files as pre-training models cannot be directly transferred from the local debugging area to the model training area.

In the model training area, GPU is invoked to complete the model training, and the trained parameter model is copied to the cloud for storage. Only the model saved to the cloud can be downloaded.

2. Model deployment

This section uses the DEPLOYMENT of MMDetection as an example.

  • 1) Code upload

Upload code by right clicking NAIE Upload. There is a size limit for uploading code, which cannot exceed 100M. Therefore, it is recommended to delete the pre-training model and some irrelevant files and keep only the core code.

  • 2) Environment deployment

Environment deployment requires a requirements.txt file in the local code area that specifies the Required Python libraries and version numbers.

  • 3) Model running

The platform does not support running sh files, so you need to write a py called model.py that uses os.system() to mimic the command line.

In addition, moxing package is also called in model.py to save the trained model to the cloud.

In the model training area, select Model.py and the required GPU specification for training.

  • 4) Additional supplements

NAIE Upload cannot upload large files directly, so you can write a program called debug.py in the local debug area, call wget in the program to download the file, and upload it to the cloud through moxing package. During training, moxing packages can be used in model.py and then transferred to the server.

Several participants finally finished the race and received awards, although the ranking was not particularly good, but still accumulated a lot of experience through the race. Achievements, they say, can not do without the work force support of huawei NAIE training platform, huawei NAIE training platform provides free V100 and P100 graphics for training, scientific research and take part in the game for us to provide the very big help, modify the code and the training is very convenient, the early stage of the familiar platform can timely solve the problem of or assist in solving. Hope that through this sharing can provide you with certain experience and avoid pits.

reference

[1]. Cai Z , Vasconcelos N . Cascade R-CNN: Delving into High Quality Object Detection[J]. 2017.

[2]. Zhang H , Cisse M , Dauphin Y N , et al. mixup: Beyond Empirical Risk Minimization[J]. 2017.

[3]. Solovyev R , Wang W . Weighted Boxes Fusion: ensembling boxes for object detection models[J]. arXiv, 2019.

[4]. P. Wang, X. Sun, W. Diao, and K. Fu, “Fmssd: Merged Single-shot Detection for Multiscale Objects in Large-scale Remote Imagery,” IEEE Transactions on Geoscience and Remote Sensing, 2019.

[5]. Zhang S , Chi C , Yao Y , et al. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection[J]. 2019.

[6]. Pang J , Chen K , Shi J , et al. Libra R-CNN: Towards Balanced Learning for Object Detection[J]. 2019.

[7]. Deng L , Yang M , Li T , et al. RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation[J]. 2019.

[8]. Ren S , He K , Girshick R , et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6).

[9]. Lin T Y, Piotr, Girshick R, et al. A Method of Detection based on multi-layer algorithm [J].

[10]. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, AND Y.Wei, “Deformable convolutional networks,” In Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 764 — 773.

[11]. X. Zhu, H. Hu, S. Lin, AND J. Dai, “Deformable convets v2: More deformable, better results,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9308 — 9316.

[12]. Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, AND W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” In Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 603 — 612.

[13]. Wang J , Chen K , Yang S , et al. Region Proposal by Guided Anchoring[J]. 2019.


Click to follow, the first time to learn about Huawei cloud fresh technology ~