Abstract:The classification and treatment of solid waste is a hot topic in the whole society at present. How to carry out concise and efficient classification and detection of solid waste is very important to the transportation and treatment of solid waste. The application of AI technology in garbage sorting has become the focus of attention.

Nowadays, AI has become a synonym for intelligence in this era. AI can be found in any field. It is natural that “AI+” can also be used in garbage sorting and supervision scenes.

Garbage, however, is often an extreme form of commodities, and the situation is special. The current technology on the basis of visual visibility, is able to do garbage classification alarm reminder, such as to determine whether the garbage is sorted. As for whether visual detection and classification can be carried out directly and certain effects can be achieved, more data and experimental support are needed to judge the feasibility of this matter. To address these questions, we might want to hear from the Haihua Trash Scatter Challenge how the contestants are using technology to change the world.

Haihua Garbage Classification Challenge data includes single garbage data sets and multiple garbage data sets. The single-category garbage data set contains 80,000 single-category household garbage images, and each single-category garbage image contains only one garbage instance. The multi-category garbage data set includes 4998 images, among which 2,998 multi-category garbage images are used as the data of the training set. Table A and B contain 1000 test images respectively, and each multi-category garbage picture contains up to 20 types of garbage instances. We will describe the two datasets separately.

One, multiple categories of garbage

Fig. 1 Multi-type garbage data category distribution

As shown in Figure 1, the multiple categories of garbage cover 204 categories of garbage, but the data for these 204 categories is very uneven, with some categories having very few or no numbers.

Figure 2 Visualization of multiple types of garbage data

The two images in Figure 2 are the two images in the training set. The garbage targets are mainly concentrated in the central area of the image with a high degree of overlap. In addition, it can be seen that a lot of targets tend to appear in the other image at different angles.

From the observation and statistics in Figure 1 and Figure 2, we can draw several conclusions:

(1) Since an object often appears in more than one image, overfitting these objects is very effective, which is why the AP can be trained to more than 90 in this competition. Therefore, the backbone with a larger number of parameters can be considered, such as Resnext101 X64+DCN.

(2) The image is taken from a top view, and both horizontal and vertical flips are effective.

(3) Although the categories are very unbalanced, due to the repeated occurrence of targets, we often train several targets, and the same target can be detected 100% if we see it again. Category disequilibrium mainly affects objects with little data, so only these targets need to be expanded, mainly including ink cartridges, spiral lions, plum cores, shellfish, etc.

(4) With a high degree of overlap, methods such as mixup can be used to artificially create some targets with high degree of overlap for training.

Table 1 Data statistics

In addition to image-level macro statistics, we also performed detailed analysis of the targets in the dataset. Table 1 shows the statistics of target size and aspect ratio. First of all, the length of an object is divided according to Coco, and those larger than 96 are large objects. 75% of the targets are large objects, which means that the lifting methods for small objects are basically ineffective. Secondly, objects of large scale rarely appear in aspect ratio, which gives us a lot of inspiration for parameter adjustment of anchor.

Two, a single category of garbage

Single category garbage mainly contains 80,000 images, with 1 target for each. As shown in the two figures on the left, single category garbage has larger targets. The use of single class mainly has two ideas: one is to expand the data with few categories, the other is to use single class data set training to get a better pre-training model.

Figure 3 Data comparison

When we expanded the data, we found that the targets of the same type of garbage were not exactly the same as those of the multiple types of garbage. The single type of crayfish was the crayfish, the multiple types of crayfish were the milk cartons, and the diode was the plastic tube. This means that it is not possible to extend the data with a single class, because the data is not identical. We tried this scheme, but the accuracy remained the same.

For the pre-training model, due to the large target, we spliced the images in accordance with 4*4, which reduced the amount of data and increased the target number of a single image, and also achieved certain effects. But when combined with other enhancement methods, it was largely ineffective, so we abandoned this approach as well.

III. Model Scheme

1.Baseline

Figure 4 Baseline scheme

Our baseline uses Cascade RCNN implemented by MMDetection, while Backbone uses Resnext101 X64+DCN. Because COCO’s evaluation index AP50:95 was adopted in this competition, Cascade RCNN could achieve very good results by setting different thresholds for regression. In addition, large backbone can often achieve better results on this data set.

2. Parameter adjustment

In the early stage of the competition, we selected 2500 pieces of training and 498 pieces of local verification data from the training set, and adjusted the parameters on this basis. Due to the high target overlap, the results were better when the SOFTNMS threshold was 0.001, MAX_PER_IMG =300 and the roll-over test was used, and the improvement was about 0.02 compared to the non-use of these parameters. Restricted by video memory, the image region (0.8W, 0.8H) is randomly clipped from the image, and then the short side is randomly limited to [640,960], while the long side is limited to 1800 for multi-scale training. During the test, the image is moderately enlarged and the short side is set to 1200. The accuracy can be trained to 88.3%. Combined with OHEM, the accuracy can be trained to about 88.6%. Inputting 498 locally verified images into the training can improve the training by 0.5% to about 89.2%.

For the categories with A small number, we added labels to the categories of shellfish with hard shell, snail, diode and plum core in the multi-category training concentration, and labeled some ambiguous targets to improve the recall rate. About 100 targets were marked, which could increase to about 90% in the A list.

As shown in Figure 5, for the adjustment of anchor, we adjusted the ratio of anchor from [0.5,1.0,2.0] to [2.0/3.0,1.0,1.5]. In addition, in order to improve the detection ability of large objects, we adjusted the hierarchy division of FPN from 56 to 70, which is equivalent to increasing the targets allocated at all layers of FPN. Then, we changed the scale of Anchor from 8 to 12 to detect these large objects.

Figure 5 Modification of Anchor

As shown in Fig. 6, after adjusting the parameters, it can be found that the distribution of the number of targets in FPN is closer to the normal distribution. We believe that such a distribution will be helpful for detection. From the convolution number of several stages of RESNET, we can see that there are more stage parameters corresponding to RESNET in the middle layer of FPN and more targets should be detected, while there are fewer parameters corresponding to backbone on both sides of FPN and the number of detected targets should not be too many.

Fig. 6 Quantity distribution changes of targets on FPN

During image enhancement, we added 24 epochs to online mixup training and improved to 91.2%~91.3%, but with only 12 epochs there was no improvement. The Mixup we set is relatively simple. The two images are respectively fused in a ratio of 0.5, so it is not necessary to weight loss.

Figure 7 Mixup rendering

3. Model fusion

In the previous test process, we thought that the speed of 1080Ti and 2080 should be similar, and each test on 1080Ti took about 40 minutes, so we only chose about three models, which was a disadvantage. In the test on the B list, we found that 2080 was much faster than 1080Ti. We only used 25 minutes for the single model plus rollover test, which would have improved our score further with more models. ResNext101 x32+ GCB +DCN Cascade RCNN, ResNext101 x64 +DCN Cascade RCNN, A Guided anchor Cascade RCNN based on ResNext101 x64 +DCN. As for the method used for Fusion, different methods can achieve little difference in effect. The method we adopted is “Weighted Boxes Fusion: Ensembling Boxes for Object Detection Models provides the method, and the fusion threshold is set to 0.8.

Figure 8 WBF effect drawing

4. Parameter effect

Table 2 Parameter Settings

Fig. 9 Variation of precision of A-list

IV. Deployment and use of NAIE platform

1. Platform understanding

Personal understanding of NAIE platform is mainly composed of three parts, local debugging area, cloud storage area, cloud training area. If you have some understanding of the respective functions of these three parts, you can quickly get started.

The local debug area is based on VSCode, associated with a GPU-free server, and can operate from the command line like a normal Linux server for initial deployment and debugging of the environment.

Cloud storage areas mainly store large data and pre-training models. Large files such as pre-training models cannot be transferred directly from the local debug area to the model training area.

The model training area calls the GPU to complete the training of the model, and copies the trained parameter model to the cloud for storage. Only the model stored in the cloud can be downloaded.

2. Model deployment

The deployment of MMDetection is shown as an example.

  • 1) Code upload

Select Naie Upload from the right button. The size of the code upload is limited to about 100M, so it is recommended to delete the pre-training model and some irrelevant files and only retain the core code.

  • 2) Environment deployment

Environment deployment requires writing a requires.txt file in the local code area with the required Python library and version number.

  • 3) Model running

The platform does not support running sh files, so you need to write a py, such as model.py, which uses os.system() to mimic the command line.

In addition, the Moxing package should be called in model.py to save the trained model to the cloud.

In the model training area, select model.py and the required GPU specifications for training.

  • 4) Additional Supplements

It is not possible to upload large files directly through NAIE upload, so you can write a program such as debug.py in the local debugging area, call wget in the program to download the files, and upload the files to the cloud through moxing package. Moxing packets can be used in model.py and transferred to the server during training.

Several contestants finally finished the race and were rewarded. Although the ranking was not particularly good, they still accumulated a lot of experience through the competition. Achievements, they say, can not do without the work force support of huawei NAIE training platform, huawei NAIE training platform provides free V100 and P100 graphics for training, scientific research and take part in the game for us to provide the very big help, modify the code and the training is very convenient, the early stage of the familiar platform can timely solve the problem of or assist in solving. I hope that we can learn from and avoid the pit experience through this sharing.

reference

[1]. Cai Z , Vasconcelos N . Cascade R-CNN: Delving into High Quality Object Detection[J]. 2017.

[2]. Zhang H , Cisse M , Dauphin Y N , et al. mixup: Beyond Empirical Risk Minimization[J]. 2017.

[3]. Solovyev R , Wang W . Weighted Boxes Fusion: ensembling boxes for object detection models[J]. arXiv, 2019.

[4]. P. Wang, X. Sun, W. Diao, and K. Fu, “Fmssd: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, 2019.

[5]. Zhang S , Chi C , Yao Y , et al. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection[J]. 2019.

[6]. Pang J , Chen K , Shi J , et al. Libra R-CNN: Towards Balanced Learning for Object Detection[J]. 2019.

[7]. Deng L , Yang M , Li T , et al. RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation[J]. 2019.

[8]. Ren S , He K , Girshick R , et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6).

[9]. Lin T Y, Piotr, Girshick R, et al. Feature Pyramid Networks for Object Detection[J]. 2016.

[10]. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y.Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773.

X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable ConvNETs V2: More deformable, better results, “In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9308-9316.

Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “CCnet: CRIS-Cross Attention for Semantic Segmentation, “in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 603-612.

[13]. Wang J , Chen K , Yang S , et al. Region Proposal by Guided Anchoring[J]. 2019.

Click on the attention, the first time to understand Huawei cloud fresh technology ~