Abstract: In general target detection algorithms, empty convolution can effectively improve the receptive field of the network, and then improve the performance of the algorithm. In this paper, a variant of empty convolution and its corresponding search method for empty convolution are proposed to fully explore the potential of empty convolution and further improve the performance of network models.

This article is shared from huawei cloud community “paper interpretation series ten: empty convolution framework search”, the original author: I want to be quiet.

preface

Empty convolution is a variant of standard convolutional neural network operators, which can control effective receptive fields and process large scale variances of objects without introducing additional calculations. However, there is little discussion in the literature on how to design and adjust the cavity convolution for different data to obtain a better receptive field so as to improve the model performance. In order to fully exploit its potential, this paper proposes a new variant of cavity convolution, namely inception(dilated) convolution, in which convolution has independent voids between different coaxials, channels and layers. At the same time, this paper proposes a simple and efficient cavity search algorithm (EDO, Effectivedilation Search) based on statistical optimization, which adaptively searches the training data friendly cavity convolution configuration method. The search method operates at zero cost and is applied extremely quickly to large data sets.

methods

Both the size of the input image and the mood object are associated with different conditions in different tasks. The input size of image classification is relatively small, while the input size of target detection is relatively large, and the target range is also large. Even for the same task of fixed network, the optimal solution ERF of convolution at a certain layer is not the same as the labeled convolution. Therefore, in order to adapt to the requirements of different ERFS, it is necessary to provide a general ERF algorithm for different tasks.

This paper proposes a variant of dilatative convolution, Inception convolution, which includes a variety of dilatative modes as shown below:

Incetption convolution provides a dense range of possible ERFs. In this paper, we provide an efficient inflation optimization algorithm (EOD), in which each layer of the hypernetwork is a standard convolution operation, which contains all possible inflation modes. For the selection of each layer, a pre-trained weight is used to solve the selection problem by minimizing the expected error of the original convolution layer and the convolution with the selected expansion mode. The specific process is shown in the figure below:

The figure above shows the algorithm overview of EDO. Taking RESnet50 as an example, we first train and obtain a RES50 with a (2dmax + 1) × (2Dmax + 1) convolutional kernel on the training data. In this example, the supernet kernel is 5*5, so dmax=2. Then, for each filter output of the convolution operation, we calculate the L1 error from the expected output and select the smallest (E=3 in this case). Finally, the filter is rearranged so that the same empty convolution is arranged together, which becomes our inception convolution.

The experimental results

Empirical results show that the proposed method achieves consistent performance improvement over the extensive Baseline test. For example, the ftF-RCNN mAP on MS-COCO was improved from 36.4% to 39.2% by simply replacing the 3×3 standard convolution in the ResNet-50 trunk with Inception Conv. In addition, using the same alternative approach in the ResNET-101 backbone significantly increased AP scores from 60.2% to 68.5% in the COCO VAL2017 AP score in bottom-up human posture estimation.

Click to follow, the first time to learn about Huawei cloud fresh technology ~