In this paper, the search space is transformed from the whole network to the convolutional unit (cell), and then stacked into a new network family NASNet according to the configuration. Not only the search complexity is reduced from 28 days to 4 days, but also the search structure is extensible, in both small model and large model scenarios can use fewer parameters and computational effort to exceed the human designed model, SOTA

Learning Transferable Architectures for Scalable Image Recognition

  • Thesis Address:Arxiv.org/abs/1707.07…

Introduction


In ICLR 2017, the author used reinforcement learning to search neural network architecture and achieved a good performance. However, this search method requires a lot of computing resources. Cifar-10 requires 800 Gpus to search for 28 days, making it almost impossible to search on large data sets. Therefore, the paper proposes to search on proxy dataset and then migrate the network to ImageNet. The main highlights are as follows:

  • The basis of migration lies in the definition of search space. Since common networks are stacked with repeated structures, the paper changes the search space from the entire network to cells, and then stacks the cells into networks according to the Settings. In this way, not only the search speed is fast, but also the unit structure is relatively universal and transferable
  • The best structure found in the paper was called NASNet, which reached SOTA at that time, and improved the top-1 accuracy by 2.4% in CIFAR-10, while the migration to ImageNet improved 1.2%
  • By stacking different numbers of cells and modifying the number of convolutional kernels in the cells, we can obtain NASNets adapted to various computing requirements. The accuracy of the smallest NASNet in ImageNet top-1 is 74.0%, 3.1% higher than that of the best mobile terminal model
  • Image features learned by NASNets are useful and can be transferred to other visual tasks. Ftf-rcnn using the largest NASNets can directly improve the SOTA by 4% to 43.1%mAP

Method


The neural network search method in this paper follows the classical reinforcement learning method. For details, please refer to my previous paper interpretation. The process is shown in Figure 1. In short, RNN is used to generate the network structure, and then trained on the data set. The weight of RNN is adjusted according to the accuracy rate after convergence. It is observed in this paper that current excellent network structures, such as ResNet and Inception, are actually stacked with repeated modules. Therefore, RNN can be used to predict generic convolutional modules, which can be combined and stacked into a series of models. This paper mainly consists of two types of cells:

  • Normal Cell, the convolution unit is used to return feature graphs of the same size,
  • Reduction Cell, the convolution unit is used to return the feature graph whose width and height are reduced twice

Figure 2 shows the network framework of CIFAR-10 and ImageNet, and the image input is 32×32 and 299×299 respectively. The Reduction Cell and Normal Cell can be of the same structure, but the paper finds that the independent structure is better. When the size of the feature graph decreases, the number of convolution kernels will be manually doubled to roughly maintain the overall number of feature points. In addition, the number of repetition N and the initial number of convolution kernels are set manually for different classification problems

The structure of the cells is defined in the search space, and the outputs of the first two low-level cells are selectedandAs the input, the Controller RNN predicts the remaining convolution unit structure blocks. Single block prediction is shown in Figure 3. Each cell is composed of B blocks, and each block contains 5 prediction steps. A SoftMax classifier selects the corresponding operation for each step, and the block’s prediction is as follows:

  • In Step 1.And select one of the previous block outputs in the cell as the input for the first hidden layer
  • Step 2, select the input of the second hidden layer, such as Step 1
  • Step 3, select the operation for the input in Step 1
  • Step 4, select the operation for the input in Step 2
  • Step 5, select the operation to merge the output of Step 3 and Step 4, and generate a new hidden layer for subsequent blocks to choose from

The operations selected in Step 3 and 4 include some of the above mainstream convolutional network operations, while the merging operations in Step 5 mainly include two types: 1) Element-wise addition 2) concatenation. Finally, all unused hidden layer outputs are concatenated together as unit outputs. The controller RNN runs altogetherSecond forecast, formerAs a Normal Cell, and the otherAs the Reduction Cell

In the training of RNN, both reinforcement learning and random search can be used. The experiment finds that random search is only slightly worse than the network obtained by reinforcement learning, which means:

  • NASNet’s search space is well constructed, so random searches can perform well
  • Random search is a baseline that’s hard to break

Experiments and Results


The Controller RNN uses Proximal Policy Optimization(PPO) to train and performs distributed training on subnetworks in The form of global workqueue. A total of 500 P100 are used in The experiment to train The networks in The queue. The entire training cost four days, compared to the previous version of the 800 K40 training 28 days, training speed up more than seven times, the effect is better

Figure 4 shows the structure of the best-performing Normal Cell and Reduction Cell, which was searched on CIFAR-10 and then migrated to ImageNet. After obtaining the convolutional unit, several hyperparameters need to be modified to build the final network. First, the number of unit repetition N is added, and then the number of convolution kernels of the initial unit is added, for exampleRepeat 4 times for the unit and the number of convolution kernels for the initial unit is 64

For details of the search, please refer to Appendix A of the paper. It should be noted that ScheduledDropPath, an improved version of DropPath, is proposed in the paper as regularization method. DropPath randomly discarded the path of the unit with a certain probability during training (such as the edge connected by the yellow box in Figure 4), but it was not effective in the case of the paper. Therefore, the paper uses ScheduledDropPath to linearly increase the probability of discarding during training

Results on CIFAR-10 Image Classification

Nasnet-a combined with random clipping data enhancement achieves SOTA

Results on ImageNet Image Classification

In this paper, the structures learned from CIFAR-10 are transferred to ImageNet, and the largest model reaches SOTA(82.7%), which is consistent with SENet accuracy, but the number of parameters is greatly reduced

Figure 5 visually shows how the NASNet family compares to other human-built networks. NASNet is better than human-built networks in all aspects

The paper also tested the network accuracy of mobile terminal configuration, which requires that the network parameters and calculation amount should be small enough, and NASNet still has a very eye-catching performance

Improved features for object detection

In this paper, the performance of NASNet in other visual tasks was studied, and NASNet was tested on COCO training set as the trunk of FtP-RCNN. Compared with the mobile network, the mAP reaches 29.6%, an improvement of 5.1%. With the best NASNet, the mAP is 43.1%mAP, a 4.0%mAP improvement. The results show that NASNet can provide richer and more general features, and thus perform well in other visual tasks

Efficiency of architecture search methods

This paper compares the performance of network search methods, mainly reinforcement learning method (RL) and random search method (RS). For the best networks, the accuracy of RL search is 1% higher than that of RS overall, while for the overall performance (such as top-5 and top-25), the two methods are close. Therefore, the paper considers that although RS is a feasible search strategy, RL performs better in NASNet search space

CONCLUSION


Based on the previous research on neural network architecture search using reinforcement learning, the paper transformed the search space from the whole network to the convolution unit (cell), and then stacked into a new network NASNet according to the setting. Not only reduces the complexity of the search, accelerate the search process, from the original down to four days, 28 days and the structure of the search out has the scalability, respectively in the small model and big scenarios can use less quantity and amount of calculation to surpass the human design model, achieve SOTA in addition, due to the clever design of the search space and model structure, In this way, the structure learned from small data sets can be transferred to large data sets, and the universality is better. And the performance of this network in the field of target detection is also quite good

   

Appendix NASNet-B & NASNet-C

There are two other structures in the paper, NASNET-B and NASNET-C, whose search space and methods are somewhat different from NASNET-A. If you are interested, please refer to Appendix of the original text





If this article is helpful to you, please click a like or watch it. For more content, please follow the wechat public account.