Based on the study of the Adaptive Network, the paper proposes the Resolution Adaptive Network (RANet) to make the trade-off between the effect and performance. This Network contains multiple subnets with different input resolutions and depths, and the reasoning of difficult and easy samples will automatically use different computational amounts. And the characteristics of the subnets will be fused, from the experimental results, in the performance and speed of a very good trade-off

Resolution Adaptive Networks for Efficient Inference

  • Thesis Address:Arxiv.org/abs/2003.07…
  • Thesis Code:Github.com/yangle15/RA…

Introduction


Deep CNN not only improves performance, but also brings too much computation. Many studies are focused on how to accelerate network, among which adaptive network can be adjusted automatically according to the difficulty of samples. Based on the study of the Adaptive Network, the paper proposes the Resolution Adaptive Network (RANet). The idea is shown in Figure 1. The Network contains multiple subnets with different input resolutions and depths. Otherwise, larger subnets will be used for identification. The characteristics of subnets are not unique, and subnets at the next level will integrate the characteristics of subnets at the next level. From the experiment, the paper has achieved a good trade-off in effect and performance.

Method


Adaptive Inference Setting

Construct an adaptive model containing K classifiers for input imagesIn the firstThe output of the classifier is shown in Formula 1,Is the subnet parameters corresponding to the classifier, some parameters are shared among classifiers,forCategory confidence.

The adaptive network dynamically selects the appropriate computing branch according to the complexity of the picture, that is, if the output of the current classifier reaches the expectation, it will quit. The paper uses the confidence of the output of Softmax to make judgment, as shown in Formula 2 and 3

Overall Architecture

The overall structure of RANet is shown in Figure 2, including Initial Layer andEach subnet contains multiple classifiers. The specific flow program first uses the initial layer to obtain feature maps with different resolutions, and then uses the subnet with the lowest resolution for prediction. If the subnet does not obtain reliable results, the next subnet with a slightly higher resolution is used for prediction, and the process is repeated until reliable results are obtained or the subnet with the maximum resolution is reached.

In the process of repeated iterative prediction, the features of the high resolution layer will be fused with those of the low resolution layer. Although RANet has processed the image from fine to coarse-grained in the initial layer, the subnet will continue to downsample it until the feature map size isScale (represents the minimum resolution generated by the initial layer), and the classifier only adds the size of the last few feature graphsScale’s block.

Network Details

  • Initial Layer

The initial layer is used for generationThe basic characteristics areThe initial layer in Figure 2 contains three features of different sizes. The first feature is generated through the regular-Conv layer, and the following features are generated through the Strided-Conv layer

  • Sub-networks with Different Scales

Sub-network 1 processes feature maps with the lowest resolution, using Figure 3(a)Layer Regular Dense Blocks, output of each layerWill be passed to sub-network 2

Input size isThe scale of the Sub – network() address basic featuresAnd Fusion Blocks in Figure 3(b,c) are used to fuse sub-network (), including two types, one is the type of preserving the size of feature map in FIG. 3B, and the other is the type of reducing the size of feature map in FIG. 3C. Up-sampling of low-dimensional features uses up-conV (Regular-Conv+Bilinear Interpolation) or regular-ConV depending on the size of the current feature. The front and back features are also connected. See Figure 3 for details of the structure.

For input:The scale of the Sub – networkIs established as follows: hypothesisSubnet containsBlocks, block 1 To blockFor Fusion Blocks, features are downsampledTo ensure the output of the feature graphScale classifies the remaining Blocks as regular Dense Blocks.

  • Transition layer

RANet also uses the DeseNet dense transition layer, which is specified asConvolution +BN+ReLU, not shown in Figure 2 for simplicity

  • Classifiers and loss function

The classifier is added in the last several blocks of each subnet. In the training stage, samples will be transmitted to all subnets in sequence. The final loss is weighted accumulation of cross entropy loss calculated by each classifier, the specific logic and weight are the same as MSDNet

Resolution and Depth Adaptation

The overall structure of RANet is very similar to that of MSDNet. This paper compares RANet with MSDNet. The classifier of MSDNet is placed in the path with the lowest resolution. RANet, on the other hand, uses subnets to make inferences from low to high sizes, which is better adapted to combine depth and resolution.

Experiments


Anytime Prediction

Limit the computation amount of single graph FLOPs and directly record the performance and computation amount of all classifiers in the adaptive network for comparison

Budgeted Batch Classification

To limit the total resources of a batch of images, you need to set a threshold based on the total resources to control the early exit of reasoning and record the performance of the adaptive network and the corresponding resource limit

Visualization and Discussion

Figure 7 shows some examples of RANet recognition. Easy refers to the samples that can be recognized by the classifier in the previous stage; Hard refers to the samples that fail to be recognized in the former stage but succeed in the latter stage. The main challenges are multi-target, small target and objects with no obvious interclass characteristics

Conclusion


Based on the study of the Adaptive Network, the paper proposes the Resolution Adaptive Network (RANet) to make the trade-off between the effect and performance. This Network contains multiple subnets with different input resolutions and depths, and the reasoning of difficult and easy samples will automatically use different computational amounts. In addition, the features of subnets will be fused. From the experimental results, the performance and speed of trade-off is very good

Refer to the content

  • MSD: Multi-Self-tribuhzer Learning via Multi-Classifiers Within Deep Neural Networks – arxiv.org/abs/1911.09…





If this article is helpful to you, please click a like or watch it. For more content, please follow the wechat public account.