FSAF makes an in-depth analysis of the selection of FPN layer during training, and embedding the original network in the form of ultra-simple anchor-free branching almost has no impact on the speed. Therefore, the optimal FPN layer can be selected more accurately, bringing a good improvement in accuracy





Source: Xiaofei’s algorithm engineering notes public number

Feature Selective anchor-free Module for single-shot Object Detection

  • Thesis Address:https://arxiv.org/abs/1903.00621
  • Paper Code:https://github.com/zccstig/mmdetection/tree/fsaf

Introduction


The primary problem of target detection is size change. Many algorithms use FPN and Anchor Box to solve this problem. In terms of positive sample judgment, the FPN layer used for prediction is generally determined first according to the size of the target. The larger the target, the higher FPN layer will be used, and then the further judgment will be made according to the IOU of the target and anchor box. However, such a design will bring two limitations: head-slapping feature selection and IOU-based anchor sampling.

As shown in Figure 2, the middle anchor is selected for 60×60, while the smallest anchor is selected for 50×50 and 40×40. The selection of anchor is based on the rules made by people according to experience, which may not be the optimal choice in some scenarios.

In order to solve the above problems, this paper proposes a simple and efficient feature selection method, FSAF(Feature Selective Anchor-Free), which can select the optimal layer for optimization in each round of training. As shown in Figure 3, FSAF adds anchor-free branch to each layer of FPN, including classification and regression. During training, the most appropriate FPN layer is selected for training according to the predicted results of anchor-free branch. The final network output can simultaneously integrate the anchor-free branch results of FSAF and the predicted results of the original network.

Network Architecture


The network result for FSAF is very simple, as shown in Figure 4. On the original network structure, FSAF introduces two additional convolution layers for each layer of FPN, which are used to predict Anchor-free classification and regression results respectively. In this way, anchor-free and anchor-based methods can be used for joint prediction in the case of shared features.

Ground-truth and Loss


For the target $b = (x, y, w, h) $, when training can be mapped to any layer FPN $P_l $, mapping areas is $b ^ l_p = [x ^ l_p, y ^ l_p, w ^ l_p, h ^ l_p] $. In general, $b^l_p=b/2^l$. Define efficient frontier $b ^ l_e = [x ^ l_e, y ^ l_e, w ^ l_e, h ^ l_e] $and ignore boundary $b ^ l_i = [x ^ l_i, y ^ l_i, w ^ l_i, h ^ l_i] $, can be used to define characteristics are in this picture, neglected area as well as the negative samples in the area of sample area. Both the effective boundary and the ignored boundary are proportional to the mapping results, and the ratios are $\epsilon_e=0.2$and $\epsilon_i=0.5$, respectively. The final classification loss is the sum of the loss values of all positive and negative samples divided by the number of positive samples.

Classification Output

The classification results include $K$dimension, and the target is mainly set in the corresponding dimension. The sample definition is divided into the following three cases:

  • The regions within the effective boundary are positive sample points.
  • Areas that ignore boundaries to valid boundaries do not participate in training.
  • Ignore that the boundary is mapped to the adjacent feature pyramid layer, and the region within the mapped boundary does not participate in the training
  • The rest of the regions are negative sample points.

Classified training uses Focal Loss, $\alpha=0.25$, $\gamma=2.0$, and the complete classification loss is the sum of the loss values of all positive and negative areas divided by the effective area points.

Box Regression Output

The output of the regression results is four offset dimensions that have nothing to do with classification, and only the points within the valid region are regressed. For effective regional position $$(I, j), the mapping target said $d ^ l_ = {I, j} [d ^ l_ {t_ {I, j}}, d ^ l_ {l_ {I, j}}, d ^ l_ {b_ {I, j}}, d ^ l_ (r_ {I, j}}] $, respectively, the current position to $b ^ l_p $boundaries of distance, The corresponding 4-dimensional vector at this position is $d^l_{I,j}/S$. $S=4.0$is the normalized constant. IOU loss was adopted for regression training, and the loss of the complete anchor-free branch was the mean of the loss values of all effective areas.

Online Feature Selection


The design of anchor-free allows us to use any FPN layer $P_L $for training. In order to find the optimal FPN layer, the FSAF module needs to calculate the prediction effect of each FPN layer on the target. For classification and regression, calculate the Focal Loss loss and IOU Loss loss in the effective area of each layer respectively:

After the results of each layer are obtained, the layer with the smallest loss value is taken as the FPN layer of the training wheel:

Joint Inference and Training


Inference

Since FSAF made few changes to the original network, the results of anchor-free and anchor-based branches were filtered slightly during reasoning, and then NMS was combined.

Optimization

Complete the loss function of integrated anchor – -based branch and branch, the anchor – free $L = L ^ {b} + \ lambda (L ^ {af_ {CLS}} + L ^ {af_ {reg}}) $

Experiments


Comparison of various structures and FPN layer selection methods.

Accuracy vs. speed of reasoning.

Contrast with SOTA method.

Conclusion


FSAF makes an in-depth analysis of the selection of FPN layer during training, and embedding the original network in the form of ultra-simple anchor-free branching almost has no impact on the speed. Therefore, the optimal FPN layer can be selected more accurately, bringing a good improvement in accuracy. It should be noted that although the previous hard selection method is abandoned, there are still some artificial Settings, such as the definition of the effective region, so this method is not the most perfect.





If you found this article helpful, please feel free to like it or read it again


For more information, please pay attention to the WeChat public number [Xiaofei’s algorithm engineering notes]