SACCAdenet carries out preliminary target positioning based on the center point feature, and then fine-tune the prediction frame by using the corner point feature and the center point feature of the preliminary prediction frame. The overall idea is similar to the two-stage target detection algorithm, which converts the regional feature of the second stage prediction frame precision call into point feature. SACCADenet is commendable in precision and speed, and the overall idea is very good





Source: Xiaofei’s algorithm engineering notes public number

SACCADENET: A Fast and Accurate Object Detector

  • Thesis Address:https://arxiv.org/abs/2003.12125
  • Paper Code:https://github.com/voidrank/SaccadeNet

Introduction


In neuroscience, instead of looking at the scene, humans look around for information-rich areas to help locate a target. Inspired by this mechanism, this paper proposes SACCAdenet, which can efficiently focus on the key points of the target with rich information and conduct target positioning from coarse-grained to fine-grained.

The structure of SACCAdenet is shown in Figure 2. Firstly, the central location and corner location of the target are preliminarily predicted, and then regression optimization is carried out based on the characteristics of the four corner locations and central location. SACCAdenet consists of four modules:

  • Center Attentive Module(Cent-ATTN), central location and types of targets to predict.
  • Attention Transitive Module(attn-trans), which preliminarily predicted the angular position corresponding to each central position.
  • Aggregation Attachment Module (Aggregation- ATTN) was used to optimize the prediction box with the features of central location and corner location.
  • Corner Attentive Module(Corner-ATTN) to enhance the target boundary characteristics of the trunk network.

The overall idea of SACADenet is quite good, which is somewhat similar to the two-stage target detection scheme, transforming the prediction box regression in the second stage from regional features to point features.

Center Attentive Module


The center-attn module consists of two simple convolution layers to convert the feature map output by the backbone network into a central point heat map, which can be used to predict the central location and category of all targets in the image. The module of GT like CornerNet Settings, using the gaussian kernel $e ^ {\ frac {| | X – X_k | | ^ 2} {2} {\ sigma} ^ 2}} $GT position to scattering, $\ sigma $for 1/3 of the radius, the radius is determined by the target size, Make sure the points within the radius produce a prediction box with an IOU of at least 0.3. In addition, the module’s loss function combines Focal Loss:

$p_{I,j}$is the fraction of position $(I,j)$on the heat map, and $y_{I,j}$is the corresponding GT value.

Attention Transitive Module


The output size of attn-trans module is $w_f\times h_f\times 2$, and it predicts the width and height of the prediction box corresponding to each position, Then according to its center position $(I, j) $calculate its corresponding angular point location $(I – w_ {I, j} / 2, j – h_ {I, j} / 2) $, $(I – w_ {I, j} / 2, j + h_ {I, j} / 2) $, $(I + w_ {I, j} / 2, J – h_ {I, j} / 2) $, $(I + w_ {I, j} / 2, j + h_ {I, j} / 2) $, the use of L1 return loss for training. Based on the center-ATTN module and the ATTN-trans module, SACCAdenet can preliminarily predict the detection results of targets. In addition, the source code of the paper provides the offset value of the additional prediction center point in this module. For the misalignment problem caused by subsampling, the offset value is also trained using L1 regression loss, which is turned on by default.

Aggregation Attentive Module


Aggregation- ATTN is a lightweight module used to fine-tune the prediction box and output a more accurate prediction box. Aggregation- ATTN module acquires the corner and Center points of the target from the ATTN-trans module and cenTER-ATTN module, and samples the features of the corresponding positions with bilinear interpolation from the feature map output from the trunk network, and finally returns to the modified values of width and height. The whole module is trained with L1 loss.

Corner Attentive Module in Training


In order to extract information-rich Corner features, additional corner-attn branches are added in the training process to transform the trunk network features into four-channel heat maps corresponding to the four Corner points of the target. As such, the branch is class-agnostic for training based on Focal Loss and Gaussian heat maps. This module can be iteratively fine-tuned for several times, similar to Cascade R-CNN, which is also compared in the experimental part of the paper.

Relation to existing methods


Abstract: There are two kinds of keypoint-based target detection methods in common usage, one is the Edge KeyPoint-based Detectors and the other is the Center KeyPoint-based Detectors. Abstract: In common idu detectors of corner points or poles, the combination of key points is used in target positioning, but this kind of algorithm cannot get global information of the target: A) Corner feature itself contains less target information, so additional central feature is needed for feature enhancement. B) Corner points are usually located on the background pixel and contain less information than other key points. Although corners are also used by SACCAdenet for target prediction, SACCAdenet directly predicts the target from the central key points, so that global information of the target can be obtained and time-consuming key point combination can be avoided.

The Centers KeyPoint-based Detectors are used in target prediction of central key points, with heat map of central point output and boundary regression directly. But the center point is usually far from the target boundary, and it may be difficult to predict the accurate target boundary, especially for large targets. In addition, the corner points and key points are closest to the boundary and contain a lot of local accurate information. The lack of corner information may be unfavorable to the prediction result, and SACCADenet just fills this defect to make more accurate boundary prediction.

Experiments


Compared with SOTA target detection algorithm.

Comparison of attn-trans module and aggregation-attn module.

Comparison of iteration times of corner-attn module.

Conclusion


SACCAdenet carries out preliminary target positioning based on the central point feature, and then fine-tune the prediction frame by using the corner point feature and the central point feature of the preliminary prediction frame. The overall idea is similar to the two-stage target detection algorithm, and the regional features of the second stage prediction frame precision call are transformed into point features. SACCADenet is commendable in precision and speed, and the overall idea is very good.





If you found this article helpful, please feel free to like it or read it again


For more information, please pay attention to the WeChat public number [Xiaofei’s algorithm engineering notes]