The core of CentripetalNet lies in the new corner matching method. An additional centripetal offset value is learned, and a corner with a sufficiently small offset value is matched. Compared to the matching method of embedding vectors, this method is more robust and better interpreted. In addition, the cross star deformation convolution proposed in this paper also fits the scene of corner target detection well and enhances the feature of corner points





Source: Xiaofei’s algorithm engineering notes public number

CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection

  • Thesis Address:https://arxiv.org/abs/2003.09119
  • Paper Code:https://github.com/KiveeDong/CentripetalNet

Introduction


CornerNet opens a new method of target detection, which detects the corner to locate the target. In corner matching, additional embedding vector is added, and the corner with a small distance of the vector is matched. However, the paper argues that this method is not only difficult to train, but also lacks the position information of the target only by using the surface of the object. For similar objects, the embedding vector is difficult to be specifically expressed. As shown in Figure 1, similar objects cause misframing. For this reason, CentripetalNet is proposed in this paper. The core of this paper is to put forward a new way of matching corner points, and to learn an additional centripetal offset value. Corner points whose offset value is small enough are matched. Compared with embedding vectors, this method is more robust and interpretive. In addition, the paper also proposes the convolution of cross star deformation, which can accurately sample the features of key positions in feature extraction for the scene predicted by corner points. Finally, the instance splitting branch is added, which can extend the network to the instance splitting task.

CentripetalNet


As shown in Figure 2, CentripetalNet contains four modules, which are:

  • Corner Prediction Module: Used to generate candidate cornerpoints, this section is the same as CornerNet.
  • Centripetal Shift Module: Predicts the Centripetal offset of corner points and groups the similar corner points according to the offset results.
  • Cross-star Deformable Convolution: Transformable Convolution for corner scenes can effectively enhance the features of corner positions.
  • Instance Mask Head: similar to MASKRCNN, the addition of Instance segmentation branch can improve the performance of target detection and increase the ability of Instance segmentation.

Centripetal Shift Module


Centripetal Shift

For $bbox ^ I = (TLX ^ I, tly ^ I, BRX ^ I, bry ^ I) $, geometric center is $(CTX ^ I, cty ^ I) = (\ frac {TLX ^ I + BRX ^ I} {2}, \ frac {tly ^ I + bry ^ I} {2}) $, Define the centripetal offset of upper left and lower right points as:

The $log$function is used to reduce the range of values for the centripetal offset, making training easier. During training, since non-GT corner points need to be combined with corner deviation to calculate centripetal deviation, which is quite complicated, as shown in Figure A, only smooth L1 loss is used for GT corner points for centripetal deviation training:

Corner Matching

Corner points belonging to the same group should have center points close enough, so after obtaining centripetal and corner offsets, the corresponding center points can be used to determine whether the two corners correspond. First, the corners satisfying the geometric relationship $TLX < BRX \wedge < bry$are combined into a prediction box, and the confidence of each prediction box is the mean of the confidence of the corner points. Next, as shown in Figure C, define the center area of each prediction box:

$R_{central}$

$0 < \mu \le 1$is the proportion of the center region corresponding to the edge length of the predicted box. According to the centripetal offset, the center point of the upper left corner $(TL_ {CTX}, TL_ {cty})$and the center point of the lower right corner $(BR_ {CTX}, BR_ {cty})$are calculated. Computing satisfy regional central $(tl ^ j_ {CTX}, tl ^ j_ {cty}) \ in R ^ j_ {central} \ wedge (br ^ j_ {CTX}, br ^ j_ {cty}) \ in R ^ j_ {central} $predict box weights:

As can be seen from Equation 5, the closer the distance of the center point corresponding to the corner point is, the higher the weight of the prediction box will be. For the prediction box that does not meet the geometric relation of the center point, the weight will be directly set to 0. Finally, the right to use value will carry out weighted output to the confidence.

Cross-star Deformable Convolution


In order to allow the corners to perceive the position information of the target, Max and sum are used for the horizontal and vertical transmission of the target information by the coner pooling method. As a result, the cross-star phenomenon exists in the output feature map. As shown in Fig. 4A, the boundary of the cross-star contains rich context information. In order to further extract the characteristics of the boundary of a cross star, not only a larger receptive domain is needed, but also a special geometric structure needs to be adapted to it. Therefore, this paper proposes the deformation convolution of a cross star.

However, not all boundary features are useful. For the upper-left corner, because the upper-left boundary feature of the cross star is outside the target, it is relatively useless for the upper-left corner. Therefore, the thesis uses the guiding shift to show the learning of the guiding offset field. The offset boot is shown in Figure B. The offset value is obtained through three convolutional layers. The first two convolutional layers transform the output of the Corner pooling. Supervised learning can be obtained through the following loss function:

$\delta$for offset boot

The third layer of convolution maps the feature into the final offset value, which contains the context and geometric information of the target.

The paper visualizes different sampling methods, and it can be seen that the effect of the proposed convolution of cross star deformation is in line with expectations, and the sampling points corresponding to the upper left point are all the lower right boundary of the cross star.

Instance Mask Head


In order to obtain the results of instance segmentation, the test results before soft-NMS were taken as the candidate box, and the mask prediction was performed using the full convolutional network. In order to ensure that the detection module can provide effective candidate boxes, CentripetalNet is pre-trained for several rounds, then Top-K candidate boxes are RoIAlign to get features, four convolutional layers are used to extract features, and finally deconvolutional layers are used for upsampling. Cross entropy loss is carried out on each candidate box during training:

Experiment


The complete loss function is:

$L_{det}$and $L_{off}$are defined as CornerNet to predict box loss and corner offset loss, with $\alpha$set to 0.005.

Target detection performance comparison.

Example segmentation performance comparison.

CornerNet CenterNet/CentripetalNet visual contrast.

Conclusion


The core of CentripetalNet lies in the new corner matching method. An additional centripetal offset value is learned, and a corner with a sufficiently small offset value is matched. Compared to the matching method of embedding vectors, this method is more robust and better interpreted. In addition, the cross star deformation convolution proposed in this paper also fits the scene of corner target detection well and enhances the feature of corner points.





If you found this article helpful, please feel free to like it or read it again


For more information, please pay attention to the WeChat public number [Xiaofei’s algorithm engineering notes]