ExtremeNet detects the four poles of the target, and then combines them in a geometric way for target detection, which has the same performance as other traditional detection algorithms. ExtremeNet’s detection method is very unique, but it contains many post-processing methods, so there is a lot of room for improvement. If you are interested, you can go to the error analysis part in the experiment of the paper





Source: Xiaofei’s algorithm engineering notes public number

Bottom-up Object Detection by Grouping Extreme and Center Points

  • Thesis Address:https://arxiv.org/abs/1901.08043
  • Paper Code:https://github.com/xingyizhou/ExtremeNet

Introduction


In target detection, the commonly used method defines the target as a rectangular box, which usually brings a lot of background information that hinders detection. For this reason, ExtremeNet is proposed in this paper to locate the target by detecting the four poles of the target, as shown in Figure 1. The overall algorithm is improved based on the idea of CornerNet. Five heat maps are used to predict the four poles and the center area of the target respectively. The poles of different heat maps are combined, and the combination meets the requirements by the value of the combined geometric centers on the heat map of the center points. In addition, extremeNet detection poles can cooperate with DextR network to predict target segmentation information.

ExtremeNet for Object detection


ExtremeNet uses HourGlassNet to detect the class-knowable key points, following the training steps of CornerNet, the loss function and the prediction of the offset value, in which the prediction of the offset value is class-knowable and the center point does not contain the offset value. The main network outputs a total of $5\times C$heat map, $4\times 2$offset value feature map, $C$for the number of categories, and the overall structure and output are shown in Figure 3. When the poles are extracted, they are combined according to geometric relationships.

Center Grouping

The poles are located in different directions of the target, so the combination will be very complex. The paper considers that the combination using embedding vector like CornerNet will lack global information, so it proposes Center Grouping for pole Grouping.

The process of Center Grouping is shown in Algorithm 1. The peak points on the heat map of four pole are obtained in the first place. The peak points should meet two requirements: 1) The value of the peak points should be greater than the threshold value $\ TAU_P $2), and the local maximum value of the peak points should be greater than eight surrounding points. After got the peak point on the heat map, traverse the combination of each peak point, to satisfy the geometrical relationship between the peak point combination ($t $$b $, $r $, $l $), calculate the geometric center of $c = (\ frac {l_x + t_x} {2}, \ frac {t_y + b_y} {2}) $, If the value of geometric center meet $\ hat {} Y ^ {(c)} _ {c_x, c_y} \ ge \ tau_c $, is that the peak point combination conforms to the requirements.

Ghost box suppression

In the case of three targets of the same size distributed equally apart, Center Grouping may produce misrepresentation with high confidence. At this time, the target in the middle may have two situations, one is the correct prediction, the other is the wrong combination with the object next door output, the paper said the second case of the prediction box is ghost box. In order to solve this situation, the paper adds a soft-NMS post-processing method. If the sum of the confidence of the prediction boxes contained in a prediction box is greater than three times, the confidence of the prediction box is divided by two, and then the NMS operation is carried out.

Edge aggregation

Poles are sometimes not unique. If the target has a horizontal or vertical boundary, then all the points on the edge are poles, and the predicted value of points on such a boundary by the network will be small, which may lead to missed detection of the poles.

Edge aggregation is used to solve this scenario. Fraction aggregation is carried out in the vertical direction for the local maximum points on the left and right heat maps, while fractional aggregation is carried out in the horizontal direction for the local maximum points on the upper and lower heat maps. The monotone decreasing fractions in the corresponding direction are aggregated until the local minimum point in the aggregation direction is encountered. Suppose $m $for local maximum points, $N ^ {} (m) _i = \ hat {Y} _ {m_x + I, m_y} $for horizontal point, defining $i_0 < 0 $and $0 < i_1 $on both sides of the recent local minimum, Namely $N ^ {(m)} _ {} i_0-1 > N ^ {(m)} _ {i_0} $and $N ^ {(m)} _ {i_1} < N ^ {(m)} _ {i_1 + 1} $, The aggregation of peak value update $\ tilde _m = {Y} \ hat {} Y _m + \ lambda_ {aggr} {\ sum} ^ {i_1} _ {I = i_0} N ^ {} (m) _i $, where $\ lambda_ {aggr} $for aggregate weights, is set to 0.1, the overall effect as shown in figure 4.

Extreme Instance Segmentation

The poles contain more target information than the BBox, after all, twice as much tagging information (8 vs 4). Based on the four poles and the Bbox, this paper proposes a simple method to obtain the mask information of the target. Firstly, a line of 1/4 Bbox boundary length is expanded with the pole as the center, and if the line is larger than the Bbox, it is truncated. Then, the four lines are connected head and tail to get an octagon, as shown in Figure 1. Finally, the DEXTR(Deep Extreme Cut) method was used to further obtain the mask information. The DEXTR network could convert the pole information into segmentation information. Here, the octagonal screenshots were directly input into the pre-trained DEXTR network.

Experiments


In addition, the thesis makes error analysis on extremeNet and replaces the output of each module with GT, which finally reaches 86.0AP.

It is compared with other SOTA methods.

Instance segmentation effect.

Conclusion


ExtremeNet detects the four poles of the target, and then combines them in a geometric way for target detection, which has the same performance as other traditional detection algorithms. ExtremeNet’s detection method is very unique, but it contains many post-processing methods, so there is a lot of room for improvement. If you are interested, you can go to the error analysis part in the experiment of the paper.





If you found this article helpful, please feel free to like it or read it again


For more information, please pay attention to the WeChat public number [Xiaofei’s algorithm engineering notes]