In this paper, the hyperparameter optimization method AABO is proposed. The core of this method is based on Bayesian optimization and sub-sample method, which can search for the optimal anchor setting adaptingly. According to the experimental results, AABO can improve the performance of SOTA target detection method by 1.4% ~ 2.4% just by optimizing anchor setting

Source: Xiao Fei’s algorithm engineering notes public account

AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling

  • Address:Arxiv.org/abs/2007.09…

Introduction


At present, mainstream target detection algorithms use anchor boxes of various shapes as the initial prediction, and then perform regression adjustment on anchor box. The configuration of Anchor box is a very important hyperparameter in detection algorithm. Generally speaking, the configuration of Anchor box is simply specified by human, such as the nine shapes of Faster R-CNN classic, or the data set can be analyzed by K-means just like YOLO to obtain the specific configuration. In order to manually set the hyper-parameter link, in recent years, there are many researches on hyper-parameter optimization (HPO), and the most effective methods are Bayesian optimization (BO) and Bandit based strategy. After analyzing the advantages and disadvantages of current methods, this paper proposes an adaptive Anchor box optimization method AABO, which is based on Bayesian optimization and sub-sample method, and can automatically and efficiently search for the optimal Anchor setting and squeeze out the potential of mainstream target detection algorithm.

Relative Method

Here is a brief introduction to the three hyperparameter search methods often mentioned in the paper, so as to facilitate the following understanding.

Bayesian optimization

The verification of hyperparameters usually requires model training, which will consume a lot of time. Therefore, the core of Bayesian optimization is to use the prior function to proxy the target model, which is usually a probability distribution model. After the alternative model is obtained, acquisition function is used to select a set of appropriate hyperparameters from the candidate set for testing. Acquisition function should be able to balance exploitation and exploration in a good way, while test is to use target model for normal training and verification. Finally, the current results are added to the observation data for training the alternative model, and the above operations are repeated.

The complete process of Bayesian optimization is shown above. In each iteration, a set of hyperparameters is obtained based on the alternative model and the acquisition function, and then the target model is used for verification. Finally, the verification results are added to the observation data set and the alternative model is updated.

Hyperband

The bandit-based method searches for the optimal hyperparameters with efficient strategies when the resources are limited. The resources can be time, number of iterations, etc., while the Hyperband is a classic Bandit-based method. Hyperband in Successive Halving algorithm on the basis of the extension, every time a batch of super parameters selected several rounds of iteration, each round of iteration to distribute resources BBB evenly to verify super parameter combination, at the end of each round retain 1 / eta 1 / \ eta1 / eta parameter combination for the next round.

The complete process of Hyperband is shown above. RRR is the maximum resource that can be allocated for a single Hyperband parameter group, which contains two loops. The outer loop is responsible for controlling the initial number of resources that can be allocated for each verification, RRR, and the number of groups that can be verified, NNN. The inner cycle is followed with the Successive SSS iterations with the Successive SSS iterations to increase the allocated resources of each group step by step and retain the optimal 1/η1/\eta1/η group each time.

BOHB

In fact, the above two classical hyperparameter methods both have their own advantages and disadvantages. Although The bayesian optimization search is efficient, it is easy to fall into the local optimal solution, while the Hyperband search is comprehensive, but the efficiency is not high enough. Therefore, BOHB combines Bayesian optimization and Hyperband for hyperparameter optimization.

The complete process of BOHO is shown above, which can be simply thought of as replacing random sampling of the Hyperband with bayesian optimization for sampling, and then adding Hyperband’s hyperparameter combination and its corresponding output to the observed data for updating the replacement model. It should be noted that BOHO’s alternative model is a multidimensional kernel density Estimator (KDE) model, similar to TPE(Tree Parzen Estimator). As mentioned in the paper, BOHO has a serious problem. For samples that are difficult to learn, it usually requires a long training cycle. However, as BOHO uses HyperBand for rapid verification, it may not be able to fully measure the true accuracy of hyperparameters, leading to deviation in the final results.

Preliminary Analysis


Default Anchors Are Not Optimal

A random sample of 100 different anchor Settings, each containing 3 sizes and 3 aspect ratios, was then performed against the Faster R-CNN default Anchor configuration. The results are shown in the figure above, with the red line showing the performance of the default Settings. As you can see, the default Settings are not optimal.

Anchors Influence More Than RPN Structure

Use BOHB to search both the RPN head structure and the Anchor setting. The search space for RPN head is shown in the figure above.

The results are shown in the table above, and you can see that the performance improvement of anchor set search is somewhat higher than that of RPN head search.

Search Space Optimization for Anchors


By analyzing the distribution characteristics of target Bbox, this paper designs a tight search space, mainly based on the following two characteristics.

Upper and Lower Limits of the Anchors

In this paper, the size and aspect ratio of COCO data set targets are statistically analyzed, and the upper bound and lower bound of ratio are obtained:

The statistical results are shown in the figure above, where the blue dot is each target, the yellow line is the upper and lower bounds respectively, and the black rectangle in the middle is the search space of the BOHB search experiment. As you can see, a portion of the search space outside the upper bound and the lower bound is an invalid search, so it is necessary to constrain the search space between the upper bound and the lower bound. In addition, there are 5 red boxes in the figure, which are the corresponding search space set by the paper for each layer of RPN, which will be mentioned below.

Adaptive Feature-Map-Wised Search Space

The output of each layer of FPN is statistically analyzed, and the results are shown in the figure above. It can be seen that different layers contain outputs of different quantities and shapes. As the number of layers increases, the number of Anchors decreases and the range of length-width ratio becomes smaller.

Based on the above analysis, this paper designs the search space of adaptive FPN. The area between the five red boxes and the upper and lower bounds in FIG. 4 is the corresponding search space of each layer of FPN. The specific search is shown in the figure above. Each floor has an independent search space. The larger the number of floors, the smaller the number, size range and length-width ratio range of Anchors. In fact, this adaptive FPN search space is larger than the black rectangle search space in Figure 4, and the smaller search space per layer helps the HPO algorithm search more intensively.

Bayesian Anchor Optimization via Sub-sampling


The proposed search method, as shown in Figure 7, includes BO and subsampling methods for selecting potential Settings and assigning different computing resources to different Settings, respectively. The overall idea is similar to that of BOHB, in which Hyperband is replaced with sub-sample method.

Bayesian Optimization

In the implementation of this paper, BO module is similar to BOHB, using TPE(Tree Parzen Estimator) as the kernel density function for modeling, and TPE contains two probability density functions: L (x) = p (y < alpha ∣ x, D) l (x) = p (y < \ alpha | x, D) l (x) = p (y < alpha ∣ x, D) and g (x) = p (y > alpha ∣ x, D) g (x) = p (y > \ alpha | x, D) g (x) = p (y > alpha ∣ x, D), Respectively, the results good probability and the probability of poor results, where D = {(x0, y0),…, xn, yn)} D = \ {(x_0 y_0) \ cdots, (x_n, y_n) \} D = {(x0, y0),…, xn, yn)} for the current observations, Alpha = min {y0,…, yn} \ alpha = min \ {y_0, \ \ cdots, y_n \} of alpha = min {y0,…, yn} for the current observation data, the optimal results of sampling taken while the l/g (x) (x) (x) l/g (x) (x) l/g (x) of the largest super parameter combination. It should be noted that since Hyperband only guarantees the accuracy of the final output results, the output of other results may not be accurate due to the stop in the middle and the lack of resources. Direct use of these results to train G (x)g(x)g(x) g(x) will cause a large error, so a better method is needed to solve this problem.

Sub-Sample Method

Sub-sample is also a Bandit-based method. In the case of limited resources, high-quality candidate hyperparameter combination can be tested as far as possible. Definition I = {1, 2,…, K} \ mathcal {I} = \ \} {1, 2, \ \ cdots, K I = {1, 2,…, K} by candidate parameters combination, The sub-sample method is selected based on the observation data Y1(k)Y^{(k)}_1Y1(k). Y1 (k), Y2 (k),…, 1 k or less KY or less ^ {} (k) _1, Y ^ {} (k) _2, \ \ cdots, 1 \ \ le le k KY1 (k), Y2 (k),…, 1 k k or less or less for the current observation point on an observation point relative to earnings.

The sub-sample method is shown above. First, test the minimum resource BBB for all super-parameter combinations to obtain a batch of observation data, and then select the super-parameter combination with the most used resources as the leader in each round. If other combinations are superior to the leader, they are given resource BBB. If the current combination k ‘k^{‘}k’ is superior to the leader combination KKK, the current combination k ‘k^{‘}k’ is superior to the leader combination KKK.

The first rule is judged according to the number of observations. Cnc_ncn is a non-negative monotone threshold, which is used to control the minimum number of observations of each hyperparameter. Generally, it is set to log⁡n\ SQRT {\log n}logn. The second rule is based on the recent returns of the two portfolios. The higher the return, the better.

Experiment


The optimal Anchor combination found on COCO data set.

Comparisons were made across different data sets.

Embed SOTA method for comparison.

Comparison experiment with other methods.

Conclustion


In this paper, the hyperparameter optimization method AABO is proposed. The core of this method is based on Bayesian optimization and sub-sample method, which can search for the optimal anchor setting adaptingly. According to the experimental results, AABO can improve the performance of SOTA target detection method by 1.4% ~ 2.4% just by optimizing anchor setting.

Refer to the content

  • BOHB – nni. Readthedocs. IO/useful/latest/T…





If this post helped you, please give it a like or watch it again

More content please pay attention to the wechat public account