AdaBoost is a more advanced “forest” type of decision tree, which has the following three characteristics compared to random forest

  1. Each tree in AdaBoost has only one root node and two leaf nodes, which might actually be better called stump
  2. Each stump in AdaBoost has a different weight, while each tree in a random forest has the same weight
  3. Incorrect data from the previous stump affects the generation of the later stump, meaning that the later stump complements the previous one. This idea is also known as Boost, and in addition to AdaBoost, GBDT and XGBoost are also thought of this way (both of them obviously have Boosts).

AdaBoost generation steps

Suppose we have the following training data, and we want to train a heart disease prediction model based on three characteristics: “chest pain,” “clogged vessels,” and “body weight” :

Chest pain Blood clots weight Heart disease
Yes Yes 205 Yes
No Yes 180 Yes
Yes No 210 Yes
Yes Yes 167 Yes
No Yes 156 No
No Yes 125 No
Yes No 168 No
Yes Yes 172 No

First, we need to attach the same weight to each sample, because there are only 8 data pieces, so the weight of each sample is 1/8, as follows

Chest pain Blood clots weight Heart disease The sample weight
Yes Yes 205 Yes 1/8
No Yes 180 Yes 1/8
Yes No 210 Yes 1/8
Yes Yes 167 Yes 1/8
No Yes 156 No 1/8
No Yes 125 No 1/8
Yes No 168 No 1/8
Yes Yes 172 No 1/8

Next, we use gini unpurity to find the most suitable root among the three features. After calculation, when “weight >176”, gini unpurity is the minimum, then the node of the first stump is “weight >176”, as shown in the figure below:

After generating a tree stump, we take the sample of the wrong stump judgment and add their weights to get the total error of the tree stump. There is only one wrong sample of the above tree stump:

Chest pain Blood clots weight Heart disease The sample weight
Yes Yes 167 Yes 1/8

Then the Total Error of the stump is the weight of the Error sample — 0.125. From the total error, we can calculate the Weight of the stump:


The curve of the formula is shown in the figure below, and it can be seen that the error ranges from 0 to 1. As the error increases, the Weight of the stump decreases. In the above example, our error is 0.125, and the corresponding Weight is 0.973, which is the position of the blue dot in the figure:

Pile emerge after a tree, then will produce the second tree and said, in front of the tree after the generation of the dependent on a tree in front of the error, specific, we will according to the error to adjust the weight of each sample, in this way, the back of the tree can according to the sample of new weight training, further, a tree in the wrong before the sample, we hope that the next training in a tree, By increasing the weight of these samples and decreasing the weight of the correct samples, the next tree will be more inclined to handle these samples well, complementing the previous tree.

The overall error is negatively correlated with the Weight of the tree, and the higher the Weight is, the higher the confidence level is. In this case, compared with the tree with the lower Weight, the important Weight of the wrong sample is higher, while the important Weight of the right sample is lower. The adjustment of the Weight of the wrong sample and the Weight of the right sample are shown in the left figure and the right figure respectively:

For this example, the Weight of the first stump is 0.973, so the Weight of the error sample will be adjusted according to the formula in the left figure, similarly, the weight of the correct sample will be adjusted according to the formula on the rightAfter normalization, the final weight adjustment of all samples is as follows:

The serial number Old sample weight New sample weight After normalization
1 1/8 0.05 0.07
2 1/8 0.05 0.07
3 1/8 0.05 0.07
4 1/8 0.33 0.49
5 1/8 0.05 0.07
6 1/8 0.05 0.07
7 1/8 0.05 0.07
8 1/8 0.05 0.07

Next, we need to train the stumps according to the new feature weights. One way is to sample according to the weights, that is, before each piece of data is drawn, a random number of 0-1 is generated to determine which piece of data is drawn according to the random number. For example, when the random number falls within the range of [0, 0.07), the first sample will be taken out; when it falls within the range of [0.07, 0.14), the second sample will be taken out, and so on.

After sampling, we re-assign equivalent weights to these samples as follows:

Chest pain Blood clots weight Heart disease The sample weight
No Yes 156 No 1/8
Yes Yes 167 Yes 1/8
No Yes 125 No 1/8
Yes Yes 167 Yes 1/8
Yes Yes 167 Yes 1/8
Yes Yes 172 No 1/8
Yes Yes 205 Yes 1/8
Yes Yes 167 Yes 1/8

It can be seen that the fourth sample has been repeatedly extracted for many times (it has the highest sample weight). After training with this batch of data, the new stump will be more inclined to classify this sample correctly, because the repeated sample will be punished more during training.

The next step is the same as the first step, repeat the above process.

AdaBoost prediction

After building AdaBoost, how do we make predictions? The prediction process is similar to random forest in that the results of each tree are used to vote, but the difference is that weighted voting is adopted here. For example, if we have a piece of data, each tree’s prediction of the data is as follows:

Tree number Tree Weight Predicted results
1 0.97 1
2 0.34 0
. . .
100 0.46 1

After the aggregation, add the weights of the same predicted result as follows

Predicted results Tree is the sum of all the Weight
1 43.7
0 20.1

So the prediction result of this data is 1.

conclusion

In this paper, we learned the construction process of AdaBoost together. Compared with random forest, AdaBoost has three characteristics:

  1. Each tree has only one root node and two leaf nodes
  2. The latter tree is determined by the error of the previous tree
  3. Each tree has a different weight, and the prediction will be aggregated based on the weight

Reference: AdaBoost, Clearly Explained

Related articles:

  • Random forests of decision trees