AdaBoost for decision tree algorithm

AdaBoost is a more advanced “forest” type of decision tree, which has the following three characteristics compared to random forest

Each tree in AdaBoost has only one root node and two leaf nodes, which might actually be better called stump
Each stump in AdaBoost has a different weight, while each tree in a random forest has the same weight
Incorrect data from the previous stump affects the generation of the later stump, meaning that the later stump complements the previous one. This idea is also known as Boost, and in addition to AdaBoost, GBDT and XGBoost are also thought of this way (both of them obviously have Boosts).

AdaBoost generation steps

Suppose we have the following training data, and we want to train a heart disease prediction model based on three characteristics: “chest pain,” “clogged vessels,” and “body weight” :

Chest pain	Blood clots	weight	Heart disease
Yes	Yes	205	Yes
No	Yes	180	Yes
Yes	No	210	Yes
Yes	Yes	167	Yes
No	Yes	156	No
No	Yes	125	No
Yes	No	168	No
Yes	Yes	172	No

First, we need to attach the same weight to each sample, because there are only 8 data pieces, so the weight of each sample is 1/8, as follows

Chest pain	Blood clots	weight	Heart disease	The sample weight
Yes	Yes	205	Yes	1/8
No	Yes	180	Yes	1/8
Yes	No	210	Yes	1/8
Yes	Yes	167	Yes	1/8
No	Yes	156	No	1/8
No	Yes	125	No	1/8
Yes	No	168	No	1/8
Yes	Yes	172	No	1/8

Next, we use gini unpurity to find the most suitable root among the three features. After calculation, when “weight >176”, gini unpurity is the minimum, then the node of the first stump is “weight >176”, as shown in the figure below:

After generating a tree stump, we take the sample of the wrong stump judgment and add their weights to get the total error of the tree stump. There is only one wrong sample of the above tree stump:

Chest pain	Blood clots	weight	Heart disease	The sample weight
Yes	Yes	167	Yes	1/8

Then the Total Error of the stump is the weight of the Error sample — 0.125. From the total error, we can calculate the Weight of the stump:

The curve of the formula is shown in the figure below, and it can be seen that the error ranges from 0 to 1. As the error increases, the Weight of the stump decreases. In the above example, our error is 0.125, and the corresponding Weight is 0.973, which is the position of the blue dot in the figure:

Pile emerge after a tree, then will produce the second tree and said, in front of the tree after the generation of the dependent on a tree in front of the error, specific, we will according to the error to adjust the weight of each sample, in this way, the back of the tree can according to the sample of new weight training, further, a tree in the wrong before the sample, we hope that the next training in a tree, By increasing the weight of these samples and decreasing the weight of the correct samples, the next tree will be more inclined to handle these samples well, complementing the previous tree.

The overall error is negatively correlated with the Weight of the tree, and the higher the Weight is, the higher the confidence level is. In this case, compared with the tree with the lower Weight, the important Weight of the wrong sample is higher, while the important Weight of the right sample is lower. The adjustment of the Weight of the wrong sample and the Weight of the right sample are shown in the left figure and the right figure respectively:

For this example, the Weight of the first stump is 0.973, so the Weight of the error sample will be adjusted according to the formula in the left figure, similarly, the weight of the correct sample will be adjusted according to the formula on the rightAfter normalization, the final weight adjustment of all samples is as follows:

The serial number	Old sample weight	New sample weight	After normalization
1	1/8	0.05	0.07
2	1/8	0.05	0.07
3	1/8	0.05	0.07
4	1/8	0.33	0.49
5	1/8	0.05	0.07
6	1/8	0.05	0.07
7	1/8	0.05	0.07
8	1/8	0.05	0.07

Next, we need to train the stumps according to the new feature weights. One way is to sample according to the weights, that is, before each piece of data is drawn, a random number of 0-1 is generated to determine which piece of data is drawn according to the random number. For example, when the random number falls within the range of [0, 0.07), the first sample will be taken out; when it falls within the range of [0.07, 0.14), the second sample will be taken out, and so on.

After sampling, we re-assign equivalent weights to these samples as follows:

Chest pain	Blood clots	weight	Heart disease	The sample weight
No	Yes	156	No	1/8
Yes	Yes	167	Yes	1/8
No	Yes	125	No	1/8
Yes	Yes	167	Yes	1/8
Yes	Yes	167	Yes	1/8
Yes	Yes	172	No	1/8
Yes	Yes	205	Yes	1/8
Yes	Yes	167	Yes	1/8

It can be seen that the fourth sample has been repeatedly extracted for many times (it has the highest sample weight). After training with this batch of data, the new stump will be more inclined to classify this sample correctly, because the repeated sample will be punished more during training.

The next step is the same as the first step, repeat the above process.

AdaBoost prediction

After building AdaBoost, how do we make predictions? The prediction process is similar to random forest in that the results of each tree are used to vote, but the difference is that weighted voting is adopted here. For example, if we have a piece of data, each tree’s prediction of the data is as follows:

Tree number	Tree Weight	Predicted results
1	0.97	1
2	0.34	0
.	.	.
100	0.46	1

After the aggregation, add the weights of the same predicted result as follows

Predicted results	Tree is the sum of all the Weight
1	43.7
0	20.1

So the prediction result of this data is 1.

conclusion

In this paper, we learned the construction process of AdaBoost together. Compared with random forest, AdaBoost has three characteristics:

Each tree has only one root node and two leaf nodes
The latter tree is determined by the error of the previous tree
Each tree has a different weight, and the prediction will be aggregated based on the weight

Reference: AdaBoost, Clearly Explained

Random forests of decision trees

AdaBoost for decision tree algorithm

AdaBoost generation steps

AdaBoost prediction

conclusion

Related Posts

Introduction to the Uplike Test Team (1)

Greedy Algorithms — Leetcode series

Performance increased by about 7 times! Apache Flink integration with Apache Hive