This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money. Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

1. Definition of integrated learning algorithm

Ensemble learning is to generate a strong classifier by combining several weak classifiers through certain strategies. A Weak Classifier refers to a Classifier whose classification accuracy is only slightly better than random guess, while a Strong Classifier has a much higher classification accuracy. Strong and weak are relative here. Some books also refer to weak classifiers as “base classifiers.”

At present, there are two main schools of integrated learning algorithms:

  • bagging
  • boosting

2. Bagging

Bagging, also known as bootstrap aggregating, is the practice of repeatedly sampling (with inputs) from data sets based on uniform probability distributions. Each new dataset is the same size as the original dataset. Since each sample in the new dataset was randomly sampled from the original dataset that was put back in, the new dataset may have duplicate values, and some samples from the original dataset may not be in the new dataset at all. The bagging method flow is shown in the following figure:

There is a random sampling of put back: Bootstap sampling means that for the original data set of M samples, one sample should be randomly selected and put into the sample set, and then put the sample back into the original data set, and then conduct random sampling of the next sample until the number of samples in the sample set reaches M. So a sample set is built, and then we can repeat the process to generate n such sample sets. In other words, in the last sample set, the samples in each sample set may be repeated, or some samples in the original data set may not be selected at all, and the sample distribution in each sample set may be different. According to the n sample sets constructed by random sampling with put back, we can train them separately to get N weak classifiers, and then according to the results returned by each weak classifier, we can use certain combination strategy to get the strong classifier we finally need. The representative algorithm of Bagging method is random forest. To be precise, random forest is a specialized advanced version of Bagging. The so-called specialization is because the weak learners of random forest are decision trees. The so-called advanced random forest is the random sampling of bagging samples with the addition of random selection of characteristics, the basic idea is not out of the bagging category.

3. Boosting

Boosting is an iterative process that adaptively changes the distribution of training samples to enable weak classifiers to focus on those that are difficult to classify. It does this by assigning a weight to each training sample and automatically adjusting the weight at the end of each training round. Boosting method flow, as shown in the figure below:

Boosting method includes Adaboost, GBDT and XGBoost

4 Adaboost is an acronym for Adaptive Boosting

The algorithm steps are as follows: Suppose we have a training set of m samples labeled y ∈{– 1,+1}, the prediction results of our n weak learners are (h1(z), H2 (z).. hn(z)), t=1,2,3… ,n

1. Calculate the sample weight

Each sample in the training set is given a weight to form a weight vector D, and the weight vector D is initialized to be equivalent. Let’s say we have m samples, and each sample has the same weight, so the weight is 1/m.

⒉ Calculate the error rate

Train a weak classifier on the training set and calculate the error rate of the classifier:

3. Calculate the weight of weak classifier and assign the weight value alpha to the current classifier, then the calculation formula of alpha is as follows:

4. Adjust the weight value according to the last training result, adjust the weight value (the weight of the last correct score decreases, and the weight of the wrong score increases). If the ith sample is correctly classified, the weight of the sample will be changed to:

If the i-th sample is misdivided, the weight of the sample is changed to:

Combine the above two formulas into one:

After that, the weak classifier is trained again on the same data set, and then the above process is repeated until the training error rate is 0 or the number of weak classifiers reaches the specified value. 5. After the end of the final strong classifier result cycle, we can get the prediction results of our strong classifier: