This is the 22nd day of my participation in the August Wen Challenge.More challenges in August

1. Introduction

Separating the blue dots from the yellow dots with a straight line perpendicular to the X or Y axes is unlikely to achieve 100% accuracy, no matter how the line is chosen. When the perceptron was proposed, it provided us with ideas for solving linear problems, but when faced with xOR problems, perceptron was powerless. Later, the activation function was introduced to solve the xOR problem and inject vitality into the perceptron. Anyway, what do you do when a line doesn’t divide the categories correctly? Let’s introduce the activation function, okay?

2. Bagging method

Bagging Training process:

  1. In the data set with X training data, m sample sets are randomly selected (these samples are put back after the training of basic classifier C)
  2. By training m randomly selected sample sets, a basic classifier C1 with small error is formed
  3. Assign weight W1 to basic classifier C1
  4. Go back to Step 1 and extract m samples again. Finally, each classifier is linearly superimposed according to a certain weight to form a strong classifier composed of basic classifiers

\

Characteristics of Bagging:

A. For each classifier, the input data are repeatable samples from the original training data. The input of each classifier follows the same distribution and is independent of each other. In Boost, the distribution of each training data is not independent, nor is the input sample of each classifier independent.

B. Each classifier can adopt the same algorithm and different hyperparameters; Different algorithms can also be used;

C. The output of each classifier has no weight and is equal.

It features “random sampling”. So what is random sampling?

Random sampling (Bootsrap) is to take a fixed number of samples from our training set, but after each sample is taken, the sample is put back. In other words, samples previously collected may continue to be collected after being put back. For our Bagging algorithm, we generally randomly collect samples with the same number as the training set sample number m. In this way, the number of samples obtained is the same as that of the training set, but the sample contents are different. If we take T random samples of m sample training sets, then T sample sets are different due to randomness.

\

In addition, Bagging can assign weight in a variety of ways:

  1. Simple voting is plurality voting, or what we call majority voting
  2. A slightly more complicated voting method is supermajority voting, which is what we call a majority vote. On the basis of plurality voting, not only the highest number of votes is required, but also a majority of the votes is required. Otherwise it will reject predictions.
  3. More complicated is the weighted voting method. Like the weighted average method, the classified votes of each weak learner are multiplied by a weight. Finally, the weighted votes of each category are summed up, and the category corresponding to the largest value is the final category.

\

The first two methods are to average or vote the results of the weak learner, which is relatively simple, but may have a large learning error, so there is the learning method.

\

For learning methods the representative methods are stacking, when used Instead of doing simple logical processing on the results of weak learners, we add a layer of learners, that is, we take the learning results of the weak learners of the training set as input, the output of the training set as output, and retrain a learner to get the final result.

\

Disadvantages of Bagging:

Probably at each sampling m sample, will get the same as the before sampling data, leads to the formation of a classifier effects after a linear combination of the strong classifier, also mentioned in the book of machine learning, random sampling, if there will be 36.8% of the sample is not be sampling, therefore the training error caused by the training set is difficult to avoid

\

Boosting method 3

\

\

Boosting method is a common statistical learning method, which is widely used and effective. In the classification problem, it can improve the performance of classification by changing the weight of training samples, learning multiple classifiers and linear combination of these classifiers. The idea of lifting method and the representative lifting algorithm Adaboost, Adaboost algorithm was proposed by Freund and Schapire in 1995

Ensemble learning is a machine learning method that uses a series of learners to learn and uses certain rules to integrate each learning result so as to obtain better learning effect than a single learner. In general, multiple learners in ensemble learning are homogeneous “weak learners”.

Base learning algorithm (homogeneous integration) : such as “decision tree integration”, “neural network integration” (corresponding individual learner is called base learner)

Heterogeneous integration: includes different types of individual learners — as well as different kinds of learning methods such as decision trees and neural networks

\

In the framework of probabilistic approximate correct learning, a concept is said to be learnable if there is a polynomial learning algorithm that can learn it with high accuracy.

The concept is said to be weakly learnable if there is a polynomial learning algorithm that can learn it with only slightly better accuracy than random guesses.

It is very interesting that Schapire later proved that strong learnable is equivalent to weak learnable. In learning, if a “weak learning algorithm” has been discovered, can it be boosted into a “strong learning algorithm”?

You know, it’s usually much easier to find a weak learning algorithm than a strong learning algorithm. Then how to implement the specific promotion becomes the problem to be solved when developing the promotion method.

Three things are likely to happen in an integrated process:

  1. The integration effect is good, and the classification effect is improved
  2. The integration effect is not obvious, and the classification effect is not improved
  3. The integration effect is poor and the classification effect is reduced

\

\

\

4.Adaboost promotion idea

Thus, there are two questions that need to be answered for ascension methods:

1. How to change the weight or probability distribution of training data in each round;

2. How to combine weak classifiers into a strong classifier.

The core idea of Adaboost algorithm is to gradually strengthen the weak classifier with poor classification effect into a strong classifier with good classification effect.

The process of reinforcement is shown in the figure below, which gradually changes the sample weight. The level of sample weight represents its importance in the classifier training process. And the classifier will pay more attention to these samples in the training process and give “special care”.

The so-called “two heads are better than one” is precisely this principle. It can be seen that in each classification process, the area (weight) of the wrongly classified points is increasing, while the area (weight) of the correctly classified points is decreasing, which can better attract the attention of the classifier to these points.

\

Boosting algorithm has several specific problems that are not described in detail

  1. How to calculate the learning error rate E

  2. How to obtain the weight coefficient α of weak learner

  3. How to update the sample weight D

  4. What combination strategy to use

This is the whole algorithm flow of Adaboost, the formula looks boring, let’s take an example.

\

Training data set T={(x1,y1),(x2,y2)… ,(xN,yN)}, where X represents the input sample, and y∈{+1,−1} is the corresponding label.

Output: Final classifier G(x)

(a) On the training data whose weight distribution is D1, the classification error rate is the lowest when the threshold ν is 2.5, so the basic classifier is

(b) The error rate of G(x) on the training data set

(c) Compute the coefficient of G(x)

(d) Updating the weight distribution of training data

Original weights (Table 1) :

Now updated weights (Table 2)

Notice that the weights of the misclassified points are going up, and that’s because in the formula

\

When it’s consistent, the exponential is negative, e to the minus alpha is less than 1,

When the judgment is inconsistent, the exponential function has a positive exponent, exp(α)>1

According to the weights in Table 2, we should focus on the three points x=6,7, and 8.

\

\

Now updated weights (Table 3)

It can be seen that due to the classification prediction error of the second classifier on x=3,4 and 5, the corresponding weights will rise. Why is the weight of basic classifier 2 higher than that of basic classifier 1? That is because the prediction error rate of basic classifier 2 is smaller than that of 1. The error rate of basic classifier 1 is 0.1+0.1+0.1=0.3, while that of basic classifier 2 is 0.0715+0.0715+0.0715=0.2143. When the error rate decreases, the weight of classifier will increase. This means that the classifier votes with a weight greater than 1. Similarly, classifier 3 is constructed.

\

\

Summary: Weight change table (marked in red are weight changes of classification errors of this classifier)

Boosting: It should be noted that in each round of training, boosting algorithm needs to check whether the currently generated base learner meets the basic conditions, and if the conditions do not meet the basic conditions, the base learner is discarded. The initial set number of learning rounds, T, may not be reached, which may lead to poor performance of the final integrated learner

5. Conclusion:

\

6. Case: Adaboost was used to predict mortality from hernia symptoms in diseased horses

As can be seen from the above results, when the number of weak classifiers reaches 50, the prediction accuracy of both training set and test set reaches a relatively high value. However, if the number of weak classifiers continues to increase, the accuracy of test set starts to decline, which is called overfitting.

Editing in the 2019-04-02

adaboost

bagging

Machine learning

Agree with 19

Add comments

share

Like collection application reprint

Agree with 19

share

The article is included in the following column

  • One weekly machine learning paper notes

  • Study in group, read one academic paper every week, welcome to submit
  • Deep learning and Natural language processing

  • Introducing columns on deep learning and NLP

Recommended reading

  • # Understand AdaBoost algorithm SIGAI
  • Boosting and AdaBoost: Boosting and AdaBoost Published in Learning from Zero AI
  • The principle and implementation of AdaBoost algorithm are clear
  • Adaboost algorithm for integrated model (3)August

No comments yet

Write your comments…

\

release

\