Key Tools of Decision making in Kaggle (2)

This article continues

05 Introduction to Integrated learning

Ensemble learning is an algorithm that completes learning tasks by constructing and combining multiple learners. There are two kinds of ensemble learning

Bagging: Parallelization methods that can be generated simultaneously without strong dependencies between base learners

Boosting: A method for generating base classifiers that has a strong dependency among them

Bagging (Bootstrap Aggregating

let n be the number of boostrap samples

for i=1 to n doDraw boostrap samples of size m,D Train base classifier h on D y = model(h_1(x),... h_n(x))Copy the code

Boosting method is a process of Boosting “weak learning algorithm” to “strong learning algorithm”, which obtains a series of weak classifiers (decision tree and logistic regression) through repeated learning, and combines these weak classifiers to get a strong classifier. Boosting algorithm involves two parts, addition model and forward step algorithm.

The addition model means that strong classifiers are linearly added by a series of weak classifiers to form the following combination

Where l (x; Am) is the weak classifier, m is the optimal parameter learned by the weak classifier, m is the proportion of the weak learning in the strong classifier, P is the combination of all m and βm. These weak classifiers add up linearly to form a strong classifier

The forward step is in the training process, and the classifier generated in the next iteration is trained on the basis of the previous one. namely

06 Bagging: Random forest

Random forest = Bagging + decision tree

Simultaneously train multiple decision trees, and comprehensively consider multiple results for prediction, such as taking the mean value of multiple nodes (regression) or mode (classification).

The disadvantage of decision tree being easy to overfit is eliminated
The variance of the prediction is reduced, and the predicted value will not change drastically due to the small change of training data

There are two things about randomness

A subset is randomly taken from the original training dataset (with bootstrap back) as the training dataset for a decision tree in the forest

Each time feature selection is limited to finding a feature in a subset of randomly selected features.

Random forest practice

Based on the existing employee dimission data of a company, we construct decision tree and random forest to predict whether an employee will dimission. And find out the important characteristics that affect employee dimission

Github.com/TuringEmmy/…

07 Boosting:Adaboost

The understanding of the Adaboostt

The idea of Adaboostl is to focus on the misclassified samples, reduce the weight of the correctly classified samples in the last round and improve the weight of the misclassified samples

Adaboost adopts the weighted voting method to classify the weak classifiers with small error with significant weight, while the weak classifiers with large classification error have small weight.

Adaboostt algorithm flow

Assume that the input training data is

Among them, the number of iterations, namely the number of weak classifiers, is M

The weight distribution of initialization training samples is

A) Use distributions with weightsTo obtain the weak classifier

B) calculationClassification error rate on training data set

C) computingWeight in a strong classifier

(d) Updating the weight distribution of the training data set (here,Is the normalized factor, in order to make the sum of the sample’s probability distribution 1)

The final classifier is obtained as

Proof of Adaboost

It is assumed that after m-1 iteration, the weak classifier FM-L (x) can be obtained. According to the forward distribution,

AdaBoosth’s loss function is exponential loss, then

becauseIs given, so let’s move it to the front

Among them:Is the sample weight of each iteration, and the proof is simplified as follows:

Continue simplifying Loss

Rewrite the Loss:

rightTake the partial derivative and set it to 0, then:

The practice of Adaboostt

Adaboosti can be regarded as a dichotomous learning method when the addition model, the loss function is exponential loss function and the learning algorithm is forward distribution algorithm. Next we use the Ada Boost interface in S Learnt:

sklearn-AdaBoostClassifier

The class sklearn. Ensemble. AdaBoostClassifier (base_estimator = None, *, n_estimators = 50, learning_rate = 1.0, algorithm='SAMME.R', random_state=None)[source]Copy the code

Here is bagging and Boosting integrated learning method that I got to know when I played in the competition. As a deep learning engineer, I felt the advantages of machine learning algorithm in major competitions for the first time.

Let’s share and grow together. Sharing makes us not lonely on the way of programming. Come and scan the WECHAT QR code, and learn happily with the bloggers!