“This is the 28th day of my participation in the Gwen Challenge.

Integrated learning model uses a series of weak learners (also known as the base model or base model) to learn, and integrates the results of each weak learner, so as to obtain better learning effect than a single learner. Bagging algorithm and Boosting algorithm are two common algorithms for integrated learning model.

Basic principles of random forest model

Random Forest is a classical Bagging model whose weak learner is decision tree model.

1. Random data

All the data were put back to randomly selected data as training data for one of the decision tree models. For example, there are 1000 original data, which are extracted 1000 times back to form a new set of data, which can be used to train a decision tree model.

2. Characteristics of the random

If the feature dimension of each sample is M, specify a constant k < M and randomly select K features from M features. When using Python to construct the random forest model, the default number of selected features k is the square root of M.

Compared with the single decision tree model, the random forest model integrates multiple decision trees, and its prediction results are more accurate, and it is not easy to cause over-fitting phenomenon, and it has stronger generalization ability.

The demonstration code of random forest classification model is as follows.

Ensemble import RandomForestClassifier X = [[1,2],[3,4],[5,6],[7,8],[9,10]] y = [0,0,0,1,1] model = RandomForestClassifier (n_estimators = 10, random_state = 213) model. The fit (X, y) print (model. The predict ([[5, 5]]))Copy the code

The first line of code introduces the RandomForestClassifier correlation library of the random forest classification model.

The X in line 2 is the characteristic variable, which has two characteristics.

In line 3, y is the target variable, with two categories — 0 and 1.

Line 5 introduces the model and sets the number of weak learners n_ESTIMators to 10, that is, there are 10 decision tree models as weak learners. Set random_state to 123 so that the results are consistent from run to run (if you don’t set it, you can expect different results each time, because random forest follows the basic principles of “data randomness” and “feature randomness”).

Line 6 trains the model with the fit() function.

Line 8 uses the predict() function to predict the result.

[0]

The demonstration code of random forest regression model is as follows.

Regressor X = [[1,2],[3,4],[5,6],[7,8],[9,10]] y = [1,2,3,4,5] model = RandomForestRegressor (n_estimators = 10, random_state = 123) model. The fit (X, y) print (model. The predict ([[5, 5]]))Copy the code

The first line of code introduces RandomForestRegressor.

The X in line 2 is the characteristic variable, which has two characteristics

The y in line 3 is the target variable, which is a continuous value.

Line 5 introduces the model and sets the number of weak learners n_ESTIMators to 10, that is, there are 10 decision tree models as weak learners. Set random_state to 123 so that each run results are consistent.

Line 6 trains the model with the fit() function.

Line 8 uses the predict() function to predict the result.

[2.8]