Original link:http://tecdat.cn/?p=23075 

This example shows how to implement Bayesian optimization with Matlab, using quantile error to adjust hyperparameters of a regression tree random forest. If you are going to use the model to predict the conditional magnitude rather than the conditional mean, then it is appropriate to adjust the model using the quantile error rather than the mean square error.

Loading and preprocessing data

Load the dataset. Consider building a model that predicts the median fuel economy of a vehicle, given its acceleration, number of cylinders, engine displacement, horsepower, manufacturer, model number, and weight. Consider the number of cylinders, manufacturer, and model_ year as categorical variables.

Cylinders = categorical(Cylinders);

Specify adjustment parameters

Consider adjustments:

  • The complexity (depth) of the trees in the forest. Deep trees tend to overfit, but shallow trees tend to underfit. Therefore, the minimum number of observed values per leaf is set at a maximum of 20.
  • Number of predictors to sample at each node when growing the tree. Specifies a sample from 1 to all predictions.

Functions that implement Bayesian optimization require you to pass these parameters as optimization variable objects.

optim('minLS',\[1,maxMinLS\],'Type');

The hyperparameter random forest is a 2-by-1 array of optimized variable objects

Bayesian optimization tends to select random forests with many trees, because sets with more learners are more accurate. If available computing resources are a consideration, and you prefer collections with a small number of trees, consider adjusting the number of trees separately from other parameters, or penalizing models with many learners.

Define the target function

Define an objective function to be optimized for Bayesian optimization algorithm. The function should:

  • Accept as input the parameters to be adjusted.
  • Train a random forest with TreeBagger. In the treeBagger call, specify the parameters to adjust and the out-of-pocket index to return.
  • Estimate the outside quantile error from the median.
  • Returns the quantile error of data outside the bag.
Function Err = RF(X) % Train the random forest and estimate the quantile error outside the bag % Train a random forest consisting of 300 regression trees using the predicted data in X and the parameter description in params, and then return the quantile error outside the bag according to the median. X is a table, and params is an array to sample with the minimum leaf size and number of predictors per node. randomForest = Tree(300,X); Error(randomForest);

Target minimization is achieved using Bayesian optimization

Using Bayesian optimization, a model is found that achieves the minimum, penalized, out-of-pocket quantile error in terms of tree complexity and the number of predictors per node.

bayes(@(params)oobErrRF,parameters,...) ;

The result is a BayesianOptimization object that contains the minimum value of the objective function and the optimized hyperparameter value.

Displays the observed minimum value of the objective function and the optimized hyperparameter value.

MinObjective
bestHyperpara

Optimized hyperparameter training model was used

Train a random forest with the entire data set and optimized hyperparameter values.

Tree(300,X,'MPG','regression',...) ;

MDL is a TreeBagger object optimized for median prediction. You can predict the median fuel economy for a given forecast by passing the MDL and new data to QuantilePredict.


The most popular insight

1. Why do employees dimission according to the decision tree model

2. Tree-based method of R language: decision tree, random forest

3. Use the scikit-learn and Pandas decision tree in Python

4. Machine Learning: Run Random Forest data analysis reports in SAS

5.R language uses random forest and text mining to improve airline customer satisfaction

6. Machine learning promotes accurate sales time series of fast fashion

7. Machine Learning to Identify Changing Stock Market Conditions — Application of Hidden Markov Model

8. Python Machine Learning: Recommendation System Implementation (Collaborative Filtering with Matrix Factorization)

9. Predicting bank customer loss using PyTorch machine learning classification in Python