Original link:tecdat.cn/?p=22482

Original source:Tuo End number according to the tribe public number

 

The introduction

This article is a short tutorial on fitting BRT (ascending regression tree) models in R. Our goal is to apply BRT (ascending regression tree) models to ecological data and interpret the results.

The purpose of this tutorial is to help you learn how to develop a BRT model in R.

The sample data

Two sets of short-fin eels were recorded. One for model training (building) and one for model testing (evaluating). In the example below, we load training data. The presence (1) and absence (0) are recorded in column 2. The environment variables are in columns 3 through 14.


> head(train)
Copy the code

Fitting model

To fit GBM models, you need to decide what Settings to use, and this article provides you with the information to use the rules of thumb. The data included records of the presence of 202 short-fin eels at 1,000 sites. You can assume that (1) there is enough data to model interactions of reasonable complexity (2). An LR learning rate of about 0.01 May be a reasonable starting point. The following example shows how to determine the optimal number of trees (NT).

Step (data= train, x = 3:13, family = "Bernoulli ", comp = 5, lr = 0.01, bag.fr = 0.5)Copy the code

The cross-validation optimization of the ascending regression tree model was carried out. Using 1000 observations and 11 predictors, 10 initial models of 50 trees were created.

We used the cross-validation above. We define: data; Predictive variables; Dependent variables — column numbers representing species data; Tree complexity — we first tried a tree complexity of 5; Learning rate — we tried 0. 01.

Running a model as described above will output progress reports and make graphs. First of all, what you can see. This model is built using the default 10-fold cross-validation method. The black solid curve is the average of the change in prediction bias, and the dotted curve is one standard error (that is, measured on cross validation). The red line represents the minimum value of the mean, and the green line represents the number of trees that generated that value. The final model returned in the model object is built on the complete data set, using the optimal number of trees.

length(fitted)
Copy the code

The returned results include FITTED values fitted from the final tree, FITTED. Vars – variances of fitted values, residuals – residuals of fitted values, and Contribution – relative importance of variables. Statistics – Related evaluation statistics. CV. Statistics These are the most appropriate assessment statistics.

We calculate each statistic in each cross validation (based on the average change in prediction bias across all cross validations at the determined optimal tree number), and then present the mean and standard error of these cross-validation-based statistics here. Weights – Weights used when fitting the model (by default, each observation is “1”, i.e. equal weights). Trees. fitted – The number of fitted trees for each step of the fitted procedure; Training. Loss. values – The periodic change of deviation on training data, CV. Values – the mean value of CV estimate of predicted deviation calculated at each step in the periodic process. You can use summary functions to see the importance of variables

> summary(lr )
Copy the code

 

Select Settings

The above is a first guess at the setup, using the rules of thumb discussed in Elith et al. (2008). It models only 650 trees, so our next step will be to reduce LR. For example, try lr = 0.005 for more than 1000 trees.

Step (data=train, x = 3:13, tree.co = 5, + LR = 0.005Copy the code

To explore whether other Settings perform better, you can split the data into training sets and test sets, or use cross-validation results, changing TC, LR, and Bagging, and then compare the results.

Simplified model

Simplification builds a lot of models, so it can be slow. In it, we evaluated the value of a model with a simplified LR of 0.005, but only tested the elimination of up to five variables (the “N.drop” parameter; The default is that the automatic rule continues until the average change in the prediction bias exceeds the original standard error calculated in GBM.step).

For our run, the optimal number of variables to eliminate is estimated to be 1; You can use numbers indicated by red vertical lines. Now, build a model that excludes one predictive variable, using [[1]] to indicate that we want to exclude one variable.

step(  x= pred.list[[1]], )
Copy the code

 

This has now become a new model, but given that we don’t particularly want a simpler model (since it is acceptable to contain small contributions from variables in data sets of this size), we won’t continue to use it.

Draw the functions and fit values of the model

The fitting function of the BRT model created by our function can be plotted using plot.

>  plot( lr005 )
Copy the code

The additional arguments to this function allow a smooth representation of the graph. Based on the distribution of observed values in environmental space, the fitting function can give the distribution of fitted values associated with each predictor.

 fits( lr005)
Copy the code

The values at the top of each graph represent a weighted average of the fitted values associated with each non-factor predictor.

Draw interaction

This code evaluates the degree of interaction between pairs in the data.

 inter( lr005)
Copy the code

Returns a list. The first two sections are a summary of the results, starting with a ranked list of the five most important interactions, followed by a table of all interactions.

f$intera
Copy the code

You can draw interactions like this.

Persp (lr005, z.r ange = c (0,0.6)Copy the code

 

Make predictions about new data

If you want to make predictions about a set of locations (rather than the entire map), the usual procedure is to build a data framework with rows representing locations and columns representing variables in your model. The data set we use to predict the site is in a file called test. “The column needs to be converted to a factor variable at a level consistent with the level in the modeled data. Predict was used to predict sites in the BRT model in a vector called PREDS.

preds <- predict(lr005,test,
deviance(obs=test, pred=preds)
Copy the code

> d <- cbind(obs, preds)
> e <- evaluate(p=pres, a=abs)
Copy the code

A useful feature of forecasting in GBM is that different numbers of trees can be predicted.

tree<- seq(100, 5000, by=100)
predict( n.trees=tree, "response")
Copy the code

The code above forms a matrix where each column is the model’s prediction of the number of trees specified for that element in tree.list; for example, column 5 is for tree.list[5]=500 trees. Now let’s calculate the deviation of all of these results and plot.

> for (i in 1:50) {
 calc.devi(obs,
+ pred[,i])
+ }
> plot(tree.list,deviance
Copy the code

 

Spatial prediction

Here we show how to make predictions for the entire map.

> plot(grids)
Copy the code

We create a data.frame with a constant value (the “factor” class) and pass it to the prediction function.

> p <- predict(grids, lr005,
> plot(p)
Copy the code


Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Hosmer-lemeshow goodness of fit test in R language regression

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7. Realize Logistic Logistic regression in R language

8. Python predicts stock prices using linear regression

9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression