Original link:tecdat.cn/?p=9670

Original source:Tuo End number according to the tribe public number

 

Splines are a method of fitting nonlinear models and learning nonlinear interactions from data.

Cubic spline

Cubic splines have consecutive first and second derivatives. By applying basic functions to transform variables and using these transformed variables to fit the model, we add nonlinearity to the model to make the spline curve fit smoother.

 

Require (splines) #ISLR attach(Wage) # agelims<-range(age age.grid<-seq(from=agelims[1], to = agelims[2])Copy the code

A cubic spline R is fitted with a function.

3knots (age ~ BS (age,knots = c(25,40,60)),data = wage) summary(fit) ## ## Call: ## lm(formula = wage ~ bs(age, knots = c(25, 40, 60)), data = Wage) ## ## Residuals: ## Min 1Q Median 3Q Max ## -98.832-24.537-5.049 15.209 203.207 ## ## Coefficients: # # Estimate Std. Error t value (Pr > | | t) # # (Intercept) 60.494 9.460 6.394 1.86 e-10 * * * # # bs (age, knots = c (25, 40, 3knots (3knots = 3knots (3knots = 3knots (3knots, 3knots = 3knots (3knots, 3knots = 3knots (3knots, 3knots = 3knots (3knots, 3knots = 3knots)) 3knots = 1 (1, 2) 3knots = 1 (2, 3) 3knots = 1 (3, 3) 3knots = 1 (3, 4) 3knots = 1 (4, 4) 3knots = 1 (4, 5) 3knots = 1 (5, 5) 3knots = 1 3knots (age, 3knots) 3knots (age, 3knots) 3knots (age, 3knots) 3knots (age, 3knots) 3knots (age, 3knots) 3knots (age, 3knots) 3knots (age, 3knots) 3knots (age, 3knots) Knots = C (25, 40, 60) knots = c(25, 40, 60) knots = C (25, 40, 60) 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1 "' 1 # # # # Residual standard error: 39.92 on 2993 degrees of freedom ## Multiple R-squared: 0.08642, Adjusted R-squared: 0.08459 ## F-statistic: 47.19 on 6 and 2993 DF, p-value: < 2.2e-16Copy the code

Draw a regression line

 

The figure above shows the smooth and local effect of cubic spline curves.

Smooth spline

Our aim in smooth spline curves is to minimize the error function by adding roughness.

 

 
Copy the code

Now we can notice that the red line (or “smooth spline”) is more volatile and fits the data more flexibly. This may be due to the high degree of freedom. The best way to select the parameters λ and DF is through cross-validation.

Perform cross-validation to select λ values and draw smooth splines:

fit2 ## Call: ## smooth.spline(x = age, y = wage, CV = TRUE) ## ## Smoothing Parameter SPAR = 0.6988943 lambda= 0.02792303 (12 iterations) ## Equivalent Degrees of Freedom (Df): 6.794596 ## Penalized Criterion: 75215.9 ## PRESS: 1593.383Copy the code

Lambda =0.0279 and DF = 6.794596 were selected

The model is also very smooth and fits the data well.

conclusion

Therefore, we need to make some transformations of data or variables to make the model more flexible and more powerful in learning nonlinear interactions between inputs X I and outputs Y.


Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Hosmer-lemeshow goodness of fit test in R language regression

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7. Realize Logistic Logistic regression in R language

8. Python predicts stock prices using linear regression

9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression