Original link:tecdat.cn/?p=20882 

Original source:Tuo End number according to the tribe public number

 

1 introduction

This article explores why using the generalized additive model is a good choice. To do this, we first need to look at linear regression to see why it might not be the best choice in some cases.


2 regression model

Suppose we have some data with two attributes Y and X. If they were linearly dependent, they might look something like this:


a<-ggplot(my_data, aes(x=X,y=Y))+
  geom_point()+
Copy the code

 

To examine this relationship, we can use regression models. Linear regression is a method that uses X to predict variable Y. Applying this to our data will predict a set of values in the red line:

a+geom_smooth(col="red", method="lm")+
Copy the code

 

This is the equation of a line. From this equation, we can describe where the line starts on the Y-axis (the “intercept” or α), and how much y is added to each unit of x (the “slope”), which we call the coefficient of x, or β. There is also a little natural fluctuation, if not, all the points will be perfect. We call this “residual” (ϵ). Mathematically:

Or, if we substitute real numbers, we get the following:

 

 

This article estimates the model by considering the difference between each data point and line (” residuals “) and then minimizing that difference. We have positive and negative errors above and below the line, so by squaring them and minimizing the “sum of squares”, we make them both positive for the estimate. This is called “ordinary least squares” or OLS.


3. What about nonlinear relations?

So what do we do if our data looks like this:

 

One of the key assumptions of the model we just saw is that y and x are linearly dependent. If our y is not normally distributed, we use the generalized linear model (Nelder&Wedderburn, 1972), in which y is transformed by a link function, but again assume that f (y) and x are linearly related. If this is not the case, and the relationship varies in the range of X, it may not be the most appropriate. We have some options here:

  • We could use a linear fit, but if we did, we would be above or below certain parts of the data.
  • We can fall into several categories. I use three in the figure below, which is a reasonable choice. Similarly, we may be below or above certain parts of the data, while being near the boundary between categories seems accurate. For example, if x = 49, is y very different from x = 50?
  • We can use polynomials and things like that. Below, I use cubic polynomials, so the model fits:. The combination of these allows the function to vary approximately smoothly. This is a good choice, but it can be extremely volatile and can cause correlations in the data, reducing the degree of fit.


Four spline curves

A further refinement of polynomials is the fitting of “piecewise” polynomials, which we chain together over the data range to describe shapes. Splines are piecewise polynomials named after the tools draftsmen use to plot curves. A physical spline is a flexible strip that can be bent into shape and held in place by a weight. In constructing a mathematical spline curve, we have polynomial functions whose second derivatives are continuous and fixed at “junction” points.

Here is aggplot2Object of the objectgeom_smoothThe formula ofnsThe natural cubic spline in the function. This spline is called cubic.And use 10 knots


5. Smooth function

Spline curves can be smooth or “wobbly”, which can be controlled by varying the number of nodes (k) or by using smooth penalty γ. If we increase the number of knots, it will be more “wobbly”. This might be closer to the data and with less error, but we start to “overfit” the relationship and fit the noise in our data. When we combine the smooth penalty, we penalize the complexity in the model, which helps reduce overfitting.


6 Generalized Additive Model (GAM)

Generalized additive models (GAM) (Hastie, 1984) use smooth functions (such as spline curves) as predictors in regression models. These models are strictly additive, which means that we can’t use interaction terms like normal regression, but we can achieve the same effect by reparameterizing into a smoother model. That’s not the case, but essentially we’re moving to a model like:

A more formal example of GAM from Wood (2017) is:

Among them:

  • μ I ≡E (Yi), expectation of Y
  • Yi ~ EF (μ I, ϕ I), Yi is a response variable with an exponential family distribution of mean μ I and the shape parameter ϕ.
  • Ai is a row of the model matrix of any strictly parameterized model component, where θ is the corresponding parameter vector.
  • Fi is a smooth function of the covariable xk, where K is the basis of each function.

GAM is a good choice if you are building regression models but doubt that smooth fitting will do a better job. They are suitable for non-linear or noisy data.

7 gam fitting

 

Then, how to build GAM model for the above S-type data? Here, I’ll use a cubic spline regression:

gam(Y ~ s(X, bs="cr")
Copy the code

The Settings above mean:

  • (s) specifiedlightSlide. There are other options, but S is a good default
  • Bs = "cr" tells it to use a cubic regression spline ('basis').
  • The s function calculates the default number of knots to use, but you can change it to k=10, for example, 10 knots.


8 Model Output:

View model summary:

## ## Family: gaussian ## Link function: identity ## Parametric coefficients: # # Estimate Std. Error t value (Pr > | | t) # # (Intercept) 43.9659 0.8305 52.94 < 2 * * * e - 16 # # - # # Signif. Codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '1 ## ## Approximate significance of smooth terms: ## edf ref. df F p-value ## s(X) 6.087 7.143 296.3 <2e-16 *** ## 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 ' ' ' '1 # # # # R - 0.1 sq. (adj) = 0.876 Deviance explained 87.9% = # # GCV = 211.94 Scale Est. = 206.93 n = 300Copy the code
  • Shows the model coefficients of our intercept, where all non-smooth parameters will be displayed
  • The general meaning of each smooth term is as follows.
  • This is based on “effective degrees of freedom” (EDF), because the spline functions we are using can be extended to many parameters, but we are also punishing them and reducing their impact.


9 Check the model:

The gam.check() function can be used to view the residual graph, but it can also test the smooth to see if there are enough knots to describe the data. But if the p value is very low, more knots are needed.

 

## ## Method: GCV Optimizer: magic ## Smoothing parameter selection converged after 4 iterations. ## The RMS GCV score gradient at convergence was ## The Hessian was positive definite. ## Model rank = 10/10 ## ## Basis Dimension (k) Checking results.  Low p-value (k-index<1) may ## indicate that k is too low, ## ## k 'edf k-index p-value ## s(X) 9.00 6.09 1.1 0.97Copy the code


Is it better than the linear model?

Let’s compare ordinary linear regression models with the same data:

anova(my_lm, my_gam)
Copy the code
## Analysis of Variance Table ## ## Model 1: Y ~ X ## Model 2: Y ~ s(X, Bs = "cr") ## res. Df RSS Df Sum of Sq F Pr(>F) ## 1 298.0088154 ## 2 292.91 60613 5.0873 27540 26.161 < 2.2e-16 *** ## - # # Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1 "' 1Copy the code

Our anOVA function performs the F test here, and our GAM model is significantly better than linear regression.

11 summary

So, we looked at what a regression model is, how we account for one variable Y and another variable X. One of the basic assumptions is linear, but this is not always the case. As the relationship changes in the range of x, we can use functions to change the shape. A good way to do this is to link smooth curves together at the junction, which we call splines.

We can use these splines in regular regression, but if we use them in the context of GAM, we estimate both the regression model and how to make our model smoother.

The example above shows a spline based GAM that fits much better than a linear regression model.

 


Reference:

  • NELDER, J. A. & WEDDERBURN, R. W. M. 1972. Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General), 135, 370-384.
  • HARRELL, F. E., JR. 2001. Regression Modeling Strategies, New York, Springer-Verlag New York.


Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Hosmer-lemeshow goodness of fit test in R language regression

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7. Realize Logistic Logistic regression in R language

8. Python predicts stock prices using linear regression

9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression