Original link:tecdat.cn/?p=20531

Original source:Tuo End number according to the tribe public number

 

 

In the standard linear model, we assume that. When the linear hypothesis cannot be satisfied, other methods can be considered.

  • Polynomial regression

The extension might be assuming some polynomial function,

Similarly, in the standard linear model method (conditional normal distribution using GLM), the parametersCan be obtained using the least square method, where 在  。

Even if this polynomial model is not a true polynomial model, it may still be a good approximation. In fact, according toStone – Weierstrass theoremIf theIf it is continuous on some interval, there is a uniform approximationBy a polynomial function.

For illustration only, consider the following data set


db = data.frame(x=xr,y=yr)
plot(db)
Copy the code

And the standard regression line

reg = lm(y ~ x,data=db)
abline(reg,col="red")
Copy the code

 

Consider some polynomial regression. If the degree of polynomial function is large enough, you can get any kind of model,

reg=lm(y~poly(x,5),data=db)
Copy the code

 

But too many times and you get too much “fluctuation,”

reg=lm(y~poly(x,25),data=db)
Copy the code

 

And estimates can be unreliable: if we change a point, a (local) change may occur

yrm=yr; yrm[31]=yr[31]-2 lines(xr,predict(regm),col="red")Copy the code

  • Local regression

In fact, if we’re interested in the local there’s a good approximationWhy not use local regression?

This can easily be done using weighted regression, in the least square formula we consider

 

  • I’m thinking about linear models here, but I can think about any polynomial model. In this case, the optimization problem is

It can be solved because

For example, if we want to predict at some point, consider. Using this model, we can delete observations that are too far away,

 

A more general idea is to consider some kernel functionsGive the weight function, and give some bandwidth (usually expressed as H) for the neighborhood length,

That’s actually what it’s calledNadaraya-WatsonFunction estimator.

In the previous case, we considered the unified core.

But using this weight function which is very discontinuous is not the best choice, try the Gaussian kernel,

This can be used


w=dnorm((xr-x0))
reg=lm(y~1,data=db,weights=w)
Copy the code

On our data set, we can plot

w=dnorm((xr-x0)) plot(db,cex=abs(w)*4) lines(ul,vl0,col="red") axis(3) axis(2) reg=lm(y~1,data=db,weights=w) U = seq (0, 10, by =. 02) v = predict (reg, newdata = data frame (x = u)) lines (u, v, col = "red", LWD = 2)Copy the code

Here, we need a local regression at point 2. The horizontal line below is regressive (the size of the point is proportional to its width). The red curve is the evolution of local regression

 

Let’s use animation to visualize the curve.

But for some reason, I couldn’t easily install the package on Linux. We can use loops to generate some graphics

PNG (name,600,400) for(I in 1:length(vx0)) graph (I)Copy the code

Then, I use

 

Of course, you can think about local linear models,

 
return(predict(reg,newdata=data.frame(x=x0)))}
Copy the code

 

Even quadratic (local) regression,

 
 lm(y~poly(x,degree=2), weights=w)
 
Copy the code

 

Of course, we can change the bandwidth

 

Note that we actually have to choose the weight function (so called kernel). However, there are (easy) ways to choose the “best” bandwidth h. The idea of cross-validation is to consider

 Is a prediction obtained using local regression.

We can try some real data.

library(XML)
 
data = readHTMLTable(html)
 
Copy the code

Collating data sets,

Plot ($no data, data $mu, ylim = c (6, 10)) segments ($no data, data * data $$mu - 1.96 se,Copy the code

 

We calculate the standard error to reflect the uncertainty.

 
for(s in 1:8){reg=lm(mu~no,data=db, 
lines((s predict(reg)[1:12] 
Copy the code

It is not a good assumption that all seasons should be considered completely independent.

 smooth(db$no,db$mu,kernel = "normal",band=5)
 
Copy the code

We can try to look at a curve with a larger bandwidth.

db$mu[95]=7
  
plot(data$no,data$mu

lines(NW,col="red")
Copy the code

The spline smoothing

Next, we discuss smoothing methods in regression. Assuming that , It’s some unknown function, but it’s supposed to be smooth enough. For example, supposeIt’s continuous,Exists and is continuous,Being and being continuous and so on. ifSmooth enough to useTaylor expansion.Therefore, for

Or you could write it as

The first part is just a polynomial.

Using Riemann integrals, observe that

 

As a result,

We have linear regression models. A natural idea is to consider regressionfor 

Give some nodes.

plot(db)
Copy the code

 

If we take a node and extend the order by 1,

B = bs (xr, knots = c (3), a Boundary. The knots = c (0, 10), degre = 1) lines (xr [xr < = 3], predict (reg) [xr < = 3], col = "red") lines(xr[xr>=3],predict(reg)[xr>=3],col="blue")Copy the code

Predictions obtained with this spline can be compared with regressions on a subset (dashed line).



lines(xr[xr<=3],predict(reg)[xr<=3


lm(yr~xr,subset=xr>=3)
Copy the code

 

This is different because here we have three parameters (regression of two subsets). One degree of freedom is lost when the continuous model is required. Observe that it can be written equivalently

Lm (yr ~ bs (xr, knots = c (3), a Boundary. The knots = c (0, 10)Copy the code

 

The functions that appear in the regression are as follows

 

Now, if we go back to these two components, we get

Matplot (xr, B abline (v = c (0,2,5,10), lty = 2)Copy the code

If we add a node, we get

 

Prediction is



lines(xr,predict(reg),col="red")
Copy the code

We can choose more nodes



lines(xr,predict(reg),col="red")
Copy the code

 

We can get a confidence interval



polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3]))

points(db)
Copy the code

 

If we keep the two nodes we chose earlier, but consider Taylor’s expansion of order 2, we get

Matplot (xr, B type = "l") abline (v = c (0,2,5,10), lty = 2)Copy the code

 

If we look at constants and based on the first part of the spline, we get



B=cbind(1,B)
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)
Copy the code

 

If we add the constant term, the first term and the second term, then we get something on the left before the first node,

k=3
lines(xr,B[,1:k]%*%coefficients(reg)[1:k]
Copy the code

By using three terms in a spline-based matrix, we can get the part between the two nodes,


lines(xr,B[,1:k]%*%coefficients(reg)[1:k]
Copy the code

 

And finally, when we sum them, this time the right-hand side after the last node,

k=5
Copy the code

 

This is what we get using quadratic spline regression with two (fixed) nodes. I can get the confidence interval just like I did before



polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3]))

points(db)
lines(xr,P[,1],col="red")
Copy the code

 

Using the function, can ensure the continuity of points.

 

Again, using linear spline functions, you can add continuity constraints,

Lm (mu ~ bs (no knots = c (12 * (1:7) +. 5), a Boundary. The knots = c (0,97), lines (c (1:9) 4 with just, predict (reg), col = "red")Copy the code

 

But we can also think about quadratic splines,

Abline (n = 12 * (0:8) +. 5, lty = 2) lm (mu ~ bs (no knots = c (12 * (1:7) +. 5), a Boundary. The knots = c (0,97),Copy the code

 


Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Hosmer-lemeshow goodness of fit test in R language regression

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7. Realize Logistic Logistic regression in R language

8. Python predicts stock prices using linear regression

9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression