Original link:tecdat.cn/?p=14887 

Generalized linear model (GLM)Is through theThe connection function, put the independent variablelinearThe combination is associated with a probability distribution of the dependent variable, which can be a Gaussian, binomial, polynomial, Poisson, gamma, or exponential distribution. The connection functions are:

  • Square root connection (for Poisson model)

Consider some random variable Y with mean μ and variance σ2. Taylor expansion

ifIf g (y) = SQRT {y} g (y) = y, then the second equation becomes

Therefore, we have variance stability through the square root transformation, which can be interpreted as a certain homology.

  • Log function of Bernoulli model

Suppose the variables are Poisson variables,

The previous model looked like Bernoulli regression analysis, with H as the link function, \ mathbb {P}

So now suppose that instead of observing N, we observe that Y = 1 (N> 0). In that case, running Bernoulli regressions with logarithmic linking functions, first with running Poisson regressions on the raw data, and then using them on our binary variables zero and non-zero. Let’s compare eλx and PX from standard logistic regression with some simulation data


regPois = glm(Y~.,data=base,family=poisson(link="log"))
regBinom = glm((Y==0)~.,data=base,family=binomial(link="probit"))
Copy the code

 

 

 

What if px \ is obtained from Bernoulli’s regression and has a join function?


plot(prob,1-exp(-lambda),xlim=0:1,ylim=0:1)
abline(a=0,b=1,lty=2,col="red")
Copy the code

 

The fit is good. Now, if we model the marital infidelity data set, published by Ray Fair in 1978 in the journal Political Economy (563 observations, nine variables) :


prob = predict(regBinom, type="response")
plot(prob,exp(-lambda),xlim=0:1,ylim=0:1)
abline(a=0,b=1,lty=2,col="red")
Copy the code

 

In this case, the two models turn out to be very different. The second model is the same


plot(prob,1-exp(-lambda),xlim=0:1,ylim=0:1)
abline(a=0,b=1,lty=2,col="red")
Copy the code

 

 

 

How do we explain this? Is it because the Poisson model is bad? We run the zero inflation model here to compare,


summary(regZIP)
 
Count model coefficients (poisson with log link):
             Estimate Std. Error z value Pr(>| z |) (Intercept) 0.002274 0.048413 0.047 0.963 1.019814 0.026186 38.945 X1<2e-16 ***
X2           1.004814   0.024172  41.570   <2e-16 *** 
Zero-inflation model coefficients (binomial with logit link): 
            Estimate Std. Error z value Pr(>| z |) (Intercept) 4.90190 2.07846 2.358 0.0184 2.00227 0.86897 2.304 0.0212 * * X1 - X2-0.01545-0.96121-0.016-0.9872 - Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1 "' 1Copy the code

Because of the expansion of zero, we reject the assumption of the Poisson distribution here and can use logarithmic connections to check whether the Poisson distribution is a good model.

 

 


reference

1. Use SPSS to estimate the HLM hierarchical linear model

2. Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Regular Discriminant Analysis (RDA) of R language

3. Lmer mixed linear regression model based on R language

4. Simple Bayesian linear regression simulation analysis of Gibbs sampling in R language

5. Use GAM (Generalized additive Model) to analyze power load time series in R language

6. Hierarchical linear model HLM using SAS, Stata, HLM, R, SPSS and Mplus

Ridge regression, lasso regression, principal component regression in R language: Linear model selection and regularization

8. Prediction of air quality ozone data by linear regression model in R language

9.R language hierarchical linear model case