Original link:http://tecdat.cn/?p=23159 

Original source:Tuoduan numbers according to the public account of the tribe

Mixed models have been around in statistics for a long time. For example, the standard anOVA method can be seen as a special case of hybrid models. Recently, hybrid models have been applied and extended to cover a wide variety of data situations.

The term

For those unfamiliar, the terminology surrounding the hybrid model, especially the interdisciplinary terminology, can be a little confusing. Some of the terms you might come across for these types of models include.

  • Variance components
  • Random intercept and slope
  • Random effects
  • Random coefficients
  • Coefficient of variation
  • Intercept and/or slope as a result
  • Hierarchical linear model
  • Multi-level model (means multi-level hierarchical clustering data
  • Growth curve model (probably Latent GCM).
  • Mixed effect model

Both describe the types of hybrid models. Mixed effects, or simply mixed, a model generally refers to a mixture of fixed and random effects. For general models, I prefer to use “hybrid models” or “random effects models” because they are simpler terms that do not imply a specific structure, and because the latter can be applied to extensions that many people would not think of when using other terms. In terms of mixed effects, fixed effects refer to the typical main effects that one sees in linear regression models, the non-random parts of the mixed model, and in some cases they are referred to as population mean effects. However defined, random effects are only those specific to the observed sample. The approach outlined here involves primarily observing situations where the sample is at some level of grouping factor.

The type of cluster

The data may have one or more clustering sources, and the clustering may be hierarchical, such as nested within other clusters. One example is to test students for academic ability multiple times (repeat observation nested in students, students nested in schools, schools nested in districts). In other cases, there is no nested structure. One example is the reaction time experiment, where participants perform the same set of tasks. While observations are nested within individuals, they are also clustered by task type. The terms nesting and crossover are used to distinguish these cases. In addition, the clustering may be balanced or unbalanced.

In the following sections, we will look at the mixed effects model for all of these data cases. In general, our approach will be the same, since this clustering is actually more of a property of the data than of the model. However, it is important to understand the flexibility of the hybrid model in dealing with a variety of data situations.

Random intercept model

Below we show the simplest and most common case of the mixed model, where we have a single grouping/group structure for random effects. This is often referred to as the random intercept model.

Example: Student’s college grade point average GPA

Below we will evaluate the factors that predict college grade point average (GPA). Each of the 200 students was assessed six times (each semester of the first three years), so we had observation groups among the students. We also have other variables like status, gender, and high school GPA. Some will come in the form of labels and numbers.

Standard regression model

Now the basic model. We can present it in several different ways. First, let’s start with a standard regression to determine our direction.

We have a coefficient (b) on the impact of intercept and time. The error (ϵ) is assumed to be normally distributed with a mean of 0 and a standard deviation of σ.

Another way to write it is a model that emphasizes the basic data generation process of GPA, as follows.

More strictly, the GPA and μ variables have an implied subscript for each observation, but you can also think of it as a model for a single individual at a single point in time.

Hybrid model

describe

Now we present a way to describe a hybrid model that includes the unique effects of each student. Consider the following model for a single student. This suggests that the student-specific effect, i.e. the GPA deviation simply because of who the student is, can be seen as an additional source of variance.

 

We (usually) make the following assumptions about the student effect.

Thus, the student effect is random, specifically normally distributed with a zero mean and a certain estimated standard deviation (τ). In other words, conceptually, the only difference between this hybrid model and the standard regression is the student effect, which, on average, has no effect, but usually varies somewhat from student to student, on average τ.

If we rearrange, we can instead focus on the coefficients of the model, rather than as an additional source of error.

Or to put it more succinct

So that way, we would have student intercepts, because each person would have their own unique influence on the overall intercept, which would make each person’s intercept different.

Now we see that the intercepts are normally distributed, with the mean of the population intercepts and some standard deviations. Therefore, this is often referred to as the random intercept model.

Multi-level model

Another way of showing mixed models is often seen in the literature on multi-level models. It is more explicitly shown as a two-part regression model, one at the sample level and one at the student level.

However, after “inserting” the part of the second layer into the first layer, it is the same as the former.

Note that we do not have an effect for the student situation. In this case, the scenario is said to be a fixed effect, with no random component. But that’s definitely not the case, as we’ll see.

application

visualization

Here, we plot GPA in relation to scenarios (i.e., semesters) to understand starting points and changing trends.

plot(occasion, gpa,smooth(method = 'lm')

 

All student paths are shown as light paths, and 10 samples are shown in bold. The estimated overall trend for the regression we will do later is shown in red. Two things stand out. One is that students start with a lot of variability. Second, although the overall trend in GPA is to rise over time, individual students may differ on this trajectory.

Standard regression

So let’s get started. First of all, let’s look at regression, and just take the time indicator as a covariable, and let’s take it as a number.

lm(gpa ~ occasion)
## summary(lm)

 

The data above tells us that at the beginning, when the semester is zero, the average GPA, expressed as an intercept, is 2.6. In addition, as we move from semester to semester, we can expect the GPA to increase by about 0.11 points. This would be fine, except that we ignored clustering. A side effect of this is that our standard error is biased, and therefore claims of statistical significance based on standard error are biased. More importantly, we can’t explore the student effect, but the student effect makes sense.

Grouping regression

Another approach is to perform regression for each student individually. However, this approach has many disadvantages – it is not easy to summarize when there are many groups, often there is too little data within each group to do this (as in this case), and the models are overly contextualized, meaning they ignore what students have in common. We will compare such an approach to the hybrid model later.

Running the hybrid model

Next we run a hybrid model that will allow students to have specific effects. In the following, the code looks just like the regression you would do with LM, but with an extra section to specify the group, the student effect. 1 | (student) means we allow intercept (1) owing to the different students. Using a hybrid model, we can get the same results as regression, but there will be more to discuss.

library(lme4)
gpa_mixed = lmer(gpa ~ occasion + (1 | student), data = gpa)
## summary(gpa_mixed)

 

First of all, we see that the coefficients of the intercept and time, which can be called the fixed effect here, are the same as what we see in the standard regression, and the explanation is the same. On the other hand, the standard errors here are different, although at the end of the day our conclusions are statistically the same. Note in particular that the standard error of the intercept has increased. Conceptually, you could argue that allowing random intercepts for each person allows us to get information about the individual, while recognizing uncertainty about the population mean.

Although we have coefficients and standard errors, you may have noticed that LME4 does not provide a P-value for several reasons. There are several reasons for this, that is, for the hybrid model, we’re basically in dealing with different sample sizes, Nc, within the group may be varied from group (or even a single observation), and total N observations, which made us in resources distribution, degree of freedom and how to approximate solutions “optimal” is in a state of a kind of fuzzy. Other programs provide p-values automatically, seemingly without any problems, and don’t tell you which method they use to calculate p-values (there are several). In addition, these approximations can be very poor in some cases, or make assumptions that may not be appropriate for the situation.

However, it is relatively straightforward to obtain confidence intervals, as shown below 7.

confint(gpa)

 

Variance component

What is new compared to the standard regression output is the estimated variance/standard deviation of the student effect (ττ in our previous formula description). This tells us, on average, how much G.P.A. changes when we move from one student to another. In other words, even after making predictions based on the point in time, each student has its own unique bias, and this value (in terms of standard deviation) is the estimated mean bias for the entire student.

Another way to interpret the variance output is to note the percentage of student variance in the total, or 0.064/0.122 = 52%. This is also called intra class correlation because it is also an estimate of intra group correlation, as we shall see later.

Estimation of random effects

After running the model, we can actually get an estimate of the student effect. I showed the first five students two methods, either as a random effect or as a random intercept (i.e. intercept + random effect).

ef(mixed)$student %>% head(5)

 

coef

 

Please note that we do not allow semester changes, so it has a constant, or fixed, effect for all students.

Usually, we are very interested in these effects and want to have some sense of uncertainty about them. You can do that by predicting the interval. Or you can just look at the graph of them.

Interval(Mixed) # is used for prediction of various models and may use new data sim(mixed) # Mean, median and SD values of random effect estimates plot(Mixed)) # to plot Interval estimates

The figure below shows the estimated random effects for each student and their interval estimates. The random effects are normally distributed, their mean is zero, and are represented by a horizontal line. Intervals excluding zeros are shown in bold.

To predict

Now let’s look at the standard predictions versus the predictions of a particular group. As with most R models, we can use predictive functions on the model.

predict(mixed, re.form=NA) %>% head()

In the code above, we specify that the random effect re.form=NA is not used, so our prediction of the observed value is about the same as what we get from the standard linear model.

predict(mixed, re.form=NA)
predict(lm)

But everyone has their own unique intercept, so let’s see how the predictions differ when we add that information.

predict(mixed)

 

Based on the estimated student effect, students start at high or lower than the estimated intercept of all students. The following is an intuitive comparison between the unconditional prediction and the conditional prediction that includes the random intercepts of the first two students.

plot(x = occasion,y = gpa, color = student,prediction, group = student,y = prediction)

We can see that the prediction results of the mixed model are offset due to the different intercepts. For these students, the shift reflects a relatively poor starting point.

Covariables at the clustering level

Note that we describe the hybrid model as a multi-level model.

 

If we add student-level covariates, such as gender, to the model, we get the following results.

Where, after the insertion, we still have the same model as before, with the addition of one predictor.

Therefore, adding covariates at the group level does not have any unusual effect on the way we think about the model. We just add them to our set of prediction variables. Also note that we can create a cluster level covariable as an average or a summary of some other observation level variable. This is especially common when the cluster represents geographical units and the object of observation is a person. For example, we can use income as a covariable at the individual level and use the median to represent the overall wealth of a geographic area.

Summary of basic knowledge of hybrid model

Hybrid models enable us to take into account clustering in the data. We better understand the source of variability in target variables. We also obtain group-specific estimates of the parameters in the model, allowing us to accurately understand the differences between groups. Furthermore, this in turn allows us to make group-specific predictions and thus make more accurate predictions, assuming that there is significant variation due to clustering. In short, even in the simplest cases, the hybrid model has many benefits.

practice

sleep

In this exercise, we will use sleep study data from the LME4 software package. Here is a description of it.

The average daily reaction time of the subjects in the sleep restriction study. On day 0, the subjects had a normal amount of sleep. From that night on, they were restricted to three hours of sleep each night. The observations represent the average response time (in milliseconds) of a series of tests given to each subject each day.

After loading the software package, you can load the data in the following ways. I showed you the first few observations.

  1. Run a regression with the dependent variable as the target variable and the day as the predictor variable.
  2. Run a mixed model that represents the Subject with random intercepts.
  3. Explain variance components and fixed effects.

Add cluster level covariables

Re-run the mixed model with GPA data, adding either cluster-level covariable gender or high school GPA (highGPA), or both. Explain all aspects of the results.

What happens to students’ variance when covariates at the clustering level are added to the model?

Simulation hybrid model

The following represents a simple way to simulate the random intercept model.

Set. Seed (1234) # Copy your results N = Ngroups * NperGroup y = 2 +.5 * x + u\[groups\] + e

Which of the above represents fixed effects and random effects? Now run the hybrid model.

Are the results in line with your expectations?

In the following, we will change various aspects of the data, then re-run the model after each change, and then summarize as before and get confidence intervals. For each of these, pay special attention to at least one change in the result.

First, the in-class correlation coefficients are calculated.

.

In addition, create density maps of random effects.

  1. Change the random effects variance /SD and/or residual /SD, note your new estimate for ICC, and plot the random effects as before.
  2. Resets the value to the original value. Change Ngroups to 50. What difference do you see in confidence interval estimates?
  3. Reset Ngroups to 100. Now change the NperGroup to 10, and notice again how the confidence interval differs from the base condition.

Most popular insight

1. Lmer mixed linear regression model based on R language

2.R language uses Rshiny to explore LME4 Generalized Linear Mixed Model (GLMM) and Linear Mixed Model (LMM).

3. Actual case of R language linear mixed effect model

4.R language linear mixed effect model practical case 2

5. Actual case of R language linear mixed effect model

Partial folded Gibbs sampling for Linear mixed-effects Models

7.R language LME4 mixed effects model to study the popularity of teachers

8. The HAR-RV model based on mixed data Sampling (MIDAS) regression in R language predicts GDP growth

9. Hierarchical linear model HLM using SAS, Stata, HLM, R, SPSS and Mplus