1. Ebook collection

A few days ago, I posted a GWAS e-book sharing article, which was extremely popular, with more than 8000 readers and many people’s comments were quite basic. The main feature of this e-book is the basis of comparison. GLM model is compared by software and R language. How to add digital covariable, factor covariable, PCA and other contents can be said to be the basis of model construction.

Today, according to their own understanding, plus access to the data, introduce the use of covariates.

2. What are covariables

In fact, covariates in GWAS are not the same as covariates in general models.

General model: y=F1+F2+x1+x2 y=F1+F2+x1+ x2y=F1+F2+x1+x2

  • F1, F2 are factors characterized by factors such as different colors (red, yellow, green)
  • X1 and X2 are covariables and are characterized by numerical values, which are inferior to the values of birth weight and PCA

Covariates are variables of exponential type.

In GWAS model: y=x1+x2y =x1+x2y =x1+x2

  • There are only covariates in GWAS, so called factors, which are also covariates
  • In GWAS analysis summary, factors are also converted into dummy variables into the model

Examples demonstrate

Here’s an example:

library(learnasreml)
data(fm)
head(fm)
str(fm)
Copy the code

This Rep has 5 levels (5 repetitions) and is of factor type. In anOVA, it is a factor:

# mod anova
mod = aov(dj ~ Rep, data=fm)
summary(mod)
coef(mod)

Copy the code

Here, the anOVA of Rep was 4 degrees of freedom, and the effect value of each level was given when coEF was used to check the coefficients.In regression analysis, it is also a factor:

mod2 = lm(dj ~ Rep, data=fm)
summary(mod2)
anova(mod2)
Copy the code

In regression analysis, lm function is used, summary is used to give the effect value at each level, and t-test results. Anova will print out the results of anOVA.

The above example shows that the AOV and LM functions are equivalent.

A factor is equivalent to a covariable

What if we turn Rep into dummy variables and then do a regression analysis of numeric variables?

library(useful) xx = build.x(~Rep-1,data=fm,contrasts = F) dat = cbind(xx[,-1],dj = fm$dj) %>% as.data.frame() head(dat)  str(dat)Copy the code

Use the function build.x to change the factors into dummy variables (numerical variables) and then perform regression analysis.

mod3 = lm(dj ~.,data=dat)
summary(mod3)
Copy the code

It turns out that dummy variables (numeric variables) changed by factors have the same results. So they’re equivalent.

This also illustrates that in a GWAS analysis, you think factors and variables are two types, but in a GWAS model, they both end up as covariables.

Note:

  • The first factor in R is forced to be 0, so the first column is dropped when building the dummy variable
  • R has an intercept (mu) by default, so when building dummy variables, remove the intercept

Writing here, I came up with a sentence:

When you think of anOVA and regression as the same thing, you’re advanced.

So, I advanced, haha.

Therefore, the variance analysis and linear regression analysis in the statistics textbook are based on the general linear model (GLM), which can be applied to GWAS analysis to explain the difference between factor covariables, numerical covariables and PCA covariables.

Without it, in the GWAS model, it becomes a numerical covariable.

Next tweet, how to build covariables in Plink, and how to build covariables in R. Stay tuned.