Original link:tecdat.cn/?p=3234

Original source:Tuo End number according to the tribe public number

 

Probabilistic programming allows us to implement statistical models without worrying about technical details. It is particularly useful for Bayesian models based on MCMC sampling.

Video: An example of RStan Bayesian hierarchical model analysis in R


Stan profile

Stan is a C ++ library for Bayesian inference. It is based on the No-U-turn sampler (NUTS), which is used to estimate a posterior distribution based on user-specified models and data. Performing analysis using Stan involves the following steps:

  1. Specify statistical models using the Stan modeling language. Through dedicated.stanThe file completes this operation.
  2. Prepare the data to be supplied to the model.
  3. The use of thestanThe function samples from a posterior distribution.
  4. Analyze the results.

In this article, I’ll show the use of Stan through a two-level model. I’ll use the first model to discuss the basic capabilities of Stan, and the second example to demonstrate more advanced applications.

School data set

The first data set we’re going to use is the school data set. The data set measures the impact of coaching programs on the college entrance exam, the Scholastic Aptitude Test (SAT) used in the United States. The data set is as follows:

As we can see: for most of the eight schools, the short-term coaching program did improve SAT scores. For this data set, we were interested in estimating the size of the true coaching program effect relative to each school. Let’s consider two alternatives. First, we can assume that all schools are independent of each other. However, this will be difficult to explain because the posterior intervals of schools overlap to a large extent due to high standard deviations. Second, you can aggregate data for all schools, assuming that the true effect is the same for all schools. However, this is also unreasonable because the plan has different effects for schools (for example, different teachers and students should have different plans).

Therefore, another model is needed. The layered model has the advantage of combining information from all eight schools without assuming a common real effect. We can specify a hierarchical Bayesian model in the following way:

According to this model, coaching effects follow a normal distribution, with the mean of which is the true effect θj and the standard deviation of which is σj (from the data). The true influence θj follows the normal distribution of the parameters μ and τ.

Define Stan model files

Having specified the model to use, we can now discuss how to specify this model in Stan. Before defining the Stan program for the above model, let’s look at the structure of the Stan modeling language.

variable

In Stan, variables can be defined in the following ways:

int<lower=0> n; Int <upper=5> n; Int <lower=0,upper=5> n; The range of # n is [0,5]Copy the code

Note that if the variables are known prior, the upper and lower boundaries of the variables should be specified.

Dimensional data can be specified with square brackets:

vector[n] numbers; // real[n] numbers; // matrix[n,n] matrix; // n by n matrixCopy the code

The program

The following programs are used in Stan:

  • data: Used to specify data subject to Bayes’ rule
  • Converted data: Used to preprocess data
  • parameter(Mandatory) : Used to specify model parameters
  • Converted parameters: used to calculate the parameter processing before a posteriori
  • model(Required) : Used to specify the model
  • To generate the number: used for post-processing the results

For model blocks, distributions can be specified in two equivalent ways. First, use the following statistical notation:

y ~ normal(mu, sigma); # y is normally distributedCopy the code

The second approach uses a programmatic representation based on the logarithmic probability density function (LPDF) :

target += normal_lpdf(y | mu, sigma); # Increase the normal log densityCopy the code

Stan supports a large number of probability distributions. This lookup function comes in handy when specifying a model through Stan: it provides a mapping from R functions to Stan functions. Consider the following example:

Library (rstan) # Lookup (rnorm)Copy the code
##     StanFunction             Arguments ReturnType Page
## 355   normal_rng (real mu, real sigma)       real  494
Copy the code

Here we see that rnorm in R is equivalent to Normal_rng in Stan.

model

Now that we know the basics of the Stan modeling language, we can define the model and store it in a file called schools.stan:

Note that θ never appears in the parameters. This is because we did not model θ explicitly, but rather η (the standardized effect of each school). Then, θ is constructed from the transformed parameters of μ, τ, and η. This parameterization makes the sampler more efficient.

 

Prepare data for modeling

Before fitting the model, we need to encode the input data into a list with parameters that correspond to the data portion of the Stan model. For school data, the data are as follows:

schools.data <- list(
  n = 8,
  y = c(28,  8, -3,  7, -1,  1, 18, 12),
  sigma = c(15, 10, 16, 11,  9, 11, 10, 18)
)
Copy the code

Sampling from a posterior distribution

We can sample from a posterior distribution using the Stan function, which performs the following three steps:

  1. It converts the model specification into C ++ code.
  2. It compiles C ++ code to a shared object.
  3. It samples from a posterior distribution based on the specified model, data, and Settings.

If rSTAN_options (auto_write = TRUE), subsequent calls to the same model will be much faster than the first call, because the Stan function then skips the first two steps (converting and compiling the model). In addition, we will set the number of kernels to use:

Options (Mc.cores = parallel::detectCores()) # parallelize rSTAN_options (auto_write = TRUE) # Store compiled Stan modelsCopy the code

Now we can compile models and samples from a posteriori.

The model

We will begin with a basic explanation of the model and then examine the MCMC program.

Basic model interpretation

To perform inferences using the fit model, we can use the print function.

Print (fit1) # Optional: pars, probsCopy the code
## Inference for Stan model: schools. ## 4 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, Total post-warmup Draws =4000. ## ## mean se_mean SD 2.5% 25% 50% 75% 97.5% n_eff Rhat ## mu 7.67 0.15 5.14-2.69 4.42 7.83 10.93 17.87 1185 1 ## tau 6.54 0.16 5.40 0.31 2.52 5.28 9.05 20.30 1157 1 ## eta[1] 0.42 0.01 0.92-1.47 0.18 0.44 1 ## eta[2] 0.03 0.07 0.07-1.74-0.54 0.03 0.58 1.72 4000 1 ## eta[3] -0.18 0.02-1.95-0.81 Eta [4] 0.45 1.65 3690 # # 1-0.03 0.01 0.92 0.64 0.02 0.57 1.81-1.85-4000 eta [5] 1 # # 0.33 0.01 0.86 2.05 0.89 Eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## [7] eta ## 0.37 0.96 2.02 3017 1 ## eta[8] 0.05 0.01 0.92-1.77-0.55 0.05 0.69 1.88 4000 1 ## theta[1] 11.39 0.15 8.09-2.21 6.14 1 ## theta[2] 7.92 0.10 6.25-4.75 4.04 8.03 11.83 20.05 4000 1 ## theta[3] 6.22 0.14 7.83 -11.41 2.03 6.64 10.80 20.97 3043 1 ## theta[4] 7.58 0.10 6.54-5.93 3.54 7.60 11.66 20.90 4000 1 ## theta[5] 5.14 0.10 6.30-8.68 1.40 5.63 9.50 16.12 4000 1 ## theta[6] 6.08 0.10 6.62-8.06 2.21 6.45 10.35 18.53 4000 1 ## theta[7] 10.60 0.11 6.70-0.94 6.15 10.01 14.48 25.75 4000 1 ## theta[8] 8.19 0.14 8.18-8.13 3.59 8.01 12.48 25.84 3361 1 ## lp__ 1 ## ## Samples were drawn using NUTS(diag_e) at Thu Nov 29 11:17:50 2018. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1).Copy the code

Here, the row name represents the estimated parameter: mu is the mean of the posterior distribution and tau is its standard deviation. The entries for eta and theta represent estimates of the vectors η and θ, respectively. These columns represent computed values. Percentages are confidence intervals. For example, the 95% confidence interval μ for the overall effect of the coaching program was [-1.27,18.26]. Since we are not sure of the mean, the 95% confidence interval for θj is also wide. For example, for the first school, the 95% confidence interval is [−2.19,32.33].

We can use the following plot function to visualize the uncertainty in the estimate:

The black line represents 95% of the interval, while the red line represents 80% of the interval. The circle represents an estimate of the mean.

We can use the following extract function to get the generated sample:

<- extract(FIT1, permuted = TRUE) # 1000 samples per parameterCopy the code

MCMC diagnosis

By plotting the trajectory of the sampling process, we can determine if something went wrong during the sampling. For example, a chain that stays in one position too long or takes too many steps in one direction can cause problems. We can use the traceplot function to plot the trajectories of the four chains used in the model:

# diagnosis:Copy the code

 

To get samples from each Markov chain, we can use the extract function again:

# chains mu tau eta[1] eta[2] eta[3] eta[4] # chain:1 1.111120 2.729124-0.1581242 -1.9874554 ## chain:2 3.633421 2.588945 1.2058772-1.1173221 1.4830778 0.4838649 ## chain:3 13.793056 3.144159 0.6023924 -1.1188243-1.2393491-0.6118482 ## chain:4 3.673380 13.889267-0.0869434 1.1900236-0.0378830-0.2687284 ## parameters ## chains ETA [5] ETA [6] ETA [7] eta[8] theta[1] ## chain Chain: 2-1.8057252 0.7429594 0.9517675 0.55907356 6.7553706 ## chain: 3-1.5867789 0.6334288-0.4613463-1.44533007 ## parameters ## chains :4 0.1028605 0.3481214 0.9264762 0.45331024 2.465799 ## chains: 1 Theta [4] Theta [5] Theta [6] Theta [7] ## chain: 1-1.208335 2.482769-4.31289292 2.030181-2.147684 2.703297 ## chain:2 0.740736 7.473028 4.88612054-1.041502 5.556902 6.097494 ## chain:3 10.275294 9.896345 11.86930758 8.803971 15.784656 ## parameters ## chains theta[8] lp__ ## Chain :1 0.8826584-41.21499 ## chain:2 5.0808317-41.17178 ## chain:3 9.2487083-40.35351 ## chain:4 9.9695268 36.34043Copy the code

For a more advanced analysis of the sampling process, we can use the Shinystan package. Using this package, you can start the Shiny application to analyze the fit model by:

library(shinystan)
launch_shinystan(fit1)
Copy the code

Hierarchical regression

Now that we have a basic understanding of Stan, we can dive into more advanced applications: Let’s try hierarchical regression. In regular regression, we model the relationship of the following form

This representation assumes that all samples have the same distribution. If there were only one set of samples, then we would have a problem because we would be ignoring potential differences within and between groups.

Another option is to create a regression model for each group. However, in this case, small sample sizes can be problematic when estimating a single model.

Hierarchical regression is a compromise between the two extremes. The model assumes that groups are similar, but differ.

Let’s say that each sample is a member of one of the K groups. Then, hierarchical regression is specified as follows:

Where Yk is the result of the KTH group, αk is the intercept, Xk is the feature, and β (k) represents the weight. The hierarchical model differs from the model where Yk fits each group separately because the parameters αk and β (k) are assumed to derive from a common distribution.

The data set

A classic example of hierarchical regression is a mouse dataset. The data set contained measurements of rat body weight over a 5-week period. Let’s load the data:

## day8 day15 day22 day29 day36 ## 1 151 199 246 283 320 ## 2 145 199 249 293 354 ## 3 147 214 263 312 328 ## 4 155 200 237 272 297 ## 5 135 188 230 280 323 ## 6Copy the code

Let’s investigate the data:


library(ggplot2)
ggplot(ddf, aes(x = variable, y = value, group = Group)) + geom_line() + geom_point()
Copy the code

The data showed that the linear growth trend was very similar for different rats. However, we also saw that the initial weight of the rats required different intercepts, and the growth rate required different slopes. Therefore, the layered model seems appropriate.

The specification of hierarchical regression model

The model can be specified as follows:

The intercept of the ith rat is represented by α I and the slope by β I. Note that the center of the measured time is x = 22, which is the median measured value of the time series data (day 22).

Now we can specify the model and store it in a file named rats.stan:

Note that the model code estimates the variance (sigmasq variable) rather than the standard deviation.

Data preparation

To prepare the model data, we first extract the measurement points into numerical values and then encode everything as a list structure:


data <- list(N = nrow(df), T = ncol(df), x = days,
                 y = df, xbar = median(days)) 
Copy the code

Fit regression model

We can now fit a Bayesian hierarchical regression model for a mouse body weight dataset:

# The model contains estimates of intercept (alpha) and slope (beta)Copy the code

The prediction of hierarchical regression model

After determining alpha and beta for each rat, we can now estimate the weight of an individual rat at any point in time. Here, we look for the weight of rats from day 0 to day 100.

 


ggplot(pred.df[pred.df$Rat %in% sel.rats, ], 
       aes(x = Day, y = Weight, group = Rat, 

    geom_line()  +
Copy the code

Compared to the original data, the model’s estimate is smooth because every curve follows a linear model. Looking at the confidence intervals shown in the last graph, we can see that the variance estimates are reasonable. We were confident about the weight of the mice at the time of sampling (days 8 to 36), but the uncertainty increased as we left the sampling area.


Most welcome insight

1. Matlab uses Bayesian optimization for deep learning

2. Matlab Bayesian hidden Markov HMM model implementation

3. Simple Bayesian linear regression simulation of Gibbs sampling in R language

4. Block Gibbs Sampling Bayesian multiple linear regression in R language

5. Bayesian model of MCMC sampling for Stan probabilistic programming in R language

6.Python implements Bayesian linear regression model with PyMC3

7.R language uses Bayesian hierarchical model for spatial data analysis

8.R language random search variable selection SSVS estimation Bayesian vector autoregression (BVAR) model

9. Matlab Bayesian hidden Markov HMM model implementation