Original link:tecdat.cn/?p=19737

Original source:Tuo End number according to the tribe public number

 

 

Stan is a probabilistic programming language for specifying statistical models. Stan provides complete Bayesian inference for continuous variable models through Markov chain Monte Carlo methods (e.g. No-u-turn sampler, an adaptive form of Hamilton Monte Carlo sampling).

Stan can be called from R using the rSTAN package, or from Python using the Pystan package. Both interfaces support sample-based and optimisation-based inference, with diagnostics and a posterior analysis.

In this article, the main features of Stan are briefly presented. Two examples are also shown: the first relates to the simple Bernoulli model and the second to the Lotka-Volterra model based on ordinary differential equations.

What is Stan?

 

  • Stan is an imperative probabilistic programming language.
  • The Stan program defines the probability model.
  • It declares data and (constrained) parameter variables.
  • It defines a logarithmic posterior.
  • Stan reasoning: Making the model fit the data and make predictions.
  • It can use Markov chain Monte Carlo (MCMC) for complete Bayesian inference.
  • Approximate Bayesian inference is performed using variational Bayesian inference (VB).
  • Maximum likelihood estimation (MLE) is used to penalize maximum likelihood estimation.

What does Stan calculate?

  • A posterior distribution is obtained.
  • MCMC sampling.
  • draw, each of which is drawnAll with a posterior probabilityThe edge distribution of.
  • Plot using histogram, kernel density estimation, etc

The installationrstan

To run Stan in R, you must have the Rstan C ++ compiler installed. On Windows, Rtools is required.

Finally, install RSTAN:

install.packages(rstan)
Copy the code

Basic syntax in Stan

Define the model

The Stan model is defined by six blocks:

  • Data (mandatory).
  • Converted data.
  • Parameters (mandatory).
  • Converted parameters.
  • Model (mandatory).
  • The number generated.

External information read from a data block.

data {
  int N;
  int x[N];
  int offset;
}
Copy the code

The transformed data block allows data preprocessing.

transformed data {
  int y[N];
  for (n in 1:N)
    y[n] = x[n] - offset;
}
Copy the code

The parameter block defines the sampling space.

parameters {
  real<lower=0> lambda1;
  real<lower=0> lambda2;
}
Copy the code

The transform parameter block defines the parameter processing before computing posteriori.

transformed parameters {
  real<lower=0> lambda;
  lambda = lambda1 + lambda2;
}
Copy the code

In the model block, we define a posterior distribution.

model { y ~ poisson(lambda); Lambda1 ~ cauchy (0, 2.5); Lambda2 ~ cauchy (0, 2.5); }Copy the code

Finally, the generated quantity block allows for post-processing.

generated quantities {
  int x_predict;
  x_predict = poisson_rng(lambda) + offset;
}
Copy the code

type

Stan has two raw data types, and both are bounded.

  • Int is an integer.
  • Real is a floating point type.
int<lower=1> N;

real<upper=5> alpha;
real<lower=-1,upper=1> beta;

real gamma;
real<upper=gamma> zeta;
Copy the code

Real numbers extend to linear algebraic types.

vector[10] a; // matrix[10, 1] b; row_vector[10] c; // matrix[1, 10] d;Copy the code

Arrays of integers, real numbers, vectors, and matrices are available.

real a[10];

vector[10] b;

matrix[10, 10] c;
Copy the code

Stan also implements a variety of constraint types.

simplex[5] theta; // sum(theta) = 1 ordered[5] o; // o[1] < ... < o[5] positive_ordered[5] p; corr_matrix[5] C; // Cov_matrix [5] Sigma; / / positive definiteCopy the code

More information about Stan

All typical judgment and loop statements are also available.

if/then/else

for (i in 1:I)

while (i < I)
Copy the code

There are two ways to modify a posterior.

y ~ normal(0, 1); target += normal_lpdf(y | 0, 1); Increment_log_posterior (log_normal(y, 0, 1))Copy the code

And many of the sample statements are vectorized.

parameters { real mu[N]; real<lower=0> sigma[N]; } model { // for (n in 1:N) // y[n] ~ normal(mu[n], sigma[n]); y ~ normal(mu, sigma); // Vectorized version}Copy the code

Bayes method

Probability is cognitive. For example, John Stuart Mill said:

The probability of an event is not the event itself, but the degree to which we or others expect it to happen. Each event itself is deterministic, not probable; If we know all about it, we should or surely know it will happen, or it won’t.

For us, probability is the degree to which we expect it to happen.

Probability quantifies uncertainty.

A Bayesian example of Stan: a repeat trial model

We solve a small example where the goal is to give a random sample drawn from a Bernoulli distribution to estimate a posterior distribution of missing parametersChances of success

Step 1: Problem definition

In this example, we will consider the following structure:

  • Data:

    • , trial times.
    • That testnResults (known modeling data).
  • Parameters:

  • Prior distribution

  • The probability of

  • Posterior distribution

Step 2: Stan

We create the Stan program that we will call from R.

data { int<lower=0> N; Int <lower=0, upper=1> y[N]; int<lower=0, upper=1> y[N]; } model {theta ~ uniform(0, 1); // prior y ~ Bernoulli (theta); / / likelihood}Copy the code

Step 3: Data

In this case, we will use the example to randomly simulate a random sample, rather than using a given data set.

Y = rbinom(N, 1, 0.3) yCopy the code
##  [1] 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1
Copy the code

Calculate MLE as sample mean according to data:

# # 0.25 [1]Copy the code

Step 4:rstanUse bayesian posterior estimation

The last step is to get our estimate using Stan in R.

## ## SAMPLING FOR MODEL '6dcfbccbf2f063595ccc9b93f383e221' NOW (CHAIN 1). ## Chain 1: ## Chain 1: Gradient evaluation took 7e-06 seconds ## Chain 1: 1000 Transitions using 10 leapfrog steps per transition would take 0.07 seconds. ## Chain 1: Adjust your expectations accordingly! ## Chain 1: ## Chain 1: ## Chain 1: Iteration: 1 / 5000 [ 0%] (Warmup) ## Chain 1: Iteration: 500 / 5000 [ 10%] (Warmup) ## Chain 1: Iteration: 1000 / 5000 [ 20%] (Warmup) ## Chain 1: Iteration: 1500 / 5000 [ 30%] (Warmup) ## Chain 1: Iteration: 2000 / 5000 [ 40%] (Warmup) ## Chain 1: Iteration: 2500 / 5000 [ 50%] (Warmup) ## Chain 1: Iteration: 2501 / 5000 [ 50%] (Sampling) ## Chain 1: Iteration: 3000 / 5000 [ 60%] (Sampling) ## Chain 1: Iteration: 3500 / 5000 [ 70%] (Sampling) ## Chain 1: Iteration: 4000 / 5000 [ 80%] (Sampling) ## Chain 1: Iteration: 4500 / 5000 [ 90%] (Sampling) ## Chain 1: Iteration: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time Sampling ## Chain 1: Sampling in seconds (Total) ## Chain 1: Sampling in seconds (Total) ## SAMPLING FOR MODEL '6dcfbccbf2f063595ccc9b93f383e221' NOW (CHAIN 4). ## Chain 4: ## Chain 4: Gradient evaluation took 3e-06 seconds ## Chain 4: 1000 Transitions using 10 leapfrog steps per transition would take 0.03 seconds. ## Chain 4: Adjust your expectations accordingly! ## Chain 4: ## Chain 4: ## Chain 4: Iteration: 1 / 5000 [ 0%] (Warmup) ## Chain 4: Iteration: 500 / 5000 [ 10%] (Warmup) ## Chain 4: Iteration: 1000 / 5000 [ 20%] (Warmup) ## Chain 4: Iteration: 1500 / 5000 [ 30%] (Warmup) ## Chain 4: Iteration: 2000 / 5000 [ 40%] (Warmup) ## Chain 4: Iteration: 2500 / 5000 [ 50%] (Warmup) ## Chain 4: Iteration: 2501 / 5000 [ 50%] (Sampling) ## Chain 4: Iteration: 3000 / 5000 [ 60%] (Sampling) ## Chain 4: Iteration: 3500 / 5000 [ 70%] (Sampling) ## Chain 4: Iteration: 4000 / 5000 [ 80%] (Sampling) ## Chain 4: Iteration: 4500 / 5000 [ 90%] (Sampling) ## Chain 4: Iteration: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time: Elapsed Time Sampling ## Chain 4: Sampling in seconds (Total) ## Chain 4:Copy the code
## Inference for Stan model: 6dcfbccbf2f063595ccc9b93f383e221. ## 4 chains, each with iter=5000; warmup=2500; thin=1; ## post-warmup draws per chain=2500, Total post-warmup Draws =10000. ## ## mean se_mean SD 10% 90% n_eff Rhat ## theta 0.27 0.00 0.09 0.16 0.39 3821 1 ## lp__ -13.40 0.01 0.73-14.25-12.90 3998 1 ##Copy the code
# Take a posterior sampling # Calculate the mean(theta_draws)Copy the code
# # 0.2715866 [1]Copy the code
# Calculate a posterior intervalCopy the code
## 0.1569165 0.3934832Copy the code
 ggplot(theta_draws_df, aes(x=theta)) +
  geom_histogram(bins=20, color="gray")
Copy the code

 

RStan: MAP, MLE

Stan estimation optimization; Two points of view:

  • Maximum a posterioriestimated(MAP)
  • The biggestLikelihood estimation (MLE).
optimizing(model, data=c("N", "y"))
Copy the code
# # # # # $par theta value # # # # # # # $0.4 [1] - 3.4 # # # # # # $return_code [1] 0Copy the code

Population competition model –Lotka-Volterra model

  • Parametric differential equations were developed by Lotka (1925) and Volterra (1926) to describe competing populations of carnivores and prey.
  • Full Bayesian inference can be used to estimate future (or past) population numbers.
  • Stan is used to encode statistical models and perform full Bayesian inference to solve the inverse problem of infering parameters from noisy data.

 

In this example, we wanted to fit the model to the respective populations of Canadian feline carnivores and hare prey between 1900 and 1920, based on the number of pelts collected by the company each year.

Mathematical model

We represent U (t) and V (t) as prey and predator populations, respectively. The differential equations related to them are:

Here:

  • α : prey growth rate.
  • β : Rate of prey reduction caused by predation.
  • γ : Natural rate of predator decline.
  • δ : The rate of increase of predators from predation.

Stan the Lotka – on

Real [] dz_dt(data real t, // time real[] z, // system state real[] theta, // parameter data real[] x_r, Data int[] x_i) {real u = z[1]; Real v = z[2];Copy the code

Known variables were observed:

  • : indicates in timetheNumber of species

Unknown variables must be inferred) :

  • Initial state:: the initial number of species in k.
  • Subsequent states: number of species k at time T.
  • parametric.

Assume the errors are proportional (rather than additive) :

Equivalent:

with

Build a model

Given the constants and variables of the observed data.

data { int<lower = 0> N; // Real ts[N]; // Real y0[2]; // real<lower=0> y[N,2]; // Subsequent quantity}Copy the code

Variables of unknown parameters.

parameters { real<lower=0> theta[4]; // alpha, beta, gamma, delta real<lower=0> z0[2]; // Real <lower=0> sigma[2]; // Forecast error}Copy the code

Prior distributions and probabilities.

Model {// prior sigma ~ lognormal(0, 0.5); Theta [{1, 3}] ~ normal(1, 0.5); // Likelihood (lognormal) for (k in 1:2) {y0[k] ~ lognormal(log(z0[k]), sigma[k]);Copy the code

We must define variables for the population of predictions:

  • Initial population (z0).
  • Initial time (0.0), time (ts).
  • Parameters (theta).
  • Maximum iteration number (1e3).

Lotka-volterra parameter estimation

Print (fit, c (" theta ", "sigma"), probs = c (0.1, 0.5, 0.9))Copy the code

Get results:

Mean se_mean SD 10% 50% 90% N_eff Rhat ## theta[1] 0.55 0 0.07 0.46 0.54 0.64 1168 1 ## theta[2] 0.03 0 0.00 0.02 0.03 1 ## theta[3] 0.80 0 0.10 0.68 0.80 0.94 1117 1 ## theta[4] 0.02 0 0.00 0.02 0.02 0.03 1230 1 ## sigma[1] 0.29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Copy the code

Analysis results:

  • Rhat close to 1 indicates convergence; N_eff is the effective sample size.
  • 10%, posterior quantile; For example,.
  • A posterior mean is a Bayesian point estimate: α=0.55.
  • The standard error of posterior mean estimation is 0.
  • The posterior standard deviation of α was 0.07.

Most welcome insight

1. Matlab uses Bayesian optimization for deep learning

2. Matlab Bayesian hidden Markov HMM model implementation

3. Simple Bayesian linear regression simulation of Gibbs sampling in R language

4. Block Gibbs Sampling Bayesian multiple linear regression in R language

5. Bayesian model of MCMC sampling for Stan probabilistic programming in R language

6.Python implements Bayesian linear regression model with PyMC3

7.R language uses Bayesian hierarchical model for spatial data analysis

8.R language random search variable selection SSVS estimation Bayesian vector autoregression (BVAR) model

9. Matlab Bayesian hidden Markov HMM model implementation