Original link:tecdat.cn/?p=9368

Original source:Tuo End number according to the tribe public number

 


 

Since Sims (1980) published his seminal paper, vector autoregressive models have become a key tool in macroeconomic research. This article introduces the basic concepts of VAR analysis and guides the estimation process for simple models.

Univariate autoregression

VAR stands for vector autoregression. To understand what this means, let’s first look at a simple univariate (that is, with only one dependent or endogenous variable) autoregressive (AR) model of the form YT = A1YT −1+et.

stationarity

Before estimating such models, always check that the time series being analyzed are stable, that is, their mean and variance are constant over time and do not show any trend behavior.

There are a series of statistical tests, such as dickey-Fuller, KPSS, or Phillips-Perron tests, to check whether the sequence is stable. Another very common practice is to plot the sequence and check if it moves around a constant average (that is, a horizontal line). If that’s the case, it’s probably stable.

Autoregressive lag model

Just like AR (P) model, it may be a very restrictive method to regression macroeconomic variables only by its own lag. Often, it is more appropriate to assume that there are other factors. This idea is realized by a model that contains the lag values of the dependent variables as well as the simultaneous and lag values of other (i.e., exogenous) variables. Again, these exogenous variables should be stable. For the endogenous variable YT and exogenous variable Xt, such as autoregressive distribution lag or ADL, the model can be written

 

Yt = a1yt – 1 + + b1xt b0xt – 1 + et.

 

The predictive performance of this ADL model may be better than that of the simple AR model. But what if the exogenous variables also depend on the lagged values of the endogenous variables? This means that XT is also endogenous, and there is further room to improve our predictions.

Vector autoregressive model

Therefore, as described above, the VAR model can be rewritten as a series of separate ADL models. In fact, you can estimate the VAR model by estimating each equation separately.

The covariance matrix of the standard VAR model is symmetric, that is, the element in the upper right corner of the diagonal (” upper triangle “) mirrors the element in the lower left corner of the diagonal (” lower triangle “). This reflects the idea that relationships between endogenous variables reflect only correlation and does not allow causal statements to be made because the effects are the same in each direction.

Simultaneous causality, or more precisely, structural relationships between variables, are analyzed in the context of the so-called structured VAR (SVAR) model, which imposes constraints on covariance matrices.

In this article, I consider the VAR (2) process.

The manual sample for this example is generated in R

set.seed(123) Reset the random number generator for reproducibility reasons

# generate sample
t <- 200 # Number of time series observations
k <- 2 Number of endogenous variables
p <- 2 # Lag order

# Generate coefficient matrix
A.1 <- matrix(c(-3..6., -4... 5), k) # Hysteresis coefficient matrix 1
A.2 <- matrix(c(-1., -2..1...05), k) # Hysteresis coefficient 2
A <- cbind(A.1, A.2) # Coefficient matrix

# generate sequence

series <- matrix(0, k, t + 2*p) # Original sequence with 0
for (i in (p + 1):(t + 2*p)){ # generate e ~ N(0,0.5) sequence
  series[, i] <- A.1%*%series[, i-1] + A.2%*%series[, i-2] + rnorm(k, 0.. 5)
}

series <- ts(t(series[, -(1:p)])) # Convert to time series format
names <- c("V1"."V2") Rename variables

plot.ts(series) # draw sequence
Copy the code

 

estimates

The estimation of parameters and covariance matrix of simple VAR model is very simple.

To estimate the VAR model, load and specify the data (y) and model.

To compare

A central problem in VAR analysis is to find the order of lag to produce the best result. Model comparisons are usually based on information standards, such as AIC, BIC, or HQ. In general, AIC is superior to other criteria due to its small sample size. However, BIC and HQ work well in large samples.

Standard information standards can be calculated to find the best model. In this example, we use AIC:

By looking at the summary, we can see that AIC suggests using the order of 2.

summary(var.aic)
Copy the code
## ## VAR Estimation Results: ## ========================= ## Endogenous variables: Series.1, Series.2 ## Deterministic variables: none ## Sample size: 200 ## Log Likelihood: -266.065 ## Roots of the characteristic Polynomial: ## 0.6611 0.4473 0.03778 ## Call: ## VAR(y = series, type = "none", lag.max = 5, ic = "AIC") ## ## ## Estimation results for equation Series.1: ## ========================================= ## Series.1 = Series.1.l1 + Series.2.l1 + Series.1.l2 + Series.2.l2 ## ## T value Estimate Std. Error (Pr > | | t) # # Series. 1. The l1 0.19750 0.06894 2.865 0.00463 * * # # Series. 2. The l1-0.32015-0.06601 4.850 2.51 e-06 * * * # # Series. 1. The l2-0.23210-0.07586-3.060 * 0.00252 * # # Series. 2. The l2 0.04687 0.06478 0.724 0.47018 # # - # # Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1 "' 1 # # # # # # Residual standard error: 0.4638 on 196 degrees of freedom ## Multiple R-squared: 0.2791, Adjusted R-squared: 0.2644 ## f-statistic: 18.97 on 4 and 196 DF, p-value: 3.351e-13 ## ## ## Estimation Results for equation Series.2: ## ========================================= ## Series.2 = Series.1.l1 + Series.2.l1 + Series.1.l2 + Series.2.l2 ## ## T value Estimate Std. Error (Pr > | | t) # # Series. 1. L1 0.67381 0.07314 9.213 < 2 e - * * * # # 16 Series. 2. The l1 0.34136 0.07004 4.874 2.25 e-06 * * * # # Series. 1. The l2-0.18430-0.08048-2.290-0.0231 * # # Series. 2. The l2 0.06903 0.06873 1.004 0.3164 # # - # # Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1 "' 1 # # # # # # Residual standard error: 0.4921 on 196 degrees of freedom ## Multiple R-squared: 0.3574, Adjusted R-squared: 0.3443 ## f-statistic: 27.26 on 4 and 196 DF, p-value: < 2.2E-16 ## ## ## ## Covariance matrix of residuals: ## Correlation matrix of residuals: ## Correlation matrix of residuals: ## Series.2 ## Series.1 1.000-0.137 ## Series.2 0.137 1.000Copy the code

Looking closely at the results, we can compare the true values with the model’s parameter estimates:

A # # # real value [1], [2], [3], [4] # # [1] - 0.3-0.4-0.1-0.10 # # # [2] 0.6 0.5 0.2 0.05 Extract coefficients, Standard errors etc. From the object # produced by the VAR function est_coefs < -coef (var.aic) # And combine them into a matrix # output rounded estimates round(EST_coefs, 2) ## # [1,] -0.20-0.32-0.23 0.05 ## [2,] 0.67 0.34-0.18 0.07Copy the code

All estimates have correct signs and are relatively close to their true values.

Impulse response

Once we define the final VAR model, we must explain its estimated parameter values. Because all variables in the VAR model are interdependent, a single parameter value provides only limited information. To better understand the dynamic behavior of the model, the impulse response (IR) is used. The trajectories of response variables can be plotted, producing those wave curves found in many macro papers.

In the example below, we want to know the behavior of sequence 2 after the shock. After specifying the model and variables we want the impulse response to be, we set the time range to n.ahead to 20. This figure shows the response of sequence 2.

# Calculate the impulse response

Draw the impulse response
plot(ir.1)
Copy the code

 

Note that the orthogonal option is important because it illustrates the relationship between variables. In our example, we already know that no such relationship exists, because the true variance-covariance matrix (or covariance matrix for short) is diagonal zero in the off-diagonal elements. However, because the finite time series data with 200 observations limits the accuracy of parameter estimation, the off-diagonal elements of the covariance matrix have positive values, which implies non-zero simultaneity effect. To rule this out in IR, we set ortho = FALSE. The result is that the impulse response starts at zero in period zero. You can also try another approach and set ortho = TRUE, so the drawing starts at zero.

To understand this, you can also calculate and plot the cumulative impulse response function to understand the overall long-term effects:

# Calculate the impulse response

# drawing
plot(ir.2)
Copy the code

 

We see that although the response of sequence 2 to sequence 1 was negative in some periods, the overall effect was significantly positive.


Most welcome insight

1. Use LSTM and PyTorch for time series prediction in Python

2. Long and short-term memory model LSTM is used in Python for time series prediction analysis

3. Time series (ARIMA, exponential smoothing) analysis using R language

4. R language multivariate Copula – Garch – model time series prediction

5. R language Copulas and financial time series cases

6. Use R language random wave model SV to process random fluctuations in time series

7. Tar threshold autoregressive model for R language time series

8. R language K-Shape time series clustering method for stock price time series clustering

Python3 uses ARIMA model for time series prediction