Original link:tecdat.cn/?p=22665 

Original source:Tuo End number according to the tribe public number

 

Abstract

State space modeling is an efficient and flexible method for making statistical inferences about a large number of time series and other data. This paper introduces state-space modeling with observations from exponential families, namely Gaussian, Poisson, binomial, negative binomial, and gamma distributions. After introducing the basic theories of gaussian and non-Gaussian state space models, an illustrative example of Poisson time series prediction is provided. Finally, the comparison with other methods of fitting non-Gaussian time series modeling is introduced.

The introduction

The state space model provides a unified framework for modeling several types of time series and other data. Structural time series, ARIMA, simple regression, generalized linear mixture, and cubic spline smoothing are just some examples of statistical models that can be expressed as state-space models. The simplest class of state space models is the linear Gaussian state space model (also known as the dynamic linear model), which is often used in many scientific fields.

Gaussian state space model

This section introduces key concepts related to gaussian state space model theory. Because the algorithm behind Kalman Filtering is mainly based on Durbin and Koopman (2012) and related articles by the same authors. For linear Gaussian state space model with continuous state and discrete time interval t=1,.., n, we have

Among them, t ~ N (0, Ht), ηt ~ N (0, Qt) and α1 ~ N (A1, P1) are independent. We assume that yt is a p×1, αt+1 is an m×1, and ηt is a k×1 vector. Alpha = (alpha > 1,…, alpha > n) >, same y = (> 1, y.., y > n) >.

The primary goal of state space modeling is to obtain knowledge of the latent state α given the observed value Y. This can be achieved by two recursive algorithms, namely Kalman filtering and smoothing algorithms. From kalman filter algorithm, we can get the prediction result and prediction error

And the associated covariance matrix

Using the results of Kalman filter, we establish the state smoothing equation, which runs backwards in time, and generates

For the interference terms t and ηt, and for the signal θt = Ztαt, similar smoothing estimates can be calculated.

Examples of Gaussian state space models

Now let me show you by example. Our time series includes the number of alcohol-related deaths per 100,000 people per year in the 40-49 age group from 1969 to 2007 (Figure 1). Data from the Bureau of Statistics. For the observed value y1… We assume that at all t = 1… n μt is a random walk drift process

Eta t ~ N (0, sigma 2 eta). Suppose we have no prior information about the initial state μ1 or the slope ν. This model can be written in terms of a state space, defined as theta

In KFAS, this model can be written in the following code. To illustrate, we define all the system matrices manually, rather than using the default values.


R> Zt <- matrix(c(1, 0), 1, 2)

R> model_gaussian <-Model(deaths / population ~ -1 +custom(Z = Zt)
Copy the code

The first parameter is the formula that defines the observations (left ~) and the structure of the equation of state (right). Here, the number of deaths/population is a univariate time series, and the equation of state is defined by a matrix. In order to maintain the identification of the model, the intercept term is omitted by -1. The observed horizontal variance is defined by the parameter H, and the NA value represents the unknown variance parameters σ 2 and σ 2 η. After estimation, filtering and smoothing recursion are performed.

 


KF(fit_gaussian)
Copy the code

In this case, the maximum likelihood estimate is 9.5 for σ 2 and 4.3 for σ 2 η. From kalman filter algorithm, we get a one-step advance prediction of the state, at = (µt, νt). Note that even though the slope term ν is defined as a time invariant in our model (νt = ν), it is estimated recursively by kalman filtering algorithms. Therefore, at each time point t, when a new observation yt is available, the estimate of ν is updated to take into account the new information provided by YT. At the end of kalman filtering, an+1 gives our final estimate of the constant slope term for all the data. Here the slope term is estimated to be 0.84 with a standard error of 0.34. For µ T, Kalman filter gives a one-step prediction, but since the state is time-varying, if we apply t=1… We are interested in the µt estimate of n, and we also need to run the smoothing algorithm. This is an estimate of n.

Figure 1 shows the observation of an estimate of µ T for a random walk process with one-step prediction (red) and smoothing (blue). Note the typical model; At time t, the Kalman filter calculates the one-step forward prediction error vt = YT – µt and uses it and the previous prediction to modify the prediction at the next time point. Here, which is easiest to see at the beginning of the series, our predictions seem to be a time step behind the observations. On the other hand, the smoothing algorithm considers both past and future values at each point in time, resulting in smoother estimates of latent processes.

 

Examples of non-Gaussian state space models

Alcohol-related deaths can also be naturally modeled as Poisson processes. Now our observed yt is the actual count of alcohol-related deaths in year T, and the change in population size is taken into account by the exposure term UT. Equation of state remains the same, but now is in the form of observation equation p (yt | including t) = Poisson (ute (including t).

R> Model(deaths ~ -1 +
+ distribution = "poisson")
Copy the code

In contrast to the previous Gaussian model, we now need the parameter distribution (default “gaussian”) to define the distribution of the observed data. We also define the exposure terms with parameter u and use the default values for A1 and P1. In this model, there is only one unknown parameter, sigma 2 η. This parameter is estimated to be 0.0053, but the actual value of σ 2 η between the Gaussian model and the Poisson model cannot be directly compared because different models interpret µ T differently. The slope term of poisson’s model is estimated to be 0.022, with a standard error of 1.4×10-4, corresponding to an annual increase of 2.3% in deaths.

Figure 2 shows a smoothed estimate (deaths per 100,000) modeled by the Gauss process (blue) and Poisson process (red).

Arbitrary state space model

By combining the previous approach, it is relatively easy to build a large number of models. In cases where this is not enough, an arbitrary state space model can be constructed by defining the system matrix directly. As an example, we modified the previous Poisson model to add an additional white noise term in an attempt to capture the possible excessive dispersion of the data. Now our Poisson strength model is UT exp(µt + t), i.e

Where t ~ N(0, σ2 η) is the same as before, t ~ N(0, σ2). This model can be written in terms of a state space, defined as theta

 Model(deaths ~ trend(2, Q = list(NA, 0)) +
distribution = "poisson")
Copy the code

Since the model contains unknown parameters in P1, we need to provide a specific model update function.

R> update <- function(pars, model) {
+ model[ "custom"] <- exp(pars)
+ }
Copy the code
 fit(model_poisson,method = "BFGS")
Copy the code

As can be seen from FIG. 3, there is almost no difference in the estimation of smoothing trend µt between the Gaussian structured time series model and the Poisson structured time series model with extra white noise. This is due to the relatively high intensity of the Poisson process.

example

I now illustrate the use of KFAS with a more complete example than the previous one. The data again consisted of alcohol-related deaths, but four age groups, namely 30-39, 40-49, 50-59 and 60-69, were now modelled together as a multivariable Poisson model.

The number of deaths from 1969 to 2012 and the annual population size of the corresponding age group are available, but for illustrative purposes we only use data prior to 2007 and make projections for 2008 to 2013. Figure 4 shows the number of deaths per 100,000 for all age groups.


ts.plot(window(data[, 1:4] / data[, 5:8], end = 2007)
Copy the code

Here I have chosen a multivariable extension of the Poisson model I used earlier.

Here μt is a random walk with a drift component, νt is a constant slope and T is an extra white noise component used to capture additional changes in the sequence. I make no restrictions on the covariance structure of the horizontal and noise components. Model (4) can be constructed with KFAS as follows.

R> Model(Pred[, 1:4] ~
+ trend(2, Q = list(matrix(NA, 4, 4))
 distribution = "poisson"
Copy the code

The update function is

R> updatefn <- function(pars, model, ...) { + model[ etas ] <- crossprod(Q) + crossprod(Q) + model + }Copy the code

We can estimate model parameters without simulation, and then run the estimator of importance sampling again with these estimates as initial values. In this case, the result from the materiality sampling step is virtually the same as the result from the initial step.

fit(model, update,
+ method = "BFGS")

R> fit <- fit(model, updatefn = updatefn, inits =optimpar)
Copy the code

Using the extraction method of the fit model, we can check the estimated covariance and correlation matrix.

R> varcordel["Q",   "level"]

R> varcordel["Q",  "custom"]
Copy the code

Parameter estimation in state space models is usually a lot of work because the likelihood surface contains multiple maximum values, which makes the optimization problem highly dependent on the initial values. Typically, unknown parameters are associated with potential states that are not observed, such as the covariance matrix in this case, with little prior knowledge.

Therefore, guessing good initial values can be challenging, especially in more complex environments. Therefore, until you can be reasonably sure that the appropriate optimal value is found, it is recommended to use multiple initial value configurations, and there may be several different types of optimization approaches. Here we use the observed series of covariance matrices as the initial values of the covariance structure. Another problem in the case of the non-Gaussian model is that the likelihood calculation is based on an iterative program that stops using some termination conditions (such as relative changes in logarithmic likelihood), so the logarithmic likelihood function actually contains some noise. This in turn will affect the gradient calculation of BFGS and other methods, which can theoretically get unreliable results. Therefore, derivative – free methods are sometimes recommended, such as Nelder-Mead. BFGS, on the other hand, are generally much faster than Nelder-Mead, so I prefer to try BFGS first, at least in a preliminary analysis. We can calculate a smooth estimate of the state.

R> out <- KF(model,)
Copy the code

We see occasional lagging cross-correlations between residuals, but overall we can be relatively satisfied with our model. We can now use our estimated model to predict the intensity of alcohol-related death e θt for each age group from 2008 to 2013. Since our model is time-dependent (u change), we need to provide a model for future observation samples through the newData parameter.

predict(model,
+ newdata +
+ interval = "confidence",)
Copy the code

for (i in 1:4) ts.plot(data[, i]/data[, 4 + i], trend[, i], pred[[i]]
Copy the code

Figure 7 shows the number of deaths observed, the smoothing trend from 1969 to 2007, and the forecast from 2008 to 2013, with a 95% prediction interval. When we compared our predictions with real observations, we saw a slight increase in the number of deaths in the oldest age group (60-69 years) in reality, and a significant decrease in the number of deaths in another age group during the prediction period. That is partly because total alcohol consumption fell almost monotonously over the period, which in turn may have been caused by tax increases on alcohol in 2008, 2009 and 2012.

 

discuss

State-space models provide a tool for solving a large class of statistical problems. Here, I introduce a method for modeling linear state Spaces.


Most welcome insight

1. Use LSTM and PyTorch for time series prediction in Python

2. Long and short-term memory model LSTM is used in Python for time series prediction analysis

3. Time series (ARIMA, exponential smoothing) analysis using R language

4. R language multivariate Copula – Garch – model time series prediction

5. R language Copulas and financial time series cases

6. Use R language random wave model SV to process random fluctuations in time series

7. Tar threshold autoregressive model for R language time series

8. R language K-Shape time series clustering method for stock price time series clustering

Python3 uses ARIMA model for time series prediction