Original link:tecdat.cn/?p=22632 

Original source:Tuo End number according to the tribe public number

 

This paper describes an approach to modeling midpoints of time series involving seasonal and trend components. We will investigate an algorithm called STL, which stands for “season-trend decomposition using LOESS(locally weighted regression)”, and how it can be applied to anomaly detection.

The basic idea is that if you have a regular time series, you can run that sequence through an STL algorithm and isolate the regular patterns. The rest are “irregular,” and anomaly detection is the equivalent of determining whether the irregularity is large enough.

Example: Air Passenger, 1949-1960

Let’s run the algorithm on a data set that gives the number of airline passengers per month for the period 1949-1960. First of all, this is an unmodified time series.

plot(y)
Copy the code

 

 

 

There is clearly a regular pattern here, but there is no significant drop in this sequence that would show up in anomaly detection. So we’re going to set one.

y[40] = 150
Copy the code

 

 

It’s a big enough drop that we hope anomaly detection will pick it up, but not so big that you can pick it up by just looking at the graph. Now let’s check it through the STL.

plot(fit)
Copy the code

 

 

First of all, INSTEAD of running STL on y, I’m running STL on log of y.

The algorithm decomposed the sequence into three parts: seasonal, trend and residual components. Seasonality is the cyclical component, the trend is the general up/down, and the remainder is the remaining trend component. Seasonality and trend together make up the “normal” part of the sequence and are therefore what we eliminate in anomaly detection.

The rest is basically a normalized version of the original sequence, so this is where we monitor anomalies. The decline in the remaining sequence is significant. The unusual drop we set up in early 1952 probably counts.

We can also adjust the number of observations for each period, the smoothing method responsible for separating seasonal and trend components, the “robustness” (i.e., insensitivity to outliers) of the fit model, and so on. Most of these parameters require some understanding of how the underlying algorithm works.

Below is some code that shows the actual data and thresholds.


data <- merge(df, ba, by.x='x')
ggplot(data) +
  geom(aes(x=x, ymin=ymin, ymax=ymax)) 
Copy the code

 

And again, smart as you might notice, the inverse transformation by e to (). Let’s talk about it now.

Why do I take logarithms and inverse transformations?

Not all factorizations involve logarithmic transformations, but this factorization does. The reason has to do with the nature of decomposition. The decomposition of STL is always additive.

y = s + t + r
Copy the code

But for some time series, multiplicative decomposition is more suitable.

y = str
Copy the code

This occurs in sales data where the amplitude of the seasonal component increases as the trend increases. This is actually the hallmark of the multiplication sequence, and the air passenger sequence also shows this pattern. To deal with this problem, we perform a logarithmic transformation of the original value, which takes us into the domain of addition, where we can perform STL factorization. And when we’re done, we take the inverse transformation and go back to the original sequence.

What about multiple seasonality?

Some time series have more than one seasonality. For example, in a hotel booking time series there are three seasons: daily, weekly and yearly.

While there are programs that can generate decomposition with multiple seasonal components, the STL does not. The highest frequency seasonality is taken as a seasonal component, while any lower frequency seasonality is absorbed into the trend.


Most welcome insight

1. Use LSTM and PyTorch for time series prediction in Python

2. Long and short-term memory model LSTM is used in Python for time series prediction analysis

3. Time series (ARIMA, exponential smoothing) analysis using R language

4. R language multivariate Copula – Garch – model time series prediction

5. R language Copulas and financial time series cases

6. Use R language random wave model SV to process random fluctuations in time series

7. Tar threshold autoregressive model for R language time series

8. R language K-Shape time series clustering method for stock price time series clustering

Python3 uses ARIMA model for time series prediction