• Time Series Anomaly Detection Algorithms
  • Author: Pavel Tiunov
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: haiyang – tju
  • Proofreader: Nanjingboy

An easy-to-understand overview of the current state of anomaly detection technology

In Statsbot, we reviewed the development of anomaly detection methods and refined our model based on this.

This paper summarizes the most commonly used time series anomaly detection algorithms and their advantages and disadvantages.

This article is intended for inexperienced readers who simply want to know the state of the art in anomaly detection techniques. We don’t want to scare people with complicated mathematical models, so we’ve put all the mathematical derivations in the recommended links below.

Important exception types

The problem of anomaly detection in time series is usually expressed as outliers relative to some standard signals or common signals. While there are many types of exceptions, we focus on the most important types from a business perspective, such as unexpected peaks, downturns, trend changes, and level transitions.

Imagine tracking the number of users on your website and seeing an unexpected increase in users in a short period of time that looks like a spike. These types of exceptions are often referred to as additional exceptions.

Another example of a website is when your server is down and you see zero or very few visitors in a short period of time. These types of anomalies are usually classified as time-varying anomalies.

Conversion rates may go down as you deal with some of the issues around conversion rates. If this happens, the target measure usually does not change the shape of the signal, but rather its total value over a period of time. Depending on the nature of the change, these types of changes are often referred to as horizontal displacements or seasonal horizontal displacement anomalies.

Typically, an anomaly detection algorithm should mark each point in time as an exception/non-exception, or predict a signal for a point, and measure whether the difference between the true value of the point and the predicted value is large enough to consider it an exception.

Using the latter method, you will be able to get a visual confidence interval, which will help you understand why the exception occurred and verify it.

Statsbot’s exception report shows that the actual time series, predicted time series, and confidence interval help to understand why the exception occurred.

Let’s review the two algorithm types from an application perspective and find outliers for each type.

STL decomposition

STL represents the process of seasonal decomposition based on losses. This technique can decompose the time series signal into three parts: seasonal change, trend change and residue.

From top to bottom, the original time series, the seasonal change part, the trend change part and the residual part obtained by STL decomposition are in order.

As the name suggests, this applies to seasonal time series, which is the more common case.

By analyzing the deviation of the residual and introducing the threshold value of the residual, an algorithm of anomaly detection can be obtained.

What is less obvious here is that we used absolute median bias in order to get more reliable anomaly detection results. The best implementation of this approach so far is Twitter’s exception detection library, which uses Generalized Extreme Student Deviation (Generalized ESD algorithm) to test whether the residual Deviation is an outlier.

advantages

The advantages of this approach lie in its simplicity and robustness. It can handle many different situations, and all the exceptions are still intuitively explicable.

It is mainly good at additional outlier detection. If you want to detect some level change, you can analyze the moving average signal.

disadvantages

The downside of this approach is that it is too rigid in tuning options. All you can do is adjust the confidence interval by significance level.

The method fails when the signal characteristics change dramatically. For example, tracking the number of users of a website that had been closed to the public but was suddenly open to the public. In this case, you should track the exceptions that occur before and after the opening is started.

Classification regression tree

Classification regression tree (CART) is one of the most robust and effective machine learning techniques. It can also be applied to exception detection problems.

  • First, you can use supervised learning to train the classification tree to classify both exceptional and non-exceptional data points. This is where marked exception data points are needed.
  • In the second method, the unsupervised learning algorithm can be used to train CART to predict the next data point of the time series data, and the confidence interval or prediction error similar to STL decomposition method can be obtained. The generalized ESD algorithm is then used to test or the Grubbs test algorithm is used to check whether the data point is within the confidence interval.

Actual time series data (green), time series data predicted by CART model (blue), anomalies detected by anomaly detection algorithm.

The most popular implementation of classification tree learning is the XGBoost library.

advantages

The advantage of this method is that it is not subject to any constraint of signal structure, and can introduce many characteristic parameters for learning, so as to obtain more complex model.

disadvantages

The disadvantage of this approach is that more and more features appear, which quickly affects the overall computational performance. In this case, you should consciously choose the features that work.

ARIMA model

The autoregressive moving average model (ARIMA) is a very simple method by design, but powerful enough to predict signals and find anomalies within them.

The idea of this method is to generate the prediction of the next data point from the past few data points, adding some random variables (usually white noise) in the process. In the same way, the data points from the predictions can be used to generate new predictions. Obviously, it smoothes out subsequent signal data.

The most difficult part of using this method is choosing the amount of variance, the amount of automatic regression, and the prediction error coefficient.

Each time you use a new signal, you should build a new ARIMA model.

Another obstacle to this approach is that the signal should be fixed after the difference. In other words, this means that the signal should not depend on time, which is a significant limitation.

Anomaly detection uses outliers to build an adjusted signal model, and then uses T-statistics to test whether the model can fit the data better than the original model.

Two time series constructed using the original ARIMA model and the ARIMA model adjusted for outliers.

The most popular implementation of this approach is the Tsoutliers package in the R language. In this case, you can find the appropriate ARIMA model for the signal, which can detect all types of exceptions.

Exponential smoothing method

The exponential smoothing method is very similar to the ARIMA method. The basic exponential model is equivalent to the ARIMA (0, 1, 1) model.

The most interesting approach from an anomaly detection perspective is the Holt-Winters seasonal approach. This approach requires defining seasonal cycles, such as weeks, months, years, and so on.

If you need to track multiple seasonal cycles, such as weekly and annual cycles, then only one should be selected. The shortest one is usually chosen: so here we should choose the weekly season.

This is obviously a disadvantage of the method, which can greatly affect the overall forecast range.

Just like using STL or CARTs, we can use statistical methods to collect statistics of outliers to realize anomaly detection.

The neural network

Like the CART method, neural networks can be applied in two ways: supervised learning and unsupervised learning.

The data we are dealing with are time series, so the most suitable type of neural network is LSTM. If constructed properly, this recurrent neural network can model the most complex dependencies in a time series, including high-level seasonal dependencies.

This approach is also useful if there are multiple time series that are coupled to each other.

This area is still under research, and you can refer to it here. Building a temporal model requires a lot of work. When the build is completed successfully, it is possible to achieve excellent results in terms of accuracy.

💡 remember 💡

  1. Try the simplest model and algorithm that best fits your problem.
  2. If that doesn’t work, use a more advanced technique.
  3. Starting with a universal solution that encompasses all situations is an attractive option, but not always the best.

In Statsbot, we started with STL for large scale anomaly detection and then used different combinations of CART and LSTM models.

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.