Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

preface

Hello! Friend!!!

Thank you very much for reading haihong’s article, if there are any mistakes in the article, please point out ~

 

Self-introduction ଘ(੭, ᵕ)੭

Nickname: Haihong

Tag: programmer monkey | C++ contestant | student

Introduction: because of C language to get acquainted with programming, then transferred to the computer major, had the honor to get some national awards, provincial awards… Has been confirmed. Currently learning C++/Linux/Python

Learning experience: solid foundation + more notes + more code + more thinking + learn English well!

 

Little White stage of learning Python

The article is only used as my own study notes for the establishment of knowledge system and review

The problem is not to learn one more question to understand one more question

Know what is, know why!

1 passenger volume prediction of JetRail high-speed railway — 7 time series methods

Content abstract

Time series prediction is often used in daily analysis and is an important time series data processing method.

Forecast of high-speed rail passenger volume

Suppose you want to solve a timing problem: using two years of data (August 2012 to August 2014), you need to predict passenger numbers for the next seven months.

Data acquisition: The number of passengers per hour from 2012 to 2014 was obtained

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
 
df = pd.read_csv('.. /profile/train2.csv')
df.head()
Copy the code

The resultsData set processing :(constructing and aggregating data sets on a daily basis)

  • Construct a dataset from August 2012 to December 2013
  • Create the train and test file for modeling. The first 14 months (August 2012 — October 2013) were used as training data, and the last two months (November 2013 — December 2013) were used as testing data.
  • Aggregate the data set on a daily basis
import pandas as pd
import matplotlib.pyplot as plt
 
df = pd.read_csv('.. /profile/train2.csv',nrows=11856)

train = df[0:10392]
test = df[10392:]
 
df['Timestamp'] = pd.to_datetime(df['Datetime'].format='%d-%m-%Y %H:%M')  # 4 digit years use Y, 2 digit years use Y
df.index = df['Timestamp']
df = df.resample('D').mean() # Sample by day and calculate the mean value
 
train['Timestamp'] = pd.to_datetime(train['Datetime'].format='%d-%m-%Y %H:%M')
train.index = train['Timestamp']
train = train.resample('D').mean() 
 
test['Timestamp'] = pd.to_datetime(test['Datetime'].format='%d-%m-%Y %H:%M')
test.index = test['Timestamp']
test = test.resample('D').mean()
 

train.Count.plot(figsize=(15.8), title= 'Daily Ridership', fontsize=14)
test.Count.plot(figsize=(15.8), title= 'Daily Ridership', fontsize=14)
plt.show()
Copy the code

The results

1.1 simple method

As shown in the figure below, the Y-axis represents the price of the item and the X-axis represents the time (days).If the data set is stable for a period of time, and we want to predict the price of the next day, we can take the price of the previous day and predict the value of the next day. This method of prediction which assumes that the first prediction point is equal to the last observation point is called the naive method. namely
H e n c e y ^ t + 1 = y t Hence \quad \hat y_{t+1} = y_t

The Demo code

dd = np.asarray(train['Count'])
y_hat = test.copy()
y_hat['naive'] = dd[len(dd) - 1]
plt.figure(figsize=(12.8))
plt.plot(train.index, train['Count'], label='Train')
plt.plot(test.index, test['Count'], label='Test')
plt.plot(y_hat.index, y_hat['naive'], label='Naive Forecast')
plt.legend(loc='best')
plt.title("Naive Forecast")
plt.show()
Copy the code

The results

The naive method is not suitable for highly variable data sets, but is most suitable for highly stable data sets. We calculate the root mean square error to check the accuracy of the model on the test data set:

from sklearn.metrics import mean_squared_error
from math import sqrt
 
rms = sqrt(mean_squared_error(test['Count'], y_hat['naive']))
print(rms)

# 43.91640614391676
Copy the code

The results

The final root mean square error (RMS) is 43.91640614391676

1.2 Simple average method

As shown in the figure below, the Y-axis represents the price of the item and the X-axis represents the time (days).Prices of items will rise and fall randomly, and the average price will stay the same. We often come across data sets where, although there are small changes over time, the average does stay the same from time to time. In this case, we can predict that the price of the next day will be roughly the same as the average price of the previous days. This method of forecasting by equating the expected value to the average of all previous observation points is called simple averaging. namely
H e n c e y ^ x + 1 = 1 x i = 1 x y i Hence \quad \hat y_{x+1} = \frac{1}{x} \sum^x_{i=1}y_i

y_hat_avg = test.copy()
y_hat_avg['avg_forecast'] = train['Count'].mean()
plt.figure(figsize=(12.8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['avg_forecast'], label='Average Forecast')
plt.legend(loc='best')
plt.show()
Copy the code

The results

The simple averaging method takes all previously known values and averages them out as the next value to be predicted. This won’t be accurate, of course, but it works best in some cases.

from sklearn.metrics import mean_squared_error 
from math import sqrt 
rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['avg_forecast'])) 
print(rms)

# 109.88526527082863
Copy the code

The results

The model did not improve accuracy. Therefore, it can be inferred that the effect of this method is best when the average value of each time period remains unchanged. Although the accuracy of the naive method is higher than that of the simple averaging method, this does not mean that the naive method is better than the simple averaging method on all data sets.

1.3 Moving average method

As shown in the figure below, the Y-axis represents the price of the item and the X-axis represents the time (days).Commodity prices rose sharply for a time, but then levelled off. We often encounter such data sets, such as when prices or sales go up or down sharply at some point. If we were to use the simple average method, we would have to use the average of all the previous data, but it doesn’t make sense to use all the previous data here, because using the price value at the beginning would greatly affect the predicted value for subsequent dates. So we’re just taking the average of prices over the last few periods. Obviously the logic here is that only the most recent value matters. This forecasting method, in which the average is calculated over certain window periods, is called the moving average method.

Calculating the moving average involves a size value p sometimes referred to as a “sliding window”. Using a simple moving average model, we can predict the next value in a time series based on the average of a fixed finite number P of previous values. Thus, for all 𝑖>𝑝 :


H e n c e y ^ l = 1 p ( y i 1 + y i 2 + . . . + y i p ) Hence \quad \hat y_l = \frac{1}{p}(y_{i-1} + y_{i-2} + … + y_{i-p})

Moving averages actually work well, especially if you choose the right p value for the time series

y_hat_avg = test.copy() 
y_hat_avg['moving_avg_forecast'] = train['Count'].rolling(60).mean().iloc[-1] 
plt.figure(figsize=(16.8)) 
plt.plot(train['Count'], label='Train') 
plt.plot(test['Count'], label='Test') 
plt.plot(y_hat_avg['moving_avg_forecast'], label='Moving Average Forecast') 
plt.legend(loc='best') 
plt.show()
Copy the code

The results

from sklearn.metrics import mean_squared_error
from math import sqrt
rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['moving_avg_forecast']))
print(rms)
Copy the code

The root mean square difference calculated by this method is 46.72840725106963

We can see that the naive method performs better than the simple and moving average methods for this data set. In addition, we can try simple exponential smoothing, which is an improvement over the moving average in that it is equivalent to weighting the moving average. As can be seen from the above moving average method, we assign the same weight to the observed values in “P”. But we may encounter situations where each observation in “p” affects the prediction in a different way. The method of giving different weights to past observations is called the weighted moving average. The wMA is really a moving average, but the values in the “sliding window” are given different weights, generally speaking, the most recent point in time is more important.


H e n c e y ^ l = 1 m ( w 1 y i 1 + w 2 y i 2 + w 3 y i 3 + . . . + w m y i m ) Hence \quad \hat y_l = \frac{1}{m}(w_1y_{i-1} + w_2y_{i-2} + w_3y_{i-3} + … + w_my_{i-m})

Instead of selecting a window period value, this method requires a list of weight values (which add up to 1). For example, if we choose [0.40, 0.25, 0.20, 0.15] as weights, we would assign 40%, 25%, 20%, and 15% weights to the last four time points respectively.

1.4 Simple exponential smoothing method

We notice that there are great differences between simple average method and weighted moving average method in the selection of time points. We need to find a middle ground between these two approaches, one that takes all the data into account but also gives different non-weights to the data. For example, it gives more weight to recent observations than to observations made in earlier periods. Methods that work according to this principle are called simple exponential smoothing. It calculates the predicted value from a weighted average, where the weight decreases exponentially as the observations change from early to late, and the smallest weight correlates with the earliest observations:

0≤α≤1 is the smoothing parameter. The single-step predicted value for time point T+1 is the weighted average of all observed values in the time series. The rate of weight decline is controlled by the parameter α, the predicted value
y ^ x \hat y_x
Is the single-step predicted value of time point T and
( 1 Alpha. ) y ^ x 1 (1- \alpha) \hat y_{x-1}
And. So writing

So, essentially, we use two weights alpha and 1− alpha to get a weighted moving average. We can see that y^x−1\hat y_{x-1}y^x−1 is multiplied by 1−α to render the expression into the form, which is why the method is called “exponentials”. Time t + 1 yt predicted values for recent values and recently predicted y ∣ t ^ t – 1 \ hat y_ {t} | t – 1 y ^ t ∣ weighted average between t – 1.

from statsmodels.tsa.api import SimpleExpSmoothing

y_hat_avg = test.copy()
fit = SimpleExpSmoothing(np.asarray(train['Count'])).fit(smoothing_level=0.6, optimized=False)
y_hat_avg['SES'] = fit.forecast(len(test))
plt.figure(figsize=(16.8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['SES'], label='SES')
plt.legend(loc='best')
plt.show()
Copy the code

The results

from sklearn.metrics import mean_squared_error
from math import sqrt

rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['SES']))
print(rms)
Copy the code

The root mean square difference is 43.357625225228155

1.5 Holt linear trend method

As shown in the figure below, the Y-axis represents the price of the item and the X-axis represents the time (days).

If the price of an item is rising (see figure above), our method above does not take into account this trend, the overall pattern of prices we observe over time. In the example above, we can see that the price of goods is on the rise. While all of the above methods can be applied to this trend, we still need a way to accurately predict price trends without making assumptions. This method, which takes into account the trend of the data set, is called The Holt linear trend method.

Each sequential data set can be decomposed into corresponding parts: Trend, Seasonal, and Residual. Any data set showing a trend can be predicted using holt’s linear trend method.

The Demo code

import statsmodels.api as sm

sm.tsa.seasonal_decompose(train['Count']).plot()
result = sm.tsa.stattools.adfuller(train['Count'])
plt.show()
Copy the code

The results

As we can see from the figure, this data set has an upward trend. So we can use Holt’s linear trend method to predict future prices. The algorithm consists of three equations: a horizontal equation, a trend equation, and an equation that adds the two together to get the predicted value
y ^ \hat y

The values we predict in the above algorithm are called levels. As with simple exponential smoothing, the horizontal equation here shows it to be a weighted average of the observed and the one-step predicted value in the sample, and the trend equation shows it to be a weighted average of the ℓ(t)−ℓ(t−1) and their earlier predicted trend B (t−1) at time T.

We add these two equations to get a prediction function. We can also multiply them instead of adding them to get a multiplicative prediction equation. When the trend is linearly increasing and decreasing, we use the equation that we get by adding; When the trend increases or decreases exponentially, we use the equation we get by multiplying. It has been shown in practice that when equations are multiplied, the predictions are more stable, but when equations are added together, they are easier to understand.The Demo code

from statsmodels.tsa.api import Holt

y_hat_avg = test.copy()

fit = Holt(np.asarray(train['Count'])).fit(smoothing_level=0.3, smoothing_slope=0.1)
y_hat_avg['Holt_linear'] = fit.forecast(len(test))

plt.figure(figsize=(16.8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['Holt_linear'], label='Holt_linear')
plt.legend(loc='best')
plt.show()
Copy the code

The results

from sklearn.metrics import mean_squared_error
from math import sqrt

rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['Holt_linear']))
print(rms)
# 43.056259611507286
Copy the code

The root mean square error is 43.056259611507286

1.6 Holt-Winters seasonal prediction model

Before we apply this algorithm, let’s introduce a new term. Let’s say you have a hotel halfway up a mountain that is very busy in the summer with lots of customers, but very few during the rest of the year. As a result, the income in summer is much higher than that in other seasons every year, and this repetition is called Seasonality. A data set is seasonal if it shows similar patterns over a fixed period of time.

The five models we discussed previously did not take into account seasonality of the data set in their predictions, so we needed a method that could take this into account. The algorithm applied in this case is called the Holt-Winters seasonal prediction model, which is a cubic exponential smoothing prediction. The idea behind it is to apply exponential smoothing to seasonal components in addition to levels and trends. Where S is the length of seasonal cycle, 0≤α≤ 1, 0≤ β≤ 1, 0≤γ≤ 1. The horizontal function is the weighted average between seasonally adjusted observations and the non-seasonal forecast at point T. The trend function has the same meaning as holt’s linear method. The seasonal function is the weighted average between the current seasonal index and the seasonal index for the same season last year.

In this algorithm, we can also add and multiply. When seasonal variation is roughly the same, the addition method is preferred, while when the magnitude of seasonal variation is proportional to the level of each time period, the multiplication method is preferred.

from statsmodels.tsa.api import ExponentialSmoothing

y_hat_avg = test.copy()
fit1 = ExponentialSmoothing(np.asarray(train['Count']), seasonal_periods=7, trend='add', seasonal='add', ).fit()
y_hat_avg['Holt_Winter'] = fit1.forecast(len(test))
plt.figure(figsize=(16.8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['Holt_Winter'], label='Holt_Winter')
plt.legend(loc='best')
plt.show()
Copy the code

The results

from sklearn.metrics import mean_squared_error
from math import sqrt

rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['Holt_Winter']))
print(rms)

# 25.26453787766697
Copy the code

1.7 Autoregressive Moving Average Model (ARIMA)

Another sequence model for the scenario is the autoregressive moving average model (ARIMA). Exponential smoothing models are based on the description of trends and seasonality in the data, while the aim of autoregressive moving average models is to describe the relationship between the data. An optimized version of ARIMA is seasonal ARIMA. Like the Holt-Winters seasonal prediction model, it also takes into account the seasonality of the data set.

The Demo code

import statsmodels.api as sm

y_hat_avg = test.copy()
fit1 = sm.tsa.statespace.SARIMAX(train.Count, order=(2.1.4), seasonal_order=(0.1.1.7)).fit()
y_hat_avg['SARIMA'] = fit1.predict(start="2013-11-1", end="2013-12-31", dynamic=True)
plt.figure(figsize=(16.8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['SARIMA'], label='SARIMA')
plt.legend(loc='best')
plt.show()
Copy the code

The results

from sklearn.metrics import mean_squared_error
from math import sqrt

rms = sqrt(mean_squared_error(test['Count'], y_hat_avg['SARIMA']))
print(rms)

# 26.05142646431944
Copy the code

We can see the effect of using seasonal ARIMA is similar to that of Holt-Winters.

conclusion

Source of learning: STATION B and its classroom PPT, the code is reproduced

The essay is just a study note, recording a process from 0 to 1

Hope to help you, if there is a mistake welcome small partners to correct ~