Original link:tecdat.cn/?p=8461

Original source:Tuo End number according to the tribe public number

 

Time series prediction refers to the type of problem in which we must predict results based on time-dependent inputs. A typical example of time series data is stock market data, where stock prices change over time.

Recursive neural networks (RNN) have been proved to be effective in solving sequence problems. In particular, the long term short-term memory Network (LSTM), a variant of RNN, is currently being used in various fields to solve sequence problems.

Type of sequence problem

Sequence problems can be roughly divided into the following categories:

  1. One-to-one: There is one input and one output. A typical example of a one-to-one sequence problem is when you have an image and want to predict individual tags for that image.
  2. Many-to-one: In the many-to-one sequence problem, we take a sequence of data as input and must predict a single output. Text sorting is a prime example of a many-to-one sequence problem, where we have a sequence of word inputs and we want to predict an output label.
  3. One-to-many: In a one-to-many sequence problem, we have only one input and one output sequence. A typical example is an image and its accompanying description.
  4. Many-to-many: Many-to-many sequence problems involve both sequence inputs and sequence outputs. For example, take the stock price for 7 days as input and the stock price for the following 7 days as output. Chatbots are also an example of the many-to-many sequence problem, where a text sequence is the input and another text sequence is the output.

In this article, we’ll learn how to use LSTM and its different variants to solve one-to-one and many-to-one sequence problems.

After reading this article, you will be able to solve problems such as stock price forecasts and weather forecasts based on historical data. Since text is also a sequence of words, the knowledge gained in this paper can also be used to solve natural language processing tasks, such as text classification, language generation, etc.

One-to-one sequence problem

As I said before, in a one-to-one sequence problem, there is only one input and one output. In this section, we will look at two types of sequence problems. First, we’ll learn how to solve one-to-one sequence problems using a single feature, and then we’ll learn how to solve one-to-one sequence problems using multiple features.

One-to-one sequence problem of a single feature

In this section, we’ll see how to solve the one-to-one sequence problem where each time step has a feature.

First, we import the required libraries that we will use in this article:

from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers.core import Activation, Dropout, Dense
from keras.layers import Flatten, LSTM
from keras.layers import GlobalMaxPooling1D
from keras.models import Model
from keras.layers.embeddings import Embedding
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.layers import Input
from keras.layers.merge import Concatenate
from keras.layers import Bidirectional

import pandas as pd
import numpy as np
import re

import matplotlib.pyplot as plt
Copy the code

Creating a data set

In the next step, we will prepare the data set to be used in this section.

X = list()
Y = list()
X = [x+1 for x in range(20)]
Y = [y * 15 for y in X]

print(X)
print(Y)
Copy the code

In the script above, we create 20 inputs and 20 outputs. Each input contains a time step, which in turn contains a characteristic. Each output value is 15 times the corresponding input value. If you run the above script, you should see input and output values as follows:

[1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] [15, 30, 45, 60, 75, 90, 105, 120, 135, 150] 165, 180, 195, 210, 225, 240, 255, 270, 285, 300]Copy the code

The input for the LSTM layer should be 3D shape, i.e. (sample, time step, feature). A sample is the number of samples in the input data. There are 20 samples in the input. The time step is the number of time steps per sample. We have a time step. Finally, features correspond to the number of features per time step. Each time step has a characteristic.

 X = array(X).reshape(20, 1, 1)
Copy the code

Through the simple LSTM solution

Now we can create a simple LSTM model with an LSTM layer.

model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(1, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
print(model.summary())
Copy the code

In the script above, we created an LSTM model with an LSTM layer containing 50 neurons and relU activation functions. You can see that the input shape is (1,1) because our data has a characteristic time step.

Layer (type) Output Shape Param # ================================================================= lstm_16 (LSTM) (None, 50) 10400 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ dense_15 (Dense) (None, 1) 51 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 10451 Trainable params: 10451 Non - trainable params: 0Copy the code

Now let’s train the model:

Model.fit (X, Y, epochs=2000, validation_split=0.2, batch_size=5)Copy the code

We trained the model for 2000 periods with a batch size of 5. You can choose any number. After training the model, we can make predictions for new instances.

Suppose we want to predict the output with an input of 30. The actual output should be 30 x 15 =450. First, we need to convert the test data to the correct shape, namely 3D shape, as required by LSTM. The output of the following forecast number 30:


print(test_output)
Copy the code

I get an output value of 437.86 slightly less than 450.

By stacking LSTM solutions

Now let’s create a stacked LSTM to see if we can get better results. The data set will remain the same and the model will be changed. Look at the script below:


print(model.summary())
Copy the code

In the above model, we have two LSTM layers. Note that the first LSTM layer parameter return_SEQUENCES is set to True. When the return sequence is set to True, the output of each neuron’s hidden state is used as input for the next LSTM layer. A summary of the above models is as follows:

_____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ Layer (type) Output Shape Param # ================================================================= lstm_33 (LSTM) (None, 1, 50) 10400 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ lstm_34 (LSTM) (None, 50) 20200 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ dense_24 (Dense) (None, 1) 51 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 30651 Trainable params: 30651 Non - trainable params: 0 _____ _____ _____ _____ _____ _Copy the code

Next, we need to train our model, as shown in the following script:

print(test_output)
Copy the code

I got an output of 459.85, which is better than the number 437 we got with a single LSTM layer.

One-to-one sequence problems with multiple characteristics

In the last section, each input sample has a time step, in which each time step has a characteristic. In this section, we will see how to solve the problem of one-to-one sequences with multiple characteristics of input time steps.

Creating a data set

Start by creating the data set. Look at the script below:

nums = 25

X1 = list()
X2 = list()
X = list()
Y = list()


print(X1)
print(X2)
print(Y)
Copy the code

In the script above, we create three lists: X1, X2, and Y. Each list contains 25 elements, which means the total sample size is 25. Finally, Y contains the output. The X1, X2, and Y lists are printed below:

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50] 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 54, 57, 60, 63, 66, 69, 72, 75] 600, 726, 864, 1014, 1176, 1350, 1536, 1734, 1944, 2166, 2400, 2646, 2904, 3174, 3456, 3750]Copy the code

Each element in the output list is essentially a product of the corresponding element in the X1 and X2 lists. For example, the second element in the output list is 24, which is the product of the second element in the list (X1 is 4) and the second element in the list (X2 is 6).

The input will consist of a combination of X1 and X2 lists, where each list will be represented as a column. The following script creates the final input:

X = np.column_stack((X1, X2))
print(X)
Copy the code

Here is the output:

[[2 3] [4 6] [6 9] [8 12] [10 15] [12 18] [14 21] [16 24] [18 27] [20 30] [22 33] [24 36] [26 39] [28 42] [30 45] [32 48] [34 51] [36 54] [38 57] [40 60] [42 63] [44 66] [46 69] [48 72] [50 75]]Copy the code

You can see that it contains two columns, two characteristics per input. As mentioned earlier, we need to transform the input into a 3d shape. Our input has 25 samples, each of which contains 1 time step, and each time step contains 2 features. The following script can reshape the input.

X = array(X).reshape(25, 1, 2)
Copy the code

Through the simple LSTM solution

We are now ready to train our LSTM model. Let’s start by developing an LSTM model as we did in the previous section:

model = Sequential()
model.add(LSTM(80, activation='relu', input_shape=(1, 2)))

print(model.summary())
Copy the code

 

In this case, our LSTM layer contains 80 neurons. We have two neural layers, of which the first layer contains 10 neurons and the second dense layer (also known as the output layer) contains 1 neuron. An abstract of the model is as follows:

Layer (type) Output Shape Param # ================================================================= lstm_38 (LSTM) (None, 80) 26560 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ dense_29 (Dense) (None, 10) 810 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ dense_30 (Dense) (None, 1) 11 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 27381 Trainable params: 27381 Non - trainable params: 0 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ NoneCopy the code

The following script training model:

Model.fit (X, Y, epochs=2000, validation_split=0.2, batch_size=5)Copy the code

Let’s test our trained model on a new data point. Our data point will have two characteristics, namely (55,80) the actual output should be 55 x 80 =4400. Let’s see what our algorithm predicts. Execute the following script:


print(test_output)
Copy the code

My output is 3263.44, which is nowhere near the actual output.

By stacking LSTM solutions

Now, let’s create a more complex LSTM with multiple LSTMS and dense layers to see if we can improve our results:

model = Sequential()

print(model.summary())
Copy the code

The model is summarized as follows:

_____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ Layer (type) Output Shape Param # ================================================================= lstm_53 (LSTM) (None, 1, 200) 162400 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ lstm_54 (LSTM) (None, 1, 100) 120400 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ lstm_55 (LSTM) (None, 1, 50) 30200 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ lstm_56 (LSTM) (None, 25) 7600 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ dense_43 (Dense) (None, 20) 520 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ dense_44 (Dense) (None, 10) 210 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ dense_45 (Dense) (None, 1) 11 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 321341 Trainable params: 321341 Non - trainable params: 0Copy the code

The next step is to train our model and test it on test data points (i.e., (55,80)).

To improve accuracy, we will reduce the batch size, and because our model is more complex, we can now reduce the number of periods. The following script trains the LSTM model and makes predictions at test data points.


print(test_output)
Copy the code

In the output, I get a value of 3705.33 that is still less than 4400, but much better than the 3263.44 I got before using a single LSTM layer. You can combine different LSTM layers, dense layers, batch size, and number of periods to see if you get better results.

Many-to-one sequence problem

In the previous section, we saw how to use LSTM to solve one-to-one sequence problems. In one-to-one sequence problems, each sample contains a single time step for one or more features. Data with a single time step is not really sequential data. It turns out that densely connected neural networks perform better at single time step data.

The actual sequence data contains multiple time steps, such as stock market prices for the past 7 days, multi-word sentences, and so on.

In this section, we’ll see how to solve the many-to-one sequence problem. In the many-to-one sequence problem, each input sample has multiple time steps, but the output consists of a single element. Each time step in the input can have one or more characteristics. We’ll start with many-to-one sequence problems with one feature, and then we’ll see how to solve many-to-one problems with multiple features of input time steps.

With a singleCharacteristics of theThe many-to-one sequence problem of

Start by creating the data set. Our data set will contain 15 samples. Each sample will have three time steps, each of which will contain a single feature, a number. The output of each sample will be the sum of the numbers of each of the three time steps. For example, if our sample contained the sequences 4,5, and 6, the output would be 4 + 5 + 6 = 10.

Creating a data set

Start by creating a list of integers from 1 to 45. Since we are going to get 15 samples in the dataset, we will collate the list of integers containing the first 45 integers.

X = np.array([x+1 for x in range(45)])
print(X)
Copy the code

In the output, you should see the first 45 integers:

[12 34 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 and 44 45]Copy the code

We can use the following function to organize it into sample number, time step, and characteristics:

X = X.r eshape (15,3,1) print (X)Copy the code

The script above transforms list X into a 3d shape with 15 samples, 3 time steps, and 1 feature. The script above also outputs the adjusted data.

[[[1] [2] [3]] [[4] [5] [6]] [[7] [8] [9]] [[10] [11] [12]] [[13] [14] [15]] [[16] [17] [18]] [[19] [20] [21]] [[22] [23] [24]] [[25] [26] [27]] [[28] [29] [30]] [[31] [32] [33]] [[34] [35] [36]] [[37] [38] [39]] [[40] [41] [42]] [[43] [44] [45]]]Copy the code

Now that we have converted the input data to the correct format, let’s create the output vector. As I said earlier, each element in the output will be equal to the sum of the values in the time step in the corresponding input sample. The following script creates the output vector:

Y = list()
for x in X:

print(Y)
Copy the code

The output array Y looks like this:

[6 15 24 33 42 51 60 69 78 87 96 105 114 123 132]Copy the code

Through the simple LSTM solution

Now let’s create the model with an LSTM layer.

model = Sequential()

model.compile(optimizer='adam', loss='mse')
Copy the code

The following script trains our model:

history = model.fit(...)
Copy the code

After training the model, we can use it to make predictions for test data points. Let’s predict the output of the sequence of numbers 50, 51, 52. The actual output should be 50 + 51 + 52 =153. The following script transforms our test point into a 3d shape and then predicts the output:


print(test_output)
Copy the code

My output is 145.96, 7 points less than the actual output value of 153.

By stacking LSTM solutions

Now, let’s create a complex LSTM model with multiple layers to see if we can get better results. Execute the following script to create and train a complex model with multiple LSTMS and dense layers:

model = Sequential() model.compile(optimizer='adam', loss='mse') history = model.fit(X, Y, epochs=1000, Validation_split = 0.2, verbose = 1)Copy the code

Now let’s test the model on the test sequence (i.e. 50, 51, 52) :

 print(test_output)
Copy the code

My answer here is 155.37, which is better than the 145.96 we got earlier. In this case, our actual difference from 153 is only 2 points.

Through two-way LSTM solutions

Bidirectional LSTM is an LSTM that can learn from a sequence of inputs in both forward and reverse directions. The final sequence interpretation is learning forward and backward traversal. Let’s see if we can get better results using bidirectional LSTM.

The following script creates a bidirectional LSTM model with a bidirectional layer and a dense layer as the output of the model.

from keras.layers import Bidirectional
 
model.compile(optimizer='adam', loss='mse')
Copy the code

The following script trains the model and makes predictions based on test sequences 50, 51, and 52.


print(test_output)
Copy the code

I got 152.26, just a fraction of what I actually got. Therefore, we can conclude that bidirectional LSTMS with single layers outperform one-way LSTMS with single and stacked layers for our data set.

Many-to-one sequence problems with multiple characteristics

In the many-to-one sequence problem, we have an input where each time step contains multiple features. The output can be one value or multiple values, one for each feature in the input time step. We will cover both cases in this section.

Creating a data set

Our data set will contain 15 samples. Each sample will contain three time steps.

Let’s create two lists. One will contain multiples of 3 up to 135, a total of 45 elements. The second list will contain multiples of 5, from 1 to 225. The second list will also contain a total of 45 elements. The following script creates the two lists:

X1 = np.array([x+3 for x in range(0, 135, 3)])

print(X2)
Copy the code

You can see the contents of the list in the following output:

[36 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135] [5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225]Copy the code

Each list above represents a feature in the time sample. You can create an aggregated dataset by merging two lists as follows:

X = np.column_stack((X1, X2))
print(X)
Copy the code

Output shows the summarized data set:

[6 10] [9 15] [12 20] [15 25] [18 30] [21 35] [24 40] [27 45] [30 50] [33 55] [36 60] [39 65] [42 70] [45 75] [48 80] [51 85] [54 90] [57 95] [60 100] [63 105] [66 110] [69 115] [72 120] [75 125] [78 130] [81 135] [84 140] [87 145] [90 150] [93 155] [96 160] [99 165] [102 170] [105 175] [108 180] [111 185] [114 190] [117 195] [120 200] [123 205] [126 210] [129 215] [132 220] [135 225]]Copy the code

We need to reshape the data into three dimensions so that LSTM can use it. Our data set has 45 rows and two columns. We organized the dataset into 15 samples, 3 time steps and 2 features.

X = array(X).reshape(15, 3, 2)
print(X)
Copy the code

You can see 15 samples in the following output:

[[[35] [6 10] [9 15]] [[12 20] [15 25] [18 30]] [[21 35] [24 40] [27 45]] [[30 50] [33 55] [36 60]] [[39 65] [42 70] [45 75]] [[48 80] [51 85] [54 90]] [[57 95] [60 100] [63 105]] [[66 110] [69 115] [72 120]] [[75 125] [78 130] [81 135]] [[84 140] [87 145] [90 150]] [[93 155] [96 160] [99 165]] [[102 170] [105 175] [108 180]] [[111 185] [114 190] [117 195]] [[120 200] [123 205] [126 210]] [[129 215] [132 220] [135 225]]Copy the code

The output will also have 15 values corresponding to the 15 input samples. Each value in the output will be the sum of the two eigenvalues in the third time step of each input sample. For example, the third time step of the first sample has features 9 and 15, so the output will be 24. Similarly, the two eigenvalues in the third time step of the second sample are 18 and 30 respectively; The two eigenvalues in the second time step are 18 and 30 respectively. The corresponding output will be 48, and so on.

The following script creates and displays the output vector:

[24 48 72 96 120 144 168 192 216 240 264 288 312 336 360]Copy the code

Now let’s solve the many-to-one sequence problem with simple, stacked, and bidirectional LSTM.

Through the simple LSTM solution

Model = Sequential() history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1)Copy the code

The model is trained. We will create a test data point, and we will use our model to predict the test point.


print(test_output)
Copy the code

The sum of the two features of the input third time step is 14 + 61 =75. Our model with an LSTM layer predicts 73.41, which is pretty close.

By stacking LSTM solutions

The following script trains stacked LSTM and makes predictions at test points:

model = Sequential()
model.add(LSTM(200, activation='relu', return_sequences=True, input_shape=(3, 2)))

print(test_output)
Copy the code

The output I received was 71.56, which is worse than simple LSTM. It seems that our stacked LSTM overfits.

Through two-way LSTM solutions

Here is a simple bidirectional LSTM training script, with code for predicting test data points:

from keras.layers import Bidirectional

model = Sequential()

print(test_output)
Copy the code

The output is 76.82, very close to 75. Similarly, bidirectional LSTM seems to outperform other algorithms.

So far, we have predicted a single value based on multiple factor values from different time steps. In the other case of a many-to-one sequence, you want to predict a value for each feature in the time step. For example, the data set we use in this section has three time steps, each with two characteristics. We may wish to predict the individual value of each feature series. The following example is clear, assuming we have the following input:

[[3 5] [6 10] [9 15]]Copy the code

In the output, we need a time step with two characteristics, as follows:

[12, 20]
Copy the code

You can see that the first value in the output is a continuation of the first sequence and the second value is a continuation of the second sequence. We can solve such problems by simply changing the number of neurons in the output dense layer to the number of eigenvalues in the desired output. But first we need to update the output vector Y. The input vector will remain the same:

Y = list()
for x in X:

print(Y)
Copy the code

The above script creates an updated output vector and prints it, which looks like this:

[[12 20] [21 35] [30 50] [39 65] [48 80] [57 95] [66 110] [75 125] [84 140] [93 155] [102 170] [111 185] [120 200] [129 215] [138 230]]Copy the code

Now, let’s train our simple, stacked and bidirectional LSTM network on data sets. The following script trains a simple LSTM:

model = Sequential()
...
history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1)
Copy the code

The next step is to test our model on test data points. The following script creates a test data point:

Test_input = array ([[20], [33] 6 23, 26, 44] [])... print(test_output)Copy the code

The actual output is [29,45]. Our model predicts [29.089157, 48.469097], which is pretty close.

Now let’s train a stacked LSTM and predict the output of the test data points:

model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(3, 2)))
...
print(test_output)
Copy the code

The output is [29.170143, 48.688267], again very close to the actual output.

Finally, we can train two-way LSTM and make predictions at test points:

from keras.layers import Bidirectional

model = Sequential()
...
print(test_output)
Copy the code

The output is [29.2071, 48.737988].

Again, you can see that two-way LSTM makes the most accurate predictions.

conclusion

Simple neural networks are not suitable for solving sequence problems, where we need to trace previous inputs in addition to the current ones. Neural networks with some kind of memory are better suited to solving sequence problems. LSTM is one such network.

  


Most welcome insight

1. Python for NLP: Classification using Keras’s multi-label text LSTM neural network

2. In Python, LSTM is used for time series predictive analysis — predicting power consumption data

3. Python uses LSTM in Keras to solve sequence problems

4.Python uses PyTorch machine learning classification to predict bank customer churn model

5.R language multivariate Copula GARCH model time series prediction

6. Use GAM (Generalized additive Model) in R language for power load time series analysis

7. ARMA, ARIMA (Box-Jenkins), SARIMA and ARIMAX models in R language are used to predict time series numbers

8. Empirical study on time series estimation of time-varying VAR model with R language

9. Time series analysis was carried out using the generalized additive model GAM