• A Guide For Time Series Prediction Using Recurrent Neural Networks (LSTMs)
  • 原文作者:Neelabh Pant
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: haiyang – tju
  • Proofreader: TrWestdoor, yzrds

Long – term memory Networks (LSTMs) are used to predict future currency exchange rate changes

StatsbotThe team has published a paper onUse time series analysis for anomaly detectionIn the article. Today, we will discuss the prediction of time series using short – and long-term memory models (LSTMs). We asked data scientist Neelabh Pant to talk about his experience using a recurrent neural network to predict currency movements.

As an Indian living in the United States, there is a constant flow of money between my family and me. If the dollar strengthens in the market, the Indian rupee (INR) will fall, so an Indian will have to spend more rupees to buy a dollar. If the dollar is weak, you will pay fewer rupees for the same dollar.

If you can predict what the price of a dollar will be tomorrow, then it can guide your decisions, which is very important to minimize risk and maximize return. By looking at the advantages of neural networks, particularly recurrent neural networks, I came up with the idea of predicting the exchange rate between the US dollar and the Indian rupee.

There are many ways to predict exchange rates. For example:

  • Purchasing power parity (PPP), which takes inflation into account and calculates the difference in inflation.
  • The relative economic power method, which takes into account the growth of each country’s economy, predicts exchange rate movements.
  • Econometric models are another common exchange rate forecasting technique that can be customized according to factors or attributes that the forecaster considers important. Such factors or attributes may be characteristics such as differences in interest rates between different countries, growth rates of GDP and growth rates of income.
  • Time series models predict future exchange rate prices purely based on past behavior and price patterns.

In this article, we will show you how to use machine learning to perform time series analysis to predict future exchange rate changes.

Sequence problem

Let’s start with an order problem. The simplest machine learning problem involving sequences is a one-to-one problem.

One to one

In this case, the model has only one input data or input tensor, and the model generates a prediction based on the given input. Linear regression, classification, and image classification using convolutional networks all fall into this category. It can be extended to allow the model to use old values for input and output.

This is a one-to-many problem. A one-to-many problem starts just like a one-to-one problem, where the model has an input and generates an output. However, the output of the model is now fed back to the model as new input. The model can now generate a new output, and we can continue this loop indefinitely. Now you can see why these are called recurrent neural networks.

More than a pair of

Recursive neural networks are used to deal with sequence problems because they can all be connected to form a directed loop. In other words, they can remain in a state from one iteration to the next by using their own output as input to the next step. In programming terms, this is like running a fixed program with specific inputs and some internal variables. If we expand it out in time, the simplest recursive neural network can be thought of as a fully connected neural network.

RNN expanded on the timeline

In this univariate case, only two weights are involved. The weight u is multiplied by the current input xT, and the other weight W is multiplied by the previous output YT-1. This formula is similar to the exponential weighted moving average (EWMA) method, which combines past output values with current input values.

A deep recurrent neural network can be built by simply stacking neural network units. A simple recurrent neural network works only for short-term memory. If we have a need for long-term memory, we find it fundamentally deficient.

Long and short duration neural network

As we have already discussed, a fundamental problem with a simple circular network is that it cannot capture long-term dependencies in a sequence. The RNNs we built involved tracking long sequences of words when analyzing text and answering questions, so this would be a problem.

In the late 1990s, LSTM was proposed by Sepp Hochreiter and Jurgen Schmidhuber as an alternative to RNNs, Hidden Markov models, and many other sequential learning methods in applications, in which LSTM is insensitive to interval length.

LSTM network structure

The model is an operation unit that includes several basic operations. LSTM has an internal state variable that is passed from one cell to another and is modified by the operation gate.

  1. Forget the door

A sigmoid layer is used to take the output of the previous time node T-1 and the input of the current time node T, combine them into a tensor, and then apply a linear transformation afterwards. After activating the sigmoid function, the output of the forgetting gate is a value between 0 and 1. This number will be multiplied by the internal state, which is why it is called a forgetting gate. If ft=0, the internal state is completely forgotten, and if ft=1, it passes without changing anything.

  1. Enter the door

The input gate takes the previous output and the new input and passes it to another sigmoid layer. The input gate also returns a value between 0 and 1. The value returned by the input gate is then multiplied by the output of the candidate layer.

This layer mixes the input with the output of the previous layer, and then applies hyperbolic tangent activation, returning a candidate vector to add to the internal state.

Internal status update rules are as follows:

The previous state is multiplied by the forgetting gate output and then added to the new candidates allowed by the output gate.

  1. Output the door

The output gate controls how much internal state is passed to the output and works like other gate structures.

The three gate structures described above have independent weights and biases, so the network needs to learn how much of the past output needs to be retained, how much of the current input needs to be retained, and how much of the internal state needs to be transmitted to the output.

In the recursive neural network, not only the input data of the current network, but also the state data of the previous time of the network need to be input. For example, if I say “Hey! I was driving and this crazy thing happened, “and then part of your brain just flipped a switch and said,” Oh, this is a story that Neelabh told me, and the main character is Neelabh, and something happened on the road.” Now, you’re going to keep some information about what I just told you. When you listen to the other sentences, in order to understand the story, you have to retain some of the information from the previous sentence.

Another example is the use of recurrent neural networks for video processing. What happens in the current frame depends largely on what happened in the previous frame. Over time, a recurrent neural network should learn strategies for what to retain, how much to retain from past data, and how much to retain from the current state, making it more powerful than simple feed-forward neural networks.

Time series prediction

I was so impressed by the advantages of the recurrent neural network that I decided to use it to predict the exchange rate between the US dollar and the Indian rupee. The data set used in this project is exchange rate data from January 2, 1980 to August 10, 2017. Later, I’ll provide a link to download this data set and experiment with it.

Table 1 Examples of data sets

The data set shows the value of one dollar in rupees. From January 2, 1980 to August 10, 2017, we have 13,730 records.

USD to INR

The price of $1 rupee has been rising throughout the period. As we can see, the U.S. economy declined significantly in 2007 and 2008, largely due to the great Recession of that period. From the late 2000s to the early 2010s, global markets generally experienced a recession.

This has not been a good period for the world’s advanced economies, especially North America and Europe (including Russia), which have fallen into deep recessions. Many of the newer advanced economies have been much less affected, particularly China and India, which have seen huge growth over the period.

Training-test data division

Now, to train the model, we need to divide the data into test sets and training sets. When working with a time series, it is important to divide it into a training set and a test set by a specific date. So, what we don’t want to see is test data coming before training data.

In our experiment, we will define a date, such as January 1, 2010, as the date we split. The training data is from January 2, 1980 to December 31, 2009, with approximately 11,000 training data points.

The test data set was about 2,700 data points between January 1, 2010 and August 10, 2017.

Test – training data division

The next thing to do is normalize the data set. Just adjust and change the training data, and then change the test data. The reason for this is to assume that we do not know the size of the test data.

To normalize or transform the data is to apply a new scaling variable between 0 and 1.

Neural network model

Fully connected model is a simple neural network model, which is constructed as a single-input single-output regression model. It basically takes the price of the previous day and predicts the price of the next day.

We use mean square deviation as the loss function and stochastic gradient descent as the optimizer. A better local optimal value will be found after sufficient epochs training. The following is a summary of the full join layer.

Full connection layer profile

After training the model to 200 periods or whichever early_callbacks (that is, early-termination callbacks that satisfy the condition) are present, the model is trained to learn the pattern and behavior of the data. Since we have divided the data into training sets and test sets, we can now predict the corresponding values of the test data and compare them to the real values.

True value (blue) vs. Predicted Value (orange)

As you can see, this model does not perform well. It essentially repeats the previous data, with one slight change. The full-join model cannot predict future data from a single previous value. Now let’s try using a recurrent neural network and see how well it does.

Short and short term memory

The cyclic network model we use is a single-layer sequential model. Six LSTM nodes are used in the layer whose input dimension shape is (1,1), that is, there is only one input for the network.

Summary of LSTM model

The last layer is the dense layer (that is, the fully connected layer), the loss function is the mean square error, and the optimizer uses the stochastic gradient descent algorithm. We trained the model 200 times using the early_stopping callback. A summary of the model is shown in the figure above.

LSTM prediction

This model has been learned to reproduce the overall shape of the data over the course of the year, and without the delays of the simple feedforward neural network used earlier. But it still underestimates some observations, so the model certainly has room for improvement.

Model to modify

The model can be changed a lot to make it better. It is generally possible to change the model training configuration directly by modifying the optimizer. Another important change is the use of a sliding time window approach, which comes from the field of streaming data management systems.

This approach comes from the idea that only the most recent data are important. You can take one year’s model data and try to make a prediction for the first day of the next year. The sliding time window approach is very useful in capturing important patterns in a data set that are highly dependent on a large number of observations in the past.

You can try making changes to the model to your liking and see how the model reacts to those changes.

The data set

This data set is available in my Github account repository, Deep Learning in Python. Feel free to download and use it.

Useful resources

I personally followed some of my favorite Data scientists, such as Kirill Eremenko, Jose Portilla and Dan Van Boxel (known as Dan Does Data). Most of them can be found on different blog sites, where they blog on many different topics, such as RNN, convolutional neural networks and LSTM, and even the latest neuroturing machine technology.

Keep up to date with the various AI conferences. By the way, if you’re interested, Kirill Eremenko will be in San Diego this November to talk with his team about machine learning, neural networks and data science.

conclusion

The LSTM model is powerful enough to learn the most important past behaviors and understand whether these past behaviors are important features for future prediction. LSTM usage is high in many applications. Some applications include speech recognition, music synthesis and handwriting recognition, even in my current research on population mobility and travel prediction.

In my opinion, LSTM is like a model that has its own memory and can act like an intelligent person when making decisions.

Thanks again and have fun in the process of machine learning!

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.