Before I wrote this article, I did some research online and found that someone had written about how to use deep learning to predict the price of Bitcoin, so I thought about it and decided to predict the price of ethereum, also known as Ether, in addition to Bitcoin.

We’ll use a long – and short-term memory model (LSTM), a type of deep learning model that works well with temporal data, or any data that has temporal, spatial, or structural order, like movies, statements, and so on. If you’re not familiar with this model, I recommend reading this blog post.

Because I also want to keep readers who don’t know much about machine learning interested, I’m going to try not to zoom in on big chunks of code, so please understand. If you want to do the same, I’ve posted the data and code on GitHub, which you can download here.

Okay, without further ado, let’s get started!

To get the data

Before we build the model, we need to get some data for the model to learn from. There’s a data set on Kaggle that details bitcoin’s price per minute over the past few years, among other things (the same data set was used in the tutorial I watched on predicting bitcoin prices earlier). But on a minute-by-minute timescale, there’s a lot of noise data, so we chose daily prices. This creates a problem: we may not collect enough data (though we need hundreds of rows rather than millions of rows). In deep learning, no model can succeed with a severe lack of data. I also don’t want to rely on static files because they complicate the update process when updating the model with new data in the future. Instead, we pull data from virtual currency sites and apis.

Since we’ll mix the prices of two virtual currencies in one model, it might be a good idea to pull data from a single data source. We’ll use the website http://coinmarketcap.com. Currently we’re only looking at bitcoin and Ethereum data, but it wouldn’t be too hard to add other currencies this way. Before we import the data, we have to load some Python packages to make things easier.


import pandas as pd
import time
import seaborn as sns
import matplotlib.pyplot as plt
import datetime
import numpy as np

# get market info for bitcoin from the start of 2016 to the current day
bitcoin_market_info = pd.read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20130428&end="+time.strftime("%Y%m%d")) [0]# convert the date string to the correct date format
bitcoin_market_info = bitcoin_market_info.assign(Date=pd.to_datetime(bitcoin_market_info['Date']))
# when Volume is equal to '-' convert it to 0
bitcoin_market_info.loc[bitcoin_market_info['Volume'] = ="-".'Volume'] = 0# convert to int
bitcoin_market_info['Volume'] = bitcoin_market_info['Volume'].astype('int64')
# look at the first few rows
bitcoin_market_info.head()
Copy the code

What just happened? We loaded some Python packages and imported the data table shown above on Coinmarketcap.com. After a bit of data cleansing, we get the bitcoin price table above. Swap Bitcoin for Ethereum in the URL to get ethereum price data.

To prove the data is accurate, we can plot the price and capacity of the two currencies over time:

Training, testing and random walks

We’ve got some data, so we need a model first. In deep learning, data is usually divided into training sets and test sets. We use the training set to make the model learn, and then use the test set to evaluate the performance of the model. For the time series model, we generally predict one time series, and then test another time series. For example, if I set the deadline to June 1, 2017, I will train the model before that date and evaluate the model with data after that date.

As you can see, the training period mostly included periods when virtual currency prices were low. As a result, the training data may not be representative of the test data, compromising the model’s ability to generalize invisible data (you can try to make the data smoother). But why let the negative get in the way? Before using our machine learning model, it is worth discussing a simpler one. The most basic model is to set tomorrow’s price equal to today’s price (we call this a lag model). We define the model mathematically as follows:

In Internet link analysis and financial stock markets, when this simple model is extended, prices are often treated as a random walk (the concept is close to Brownian motion, which is the ideal mathematical state of Brownian motion). It means that the conserved quantities carried by any random walker correspond to a diffusion transport law respectively.

We will identify μ and σ from the training set and apply the random walk model to the bitcoin and Ethereum test sets.

Wow!!! Look at these prediction lines. Except for a few distortions, it basically tracks the actual price movements of each virtual currency. The model even caught a spike in mid-June and late August. However, as another blog post on bitcoin’s price forecast points out, models that predict the future at only one point can often be misleading because they don’t take into account errors in subsequent predictions. No matter how big or small the error is, it essentially gets reset at every point in time, because the real price is the input to the model. Bitcoin’s random walk is particularly misleading because the Y-axis value can be large, which makes the prediction line look smooth.

Unfortunately, single point forecasting is quite common in evaluating sequential models. So a better way to measure model accuracy is through multi-point prediction. This way, errors in previous forecasts are not reset, but are factored into subsequent forecasts. Thus, we define it mathematically as follows:

The prediction of the model is extremely sensitive to random seeds. For my ethereum prediction, I chose a complete, normal-looking interval random walk (figure below). You can also handle the following random seed values in Jupyter Notebook to see how bad things are.

Note that a single point random walk will always seem accurate, even if there is no real entity behind it. Therefore, you are expected to be skeptical of any predicted currency price in any blog. Those who want to buy virtual currency should not be fooled by market forecasts.

Long and Short-term Memory Model (LSTM)

As mentioned earlier, we use the long and short term memory model. There is no need to build a model from scratch, however, as there are many open source frameworks that use a variety of deep learning algorithms (TensorFlow, Keras, PyTorch, etc.). I chose Keras because I found it to be very suitable for people of my level. If you’re not familiar with Keras, check out this tutorial I wrote, or anyone else’s.

I created a new data framework called model_data. I have removed some of the previous columns (opening prices, daily high and low prices) and rerepresented some of the new columns. Close_off_high represents the difference between the close of the day and the high of the day, where -1 and 1 indicate that the close of the day equals the low and high of the day, respectively. The volatility column refers to the difference between the low and high prices separated by the open price. You may also notice that model_data is sorted from earliest to most recent. We don’t really need the date column anymore because we don’t need to enter this information into the model.

model_data.head()
Copy the code

Our short and long memory model uses previous data (for both Bitcoin and Ethereum) to predict the next day’s closing price for each currency. We must decide how many previous days of data the model needs to fetch. That’s arbitrary. I picked 10 days because 10 is a nice round number. We create a small data frame that contains 10 consecutive days of data (called Windows), so the first window will contain lines 0-9 of the training set (Python is zero indexed), the second window will contain lines 1-10, and so on. Choosing a smaller window means we can input more Windows into the model. A downtrend is when the model may not have enough information to detect complex long-term behavior (if this is the case).

Deep learning models do not like large changes in input data. Looking closely at the data columns, some values range from -1 to 1, while others are in the millions. We need to normalize the data, so our input data is consistent. In general, you want to be somewhere between -1 and 1. The Off_high and volatility columns are just as good as before. For the remaining columns, we normalize the input data to the first value in the window, as everyone else does.

This table represents an example of input to our short and long memory model (we actually have hundreds of similar tables). We have normalized the columns so that they are all equal to 0 at the first time, so our goal is to predict the price change at that point in time. We are now ready to build the LSTM model. Modeling with Keras is very fast, just putting a few pieces together. I have written a detailed tutorial for your reference.

# import the relevant Keras modules
from keras.models import Sequential
from keras.layers import Activation, Dense
from keras.layers import LSTM
from keras.layers import Dropout

def build_model(inputs, output_size, neurons, activ_func = "linear", dropout = 0.25, loss ="mae", optimizer="adam") : model = Sequential() model.add(LSTM(neurons, input_shape=(inputs.shape[1], inputs.shape[2]))) model.add(Dropout(dropout)) model.add(Dense(units=output_size)) model.add(Activation(activ_func)) model.compile(loss=loss, optimizer=optimizer)return model
Copy the code

Thus, the build_model function builds an empty model (model = Sequential), adding an LSTM layer. The LSTM layer has been adjusted to accommodate our input (n x m table, where n and M represent points in time/row and column values). Functions also contain some of the characteristics of neural networks, such as dropout and activation functions. Now we just need to specify the number of neurons to put into the LSTM layer (I chose 20) and the amount of data to train the model with.

# random seed for reproducibility
np.random.seed(202)
# initialise model architecture
eth_model = build_model(LSTM_training_inputs, output_size=1, neurons = 20)
# model output is next price normalised to 10th previous closing price
LSTM_training_outputs = (training_set['eth_Close'][window_len:].values/training_set['eth_Close'][:-window_len].values)-1
# train model on data
# note: eth_history contains information on the training error per epoch
eth_history = eth_model.fit(LSTM_training_inputs, LSTM_training_outputs, 
                            epochs=50, batch_size=1, verbose=2, shuffle=True)
#eth_preds = np.loadtxt('eth_preds.txt')
Copy the code
Epoch 50/50 6S-Loss: 0.0625Copy the code

We now have an LSTM model that can predict the next day’s closing price of Ethereum. Let’s see how it performs. We first tested its performance with the training set (i.e., data prior to June 2017). The number below the code represents the mean absolute error of the model over the training set after the 50th training iteration (or epoch). Instead of looking at the changes, we can look at the daily closing price of the model output.

We should not be too surprised that the model is so accurate. The model can capture the source of its error and then correct itself. In fact, it is not very difficult to achieve almost zero error training. Just add a few hundred neurons and train a few thousand epochs. We are more interested in how the model performs on the test set, because this represents how well the model predicts when faced with new data.

Leaving aside the misleadability of our single point forecast, the LSTM model we built seems to perform well with the new data. The most obvious error is its failure to detect the inevitable decline in prices following sudden increases (such as mid-June and October). In fact, this is a persistent error, only more pronounced at these peaks. The forecast price is basically the same as the next day’s price (e.g., price drop in mid-July). In addition, the models appear to systematically overestimate the future price of Ethereum, as the forecast line is always above the actual price line. I suspect this is because the training set represents a period of time when ethereum prices are skyrocketing, so the model expects this trend to continue. We can also build an identical LSTM model to predict the price of Bitcoin for the test set of bitcoin, as shown in the figure below:

Click here to get the full code.

As I emphasized earlier, single point forecasting is misleading. On closer inspection, you’ll notice that the forecast regularly reflects previous values (e.g., October). Our deep learning LSTM model regenerates an autoregressive model with order P in some places, where future values are the weighted sum of previous P-values. We can mathematically define an autoregressive model as follows:

So the prediction of the model is obviously not as satisfactory as the single point prediction. . However, I am glad that the model returns some very subtle actions (such as the second line in the Ethereum diagram). Models do not simply predict that prices will move in the same direction, so there are grounds for optimism.

Going back to the single-point prediction part, our deep learning neural network looks good, but so does the boring random walk model. Like the random walk model, the LSTM model is very sensitive to the random seeds selected (the model weight is set randomly at the beginning). So, if we want to compare these two models, we can run each model multiple times, say 25 times, to get an estimate of the model error. This error can be calculated as the difference between the actual and predicted closing prices in the test set.

Maybe AI deserves some praise!! These charts show the errors on the test set after 25 iterations of each model. LSTM’s prediction error for Bitcoin and Ethereum was 4% and 5%, respectively, crushing the random walk model.

Trying to beat the random walk model is too low. We can compare the LSTM model with more temporal models. Such as autoregressive model, ARIMA, weighted average method and so on. We’ll leave that task for later, or you can try it yourself. Again, hopefully when it comes to using deep learning to predict the price of virtual currency, everyone will be skeptical, because the technology isn’t perfect.

Warning: This article should not be taken as investment advice, and I hope you will not rely on this method to invest in virtual currencies lightly. This article is just to share my own thoughts on how to use deep learning to predict the price of virtual currencies. Financial management risks, investment should be cautious, up and down, please do what you can.

conclusion

We took some data on virtual currency and fed it into a really cool long and short-term memory model. Unfortunately, the model’s predictions almost simply repeat previous values. How can we improve the model?

  • Change the loss function: the mean absolute error doesn’t really encourage us to take risks. For example, in the case of absolute squared error, the LSTM model is forced to consider detecting peak values as more important. Moreover, a loss function better suited to the model would make the model less conservative.

  • Suppress autoregressive models that are too conservative: This can motivate deep learning algorithms to explore more interesting or risky models. This step is easier said than done!

  • Get more and better data: Even if past prices alone are a good predictor of future prices, we can add other data features that improve the predictive power of our models. This way, the LSTM model doesn’t have to rely solely on past price data and may unlock more complex functions. This step is probably the most rewarding but also the hardest part.

If you want to create an LSTM model yourself from scratch, you can click here to get all the Python code.

Thanks for reading!