From the Medium

Heart of the machine compiles

Participants: Jiang Siyuan, Li Yazhou, Liu Xiaokun

The STATWORX team recently culled S&P 500 data from the Google Finance API, which contains index and stock price information for the S&P 500. Armed with this data, they hope to use deep learning models and 500 component stock prices to forecast the S&P 500. The STATWORX team’s data set is novel, but it only uses a fully connected network of four hidden layers to make predictions. Readers can also download the data to try out a better circular neural network.

This article is ideal for beginners to learn how to build basic neural networks using TensorFlow. It provides a comprehensive overview of the concepts and modules involved in building a TensorFlow model. The data set used in this article can be downloaded directly, so readers with some basic knowledge can also try to process this kind of sequential data using stronger cyclic neural networks.

Data set address: http://files.statworx.com/sp500.zip

Import and preprocess data

The STATWORX team crawls the stock data from the server and saves it as a CSV file. The dataset contains n=41266 minutes of records covering 500 stocks and the S&P 500 index from April to August 2017, covering a wide range of stocks and indexes.


     
  1. # Import data

  2. data = pd.read_csv('data_stocks.csv')

  3. # Dimensions of dataset

  4. n = data.shape[0]

  5. p = data.shape[1]

Copy the code

The data set has been cleaned and preprocessed, i.e. the lost stocks and indices are processed by the LOCF’ed (the next observation is copied before), so there are no missing values in the data set.

We can plot the S&P timing data using the Pyplot.plot (‘SP500’) statement.

Time series plot of S&P 500 stock index

Prepare training and test data

This data set needs to be split into training and test data, and the training data contains 80% of the records of the total data set. The data set does not need to be scrambled but sliced sequentially. The training data can be selected from April 2017 to the end of July 2017, while the test data can be selected from the remaining data until August 2017.


     
  1. # Training and test data

  2. train_start = 0

  3. Train_end = int (np) floor (0.8 * n))

  4. test_start = train_end + 1

  5. test_end = n

  6. data_train = data[np.arange(train_start, train_end), :]

  7. data_test = data[np.arange(test_start, test_end), :]

Copy the code

There are many different ways to cross-validate time series, such as performing a rolling forecast with or without refitting, or more detailed strategies such as time series bootstrap resampling. The latter involves repeating samples of periodic decomposition of time series in order to simulate samples of the same periodic pattern as the original time series, but this is not simply copying their values.

Data standardization

Most neural network architectures require standardized data because activation functions of most neurons such as TANH and SigmoID are defined within the interval [-1, 1] or [0, 1]. At present, linear correction unit ReLU activation function is the most commonly used, but its range has lower bound but no upper bound. However, we should rescale the range of the input and target values anyway, which is also useful for using the gradient descent algorithm. Scaling values can be easily implemented using SkLearn’s MinMaxScaler.


     
  1. # Scale data

  2. from sklearn.preprocessing import MinMaxScaler

  3. scaler = MinMaxScaler()

  4. scaler.fit(data_train)

  5. scaler.transform(data_train)

  6. scaler.transform(data_test)

  7. # Build X and y

  8. X_train = data_train[:, 1:]

  9. y_train = data_train[:, 0]

  10. X_test = data_test[:, 1:]

  11. y_test = data_test[:, 0]pycharm

Copy the code

Note that we must be careful to determine when to scale which part of the data. A common mistake is to scale the entire test and training data sets before splitting them. Because when we perform scaling, we are involved in calculating statistics, such as the maximum and minimum values of a variable. But in the real world we don’t have future observations, so we have to scale the training data and apply the statistics to the test data. Otherwise, we use future temporal prediction information, which tends to skew the predictive measures in a positive direction.

TensorFlow profile

TensorFlow is an excellent framework that is currently the most widely used framework for deep learning and neural networks. It is based on the underlying back end of C++, but is usually controlled through Python. TensorFlow uses powerful static diagrams to represent the algorithms and operations we need to design. This method allows users to specify operations as nodes in a graph and transfer data as tensors to achieve efficient algorithm design. Because neural networks are actually computational graphs of data and mathematical operations, TensorFlow supports neural networks and deep learning well.

TensorFlow is an open source software library that uses Data Flow Graphs for numerical calculations. Tensor means you’re passing tensors, Flow means you’re using graphs. The data flow graph uses a directed graph composed of nodes and edges to describe mathematical operations. “Node” is generally used to refer to the applied mathematical operation, but can also refer to the starting point of data input and the end point of output, or the end point of reading/writing persistent variables. Edges represent input/output relationships between nodes. These edges can carry dynamically adjusted multidimensional arrays, called tensors.

A simple graph that performs addition

In the figure above, the addition task will be performed by two zero-dimensional tensors (scalars) stored in two variables A and B. These two values flow through the graph and are added when they reach the square node, the result of which is stored in variable C. In fact, a, B, and C can be used as placeholders, and any values entered into A and B will be added to C. This is the basic principle of TensorFlow. Users can define an abstract representation of the model through placeholders and variables, and then populate the placeholders with actual data to generate actual operations. The following code implements the simple calculation diagram shown above:


     
  1. # Import TensorFlow

  2. import tensorflow as tf

  3. # Define a and b as placeholders

  4. a = tf.placeholder(dtype=tf.int8)

  5. b = tf.placeholder(dtype=tf.int8)

  6. # Define the addition

  7. c = tf.add(a, b)

  8. # Initialize the graph

  9. graph = tf.Session()

  10. # Run the graph

  11. graph.run(c, feed_dict{a: 5, b: 4})

Copy the code

As above, after importing the TensorFlow library, use tf.placeholder() to define two placeholders to pre-store tensors A and B. You can then define the operation and execute the graph to get the result.

A placeholder

As mentioned earlier, neural networks begin with placeholders. So now we define two placeholders to fit the model, X containing the neural network’s inputs (all S&P 500 stock prices at time T= T) and Y containing the neural network’s outputs (S&P 500 index values at time T= T +1).

Therefore, the dimensions of the input data placeholders can be defined as [None, n_stocks], and the dimensions of the output placeholders are [None], which represent two-dimensional and one-dimensional tensors respectively. Understanding the dimensions of input and output tensors is important for constructing the entire neural network.


     
  1. # Placeholder

  2. X = tf.placeholder(dtype=tf.float32, shape=[None, n_stocks])

  3. Y = tf.placeholder(dtype=tf.float32, shape=[None])

Copy the code

None in the code above means that we do not yet know how many batches are passed to the neural network, so using None allows flexibility. We will later define batch_size to control the batch size used for each training session.

variable

In addition to placeholders, variables are another important element of TensorFlow’s representation of data and operations. While placeholders are commonly used to store input and output data within a computation diagram, variables are very flexible containers within a computation diagram that can be modified and passed during execution. Weights and bias terms of neural networks are generally defined by variables so that they can be easily adjusted during training. Variables need to be initialized, as explained in more detail later.

The model consists of four hidden layers, the first of which contains 1024 neurons, then the next three layers are successively reduced by multiples of two, i.e. 512, 256 and 128 neurons. The number of neurons at the next level in turn compresses the features extracted from the previous level. And, of course, we can also use other neural network architecture and configuration of neurons to better deal with the data, such as convolution neural network architecture is suitable for processing image data, loop neural network is suitable for processing temporal data, but this article is to grasp briefly introduce how to use a fully connected network processing time-series data, so the complex architecture will not be discussed in this paper.


     
  1. # Model architecture parameters

  2. n_stocks = 500

  3. n_neurons_1 = 1024

  4. n_neurons_2 = 512

  5. n_neurons_3 = 256

  6. n_neurons_4 = 128

  7. n_target = 1

  8. # Layer 1: Variables for hidden weights and biases

  9. W_hidden_1 = tf.Variable(weight_initializer([n_stocks, n_neurons_1]))

  10. bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))

  11. # Layer 2: Variables for hidden weights and biases

  12. W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))

  13. bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))

  14. # Layer 3: Variables for hidden weights and biases

  15. W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))

  16. bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))

  17. # Layer 4: Variables for hidden weights and biases

  18. W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))

  19. bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))

  20. # Output layer: Variables for output weights and biases

  21. W_out = tf.Variable(weight_initializer([n_neurons_4, n_target]))

  22. bias_out = tf.Variable(bias_initializer([n_target]))

Copy the code

Understanding the dimensional transformation of variables between the input, hidden, and output layers is important to understanding the entire network. As an empirical rule of multilevel perceptrons, the first dimension of the later level corresponds to the second dimension of the weight variable of the previous level. This might sound complicated, but it’s really just passing the output of each layer as input to the next layer. The dimension of the offset term is equal to the second dimension of the weight of the current level, which is also equal to the number of neurons in that level.

Design the architecture of the neural network

After defining the weight matrix and bias vector required by the neural network, we need to specify the topology or network architecture of the neural network. So placeholders (data) and variables (weights and bias items) need to be combined into a continuous matrix multiplication system.

In addition, each neuron in the hidden layer of the network also needs to have an activation function for nonlinear transformation. Activation functions are a very important part of network architecture because they introduce nonlinearity into the system. At present, there are many activation functions, among which the most common is the linear correction unit ReLU activation function, which will also be used in this model.


     
  1. # Hidden layer

  2. hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))

  3. hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2), bias_hidden_2))

  4. hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3), bias_hidden_3))

  5. hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4), bias_hidden_4))

  6. # Output layer (must be transposed)

  7. out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))

Copy the code

The diagram below shows the neural network architecture constructed in this paper. The model is mainly composed of three building blocks, namely the input layer, the hidden layer and the output layer. This architecture is called a feedforward network or fully connected network. Feedforward means that incoming batch data flows only from left to right. Other architectures, such as cyclic neural networks, also allow data to flow backward.

The core architecture of feedforward networks

Loss function

The loss function of the network is mainly used to generate the deviation between the network prediction and the actual observed training target. The mean square error (MSE) function is most commonly used for regression problems. MSE calculates the mean square error between the predicted value and the target value.


     
  1. # Cost function

  2. mse = tf.reduce_mean(tf.squared_difference(out, Y))

Copy the code

However, the features of MSE are advantageous for common optimization problems.

The optimizer

The optimizer deals with the calculations necessary to accommodate network weights and deviation variables during training. These calculations call the gradient calculation results to indicate the direction in which the weights and deviations need to be changed during the training process, thus minimizing the cost function of the network. The development of stable and fast optimizer is always an important research in the field of neural network and deep learning.


     
  1. # Optimizer

  2. opt = tf.train.AdamOptimizer().minimize(mse)

Copy the code

The above uses the Adam optimizer, which is currently the default optimizer in deep learning. Adam stands for adaptive moment estimation and can be considered as a combination of the optimizers AdaGrad and RMSProp.

The initializer

Initializers are used to initialize variables of the network prior to training. Because neural networks are trained using numerical optimization techniques, the starting point of optimization problems is the focus of finding good solutions. There are different initializers in TensorFlow, each with a different initialization method. In this article, I used tF.variance_scaling_initializer (), which is the default initialization strategy.


     
  1. # Initializers

  2. sigma =

  3. weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma)

  4. bias_initializer = tf.zeros_initializer()

Copy the code

Note that you can define multiple initialization functions for different variables using TensorFlow’s computation diagram. In most cases, however, a unified initialization function is sufficient.

Fitting neural network

Once you have defined the network’s placeholders, variables, initializers, cost functions, and optimizers, you can start training the model, usually using a small-batch training method. During small-batch training, data samples of n=batch_size are randomly extracted from the training data and fed to the network. The training data set will be divided into N/batCH_size batches and fed to the network in sequence. This is where placeholders X and Y come into play, which hold input data and target data and are represented in the network as input and target, respectively.

A batch of X data flows forward through the network until it reaches the output layer. At the output layer, TensorFlow will compare the model prediction of the current batch with the actual observed object Y. TensorFlow then optimizes, updating the parameters of the network with the chosen learning scheme. After updating the weights and deviations, the next lot is sampled and the process is repeated. This process will continue until all batches have been fed into the network, completing an epoch.

Network training stops when training reaches the epoch maximum or other user-defined stop criteria.


     
  1. # Run initializer

  2. net.run(tf.global_variables_initializer())

  3. # Setup interactive plot

  4. plt.ion()

  5. fig = plt.figure()

  6. ax1 = fig.add_subplot(111)

  7. line1, = ax1.plot(y_test)

  8. Line2 = ax1. The plot (y_test * 0.5)

  9. plt.show()

  10. # Number of epochs and batch size

  11. epochs = 10

  12. batch_size = 256for e in range(epochs):

  13. # Shuffle training data

  14. shuffle_indices = np.random.permutation(np.arange(len(y_train)))

  15. X_train = X_train[shuffle_indices]

  16. y_train = y_train[shuffle_indices]

  17. # Minibatch training

  18. for i in range(0, len(y_train) // batch_size):

  19. start = i * batch_size

  20. batch_x = X_train[start:start + batch_size]

  21. batch_y = y_train[start:start + batch_size]

  22. # Run optimizer with batch

  23. net.run(opt, feed_dict={X: batch_x, Y: batch_y})

  24. # Show progress

  25. if np.mod(i, 5) == 0:

  26. # Prediction

  27. pred = net.run(out, feed_dict={X: X_test})

  28. line2.set_ydata(pred)

  29. plt.title('Epoch ' + str(e) + ', Batch ' + str(i))

  30. file_name = 'img/epoch_' + str(e) + '_batch_' + str(i) + '.jpg'

  31. plt.savefig(file_name)

  32. PLT. Pause (0.01)

Copy the code

During the training process, we evaluated the predictive ability of the network on the test set (data that had not been learned by the network), and conducted it once in every 5 batches of training, and presented the results. In addition, these images will be exported to disk and combined into a video animation of the training process. The model can quickly learn the position and shape of the time series in the test data and generate accurate predictions after training with several epochs. That’s great!

As you can see, the network quickly ADAPTS to the basic shape of time series and can continue to learn finer patterns of data. This is attributed to the Adam learning scheme, which reduces the learning rate during model training to avoid missing the minimum. After 10 epochs, we fit the test data perfectly! The final test MSE is equal to 0.00078, which is very low because the target is scaled. The average margin of error for the test set’s predictions was 5.31%, which is a good result.

Scatter plot of forecast and actual S&P prices (scaled)

Note that there are many ways to further optimize this result: the design of layers and neurons, the choice of different initialization and activation schemes, the introduction of dropout layers for neurons, the application of early stop methods, and so on. In addition, different types of deep learning models, such as recurrent neural networks, may achieve better results in this task. However, that is beyond our scope of discussion.

Conclusions and Prospects

The release of TensorFlow is a milestone event in deep learning research. Its high flexibility and powerful performance enable researchers to develop all kinds of complex neural network architectures and other machine learning algorithms. However, flexibility comes at the cost of longer modeling time than using advanced apis such as Keras or MxNet. Nevertheless, I believe TensorFlow will continue to evolve and become a realistic standard for research and practical applications of neural networks and deep learning development. Many of our customers already use TensorFlow or are developing projects that apply the TensorFlow model. Our STATWORX data science adviser (https://www.statworx.com/de/data-science/) is done with TensorFlow basic research lesson to develop deep learning and neural network.

What are Google’s future plans for TensorFlow? At least in my opinion, TensorFlow lacks a clean graphical user interface for designing and developing neural network architectures on the TensorFlow back end. Maybe this is one of Google’s future goals 🙂

The original link: https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-30505541d877

Please click on the homepage of the competition US to enter the registration channel of the US division