Abstract: This article will share how the recurrent neural network LSTM RNN implements regression prediction.

Cyclic Neural Network LSTM RNN Regression Case sin Curve Prediction by Eastmount

Review of RNN and LSTM

1.RNN

(1) the principle of RNN

It’s called Recurrent Neural Networks, or RNN for short. Assume that there is a group of data data0, datA1, datA2 and data3, and use the same neural network to predict them and get the corresponding results. If there is a relationship between data, such as the steps before and after cooking and the order of English words, how can the relationship between data be learned by neural networks? And that’s going to be RNN.

If there is an ABCD number and the next number E needs to be predicted, the prediction will be made according to the previous ABCD order, which is called memory. Before prediction, it is necessary to review the previous memory, add the new memory points in this step, and finally output output. Cyclic neural network (RNN) makes use of this principle.

First, let’s think about how humans analyze correlations or sequences of things. Humans usually remember past events to help us decide what to do later, but could computers do the same?

When analyzing datA0, we store the analysis results into Memory, and then when analyzing DATA1, the neural network (NN) will generate new memories, but at this time the new memories are not associated with the old memories, as shown in the figure above. In RNN, we will simply call the old memory to analyze the new memory. If we continue to analyze more data, NN will accumulate all the previous memories.

The structure of RNN is shown in the figure below. According to time points t-1, t and t+1, there is a different X at each moment. The state of the previous step and x(t) of this step will be considered in each calculation, and then y value will be output. In this mathematical form, s(t) will be generated after every RNN runs. When RNN wants to analyze X (t+1), y(t+1) at the moment is created by S (t) and S (t+1) together, and S (t) can be regarded as the memory of the last step. The accumulation of multiple neural networks NN is converted into a recurrent neural network, whose simplified figure is shown on the left side of the figure below.

In short, RNN can be used as long as your data is sequential, such as the order of human speech, the order of phone numbers, the order of image pixels, the order of ABC letters, etc. In the previous explanation of CNN principle, it can be seen as a filter sliding scanning the whole image, which deepens the neural network’s understanding of the image through convolution.

RNN has the same scanning effect, but with the addition of chronological and memory functions. RNN can capture dynamic information in serialized data and improve prediction results by hiding periodic connections in layers.

(2) the RNN applications

RNN is commonly used in natural language processing, machine translation, speech recognition, image recognition and other fields.

  • RNN emotion analysis: When analyzing whether a person speaks positively or negatively, we use the RNN structure as shown in the figure below, which has N inputs and 1 output. The Y value at the last time point represents the final output result.

  • RNN Image recognition: at this time there is one picture input X, N corresponding output.

  • RNN Machine Translation: There are two input and output respectively, corresponding to Chinese and English, as shown in the figure below.

2.LSTM

Let’s look at a more powerful structure called LSTM.

(1) Why is LSTM introduced?

RNN learns on ordered data. RNN will remember the previous data just like people, but sometimes it will forget what was said just like grandpa. In order to solve this shortcoming of RNN, LTSM technology is proposed. Its English full name is Long short-term memory, and it is also one of the most popular RNN.

Suppose now there is a sentence, as shown in the picture below, RNN determines that the sentence is braised sparerib in brown sauce, then needs to learn, and “braised sparerib in brown sauce” is at the beginning of the sentence.

The word “braised ribs” takes a long journey to arrive, a series of errors, and then it goes back, multiplying by a weight w parameter at each step. If the weight multiplied by is a number less than 1, such as 0.9, 0.9 will keep multiplying the error, and eventually when the value is passed to the initial value, the error disappears, which is called gradient extinction or gradient dispersion.

Conversely, if the error is a large number, such as 1.1, then the RNN will get a large value, which is called a gradient explosion.

Gradient extinction or gradient explosion: In an RNN, if your State is a long sequence, assuming that the error value of the backpass is a number less than 1, each backpass is multiplied by that number, and 0.9 to the n power tends to 0, and 1.1 to the n power tends to infinity, this causes gradient extinction or gradient explosion.

This is also the reason why RNN does not recover its memory. In order to solve the problem of gradient disappearance or gradient explosion encountered in RNN gradient descent, LSTM is introduced.

(2) LSTM

LSTM is an improvement on ordinary RNN, LSTM RNN has three more controllers, namely input, output, forget controller. The left side has a main line, such as the main plot of a movie, and the original RNN system has become a split plot, and all three controllers are on the split line.

  • Write Gate: Set a gate when entering the input. The gate is used to decide whether to write the input to Memory. It is a parameter that can be trained to control whether to remember the current point.
  • Read Gate: A gate at the output location that determines whether to read current Memory.
  • Forget gate: Handles the location of the forget controller to determine whether to forget the previous Memory.

The working principle of LSTM is as follows: If the split-line plot is very important to the final result, the input controller will write the split-line plot into the main plot according to the importance degree, and then analyze it. If the split story changes what we thought before, then the forget controller will forget some of the main story and then replace the new story proportionally, so the main story update depends on input and forget control; The final output is based on the main story and the split story.

Our RNN is well controlled through these three gates, and based on these control mechanisms, LSTM is a good medicine for delaying memory, leading to better results.

LSTM RNN regression case description

Previously, we explained the classification problem of RNN and CNN. This article will share a regression problem. In the regression case of LSTM RNN, we want to use the blue dotted line to predict the red solid line. Since the sine curve is a wave cycle, the RNN will use one sequence to predict another sequence.

The basic structure of the code includes:

  • (1) Function get_batch()

  • (2) Subject LSTM RNN

  • (3) Three-layer neural network, including input_layer, cell and output_layer, has the same structure as the previous classification RNN.

    def add_input_layer(self,): pass def add_cell(self): pass

    def add_output_layer(self): pass

  • (4) Calculate the error function computer_cost

  • (5) Error weight and bias biases

  • (6) Main function to establish LSTM RNN model

  • (7) TensorBoard visual neural network model, Matplotlib visual fitting curve,

Finally, add BPTT, and we’ll start coding.

(1) general RNN

Suppose we train a sequence containing 1,000,000 data. If we train all the sequences, the whole sequence will be fed into the RNN, which is easy to cause the problem of gradient disappearance or explosion. Therefore, the solution is Truncated Backpropagation (BPTT). We truncate the sequence for training (num_steps).

Generally, truncated back propagation is as follows: at the current time t, backward propagation is as follows: num_steps. As shown in the figure below, the sequence of length 6, truncation step number 3, Initial State and Final State are transmitted in RNN Cell.

(2) BPTT of TensorFlow version

However, the implementation of Tensorflow is different. It divides the sequence of length 6 into two parts, each with length 3. The final state calculated from the first part is used for the initial state calculated from the next part. As shown in the figure below, each batch conducts separate truncation backpropagation. The batch saves the Final State as the initialization State of the next batch.

Reference: Deep Learning (07) RNN- Recurrent Neural Networks -02- Implementation in Tensorflow – Lawlite

Three. Code implementation

First, open Anaconda, then select the “TensorFlow” environment you have set up and run Spyder.

Step 2: Import the extension package.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
Copy the code

Third, write the function get_batch() that generates the data, which generates the sequence of sin curves.

Def get_batch(): global BATCH_START, TIME_STEPS # xs shape (50batch, 20steps) xs = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE).reshape((BATCH_SIZE, TIME_STEPS))/(10*np.pi) seq = Np.sin (xs) res = Np.cos (xs) BATCH_START += TIME_STEPS # Res [0, :], 'r', xs [0, :], seq [0, :], '-' b) PLT. The show () # return sequence seq results res input xs return [seq [:, :, np. Newaxis], res [:, :, np.newaxis], xs]Copy the code

At this point, the output result is shown in the figure below. Note that it is only the expected curve of the simulation and not the structure of our neural network learning.

Step 4, write the LSTMRNN class, which defines our circular neural network structure, initialization operations, and required variables.

The parameters to initialize init() include:

  • N_steps Indicates the steps in Batch. There are three steps.
  • Input_size indicates the length of each input when batch data is passed in. In this example, input_size and output_size are both 1. As shown in the figure below, assume that the batch length is a cycle (0-6), each input is the x value of the line, input size represents the number of values at each time point, and there is only one point, so it is 1.
  • Output_size represents the output value, the y value corresponding to the input line, which has a size of 1.
  • Cell_size indicates the number of RNN cells. The value is 10.
  • Batch_size Specifies the batch number to be sent to the neural network at a time. Set this parameter to 50.

This part of the code is as follows, notice the shape of xs and ys. Meanwhile, we need to visualize the structure of the RNN using Tensorboard, so we call tf.name_scope() to set the namespace name of each neural layer and variable, as described in article 5.

# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- define the parameters -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- BATCH_START = 0 TIME_STEPS BATCH_SIZE = = 20 50 # BATCH quantity INPUT_SIZE = 1 # Input a value OUTPUT_SIZE = 1 # Output a value CELL_SIZE = 10 # Cell quantity LR = 0.006 BATCH_START_TEST = 0 #----------------------------------LSTM RNN---------------------------------- class LSTMRNN(object): Def __init__(self, n_steps, input_size, output_size, cell_size, batch_size): self.n_steps = n_steps self.input_size = input_size self.output_size = output_size self.cell_size = cell_size Self. batch_size = batch_size # TensorBoard visual operation using name_scope with tf.name_scope('inputs'): Self. ys = tf.placeholder(tF.placeholder) self.ys = tf.placeholder(tF.placeholder) self.xs = tf.placeholder(tF.placeholder, [None, n_steps, input_size], name=' tF.placeholder ') self.ys = tf.placeholder(tF.placeholder, [None, n_steps, output_size], name='ys') with tf.variable_scope('in_hidden'): Self.add_input_layer () with tf.variable_scope('LSTM_cell'): Self.add_output_layer () with tf.name_scope('cost'): self.add_output_layer() with tf.name_scope('cost'): Self. Train_op = tf.train.AdamOptimizer(LR).minimize(self.cost)Copy the code

Step 5, then start writing the three functions (three-layer neural network), which are the core structure of RNN.

Def add_input_layer(self): pass # cell layer def add_input_layer(self): pass # cell layer def add_input_layer(self): pass # cell layer def add_input_layer(self): pass # cell layer def add_input_layer(self): pass # cell layer def add_input_layer(self): passCopy the code

These three functions are also added to the Class of LSTMRNN. The core code and detailed comments are as follows:

# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- three layer structure defines the core -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # input layer def add_input_layer (self) : 0 # [None, N_steps, input_size] => (Batch *n_step, in_size) 0 l_in_x = tf.0 (self.xs, [-1, 0, 0) Self.input_size], name='2_2D') Ws_in = self._weight_variable([self.input_size, Self.cell_size]) # Define input bias (cell_size,) bs_in = self._bias_variable([self.cell_size,]) # define output y variable 2d shape (Batch * n_steps, cell_size) with tf.name_scope('Wx_plus_b'): L_in_y = tf.matmul(l_in_x, Ws_in) + bs_in # L_in_Y ==> cell_size) self.l_in_y = tf.reshape(l_in_y, [-1, self.n_steps, self.cell_size], Name ='2_3D') # def add_cell(self): LSTM will selectively forget LSTM_cell = as training goes on Tf. Nn. Rnn_cell. BasicLSTMCell (self. Cell_size forget_bias = 1.0, Name_scope with tf.name_scope('initial_state'): state_is_tuple=True self.cell_init_state = lstm_cell.zero_state(self.batch_size, Dtype = tf.FLOAT32) # RNN The output of each step of the loop is stored in the cell_outputs sequence cell_final_state as the final State and passed into the next batch # Regular RNN only m_state LSTM includes c_state and m_state self.cell_state, self.cell_final_state = tf.nn.dynamic_rnn(lstm_cell, self.l_in_y, Initial_state =self.cell_init_state, time_major=False) def add_output_layer(self): 0 W*X+B # shape => (Batch * steps, cell_size) l_out_x = tf.0 (self.cell_outputs, [-1, self.cell_size], 0 name='2_2D') Ws_out = self._weight_variable([self.cell_size, Self.output_size]) bs_out = self._bias_variable([self.output_size,]) output_size) with tf.name_scope('Wx_plus_b'): self.pred = tf.matmul(l_out_x, Ws_out) + bs_outCopy the code

0 0 Note that 0 is called 0 for shape update why 0 is 0 0 0 Because you can’t compute W times X plus B until you have a two-dimensional variable.

The sixth step is to define the calculation error function.

Note here: we use the seq2seq function. The loss obtained is the loss of each step in the whole batch, and then sum the loss of each step to become the loss of the whole TensorFlow, divided by the average of the batch size, and finally get the total cost of the batch, which is a scalar number.

Def compute_cost(self): # tf.nn.seq2seq.sequence_loss_by_example() Losses = tf.contrib.legacy_seq2seq.sequence_loss_by_example( [tf.reshape(self.pred, [-1], name='reshape_pred')], [tf.reshape(self.ys, [-1], name='reshape_target')], [tf.ones([self.batch_size * self.n_steps], dtype=tf.float32)], average_across_timesteps=True, softmax_loss_function=self.msr_error, Name ='losses') # Batch total cost is a number with tf.name_scope('average_cost'): # Batch size self.cost = tf.div(tf.reduce_sum(losses, name='losses_sum'), self.batch_size, name='average_cost') tf.summary.scalar('cost', self.cost)Copy the code

We will write a detailed article about machine translation in a later article, using the SEQ2SEQ model.

Seq2Seq model is used when the length of the output is uncertain, which usually occurs in machine translation tasks. When a Chinese sentence is translated into English, the length of the English sentence may be shorter or longer than that of Chinese, so the length of the output is uncertain. As shown in the figure below, the input Chinese length is 4 and the output English length is 2.

In the network structure, input a Chinese sequence, and then output its corresponding Chinese translation. After the prediction of the result of the output part, according to the above example, that is, output “machine” first, take “machine” as the next input, and then output “learning”, so that the sequence of any length can be output.

Machine translation, man-machine dialogue, chatbot and so on, these are all applications in today’s society, more or less, the use of what we say here Seq2Seq.

The seventh step defines the mSR_error calculation function, the error calculation function, and the bias calculation function.

Def msr_error(self, y_pre, y_target) return tf.square(tf.sub(y_pre, y_target)) def msr_error(self, logits, Subtract (logits, labels)) # error calculation def _weight_variable(self, shape, name='weights'): initializer = tf.random_normal_initializer(mean=0., stddev=1.,) return tf.get_variable(shape=shape, Def _bias_variable(self, shape, name='biases'): Initializer = tf.constant_Initializer (0.1) return tf.get_variable(name=name, shape= Shape, initializer= Initializer)Copy the code

At this point, the entire Class is defined.

In the eighth step, the main function is defined and the training and prediction operations are carried out. Here, the TensorBoard visual presentation is first tried.

# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the main function of training and predicting -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- if __name__ = = "__main__ ': Model = LSTMRNN(TIME_STEPS, INPUT_SIZE, OUTPUT_SIZE, CELL_SIZE, BATCH_SIZE) sess = tf.Session() merged = tf.summary.merge_all() writer = tf.summary.FileWriter("logs", sess.graph) sess.run(tf.initialize_all_variables())Copy the code

Complete code and visual display

The complete code for this phase is as follows. Let’s try to run the code first:

# -*- coding: utf-8 -*- """ Created on Thu Jan 9 20:44:56 2020 @author: xiuzhang Eastmount CSDN """ import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- define the parameters -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- BATCH_START = 0 TIME_STEPS BATCH_SIZE = = 20 50 # BATCH quantity INPUT_SIZE = 1 # Input a value OUTPUT_SIZE = 1 # output a value CELL_SIZE = 10 # Cell quantity LR = 0.006 BATCH_START_TEST = 0 # Def get_batch(): global BATCH_START, TIME_STEPS # xs shape (50batch, 20steps) xs = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE).reshape((BATCH_SIZE, TIME_STEPS))/(10* Np.pi) seq = Np.sin (xs) res = Np.cos (xs) BATCH_START += TIME_STEPS # Return sequence seq result res enter xs return [seq[:, :, np.newaxis], res[:, :, np.newaxis], xs] #----------------------------------LSTM RNN---------------------------------- class LSTMRNN(object): Def __init__(self, n_steps, input_size, output_size, cell_size, batch_size): self.n_steps = n_steps self.input_size = input_size self.output_size = output_size self.cell_size = cell_size Self. batch_size = batch_size # TensorBoard visual operation using name_scope with tf.name_scope('inputs'): Self. ys = tf.placeholder(tF.placeholder) self.ys = tf.placeholder(tF.placeholder) self.xs = tf.placeholder(tF.placeholder, [None, n_steps, input_size], name=' tF.placeholder ') self.ys = tf.placeholder(tF.placeholder, [None, n_steps, output_size], name='ys') with tf.variable_scope('in_hidden'): Self.add_input_layer () with tf.variable_scope('LSTM_cell'): Self.add_output_layer () with tf.name_scope('cost'): self.add_output_layer() with tf.name_scope('cost'): Selferror self.compute_cost() with tf.name_scope('train'): Train_op = tf.train.adamoptimizer (LR).minimize(self.cost) # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- three layer structure defines the core -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # input layer def add_input_layer (self) : 0 # [None, N_steps, input_size] => (Batch *n_step, in_size) 0 l_in_x = tf.0 (self.xs, [-1, 0, 0) Self.input_size], name='2_2D') Ws_in = self._weight_variable([self.input_size, Self.cell_size]) # Define input bias (cell_size,) bs_in = self._bias_variable([self.cell_size,]) # define output y variable 2d shape (Batch * n_steps, cell_size) with tf.name_scope('Wx_plus_b'): L_in_y = tf.matmul(l_in_x, Ws_in) + bs_in # L_in_Y ==> cell_size) self.l_in_y = tf.reshape(l_in_y, [-1, self.n_steps, self.cell_size], Name ='2_3D') # def add_cell(self): LSTM will selectively forget LSTM_cell = as training goes on Tf. Nn. Rnn_cell. BasicLSTMCell (self. Cell_size forget_bias = 1.0, Name_scope with tf.name_scope('initial_state'): state_is_tuple=True self.cell_init_state = lstm_cell.zero_state(self.batch_size, Dtype = tf.FLOAT32) # RNN The output of each step of the loop is stored in the cell_outputs sequence cell_final_state as the final State and passed into the next batch # Regular RNN only m_state LSTM includes c_state and m_state self.cell_state, self.cell_final_state = tf.nn.dynamic_rnn(lstm_cell, self.l_in_y, Initial_state =self.cell_init_state, time_major=False) def add_output_layer(self): 0 W*X+B # shape => (Batch * steps, cell_size) l_out_x = tf.0 (self.cell_outputs, [-1, self.cell_size], 0 name='2_2D') Ws_out = self._weight_variable([self.cell_size, Self.output_size]) bs_out = self._bias_variable([self.output_size,]) output_size) with tf.name_scope('Wx_plus_b'): self.pred = tf.matmul(l_out_x, Ws_out) + bs_out # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- define the error calculation function -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- def compute_cost (self) : # tf.nn.seq2seq.sequence_loss_by_example() Losses = tf.contrib.legacy_seq2seq.sequence_loss_by_example( [tf.reshape(self.pred, [-1], name='reshape_pred')], [tf.reshape(self.ys, [-1], name='reshape_target')], [tf.ones([self.batch_size * self.n_steps], dtype=tf.float32)], average_across_timesteps=True, softmax_loss_function=self.msr_error, Name ='losses') # Batch total cost is a number with tf.name_scope('average_cost'): # Batch size self.cost = tf.div(tf.reduce_sum(losses, name='losses_sum'), self.batch_size, Name ='average_cost') tf.summary.scalar('cost', self.cost) # y_target) return tf.square(tf.sub(y_pre, y_target)) def msr_error(self, logits, labels): Subtract (logits, labels) def _weight_variable(self, shape, name='weights'): initializer = tf.random_normal_initializer(mean=0., stddev=1.,) return tf.get_variable(shape=shape, Def _bias_variable(self, shape, name='biases'): Initializer = tf.constant_initializer(0.1) return tf.get_variable(name=name, shape=shape, , initializer =, initializer) # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the main function of training and prediction -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- if __name__ = = '__main__': Model = LSTMRNN(TIME_STEPS, INPUT_SIZE, OUTPUT_SIZE, CELL_SIZE, BATCH_SIZE) sess = tf.Session() merged = tf.summary.merge_all() writer = tf.summary.FileWriter("logs", sess.graph) sess.run(tf.initialize_all_variables())Copy the code

A new “logs” folder and events file will be created in the Python file directory, as shown below.

Next, try opening it. Call up Anaconda Prompt and activate TensorFlow. Then go to the events directory and run the command “tensorboard –logdir=logs”, as shown below. Note that all you need to do here is point to the folder and it will automatically index your files.

activate tensorflow
cd\
cd C:\Users\xiuzhang\Desktop\TensorFlow\blog
tensorboard --logdir=logs
Copy the code

At this point, visit the website “http://localhost:6006/” and select “Graphs”. After running, as shown in the figure below, our neural network appears.

The neural network structure is shown in the figure below, including input layer, LSTM layer, output layer, cost error calculation, train training, etc.

The detailed structure is shown in the figure below:

Usually we put the train part aside, select “train” and right click on “Remove from Main Graph”. The core structure is as follows: IN_hidden is the first layer that accepts input, followed by LSTM_cell, and finally out_hidden is the output layer.

  • In_hidden: including Weights and biases, calculation formula Wx_plus_b. Also, it includes 0 0 for operation 2_2D and 2_3D.

  • Out_hidden: includes weights, bias biases, calculation formula Wx_plus_b, two-dimensional data 2_2D, and the output result is cost.

  • Cost: calculation error.

  • In the middle is LSTM_cell: including RNN recurrent neural network, initial_state is initialized, which will be replaced by state update.

Note the versioning issue. You can run it with your own version of TensorFlow. Author version The version information is Python3.6, Anaconda3, Win10, and Tensorflow1.15.0.

If you report AttributeError: module ‘tensorflow._api.v1.nn’ has no attribute ‘seq2seq’, this is an update to the TensorFlow version and the method call has changed. Solution:

If you return TypeError: msr_error() got an unexpected keyword argument ‘lables’, the msr_error() function gets an unexpected key argument’ lables’. To define msr_error(), use labels, logits, to set labels

def msr_error(self, y_pre, y_target):
    return tf.square(tf.subtract(y_pre, y_target))
Copy the code

To:

def msr_error(self, logits, labels):
    return tf.square(tf.subtract(logits, labels))
Copy the code

If you reported ValueError: Variable in_hidden/weights already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? , then restart the kernel to run.

V. Prediction and curve fitting

Finally, we write RNN training learning and prediction code in the main function.

First, let’s test the learning result of COST. Cell_init_state (model.cell_init_state); cell_init_state (model.cell_init_state); State), in fact, the Final state is replaced by the next batch of Initial state, so as to conform to the structure we defined.

# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the main function of training and predicting -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- if __name__ = = "__main__ ': Model = LSTMRNN(TIME_STEPS, INPUT_SIZE, OUTPUT_SIZE, CELL_SIZE, BATCH_SIZE) sess = tf.Session() merged = tf.summary.merge_all() writer = tf.summary.FileWriter("logs", Sess.graph) sess.run(tf.initialize_all_variables()) # Tensorboard Visualizes neural network results # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- RNN learning -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # training model for I in range (200) : Seq, res, xs = get_batch() # Update cell_init_state if I == 0 after the first step assignment: Feed_dict = {model.xs: seq, model.ys: res, # create initial state (cell_init_state)} else: feed_dict = { model.xs: seq, model.ys: res, model.cell_init_state: State # use last state as the initial state for this run} # state = final_state _, state, pred = sess.run( [model.train_op, model.cost, model.cell_final_state, model.pred], Feed_dict =feed_dict) # if I % 20 == 0: print('cost: ', round(cost, 4))Copy the code

The results are output every 20 steps, as shown below. The error ranges from 33 at the beginning to 0.335 at the end. The neural network keeps learning and the error keeps decreasing.

Cost: 33.1673 cost: 9.1332 cost: 3.8899 cost: 1.3271 cost: 0.2682 cost: 0.4912 cost: 1.0692 cost: 0.3812 cost: Cost: 0.63 0.335Copy the code

Next, add the visualized sin curve dynamic fitting process of Matplotlib, and the final complete code is shown as follows:

# -*- coding: utf-8 -*-
"""
Created on Thu Jan  9 20:44:56 2020
@author: xiuzhang Eastmount CSDN
"""
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

#----------------------------------定义参数----------------------------------
BATCH_START = 0
TIME_STEPS = 20
BATCH_SIZE = 50          # BATCH数量
INPUT_SIZE = 1           # 输入一个值
OUTPUT_SIZE = 1          # 输出一个值
CELL_SIZE = 10           # Cell数量
LR = 0.006
BATCH_START_TEST = 0

# 获取批量数据
def get_batch():
    global BATCH_START, TIME_STEPS
    # xs shape (50batch, 20steps)
    xs = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE).reshape((BATCH_SIZE, TIME_STEPS)) / (10*np.pi)
    seq = np.sin(xs)
    res = np.cos(xs)
    BATCH_START += TIME_STEPS    
 
    # 显示原始曲线
    # plt.plot(xs[0, :], res[0, :], 'r', xs[0, :], seq[0, :], 'b--')
    # plt.show()
 
    # 返回序列seq 结果res 输入xs
    return [seq[:, :, np.newaxis], res[:, :, np.newaxis], xs]

#----------------------------------LSTM RNN----------------------------------
class LSTMRNN(object):
    # 初始化操作
    def __init__(self, n_steps, input_size, output_size, cell_size, batch_size):
        self.n_steps = n_steps
        self.input_size = input_size
        self.output_size = output_size
        self.cell_size = cell_size
        self.batch_size = batch_size
 
        # TensorBoard可视化操作使用name_scope
        with tf.name_scope('inputs'):         #输出变量
            self.xs = tf.placeholder(tf.float32, [None, n_steps, input_size], name='xs')
            self.ys = tf.placeholder(tf.float32, [None, n_steps, output_size], name='ys')
        with tf.variable_scope('in_hidden'):  #输入层
            self.add_input_layer()
        with tf.variable_scope('LSTM_cell'):  #处理层
            self.add_cell()
        with tf.variable_scope('out_hidden'): #输出层
            self.add_output_layer()
        with tf.name_scope('cost'):           #误差
            self.compute_cost()
        with tf.name_scope('train'):          #训练
            self.train_op = tf.train.AdamOptimizer(LR).minimize(self.cost)
 
    #--------------------------------定义核心三层结构-----------------------------
    # 输入层
    def add_input_layer(self,):
        # 定义输入层xs变量 将xs三维数据转换成二维
        # [None, n_steps, input_size] => (batch*n_step, in_size)
        l_in_x = tf.reshape(self.xs, [-1, self.input_size], name='2_2D')
        # 定义输入权重 (in_size, cell_size)
        Ws_in = self._weight_variable([self.input_size, self.cell_size])
        # 定义输入偏置 (cell_size, )
        bs_in = self._bias_variable([self.cell_size,])
        # 定义输出y变量 二维形状 (batch * n_steps, cell_size)
        with tf.name_scope('Wx_plus_b'):
            l_in_y = tf.matmul(l_in_x, Ws_in) + bs_in
        # 返回结果形状转变为三维
        # l_in_y ==> (batch, n_steps, cell_size)
        self.l_in_y = tf.reshape(l_in_y, [-1, self.n_steps, self.cell_size], name='2_3D')
 
    # cell层
    def add_cell(self):
        # 选择BasicLSTMCell模型
        # forget初始偏置为1.0(初始时不希望forget) 随着训练深入LSTM会选择性忘记
        lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(self.cell_size, forget_bias=1.0, state_is_tuple=True)
        # 设置initial_state全为0 可视化操作用name_scope
        with tf.name_scope('initial_state'):
            self.cell_init_state = lstm_cell.zero_state(self.batch_size, dtype=tf.float32)
        # RNN循环 每一步的输出都存储在cell_outputs序列中 cell_final_state为最终State并传入下一个batch中
        # 常规RNN只有m_state LSTM包括c_state和m_state
        self.cell_outputs, self.cell_final_state = tf.nn.dynamic_rnn(
            lstm_cell, self.l_in_y, initial_state=self.cell_init_state, time_major=False)
 
    # 输出层 (类似输入层)
    def add_output_layer(self):
        # 转换成二维 方能使用W*X+B
        # shape => (batch * steps, cell_size) 
        l_out_x = tf.reshape(self.cell_outputs, [-1, self.cell_size], name='2_2D')
        Ws_out = self._weight_variable([self.cell_size, self.output_size])
        bs_out = self._bias_variable([self.output_size, ])
        # 返回预测结果
        # shape => (batch * steps, output_size)
        with tf.name_scope('Wx_plus_b'):
            self.pred = tf.matmul(l_out_x, Ws_out) + bs_out
 
    #--------------------------------定义误差计算函数-----------------------------     
    def compute_cost(self):
        # 使用seq2seq序列到序列模型
        # tf.nn.seq2seq.sequence_loss_by_example()
        losses = tf.contrib.legacy_seq2seq.sequence_loss_by_example(
            [tf.reshape(self.pred, [-1], name='reshape_pred')],
            [tf.reshape(self.ys, [-1], name='reshape_target')],
            [tf.ones([self.batch_size * self.n_steps], dtype=tf.float32)],
            average_across_timesteps=True,
            softmax_loss_function=self.msr_error,
            name='losses'
        )
        # 最终得到batch的总cost 它是一个数字
        with tf.name_scope('average_cost'):
            # 整个TensorFlow的loss求和 再除以batch size
            self.cost = tf.div(
                tf.reduce_sum(losses, name='losses_sum'),
                self.batch_size,
                name='average_cost')
            tf.summary.scalar('cost', self.cost)
 
    # 该函数用于计算
    # 相当于msr_error(self, y_pre, y_target) return tf.square(tf.sub(y_pre, y_target))
    def msr_error(self, logits, labels):
        return tf.square(tf.subtract(logits, labels))
    # 误差计算
    def _weight_variable(self, shape, name='weights'):
        initializer = tf.random_normal_initializer(mean=0., stddev=1.,)
        return tf.get_variable(shape=shape, initializer=initializer, name=name)
    # 偏置计算
    def _bias_variable(self, shape, name='biases'):
        initializer = tf.constant_initializer(0.1)
        return tf.get_variable(name=name, shape=shape, initializer=initializer)
 
#----------------------------------主函数 训练和预测----------------------------------     
if __name__ == '__main__':
    # 定义模型并初始化
    model = LSTMRNN(TIME_STEPS, INPUT_SIZE, OUTPUT_SIZE, CELL_SIZE, BATCH_SIZE)
    sess = tf.Session()
    merged = tf.summary.merge_all()
    writer = tf.summary.FileWriter("logs", sess.graph)
    sess.run(tf.initialize_all_variables())
    # Tensorboard可视化展现神经网络结果
 
    #------------------------------RNN学习-------------------------------------     
    # 交互模式启动
    plt.ion()
    plt.show()
 
    # 训练模型 
    for i in range(200):
        # 用seq预测res (序列-seq 结果-res 输入-xs)
        seq, res, xs = get_batch()
        # 第一步赋值 之后会更新cell_init_state
        if i == 0:
            feed_dict = {
                model.xs: seq,
                model.ys: res,
                # create initial state (前面cell_init_state已初始化state)
            }
        else:
            feed_dict = {
                model.xs: seq,
                model.ys: res,
                model.cell_init_state: state    
                # use last state as the initial state for this run
            }
 
        # state为final_state 
        _, cost, state, pred = sess.run(
                [model.train_op, model.cost, model.cell_final_state, model.pred], 
                feed_dict=feed_dict)

        # plotting
        # 获取第一批数据xs[0,:] 获取0到20区间的预测数据pred.flatten()[:TIME_STEPS]
        plt.plot(xs[0, :], res[0].flatten(), 'r', xs[0, :], pred.flatten()[:TIME_STEPS], 'b--')
        plt.ylim((-1.2, 1.2))
        plt.draw()
        plt.pause(0.3)
 
        # 每隔20步输出结果
        if i % 20 == 0:
            print('cost: ', round(cost, 4))
            # result = sess.run(merged, feed_dict)
            # writer.add_summary(result, i)
Copy the code

Write here, the article is finally finished. The article is very long, but hopefully helpful. LSTM RNN predicts one set of data from another. The prediction effect is shown in the figure below. The solid red line represents the line to be predicted, and the dotted blue line represents the line to be learned by RNN. They are constantly approaching, and the blue line learns the law of the red line, and finally basically fits the blue line to the red line.

6. Summary

After this article, more TensorFlow deep learning articles will be shared, followed by supervised learning, GAN, machine translation, text recognition, image recognition, speech recognition and more. If readers have something they want to learn, they can also talk to me privately, and I will learn and apply it to your field.

Finally, I hope this basic article will be helpful to you, if errors or deficiencies exist in the article, also please burke ~ as a rookie of artificial intelligence, I hope I can progress and thorough, the subsequent it was applied to image recognition, network security, counter sample, and other fields, to guide you to write a simple academic papers, together come on!

Code download address (welcome to focus on the likes) :

  • Github.com/eastmountyx…
  • Github.com/eastmountyx…

Click to follow, the first time to learn about Huawei cloud fresh technology ~