A background

In this chapter, we will introduce a recurrent neural network (RNN), a popular neural network structure. Conventional neural networks such as fully connected networks can only handle one input in isolation, and the previous input is completely unrelated to the next one. However, in some cases, the input is sequential, and the network needs to be able to better process the sequential information.

Such scenarios that require processing “sequential data — a stream of interdependent data” need to be solved using RNN.

Several types of sequence data are typical:

  1. The text of the article
  2. Audio content in speech
  3. Price movements in the stock market

Introduction to two cyclic neural networks

2.1 Deep neural network

Traditional machine learning algorithms rely heavily on manually extracted features, and there are bottlenecks in feature extraction in image recognition, speech recognition and natural language processing. However, the method based on fully connected neural network has too many parameters and cannot extract features based on timing information of data.

Cyclic neural network has made great breakthroughs in speech recognition, language model, machine translation and other aspects by mining temporal information and describing the depth of expression ability.

The model structure characteristics of fully connected neural network or convolutional neural network are from the input layer to the hidden layer and then to the output layer. The layers are fully connected or partially connected, but there is no connection between each layer.

Considering such a problem, if you want to predict what the next word of a sentence will be, you generally need to use the current word and the previous word, because there is a logical relationship between the words before and after the sentence. For example, if the current word is “very,” if the previous word is “algorithm,” then the next word is most likely “volume,” and there is a logical connection between the contexts. In this case, fully connected neural network and convolutional neural network are not suitable, while cyclic neural network is very suitable for this situation, and its unique network structure — there are connections between nodes of hidden layer, so as to deeply describe the relationship between the current output of a sequence and the previous information.

2.2 Recurrent neural network

Please see the picture below

Based on the figure above, we can have a certain understanding of the structure of RNN. The network structure of the whole RNN consists of input layer, hidden layer and output layer, and there are mutual connections between the hidden layer.

Let’s analyze the loop body in the network structure of RNN in detail below:

  • Network structure-loop body: the observation on the left and right side of the figure shows that the whole network structure is similar to a loop body, and the loop body contains two fully connected layers, such as St and OtS_t and O_tSt and Ot.
  • Input layer: X is a vector that represents the value of the input layer and is not fully connected to the hidden layer, but aligned to the hidden layer by time.
  • Hidden layer: S is a vector that represents the value of the hidden layer (the number of nodes is the same as the dimension of vector S);
  • Output layer: O is a vector that represents the value of the output layer;
  • Model parameters: U is the weight matrix from the input layer to the hidden layer, V is the weight matrix from the hidden layer to the output layer, W is the weight matrix from the hidden layer to the hidden layer, and the U, V and W matrices at each moment share parameters.

Let’s analyze the algorithm flow of RNN in detail below:

  1. Assume that the current network is at time T, first analyze the operation process of a cycle body, and multiple cycle bodies are single repetitions.

  2. The input of the loop body is different from that of a normal fully connected network. It has two inputs and needs to merge them. After merging, the input is the same as that of a normal fully connected network

    1. Enter layer time t value XtX_tXt

    2. Time value St−1S_{t-1}St−1 on the hidden layer

  3. Calculate the value of the hidden layer at time t, which is also the input value at the next time

    St = f (Xt ∗ ∗ U + St – 1 W) S_t = f (X_t * U + S_ {1} t – * W) St = f (Xt ∗ ∗ U + St – 1 W) and because the dimensions of the matrix, Can be abbreviated to St = f ((Xt radius St – 1) ∗ (U) radius W) S_t = f ((X_t radius S_} {t – 1) * (U) radius W) St = f ((Xt radius St – 1) ∗ (U radius W))

  4. Calculate the value of the output layer at time t

Ot=g(St∗V)O_t =g(S_{t} * V)Ot=g(St∗V)

  1. At this point, the full connection process of a cyclic body is calculated, followed by St+1, St+1…. A + 1 bonus to S_ {t}, S_ (t + 1}… St + 1 + 1, St… Wait and repeat the process.

Three practical

Below, we use examples to construct an RNN network to further explain and analyze the network structure of RNN.

  • It is assumed that the dimension of the state of the hidden layer is 2, the dimension of the input layer and the output layer is 1, and the parameter Ws(U⊕W)W_s(U ⊕W) Ws(U⊕W) used to calculate the state of the hidden layer in the loop body is assumed to be


[ 0.1 0.2 0.3 0.4 0.5 0.6 ] \ begin {bmatrix} 0.3 & 0.1 & 0.2 \ \ \ \ 0.4 0.5 & 0.6 {bmatrix} \ \ end quad

  • Assuming the loop body is used to calculate the state of the hidden layer of the entire company layer offset by bs = [0.1, 0.1] – b_s = [0.1, 0.1] bs = [0.1, 0.1] –

  • Assume that the weight of the fully connected layer used to calculate the output layer in the loop body is:


[ 1.0 2.0 ] \begin{bmatrix} 1.0 \\ 2.0 \end{bmatrix}\quad

  • Assume that the bias term of the full connection layer used to calculate the output layer in the loop body is BS =[0.1] b_S =[0.1] bs=[0.1].
  • Assuming that the initial state is [0, 0] and the input is 1 at t0t_0t0, then vector splicing is performed to obtain [0, 0, 1], then the fully connected layer used to calculate the hidden layer state in the loop body is as follows, and this result will also be used as the input state at the next moment.


t a n h [ [ 0 . 0 . 1 ] x [ 0.1 0.2 0.3 0.4 0.5 0.6 ] + [ 0.1 . 0.1 ] ] = t a n h ( [ 0.6 . 0.5 ] ) = [ 0.537 . 0.462 ] Tanh \ left [[0, 1] \ times \ begin {bmatrix} 0.3 & 0.1 & 0.2 \ \ \ \ 0.4 0.5 & 0.6 {bmatrix} \ \ end quad + [0.1, -0.1]\right] = tanh([0.6, 0.5]) = [0.537, 0.462]

  • Then the full connection layer used to calculate the output layer OtO_tOt is:


[ 0.537 . 0.462 ] x [ 1.0 2.0 ] = 1.56 [0.537, 0.462] times \begin{bmatrix} 1.0\\2.0\end{bmatrix}\quad = 1.56

  • By similar deduction, we can get that the state at t1t_1T1 is [0.860, 0.884] and the output is 2.73. After obtaining the forward propagation results of the recurrent neural network, Loss Loss function can be defined as other neural network structures.

Four source

Through the above explanation and analysis, I believe that you should have a comprehensive understanding of RNN, the following attached code, you can further understand through the code.

#coding=utf-8
# Simple RNN network forward propagation structure implementation
import numpy as np

Define the input and the initial state. The subsequent state is dynamically computed
X=[1.2]
state = [0.0.0.0]

Define parameters separately for easy calculation
w_cell_state = np.asarray([[0.1.0.2], [0.3.0.4]])
w_cell_input = np.asarray([0.5.0.6])
b_cell = np.asarray([0.1, -0.1])
w_output=np.asarray([[1.0], [2.0]])
b_output = np.asarray([0.1])

for i in range(len(X)):
    before_activation = np.dot(state,w_cell_state)+X[i]*w_cell_input+b_cell
    state = np.tanh(before_activation)
    final_output = np.dot(state,w_output)+b_output
    print("before activation: ",before_activation)
    print("state",state)
    print("output:",final_output)
Copy the code

Github address: github.com/dubaokun/co…

Five set pieces

Personal Introduction: Du Baokun, a practitioner in the privacy computing industry, led the team to build JD’s federal learning solution 9N-FL from 0 to 1, and led the federal learning framework and federal start-up business. Framework level: realized the industrial federated learning solution supporting super-large scale in the field of e-commerce marketing, and supported many models such as super-large sample PSI privacy alignment, tree model and neural network model. Business level: achieved a good start in business, created new business growth points and generated significant business economic benefits. Personally, I like to learn new things and delve into technology. Based on the consideration of full-link thinking and decision technology planning, there are many research fields, ranging from engineering architecture, big data to machine learning algorithms and algorithm frameworks. Welcome students who like technology to communicate with me, email: [email protected]

The guide of the official account

I have been writing my blog for a long time. I have been involved in many technical fields, such as high concurrency and high performance, distributed, traditional machine learning algorithms and frameworks, deep learning algorithms and frameworks, password security, private computing, federated learning, big data and so on. I have led a number of major projects, including federal learning of retail, and BOge has shared with the community for many times. In addition, I insist on writing original articles, many of which have been read more than ten thousand times. Public number we can continue to read according to the topic, which I have done in accordance with the order of the learning route, the topic is the public number inside the following marked red this, you can click to see the topic of many articles, such as the following figure (topic is divided into: (1) Private computing (2) Federated learning (3) Machine learning frameworks (4) Machine learning algorithms (5) High performance computing (6) Advertising algorithms (7) Procedural Life (7).

All promising dharma, like a dream, like dew, like electricity, should be viewed as such.