This is the 11th day of my participation in the August More text Challenge. For details, see: August More Text Challenge

An RNN Layer is shown below

Let’s say that the shape of X is [10, 3, 100]. There are 10 words, 3 sentences at a time, and a 100-dimensional tensor for each word

So for xtx_txt, the shape for xtx_txt is [3 100]

And then let’s look at the operation above, where hidden len is the dimension of memory, let’s say 20. Therefore:

\begin{aligned} h_{t+1} &= x_t @ w_{xh} + h_t @ w_{hh}\\ &= [3, 100] @ [20, 100]^T + [3, 20] @ [20, 20]^T \\ &= [3, 20] \end{aligned}

nn.RNN

Define an RNN Layer in code and then view its parameter information

import torch
import torch.nn as nn

rnn = nn.RNN(100.20)
print(rnn._parameters.keys())
print(rnn.weight_ih_l0.shape) # w_{xh} [20, 100]
print(rnn.weight_hh_l0.shape) # w_{hh} [20, 20]
print(rnn.bias_ih_l0.shape) # b_{xh} [20]
print(rnn.bias_hh_l0.shape) # b_{hh} [20]
Copy the code

Before explaining the above code, take a look at the parameters of the RNN class in PyTorch (see the RNN API on the PyTorch website).

• Will choose parametersinput_size, specify the size size of a single sample in the input sequence, e.g. a word may be represented by a vector of 1000 length, theninput_size=1000
• Will choose parametershidden_sizeRefers to the size of the output feature in the hidden layer
• Will choose parametersnum_layers“Refers to the number of vertical hidden layers. Generally, the value ranges from 1 to 10, and default=1

Now the above code is easy to understand. In nn.rnn (100, 20), 100 refers to a word as a vector of length 100, and 20 refers to hidden_size

The forward function of RNN is a little different from the way defined by CNN, as shown in the following figure

The XXX parameter is not xtx_txt, X =[seq_len,batch,feature_len]x=[\text{seq\_len}, \text{batch}, \ text x = {feature \ _len}] [seq_len, batch, feature_len] brought in

H0h_0h0 is 0 if you don’t write it, and if you do, H0h_0h0 is [layers,batch,hidden_len][\text{layers},\text{batch}, \text{hidden\_len}][layers,batch,hidden_len]

Look at the code

import torch
import torch.nn as nn

rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=1)
x = torch.randn(10.3.100)
out, h_t = rnn(x, torch.zeros(1.3.20))
print(out.shape) # [10, 3, 20]
print(h_t.shape) # [1, 3, 20]
Copy the code

The shape of each parameter is related, so you must understand what I have written above

It’s easy to confuse hTH_tht with outoutout, so let’s look at a two-tier RNN model

Before explaining hth_tht and outoutout, it is important to understand the concept of time stamps. The time stamps are left and right, not up and down. The figure above is a two-layer RNN, and if the RNN of the two layers is next to each other, then the left and right structures are time stamps. Give the definitions of hTH_tht and outoutout:

• Hth_tht: All memory states at the last timestamp
• Outoutout: The last memory state on all timestamps

For example, the memory of the first layer is the first memory and the memory of the last layer is the last memory

Look at the code

import torch
import torch.nn as nn

rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=4)
x = torch.randn(10.3.100)
out, h_t = rnn(x)
print(out.shape) # [10, 3, 20]
print(h_t.shape) # [4, 3, 20]
Copy the code

If you understand the shape of outoutout and hTH_tht above, the output here is easy to visualize

The above definition of nn.rnn () is to input the entire XXX directly, automatically completing the loop. Here’s another way to define an RNN that requires you to complete the loop manually

nn.RNNCell

First take a look at the official API for PyTorch

The parameters are roughly similar to nn.RNN, but note that the shape of input_size is (batch, input_size), and the shape of hidden_size is (batch, hidden_size), which results in a different forward

Look at the code

import torch
import torch.nn as nn

cell1 = nn.RNNCell(100.20)
x = torch.randn(10.3.100)
h1 = torch.zeros(3.20)
for xt in x:
h1 = cell1(xt, h1)
print(h1.shape) # 3, 20]
Copy the code

Above is a layer of RNN, using the RNNCell way, manual cycle training

Now let’s look at a two-layer RNN. What do we do with RNNCell

import torch
import torch.nn as nn

cell1 = nn.RNNCell(100.30) # 100 - > 30
cell2 = nn.RNNCell(30.20)
x = torch.randn(10.3.100)
h1 = torch.zeros(3.30)
h2 = torch.zeros(3.20)
for xt in x:
h1 = cell1(xt, h1)
h2 = cell2(h1, h2)
print(h2.shape) # 3, 20]
Copy the code

The first layer takes a 100-dimensional input and turns it into a 30-dimensional memory output, and then feeds that output into the second layer, which is a 20-dimensional memory output. The most important code is the two sentences in the for. The first level of input is XT and Memory H1, and the second level of input is Memory H1 and MEMORY H2