Take a simple RNN as an example to comb out the training process of neural network

This article is a summary of the PyTorch Introduction course: Deep Learning on the Torch — Natural Language Processing (NLP) series.

This task is to predict characters (numbers), so that the neural network to find the following rules of numbers.

012
00112
0001112
000011112
00000111112
Copy the code

Given a set of data (say 0000001), ask the neural network to predict what the next number should be

1. Establish neural network architecture

We build an RNN class

class simpleRNN(nn.Module):
    def __init():
        ...
    def forword():
        ...
    def initHidden():
        ...
Copy the code

The function initHidden initializes the hidden layer vector

def initHidden(self):
    Initialization of the implicit unit
    Layer_size, batch_size, hidden_size
    return Variable(torch.zeros(self.num_layers, 1, self.hidden_size))
Copy the code

Using the init function

Init is used to build the structure of the neural network, the input dimensions of the network, the output dimensions of the network, the hidden dimensions and the number of layers, the models that need to be used in the process, and so on, and that’s defined in Init.

Nn is a built-in module of PyTorch, which includes Embedding, RNN, Linear, logSoftmax and other models.

# Introduce nn (model module) in PyTorch
import torch.nn as nn
def __init__(self, input_size, hidden_size, output_size, num_layers = 1):
        # define
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        An embedding layer
        self.embedding = nn.Embedding(input_size, hidden_size)
        # PyTorch's RNN model, the BATch_first flag allows the first dimension of the input tensor to represent the Batch metric
        self.rnn = nn.RNN(hidden_size, hidden_size, num_layers, batch_first = True)
        Full link layer for output
        self.linear = nn.Linear(hidden_size, output_size)
        # Last logSoftMax layer
        self.softmax = nn.LogSoftmax()

Copy the code

The forward function is used as the operation process of neural network

The computation process is also well understood, which involves walking the input step by step through the embedded layer, RNN layer, Linear layer, and Softmax layer

Embedding: Used for embedding the input layer into the hidden layer. The process is basically to convert the input vector to one-hot encoding, and then to a hidden_size vector
RNN layer: Goes through a layer of RNN model
Linear layer (full link layer) : All dimensions of the hidden layer vector are mapped to the output one by one, which can be understood as shared information
Softmax: Normalized data processing

 # Operation procedure
def forward(self, input, hidden):
        # size of input: [batch_size, num_step, data_dim]
        
        # embedding layer:
        # Compute from input to hidden layer
        output = self.embedding(input, hidden)
        # size of output: [batch_size, num_step, hidden_size]
        
        output, hidden = self.rnn(output, hidden)
        # size of output: [batch_size, num_step, hidden_size]
      
        The output output contains the results of all time steps
        output = output[:,-1,:]
        # size of output: [batch_size, hidden_size]
        
        # Full link layer
        output = self.linear(output)
        # output size: batch_size, output_size
        
        # Softmax layer, normalized processing
        output = self.softmax(output)
         # size of output: batch_size, output_size
        return output, hidden
Copy the code

There is a special operation in the middle of the training results for RNN

output = output[:, -1 ,:]
Copy the code

Output size is [batch_size, step, hidden_size], this step is to reserve only the last number of the data of the second two-dimensional time step. Because RNN is characterized by memory, the last step data contains information about all previous steps. So we just have to take the last number here

Use this init and forword

Init and Forward are both built-in functions in Python’s class.

If you define it__init__, it will run automatically when the class is instantiatedinitFunction body, and the instantiated argument isinitParameters of a function
If you define itforwardSo when you execute this class, it’s automatically executedforwardfunction

Instantiate the simpleRNN class, at which point the __init__ function is executed
rnn = simpleRNN(input_size = 4, hidden_size = 1, output_size = 3, num_layers = 1)

# Use the simpleRNN class
output, hidden = rnn(input, hidden)
Copy the code

Performing a forward is a training process: input -> output

2. It’s time to start training

The first is structure`Loss function`and`The optimizer`

The powerful PyTorch comes with a common loss function and optimizer model. A single command takes care of everything.

Criterion = torch.nn.nllLoss () optimizer = torch.optim.adam (nn. Parameters (), lr = 0.001)Copy the code

Loss function criterion: To record training losses, ownership weight will be adjusted according to the loss value of each step. NLLLoss function is used here, which is a relatively simple loss calculation to calculate the absolute difference between the real value and the predicted value

# output = predicted value, y = true value
loss = criterion(output, y)
Copy the code

Optimizer: Iterates the training process. Including gradient back propagation and gradient clearing. The parameters passed in are nn.parameters() of the neural network and the learning rate LR

# Gradient backpass, adjust the weight
optimizer.zero_grad()
# Gradient clearing
optimizer.step()
Copy the code

The training process

The training idea is:

Prepare training data, checksum data, and test data (each set of data is a sequence of numbers)
Loop number sequence of numbers, current number as input, next number as label (real result)
Each loop passes through an RNN network
T_loss for each group was calculated and recorded
The optimizer optimizes parameters
Repeat steps 1 to 5 for n times

The preparation of training data is not within the scope of this discussion, so the processed results are presented as follows.

train_set = [[3, 0, 0, 1, 1, 2],
            [3, 0, 1, 2],
            [3, 0, 0, 0, 1, 1, 1, 2],
            [3, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2]
            ...]
Copy the code

Start training

# 50 tests were repeated
num_epoch = 50
loss_list = []
for epoch in range(num_epoch):
    train_loss = 0
    Shuffle the data in train_set randomly to ensure that the training order obtained by each epoch is different.
    np.random.shuffle(train_set)
    Loop through the data in train_set
    for i, seq in enumerate(train_set):
        loss = 0
        Loop through all characters in each sequence
        for t in range(len(seq) - 1):
            # Current character as input
            x = Variable(torch.LongTensor([seq[t]]).unsqueeze(0))
            Batch_size = 1, time_steps = 1, data_dimension = 1
            # Next character as tag
            y = Variable(torch.LongTensor([seq[t + 1]]))
            Batch_size = 1, data_dimension = 1
            output, hidden = rnn(x, hidden) # RNN output
            Output size: batch_size, output_size = 3
            # hidden size: layer_size =1, batCH_size =1, hidden_size
            loss += criterion(output, y) # Calculate the loss functionLoss = 1.0 * Loss/Len (seq)# Calculate the loss value per character
        optimizer.zero_grad() # Gradient clearing
        loss.backward() # Backpropagation
        optimizer.step() Step gradient descent
        train_loss += loss # Cumulative loss function value
        # Print the results
        if i > 0 and i % 500 == 0:
            print('Round {}, round {}, training Loss:{:.2f}'.format(epoch, i, train_loss.data.numpy()[0] / i))
    loss_list.appand(train_loss)
            
Copy the code

The loss here is the loss of each training cycle (epoch). In fact, no matter how the training is, the loss here will decrease, because the neural network will make the final result as close to the real data as possible, so the loss of the training set can not be used to evaluate the training quality of a model.

In the actual training process, we will put the obtained model into the verification set to calculate loss after each round of training, so that the result is more objective.

The calculation of loss is exactly the same as the training set, except that train_set is replaced with VALID_set, and there is no need to optimize parameters according to the results, which has been done in the training steps. The function of the verification set is to check the training effect of the model:

for epoch in range(num_epoch):
    # Training steps. valid_loss = 0for i, seq in enumerate(valid_set):
        Loop through each string in valid_set
        loss = 0
        outstring = ' '
        targets = ' '
        hidden = rnn.initHidden() Initialize hidden layer neurons
        for t in range(len(seq) - 1):
            Loop through each character
            x = Variable(torch.LongTensor([seq[t]]).unsqueeze(0))
            Batch_size = 1, time_steps = 1, data_dimension = 1
            y = Variable(torch.LongTensor([seq[t + 1]]))
            Batch_size = 1, data_dimension = 1
            output, hidden = rnn(x, hidden)
            Output size: batch_size, output_size = 3
            # hidden size: layer_size =1, batCH_size =1, hidden_size
            loss += criterion(output, y) # Calculate the loss functionLoss = 1.0 * Loss/Len (seq) VALID_Loss += loss# Cumulative loss function value
# # Print the result
    print('Round % D, training Loss:%f, check Loss:%f, error rate :%f'%(epoch, Train_loss.data.numpy ()/len(train_set), train_loss.data.numpy()/len(valid_set),1.0 * errors/len(valid_set)))Copy the code

According to the Loss output of the checkset, we can plot the final loss change.

3. Test the prediction effect of the model

Construct data to test whether the model can guess the next number of the current number. How high is the success rate? The first is the construction data, the construction of the length of 0 ~ 20 numeric sequence

for n in range(20):
    inputs = [0] * n + [1] * n
Copy the code

Each sequence is then tested

for n in range(20):
    inputs = [0] * n + [1] * n
    
    outstring = ' '
    targets = ' '
    diff = 0
    hiddens = []
    hidden = rnn.initHidden()
    for t in range(len(inputs) - 1):
        x = Variable(torch.LongTensor([inputs[t]]).unsqueeze(0))
        Batch_size = 1, time_steps = 1, data_dimension = 1
        y = Variable(torch.LongTensor([inputs[t + 1]]))
        Batch_size = 1, data_dimension = 1
        output, hidden = rnn(x, hidden)
        Output size: batch_size, output_size = 3
        # hidden size: layer_size =1, batCH_size =1, hidden_size
        hiddens.append(hidden.data.numpy()[0][0])
        #mm = torch.multinomial(output.view(-1).exp())
        mm = torch.max(output, 1)[1][0]
        outstring += str(mm.data.numpy()[0])
        targets += str(y.data.numpy()[0])
         # Count the number of characters that differ between the model output string and the target string
        diff += 1 - mm.eq(y)
    Print out each generated string and target string
    print(outstring)
    print(targets)
    print('Diff:{}'.format(diff.data.numpy()[0]))
Copy the code

The final output is zero

[0, 1, 2] [0, 1, 2] Diff: 0 [0, 0, 1, 1, 2] [0, 0, 1, 1, 2] Diff: 0 [0, 0, 0, 1, 1, 1, 2] [0, 0, 0, 1, 1, 1, 2] Diff: 0...# The results are not listed, you can try it yourself
Copy the code

conclusion

Neural networks can be understood as the process by which computers use various mathematical techniques to find patterns in a pile of data. We can understand the inner workings of neural networks by dissecting some simple tasks. When faced with complex tasks, just hand the data to the model and it will do its best to give you a good result.

This article is a summary of the PyTorch Introduction course: Deep Learning on the Torch — Natural Language Processing (NLP) series. There are also basic and rich knowledge points about LSTM, translation tasks and practical operation, I will come back again

Take a simple RNN as an example to comb out the training process of neural network

1. Establish neural network architecture

Using the init function

The forward function is used as the operation process of neural network

Use this init and forword

2. It’s time to start training

The first is structureLoss functionandThe optimizer

The training process

3. Test the prediction effect of the model

conclusion

Related Posts

Entropy of Chaos in Machine Learning Interviews (PART 2)

The illustration python | operator

NLP Series (1) Pkuseg-Python: a Chinese word segmentation kit with higher accuracy than Jieba

The first is structure`Loss function`and`The optimizer`