PyTorch: Text generation for BI-LSTM

The author | Fernando Lopez compile | source of vitamin k | forward Data Science

“There are no rules for writing. Sometimes it comes easily and perfectly; Sometimes it’s like drilling a hole in a rock and blasting it open with dynamite.” — Ernest Hemingway

The purpose of this blog is to explain how to build an end-to-end model of text generation by implementing a powerful ARCHITECTURE based on LSTMs.

The blog is divided into the following sections:

introduce
Text preprocessing
Sequence generated
Model architecture
The stage of training
The text generated

For the full code visit: github.com/FernandoLpz…

introduce

Over the years, various suggestions have been made to model natural languages, but what’s going on? What does “modeling natural language” mean? We can think of “modeling natural language” as reasoning about the semantics and syntax that make up the language, essentially, but it goes further.

Currently, the field of natural language processing (NLP) uses different methods and techniques to handle different tasks, namely reasoning, understanding and modeling languages.

The field of natural language processing (NLP) has developed rapidly over the past decade. Many models propose approaches to different NLP tasks from different perspectives. Again, the common denominator among the most popular models is the implementation of deep learning-based models.

As mentioned earlier, the NLP domain solves a number of problems, especially in this blog where we will solve text generation problems by using deep learning-based models such as cyclic neural network LSTM and BI-LSTM. Similarly, we will use one of today’s most sophisticated frameworks to develop deep learning models, in particular we will use PyTorch’s LSTMCell class.

Problem statement

Given a text, the neural network learns the semantics and syntax of the given text through character sequences. A random series of characters is then drawn and the next character is predicted.

Text preprocessing

First, we need a text to work with. There are different resources for finding different texts in plain text, and I suggest you check out the Gutenberg project (www.gutenberg.org/).

In this case, I’ll use George Bird Grinnell’s “Jack Among the Indians,” this book, you can be found here: www.gutenberg.org/cache/epub/…

The train rushed down the hill, with a long shrieking whistle, and then began to go more and more slowly. Thomas had brushed Jack off and thanked him for the coin that he put in his hand, and with the bag in one hand and the stool in the other now went out onto the platform and down the steps, Jack closely following.
Copy the code

As you can see, text contains uppercase, lowercase, newline, punctuation, and so on. It is recommended that you adjust the text into a form that allows us to handle it in a better way, which mainly reduces the complexity of the model we will be developing.

We want to convert each character to its lowercase form. Also, it is recommended to treat the text as a list of characters, that is, we will use a list of characters instead of “strings.” The purpose of using text as character sequences is to better handle the generated sequences that will be supplied to the model (which we’ll cover in more detail in the next section).

Code snippet 1- preprocessing

def read_dataset(file) :
    letters = ['a'.'b'.'c'.'d'.'e'.'f'.'g'.'h'.'i'.'j'.'k'.'l'.'m'.'n'.'o'.'p'.'q'.'r'.'s'.'t'.'u'.'v'.'w'.'x'.'y'.'z'.' ']
    
    Open the original file
    with open(file, 'r') as f:
        raw_text = f.readlines()
        
    Convert each line to lowercase
    raw_text = [line.lower() for line in raw_text]
    
    Create a string containing the entire text
    text_string = ' '
    for line in raw_text:
        text_string += line.strip()
        
     #. Creates a character array
    text = list(a)for char in text_string:
        text.append(char)
        
     # Remove all symbols and keep only letters
    text = [char for char in text if char in letters]
	
    return text
Copy the code

As we can see, in line 2 we define the character to use, and all other symbols are discarded, leaving only the “white space” symbol.

In lines 6 and 10, we read the original file and convert it to lowercase.

In the loop at lines 14 and 19, we create a string representing the entire book and generate a list of characters. In line 23, we filter the text list by keeping only the letters defined in line 2.

Therefore, once the text is loaded and preprocessed, for example:

text = "The train rushed down the hill."
Copy the code

You get a list of characters like this:

text = ['t','h','e',' ','t','r','a','i','n',' ','r','u','s','h','e','d',' ','d','o','w','n',
' ','t','h','e',' ','h','i','l','l']
Copy the code

We already have the full text as a character list. As we all know, we can’t import raw characters directly into the neural network, we need a numerical representation, so we need to convert each character to a numerical representation. To do this, we will create a dictionary to help us hold the equivalent “character index” and “index character”.

Code snippet 2- Dictionary creation

def create_dictionary(text) :
 
  char_to_idx = dict()
  idx_to_char = dict()
  
  idx = 0
  for char in text:
    if char not in char_to_idx.keys():
      
      # build a dictionary
      char_to_idx[char] = idx
      idx_to_char[idx] = char
      idx += 1
				
	return char_to_idx, idx_to_char
Copy the code

Notice that the “char-index” and “index-char” dictionaries are created on lines 11 and 12.

So far, we’ve shown how to load text and save it as a list of characters, and we’ve created two dictionaries to help us encode and decode each character.

Sequence generated

How the sequence is generated depends entirely on the type of model we are implementing. As mentioned earlier, we will use an LSTM-type recurrent neural network that receives data sequentially (in time steps).

For our model, we need to form a sequence of a given length, which we call a “window”, where the characters to be predicted (targets) will be the characters next to the window. Each sequence will consist of characters contained in the window. To form a sequence, the window gets one character to the right at a time. The character to predict is always the character behind the window. We can see this process clearly in the figure.

In this case, the window is 4 in size, which means it will contain 4 characters. The target is the author’s first character to the right of the window image

So far, we’ve seen how to generate character sequences in a simple way. Now we need to convert each character to its own numeric format, for which we will use the dictionary generated during the pre-processing phase. This process can be visualized in the following figure.

Good, now that we know how to generate character sequences using a window that slides one character at a time, and how to convert characters to numeric format, the code snippet below shows the process described.

Code segment 3- Sequence generation

def build_sequences(text, char_to_idx, window) :
    x = list()
    y = list(a)for i in range(len(text)):
        try:
            Get character window from text
            # convert it to its IDX representation
            sequence = text[i:i+window]
            sequence = [char_to_idx[char] for char in sequence]
			
            # the target
            # convert to its IDX representation
            target = text[i+window]
            target = char_to_idx[target]
            
            Save sequence and target
            x.append(sequence)
            y.append(target)
            
         except:
            pass
        
    x = np.array(x)
    y = np.array(y)
    
    return x, y
Copy the code

Great, now we know how to preprocess raw text, how to turn it into a character list, and how to generate a sequence in numeric format. Now let’s look at the most interesting part, the model architecture.

The proposed framework

As you have read in the title of this blog post, we will use bi-LSTM recurrent neural networks and standard LSTM. Essentially, we use this type of neural network because it has great potential for processing sequential data, such as text-type data. Similarly, there are numerous articles on the use of recurrent neural network-based architectures (such as RNN, LSTM, GRU, bi-lstm, etc.) for text modeling, especially text generation [1,2].

The proposed neural network structure consists of an embedded layer, a double LSTM layer and a LSTM layer. Next, the latter LSTM is connected to a linear layer.

methods

The method involves passing each character sequence to the embedding layer, which will generate a vector representation for each element that makes up the sequence, so we will form an embedded character sequence. Each element of the embedded character sequence is then passed to the BI-LSTM layer. A concatenation of each output of the LSTM that makes up the double LSTM (forward LSTM and back LSTM) is then generated. Next, each forward + backward concatenated vector is passed to the LSTM layer, from which the last hidden state is passed to the linear layer. The last linear layer will have a Softmax function as the activation function to represent the probability of each character. The following figure shows the method described.

So far, we have explained the architecture of the text generation model and how it is implemented. Now we need to know how to do all this using the PyTorch framework, but first, I want to briefly explain how BilSTM and LSTM work together so that I can see how to do this in the code later, so let’s look at how the BilSTM network works.

Bi – LSTM and LSTM

The key difference between standard LSTM and BI-LSTM is that bi-LSTM consists of two LSTM, commonly referred to as “forward LSTM” and “reverse LSTM”. Basically, the forward LSTM receives sequences in raw order, while the reverse LSTM receives sequences. Each hidden state of each time step of the two LSTMs can then be concatenated, depending on the operation to be performed, or only the last state of the two LSTMs can be operated on. In the proposed model, we propose to add two hidden states per time step.

Good, now we understand the key difference between BI-LSTM and LSTM. Returning to the example we are developing, the following diagram shows the evolution of each character sequence as it moves through the model.

Great, once the interaction between BI-LSTM and LSTM is clear, let’s see how we can do this in our code using only LSTMcell in the PyTorch framework.

So, first let’s look at how to construct the constructor of the TextGenerator class. Let’s look at the following code snippet:

Code snippet 4- Constructor of the text generator class

class TextGenerator(nn.ModuleList) :
	
    def __init__(self, args, vocab_size) :
        super(TextGenerator, self).__init__()
    
        self.batch_size = args.batch_size
        self.hidden_dim = args.hidden_dim
        self.input_size = vocab_size
        self.num_classes = vocab_size
	self.sequence_len = args.window
    
        # Dropout
        self.dropout = nn.Dropout(0.25)
    
        # Embedding layer
        self.embedding = nn.Embedding(self.input_size, self.hidden_dim, padding_idx=0)
    
        # Bi-LSTM
        # forward and reverse
        self.lstm_cell_forward = nn.LSTMCell(self.hidden_dim, self.hidden_dim)
        self.lstm_cell_backward = nn.LSTMCell(self.hidden_dim, self.hidden_dim)
    
        # LSTM layer
        self.lstm_cell = nn.LSTMCell(self.hidden_dim * 2, self.hidden_dim * 2)
    
        # Linear layer
        self.linear = nn.Linear(self.hidden_dim * 2, self.num_classes)
Copy the code

As we can see, from lines 6 through 10, we define the parameters used to initialize each layer of the neural network. It should be noted that input_size is equal to the size of the vocabulary (that is, the number of elements generated by our dictionary during preprocessing). Again, the number of classes to be predicted is the same as the size of the vocabulary, and the length of the sequence represents the size of the window.

On the other hand, in lines 20 and 21, we define the two LSTMCells (forward and backward) that make up the BI-LSTM. In line 24, we define LSTMCell, which will be fed with the output of bi-LSTM. It is worth mentioning that the size of the hidden state is twice that of bi-LSTM because bi-LSTM outputs are concatenated. The linear layer is defined later in line 27, which will be filtered later by the Softmax function.

Once the constructors are defined, we need to create tensors containing both unit state and hidden state for each LSTM. Therefore, we proceed as follows:

Snippet 5- Weight initialization

# Bi-LSTM
# hs = [batch_size x hidden_size]
# cs = [batch_size x hidden_size]
hs_forward = torch.zeros(x.size(0), self.hidden_dim)
cs_forward = torch.zeros(x.size(0), self.hidden_dim)
hs_backward = torch.zeros(x.size(0), self.hidden_dim)
cs_backward = torch.zeros(x.size(0), self.hidden_dim)

# LSTM
# hs = [batch_size x (hidden_size * 2)]
# cs = [batch_size x (hidden_size * 2)]
hs_lstm = torch.zeros(x.size(0), self.hidden_dim * 2)
cs_lstm = torch.zeros(x.size(0), self.hidden_dim * 2)

# Weight initialization
torch.nn.init.kaiming_normal_(hs_forward)
torch.nn.init.kaiming_normal_(cs_forward)
torch.nn.init.kaiming_normal_(hs_backward)
torch.nn.init.kaiming_normal_(cs_backward)
torch.nn.init.kaiming_normal_(hs_lstm)
torch.nn.init.kaiming_normal_(cs_lstm)
Copy the code

Once the tensors containing the hidden and unit states are defined, it is time to show how the entire architecture is assembled.

First, let’s take a look at the following code snippet:

Code snippet 6-BiLSTM+LSTM+ Linear layer

# From IDX to embedding
out = self.embedding(x)

Prepare shape for LSTM
out = out.view(self.sequence_len, x.size(0), -1)

forward = []
backward = []

# unlock Bi - LSTM
Towards #
for i in range(self.sequence_len):
  hs_forward, cs_forward = self.lstm_cell_forward(out[i], (hs_forward, cs_forward))
  hs_forward = self.dropout(hs_forward)
  cs_forward = self.dropout(cs_forward)
  forward.append(hs_forward)
  
 # reverse
for i in reversed(range(self.sequence_len)):
  hs_backward, cs_backward = self.lstm_cell_backward(out[i], (hs_backward, cs_backward))
  hs_backward = self.dropout(hs_backward)
  cs_backward = self.dropout(cs_backward)
  backward.append(hs_backward)
  
 # LSTM
for fwd, bwd in zip(forward, backward):
  input_tensor = torch.cat((fwd, bwd), 1)
  hs_lstm, cs_lstm = self.lstm_cell(input_tensor, (hs_lstm, cs_lstm))

The last hidden state passes through the linear layer
out = self.linear(hs_lstm)
Copy the code

To better understand, we will interpret the program with some defined values so that we can understand how each tensor is passed from one layer to another. So suppose we have:

batch_size = 64
hidden_size = 128
sequence_len = 100
num_classes = 27
Copy the code

So the x input tensor will have a shape:

# torch.Size([batch_size, sequence_len])
x : torch.Size([64.100])
Copy the code

Then, in line 2, the X tensor is passed through the embedding layer, so the output will have a size:

# torch.Size([batch_size, sequence_len, hidden_size])
x_embedded : torch.Size([64.100.128])
Copy the code

Note that in line 5 we are 0 0 x_embedded tensor This is because we need to use the sequence length as the first dimension, essentially because in bi-LSTM we will iterate over each sequence, so the reconstructed tensor will have a shape:

# torch.Size([sequence_len, batch_size, hidden_size])
x_embedded_reshaped : torch.Size([100.64.128])
Copy the code

Next, the lists of forward and backward are defined in lines 7 and 8. There we will store the hidden state of bi-LSTM.

So it’s time to input data into bi-LSTM. First, in line 12, we iterate over the forward LSTM, and we also save the hidden state (HS_forward) for each time step. In line 19, we iterate over the backward LSTM while saving the hidden state (HS_BACKWARD) for each time step. You can notice that the loop is executed in the same order, except that it is read in reverse. Each hidden state will have the following shape:

# hs_forward : torch.Size([batch_size, hidden_size])
hs_forward : torch.Size([64.128])

# hs_backward : torch.Size([batch_size, hidden_size])
hs_backward: torch.Size([64.128])
Copy the code

Good, now let’s look at how to provide data for the latest LSTM layer. To do this, we use the forward and backward lists. In line 26, we iterate over each hidden state corresponding to the forward and BACKWARD cascaded in line 27. Note that by joining two hidden states, the dimension of the tensor is doubled, i.e., the tensor will have the following shape:

# input_tesor : torch.Size([bathc_size, hidden_size * 2])
input_tensor : torch.Size([64.256])
Copy the code

Finally, LSTM returns a hidden state of size:

# last_hidden_state: torch.Size([batch_size, num_classes])
last_hidden_state: torch.Size([64.27])
Copy the code

Finally, the last hidden state of the LSTM passes through a linear layer, as shown on line 31. Thus, the full forward function is shown in the following code snippet:

Snippet 7- Forward functions

def forward(self, x) :
    
    # Bi-LSTM
    # hs = [batch_size x hidden_size]
    # cs = [batch_size x hidden_size]
    hs_forward = torch.zeros(x.size(0), self.hidden_dim)
    cs_forward = torch.zeros(x.size(0), self.hidden_dim)
    hs_backward = torch.zeros(x.size(0), self.hidden_dim)
    cs_backward = torch.zeros(x.size(0), self.hidden_dim)
    
    # LSTM
    # hs = [batch_size x (hidden_size * 2)]
    # cs = [batch_size x (hidden_size * 2)]
    hs_lstm = torch.zeros(x.size(0), self.hidden_dim * 2)
    cs_lstm = torch.zeros(x.size(0), self.hidden_dim * 2)
    
    # Weight initialization
    torch.nn.init.kaiming_normal_(hs_forward)
    torch.nn.init.kaiming_normal_(cs_forward)
    torch.nn.init.kaiming_normal_(hs_backward)
    torch.nn.init.kaiming_normal_(cs_backward)
    torch.nn.init.kaiming_normal_(hs_lstm)
    torch.nn.init.kaiming_normal_(cs_lstm)
    
    # From IDX to embedding
    out = self.embedding(x)
    
    Prepare shape for LSTM
    out = out.view(self.sequence_len, x.size(0), -1)
    
    forward = []
    backward = []
    
    # unlock Bi - LSTM
    Towards #
    for i in range(self.sequence_len):
        hs_forward, cs_forward = self.lstm_cell_forward(out[i], (hs_forward, cs_forward))
        hs_forward = self.dropout(hs_forward)
        cs_forward = self.dropout(cs_forward)
        forward.append(hs_forward)
        
     # reverse
    for i in reversed(range(self.sequence_len)):
        hs_backward, cs_backward = self.lstm_cell_backward(out[i], (hs_backward, cs_backward))
        hs_backward = self.dropout(hs_backward)
        cs_backward = self.dropout(cs_backward)
        backward.append(hs_backward)
        
     # LSTM
    for fwd, bwd in zip(forward, backward):
        input_tensor = torch.cat((fwd, bwd), 1)
        hs_lstm, cs_lstm = self.lstm_cell(input_tensor, (hs_lstm, cs_lstm))
        
     The last hidden state passes through the linear layer
    out = self.linear(hs_lstm)
    
    return out
Copy the code

So far, we have seen how to assemble neural networks using LSTMCell in PyTorch. Now it’s time to see how we go through the training phase, so let’s move on to the next one.

The stage of training

Great, we’re here for practice. To perform the training, we need to initialize the model and optimizer, and later we need to do it for each epoch and each mini-batch, so let’s get started!

Snippet 8- Training phase

def train(self, args) :
  
  # Model initialization
  model = TextGenerator(args, self.vocab_size)
  
  Optimizer initialization
  optimizer = optim.RMSprop(model.parameters(), lr=self.learning_rate)
  
  Define batch number
  num_batches = int(len(self.sequences) / self.batch_size)
  
  # Training model
  model.train()
  
  # Training phase
  for epoch in range(self.num_epochs):
    
    # Mini batches
    for i in range(num_batches):
      
      # Batch definition
      try:
        x_batch = self.sequences[i * self.batch_size : (i + 1) * self.batch_size]
        y_batch = self.targets[i * self.batch_size : (i + 1) * self.batch_size]
      except:
        x_batch = self.sequences[i * self.batch_size :]
        y_batch = self.targets[i * self.batch_size :]
        
      Convert Numpy array to Torch Tensors
      x = torch.from_numpy(x_batch).type(torch.LongTensor)
      y = torch.from_numpy(y_batch).type(torch.LongTensor)
      
      # Input data
      y_pred = model(x)
      
      # loss calculation
      loss = F.cross_entropy(y_pred, y.squeeze())
      
      # Clear gradient
      optimizer.zero_grad()
      
      # Backpropagation
      loss.backward()
      
      # update parameters
      optimizer.step()
      
      print("Epoch: %d , loss: %.5f " % (epoch, loss.item()))
Copy the code

Once the model has been trained, we will need to save the weights of the neural network so that we can use them later to generate text. To do this we have two options, the first is to define a fixed time period and then save the weights, and the second is to define a stop function to get the best version of the model. In this particular case, we will choose the first option. After training the model for a certain number of times, we saved the weights as follows:

Code segment 9- weight save

# save weight
torch.save(model.state_dict(), 'weights/textGenerator_model.pt')
Copy the code

So far, we’ve seen how to train the text generator and how to save weights, now we’ll move on to the final part of this blog, text generation!

The text generated

We’ve come to the last part of the blog, text generation. To do this, we need to do two things: the first is to load the trained weights, and the second is to start generating the next character by taking a random sample from the sequence set as the pattern. Let’s take a look at the following code snippet:

Snippet 10- Text generator

def generator(model, sequences, idx_to_char, n_chars) :
  
  # Evaluation model
  model.eval(a)Define the softmax function
  softmax = nn.Softmax(dim=1)
  
  # Randomly select the index from the sequence set
  start = np.random.randint(0.len(sequences)-1)
  
  # given a random IDX to define the schema
  pattern = sequences[start]
  
  Using the dictionary, it prints Pattern
  print("\nPattern: \n")
  print(' '.join([idx_to_char[value] for value in pattern]), "\" ")
  
  # In full_prediction, we will save the complete prediction
  full_prediction = pattern.copy()
  
  # Prediction start, it will be predicted as a given character length
  for i in range(n_chars):
    
    # translate to tensor
    pattern = torch.from_numpy(pattern).type(torch.LongTensor)
    pattern = pattern.view(1, -1)
    
    # prediction
    prediction = model(pattern)
    # Apply the Softmax function to the prediction tensor
    prediction = softmax(prediction)
    
    The prediction tensor is converted to a NUMPY array
    prediction = prediction.squeeze().detach().numpy()
    # Take the idx with the highest probability
    arg_max = np.argmax(prediction)
    
    # Convert the current tensor to a NUMpy array
    pattern = pattern.squeeze().detach().numpy()
    # Window 1 character to the right
    pattern = pattern[1:]
    # New Pattern is composed of "old" pattern+ predicted characters
    pattern = np.append(pattern, arg_max)
    
    # Save the full forecast
    full_prediction = np.append(full_prediction, arg_max)
    
  print("Prediction: \n")
  print(' '.join([idx_to_char[value] for value in full_prediction]), "\" ")
Copy the code

Therefore, by training the model under the following characteristics:

window : 100
epochs : 50
hidden_dim : 128
batch_size : 128
learning_rate : 0.001
Copy the code

We can generate the following:

Seed: one of the prairie swellswhich gave a little wider view than most of them jack saw quite close to the Prediction: one of the prairie swellswhich gave a little wider view than most of them jack saw quite close to the wnd banngessejang boffff we outheaedd we band r hes tller a reacarof t t alethe ngothered uhe th wengaco ack fof ace ca e s alee bin cacotee tharss th band fofoutod we we ins sange trre anca y w farer we sewigalfetwher d e we n s shed pack wngaingh tthe  we the we javes t supun f the har man bllle s ng ou y anghe ond we nd ba a she t t anthendwe wn me anom ly tceaig t i isesw arawns t d ks wao thalac tharr jad d anongive where the awe w we he is ma mie cack seat sesant sns t imes hethof riges we he d ooushe he hang out f t thu inong bll llveco we see s the he haa is s igg merin ishe d t san wack owhe o or  th we sbe se we we inange t ts wan br seyomanthe harntho thengn th me ny we ke in acor offff of wan s arghe we t angorro the wand be thing a sth t tha alelllll willllsse of s wed w brstougof bage orore he anthesww were ofawe ce qur the he sbaing tthe bytondece nd t llllifsffo acke o t in ir me hedlff scewant pi t bri pi owasem the awh thorathas th we  hed ofainginictoplid we meCopy the code

As we can see, the generated text may not make any sense, but there are some words and phrases that seem to form an idea, for example:

we, band, pack, the, man, where, he, hang, out, be, thing, me, were
Copy the code

Congratulations, we’ve reached the end of the blog!

conclusion

In this blog, we show how to use PyTorch’s LSTMCell to build an end-to-end model for text generation and implement an architecture based on cyclic neural network LSTM and BI-LSTM.

It is worth noting that the proposed text generation model can be improved in different ways. Some suggested ideas are to increase the size of the textual corpus to be trained, increase the epoch and the size of the hidden layer for each LSTM. On the other hand, we can consider an interesting architecture based on convolutional LSTM.

Refer to the reference

[1] LSTM vs. GRU vs. Bidirectional RNN for Script Generation (arxiv.org/pdf/1908.04…)

[2] The survey: The Text generation models in deep learning (www.sciencedirect.com/science/art)…

The original link: towardsdatascience.com/text-genera…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/