3: Faster emotion analysis

In the previous article we introduced an upgraded version of sentiment analysis based on RNN. In this section, we will learn an approach that does not use RNN: We will implement the model in the paper Bag of Tricks for Efficient Text Classification. This simple model achieves performance comparable to the sentiment analysis in the second article, but much faster training.

3.1 Data preprocessing

The biggest difference between FastText classification model and other text classification models is that it calculates the N-gram of the input sentence and adds the n-gram as an additional feature to obtain the local word order feature information to the end of the tokenized list. The basic idea of N-gram is that the contents of the text are operated by a sliding window with a size of N in bytes, forming a sequence of byte fragments with a length of N, and each byte fragment is called a gram. Specifically, we use bi-grams here.

For example, in the sentence “How are you? In, bi-grams are: “How are”, “are you” and “You?” .

The “generate_bigrams” function takes an annotated sentence, calculates the bigrams and appends it to the end of the tokenized list.

def generate_bigrams(x) :
    n_grams = set(zip(*[x[i:] for i in range(2)))for n_gram in n_grams:
        x.append(' '.join(n_gram))
    return x
Copy the code

Example:

generate_bigrams(['This'.'film'.'is'.'terrible'])
Copy the code
['This', 'film', 'is', 'terrible', 'film is', 'This film', 'is terrible']
Copy the code

TorchText ‘Field’ has a preprocessing parameter. The function passed here will be applied to the sentence after it is tokenized (converted from a string to a list of tokens), but before it is digitized (converted from a list of tokens to a list of indexes). Here we will pass the generate_bigrams function.

Since we are not using RNN, we do not need to use compression to fill the sequence, so we do not need to set “include_length=True”.

import torch
from torchtext.legacy import data
from torchtext.legacy import datasets

SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(tokenize = 'spacy',
                  tokenizer_language = 'en_core_web_sm',
                  preprocessing = generate_bigrams)

LABEL = data.LabelField(dtype = torch.float)
Copy the code

As before, load the IMDb dataset and create the split:

import random

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

train_data, valid_data = train_data.split(random_state = random.seed(SEED))
Copy the code

Build vocab and load pre-trained word embedders:

MAX_VOCAB_SIZE = 25 _000

TEXT.build_vocab(train_data, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = "glove.6B.100d", 
                 unk_init = torch.Tensor.normal_)

LABEL.build_vocab(train_data)
Copy the code

Create iterators:

BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE, 
    device = device)
Copy the code

3.2 Model Building

FastText is a typical deep learning word vector representation method, which maps the words into the dense space by Embedding, and then averages all the words in the sentence in the Embedding space, so as to complete classification. Therefore, compared with the model in the previous chapter, the number of parameters in this model will be much reduced.

Specifically, it first calculates each word Embedding using the ‘Embedding’ layer (blue), then calculates the average of all word Embedding (pink) and enters it through the ‘Linear’ layer (silver).

We use the two-dimensional pooling function “avg_pool2d” to realize the averaging of words in the Embedding space. We can think of word embedding as a two-dimensional grid, where words are along one axis and the dimensions of word embedding are along the other axis. Below is an example sentence converted to 5-dimensional word embedding, with words along the vertical axis and embedding along the horizontal axis. [4×5] Every element in the tensor is represented by a green block.

“Avg_pool2d” uses a filter of size embedded. Shape [1] (sentence length) multiplied by 1. Shown in pink below.

We calculate the average of all the elements covered by the Filter, and then the filter slides right to calculate the average of the embedded values in the next column for each word in the sentence.

Each filter position provides a value, which is the average of all covered elements. After filter covers all embedded dimensions, a [1×5] tensor will be obtained, and then prediction will be made through the linear layer.

import torch.nn as nn
import torch.nn.functional as F

class FastText(nn.Module) :
    def __init__(self, vocab_size, embedding_dim, output_dim, pad_idx) :
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        
        self.fc = nn.Linear(embedding_dim, output_dim)
        
    def forward(self, text) :
        
        #text = [sent len, batch size]
        
        embedded = self.embedding(text)
                
        #embedded = [sent len, batch size, emb dim]
        
        embedded = embedded.permute(1.0.2)
        
        #embedded = [batch size, sent len, emb dim]
        
        pooled = F.avg_pool2d(embedded, (embedded.shape[1].1)).squeeze(1) 
        
        #pooled = [batch size, embedding_dim]
                
        return self.fc(pooled)
Copy the code

Create an instance of the ‘FastText’ class as before:

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
OUTPUT_DIM = 1
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = FastText(INPUT_DIM, EMBEDDING_DIM, OUTPUT_DIM, PAD_IDX)
Copy the code

Looking at the number of parameters in the model, we see that this parameter is roughly the same as the standard RNN in section 1, and only half as large as the previous model.

def count_parameters(model) :
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')
Copy the code
The model has 2,500,301 trainable parameters
Copy the code

The pre-trained vector is copied to the embedding layer:

pretrained_embeddings = TEXT.vocab.vectors

model.embedding.weight.data.copy_(pretrained_embeddings)
Copy the code
Tensor ([[0.1117, 0.4966, 0.1631,...., 1.2647, 0.2753, 0.1325], [0.8555, 0.7208, 1.3755,...., 0.0825, 1.1314, 0.3997], [0.0382, 0.2449, 0.7281,...., 0.1459, 0.8278, 0.2706],... [0.1606, 0.7357, 0.5809,..., 0.8704, 1.5637, 1.5724], [1.3126, 1.6717, 0.4203,...., 0.2348, 0.9110, 1.0914], [1.5268, 1.5639, 1.0541,...., 1.0045, 0.6813, 0.8846]])Copy the code

Zero the initial weight of unknown tokens and filled tokens:

UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)
Copy the code

3.3 Training Model

The training model is exactly the same as in the previous section.

Initializing optimizer:

import torch.optim as optim

optimizer = optim.Adam(model.parameters())
Copy the code

Define standards and place models and standards on gpus:

criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)
Copy the code

Calculation of precision function:

def binary_accuracy(preds, y) :
    "" Returns accuracy per batch, i.e. if you get 8/10 right, this Returns 0.8, NOT 8 ""

    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float(a)#convert into float for division 
    acc = correct.sum(a) /len(correct)
    return acc
Copy the code

Defines a function to train the model.

Note: Since we won’t be using dropout, we don’t actually need to use model.train(), but keep the line here for good code practice.

def train(model, iterator, optimizer, criterion) :
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        predictions = model(batch.text).squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)
Copy the code

Defines a function to test the trained model.

Note: Again, we keep model.eval()

def evaluate(model, iterator, criterion) :
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval(a)with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.text).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)
Copy the code

How long does it take to get an epoch from a function:

import time

def epoch_time(start_time, end_time) :
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs
Copy the code

Finally, train our model:

N_EPOCHS = 5

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut3-model.pt')
    
    print(f'Epoch: {epoch+1: 02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:3.f} | Train Acc: {train_acc*100:2.f}% ')
    print(f'\t Val. Loss: {valid_loss:3.f} |  Val. Acc: {valid_acc*100:2.f}% ')
Copy the code

Obtain test accuracy (much less training time than the model in the previous section) :

model.load_state_dict(torch.load('tut3-model.pt'))

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:3.f} | Test Acc: {test_acc*100:2.f}% ')
Copy the code
The Test Loss: 0.381 | Test Acc: 85.42%Copy the code

3.3 Model Verification

import spacy
nlp = spacy.load('en_core_web_sm')

def predict_sentiment(model, sentence) :
    model.eval()
    tokenized = generate_bigrams([tok.text for tok in nlp.tokenizer(sentence)])
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    prediction = torch.sigmoid(model(tensor))
    return prediction.item()
Copy the code

Examples of negative comments:

predict_sentiment(model, "This film is terrible")
Copy the code
2.1313092350011553 e-12Copy the code

Examples of positive comments:

predict_sentiment(model, "This film is great")
Copy the code
1.0
Copy the code

summary

In the next section, we will use convolutional neural network (CNN) for sentiment analysis.