• Part of Speech Tagging tutorial with the Keras Deep Learning Library
  • 原文作者:Cdiscount Data Science
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: luochen
  • Proofread by: Stormluke Mingxing47

In this tutorial, you will see how to use a simple Keras model to train and evaluate artificial neural networks for multiple classification problems.

Photograph by Unsplash’s Joao Tzanno

Part of speech tagging is a well-known task in natural language processing. It refers to categorizing words into parts of speech (also known as parts of speech or part of speech categories). This is a supervised method of learning.

Artificial neural networks have been successfully applied to pos tagging with excellent performance. We will focus on multi-layer perceptron networks, a very popular network structure that is seen as the latest technology to solve the problem of pos tagging. (Translator: RNN works better for pos tagging problems)

Let’s put it into practice!

In this article, you’ll get a quick tutorial on how to implement a simple multilayer perceptron in Keras, trained on the annotated corpus.

Ensure repeatability

To ensure that our experiment can be reproduced, we need to set a random seed:

import numpy as np

CUSTOM_SEED = 42
np.random.seed(CUSTOM_SEED)
Copy the code

Get the annotated corpus

Penn Treebank is a part-of-speech tagging corpus. Example of a Python library, [me] (https://github.com/nltk/nltk) can be used for training and testing of some natural language processing (NLP) models model corpus.

First, we download the marked corpus:

import nltk

nltk.download('treebank')
Copy the code

Then we load the marked sentences.

from nltk.corpus import treebank

sentences = treebank.tagged_sents(tagset='universal')
Copy the code

Then let’s pick a random sentence and see:

import random

print(random.choice(sentences))
Copy the code

This is a list of tuples (term, tag).

[('Mr.'.'NOUN'), ('Otero'.'NOUN'), (', '.'. '), ('who'.'PRON'), ('apparently'.'ADV'), ('has'.'VERB'), ('an'.'DET'), ('unpublished'.'ADJ'), ('number'.'NOUN'), (', '.'. '), ('also'.'ADV'), ('could'.'VERB'), ("n't".'ADV'), ('be'.'VERB'), ('reached'.'VERB'), ('. '.'. ')]
Copy the code

This is a multi-classification problem with more than forty different categories. Pos tagging on the Treebank corpus is a well-known problem, and we expect the accuracy of the model to exceed 95%.

tags = set([
    tag for sentence in treebank.tagged_sents() 
    for _, tag in sentence
])
print('nb_tags: %sntags: %s' % (len(tags), tags))
Copy the code

Produced a:

46
{'IN'.'VBZ'.'. '.'RP'.'DT'.'VB'.'RBR'.'CC'.The '#'.', '.'VBP'.'WP$'.'PRP'.'JJ'.'RBS'.'LS'.'PRP$'.'WRB'.'JJS'.'` `'.'EX'.'POS'.'WP'.'VBN'.'-LRB-'.'-RRB-'.'FW'.'MD'.'VBG'.'TO'.'$'.'NNS'.'NNPS'."" '".'VBD'.'JJR'.':'.'PDT'.'SYM'.'NNP'.'CD'.'RB'.'WDT'.'UH'.'NN'.'-NONE-'}
Copy the code

### Data set preprocessing for supervised learning

We divided the marked sentences into three data sets:

  • The training set is equivalent to the sample data of the fitting model,
  • The validation set is used to adjust the parameters of the classifier, such as selecting the number of neurons in the network,
  • The test set is used only to evaluate the performance of the classifier.

We use about 60% of the marked sentences for training, 20% for validation sets, and 20% for evaluating our model.

train_test_cutoff = int(80. * len(sentences)) 
training_sentences = sentences[:train_test_cutoff]
testing_sentences = sentences[train_test_cutoff:]

train_val_cutoff = int(25. * len(training_sentences))
validation_sentences = training_sentences[:train_val_cutoff]
training_sentences = training_sentences[train_val_cutoff:]
Copy the code

Characteristics of the engineering

Our feature set is very simple. For each word, we create a feature dictionary based on the sentence from which the word was extracted. These properties include the words before and after the word as well as its prefix and suffix.

def add_basic_features(sentence_terms, index):
    Param sentence_terms: [w1, w2,...]  :type sentence_terms: list :param index: the index of the word :type index: int :return: dict containing features :rtype: dict """
    term = sentence_terms[index]
    return {
        'nb_terms': len(sentence_terms),
        'term': term,
        'is_first': index == 0.'is_last': index == len(sentence_terms) - 1.'is_capitalized': term[0].upper() == term[0].'is_all_caps': term.upper() == term,
        'is_all_lower': term.lower() == term,
        'prefix-1': term[0].'prefix-2': term[:2].'prefix-3': term[:3].'suffix-1': term[- 1].'suffix-2': term[2 -:].'suffix-3': term[- 3:].'prev_word': ' ' if index == 0 else sentence_terms[index - 1].'next_word': ' ' if index == len(sentence_terms) - 1 else sentence_terms[index + 1]}Copy the code

We map the list of sentences to the list of feature dictionaries.

def untag(tagged_sentence):
    """ "Remove the tag for each tagged word. Tagged_sentence: list :return: a list of tags :rtype: list of strings ""
    return [w for w, _ in tagged_sentence]

def transform_to_dataset(tagged_sentences):
    Param tagged_sentences: List of sentences tagged :param tagged_sentences: List of tuples (term_i, tag_i) :return: """
    X, y = [], []

for pos_tags in tagged_sentences:
        for index, (term, class_) in enumerate(pos_tags):
            # Add basic NLP features for each sentence term
            X.append(add_basic_features(untag(pos_tags), index))
            y.append(class_)
    return X, y
Copy the code

For training, validation, and testing sentences, we divide the attributes into X (input variable) and Y (output variable).

X_train, y_train = transform_to_dataset(training_sentences)
X_test, y_test = transform_to_dataset(testing_sentences)
X_val, y_val = transform_to_dataset(validation_sentences)
Copy the code

Character encoding

Our neural network takes vectors as input, so we need to convert our dictionary features into vectors. Built-in function for skLearn [DictVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html) Provides a very straightforward way to perform vector transformations.

from sklearn.feature_extraction import DictVectorizer

# Fit the dictionary vector generator with our feature set
dict_vectorizer = DictVectorizer(sparse=False)
dict_vectorizer.fit(X_train + X_test + X_val)

# Convert dictionary features to vectors
X_train = dict_vectorizer.transform(X_train)
X_test = dict_vectorizer.transform(X_test)
X_val = dict_vectorizer.transform(X_val)
Copy the code

Our y vector has to be coded. The output variable contains 49 different string values that are encoded as integers.

from sklearn.preprocessing import LabelEncoder

# Train tag encoders with category lists
label_encoder = LabelEncoder()
label_encoder.fit(y_train + y_test + y_val)

# Code category values as integers
y_train = label_encoder.transform(y_train)
y_test = label_encoder.transform(y_test)
y_val = label_encoder.transform(y_val)
Copy the code

We then need to convert these encoded values into dummy variables (thermal encoding alone).

# Convert integers to dummy variables
from keras.utils import np_utils

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
y_val = np_utils.to_categorical(y_val)
Copy the code

Build the Keras model

[Keras] (https://github.com/fchollet/keras/) is an advanced framework, to design and run the neural network, It has multiple backend like [TensorFlow] (https://github.com/tensorflow/tensorflow/), [Theano] (https://github.com/Theano/Theano) and [CNTK] (https://github.com/Microsoft/CNTK).

We want to create a very basic neural network: multilayer perceptrons. This linear lamination can be accomplished easily with a Sequential model. The model will contain an input layer, a hidden layer, and an output layer. To overcome overfitting, we use the Dropout regularization. We set the disconnect rate to 20%, which means that the input neurons are randomly disconnected at a 20% probability every time the parameters are updated during the training.

We use the Corrected Linear Units (ReLU) activation functions for the hidden layers because they are the simplest nonlinear activation functions available.

For the multi-classification problem, we want to convert the neuron output to probability, which can be done using the Softmax function. We decided to use a categorical cross-entropy loss function. In the end, we chose Adam Optimizer because it seemed to be a good fit for the classification task.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

def build_model(input_dim, hidden_neurons, output_dim):
    Build, compile, and return a Keras model for fitting/forecasting. "" "
    model = Sequential([
        Dense(hidden_neurons, input_dim=input_dim),
        Activation('relu'),
        Dropout(0.2),
        Dense(hidden_neurons),
        Activation('relu'),
        Dropout(0.2),
        Dense(output_dim, activation='softmax')
    ])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
Copy the code

Create a wrapper between the Keras API and Scikit-Learn

[Keras] (https://github.com/fchollet/keras/) provides a wrapper called [KerasClassifier] (https://keras.io/scikit-learn-api/). It implements the Scikit-Learn classifier interface.

All model parameters are defined as follows. We need to provide a function (build_fn) that returns the structure of the neural network. The number of hidden neurons and the batch size were chosen at random. We set the number of iterations to 5 because as the number of iterations increases, the multilayer perceptron begins to overfit (using Dropout Regularization).

from keras.wrappers.scikit_learn import KerasClassifier

model_params = {
    'build_fn': build_model,
    'input_dim': X_train.shape[1].'hidden_neurons': 512.'output_dim': y_train.shape[1].'epochs': 5.'batch_size': 256.'verbose': 1.'validation_data': (X_val, y_val),
    'shuffle': True
}

clf = KerasClassifier(**model_params)
Copy the code

Training the Keras model

Finally, we train the multilayer perceptron on the training set.

hist = clf.fit(X_train, y_train)
Copy the code

Through callback history, we can visualize the change of the model’s log loss and accuracy over time.

import matplotlib.pyplot as plt

def plot_model_performance(train_loss, train_acc, train_val_loss, train_val_acc):
    Curve model loss and accuracy over time

    blue= '#34495E'
    green = '#2ECC71'
    orange = '#E23B13'

    # Draw the model loss curve
    fig, (ax1, ax2) = plt.subplots(2, figsize=(10.8))
    ax1.plot(range(1, len(train_loss) + 1), train_loss, blue, linewidth=5, label='training')
    ax1.plot(range(1, len(train_val_loss) + 1), train_val_loss, green, linewidth=5, label='validation')
    ax1.set_xlabel('# epoch')
    ax1.set_ylabel('loss')
    ax1.tick_params('y')
    ax1.legend(loc='upper right', shadow=False)
    ax1.set_title('Model loss through #epochs', color=orange, fontweight='bold')

    # Draw model accuracy curve
    ax2.plot(range(1, len(train_acc) + 1), train_acc, blue, linewidth=5, label='training')
    ax2.plot(range(1, len(train_val_acc) + 1), train_val_acc, green, linewidth=5, label='validation')
    ax2.set_xlabel('# epoch')
    ax2.set_ylabel('accuracy')
    ax2.tick_params('y')
    ax2.legend(loc='lower right', shadow=False)
    ax2.set_title('Model accuracy through #epochs', color=orange, fontweight='bold')
Copy the code

Then, look at the performance of the model:

plot_model_performance(
    train_loss=hist.history.get('loss', []),
    train_acc=hist.history.get('acc', []),
    train_val_loss=hist.history.get('val_loss', []),
    train_val_acc=hist.history.get('val_acc'[])),Copy the code

Model performance varies with the number of iterations.

After two iterations, we found that the model was overfitted.

Evaluate multilayer perceptrons

Since our model has been trained, we can evaluate it directly:

score = clf.score(X_test, y_test)
print(score)

[Out] 0.95816
Copy the code

Our accuracy on the test set is close to 96%, which is very impressive when you look at the basic features that we put into the model. Keep in mind that 100% accuracy is not possible, even for human taggers. We estimate the accuracy of human pos tagging at about 98%.

Visualization of the model

from keras.utils import plot_model

plot_model(clf.model, to_file='model.png', show_shapes=True)
Copy the code

Save the Keras model

Saving the Keras model is very simple because the Keras library provides a way to localize:

clf.model.save('/tmp/keras_mlp.h5')
Copy the code

The model structure, weights, and training configuration (loss function, optimizer) are saved.

resources

  • KerasPython deep learning library:[doc]
  • Adam: A Stochastic Optimization Method: [Paper]
  • Improving neural networks by preventing co-adaptation of feature detectors: [paper]

In this article, you learn how to use the Keras library to define and evaluate the accuracy of neural networks for multiple classifications. Code here: [p y |. Ipynb].


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.