Abstract: Codec models provide a model for using recurrent neural networks to solve sequence prediction problems such as machine translation. Codec models can be developed using the Keras Python deep learning library. Examples of neural network machine translation systems developed using this model are also described on the Keras blog, and sample code is distributed with the Keras project. This example provides a basis for users to develop their own codec LSTM models.

In this tutorial, you will learn how to use Keras to develop complex codec cyclic neural networks for sequence prediction problems, including:

  • How to define a complex codec model for sequence prediction in Keras.
  • How to define a scalable sequence prediction problem that can be used to evaluate codec LSTM models.
  • How to apply codec LSTM model in Keras to solve scalable integer sequence prediction problem.

Tutorial overview

Scalable sequence problem for codec model in Keras codec LSTM for sequence prediction

Python environment

  • Python SciPy is required and can be developed using Python 2/3.
  • You must have Keras (version 2.0 or higher) installed and use TensorFlow or Theano as the back end.
  • Install sciKit-learn, Pandas, NumPy, and Matplotlib. This article is helpful for setting up the environment:
  • How to set up machine learning and deep learning Python environments with Anaconda

Codec model in Keras

Codec model is a method to organize recurrent neural network for sequence prediction. It was originally developed for machine translation problems and has been shown to be effective in related sequence prediction problems such as text summarization and question answering.

The method involves two recurrent neural networks, one for encoding the source sequence, called the encoder, and the other for decoding the encoded source sequence to the target sequence, called the decoder.

The Keras Deep Learning Python library provides an example of machine translation codec (lSTm_seq2seq.py), described in the article “Ten Minute Introduction: Sequence Learning in Keras”.

Based on the example code, we can develop a general function to define the codec cyclic neural network. Below is the define_models() function.

This function takes three arguments:

  • N_input: The cardinality of the input sequence, such as the number of characters, words, or characters per time step.
  • N_output: The cardinality of the output sequence, such as the number of characters, words, or characters per time step.
  • N_units: The number of units created in the encoder and decoder models, such as 128 or 256.

This function creates and returns three models:

  • Train: A model for training given a source, target, and offset target sequence.
  • Inference_encoder: Encoder model used for predicting new source sequences.
  • Inference_decoder: Decoder model used for predicting new source sequences. The model trains the given source sequence and target sequence, and the model predicts the whole target sequence with the offset of source sequence and target sequence as the input.

The model trains source and target sequences, in which the model takes the source and displacement versions of the target sequence as inputs and predicts the entire target sequence.

For example, the source sequence may be [1,2,3] and the target sequence [4,5,6], then the inputs and outputs of the model during training will be:

Input1: ['1', '2', '3']
Input2: ['_', '4', '5']
Output: ['4', '5', '6']
Copy the code

This model is called recursively when a target sequence is generated for a new source sequence.

The source sequence is encoded, and the target sequence generates an element with an initial character like “_” to start the process. Therefore, in the above case, the following input-output pairs are generated during training:

Here you can see how recursion uses the model to build the output sequence. In the prediction process, the Inference_Encoder model is used to encode the input sequence. The Inference_decoder model is then used to generate predictions step by step.

The following predict_sequence() function generates the target sequence from the given source sequence after model training is complete.

This function takes five arguments:

  • Infenc: Encoder model for prediction of new source sequences.
  • Infdec: Decoder model used for prediction of new source sequences.
  • Source: Encoded source sequence.
  • N_steps: Number of time steps in the target sequence.
  • Cardinality: The cardinality of the output sequence, such as the characteristics, number of words, or characters per time step.

This function returns a list containing the target sequence.

Scalable sequence problems

In this chapter, we present a scalable sequence prediction problem.

The source sequence is a series of randomly generated integer values, such as [20, 36, 40, 10, 34, 28], and the target sequence is a backward predefined subset of the input sequence, such as the first three elements in reverse order [40, 36, 20]. The length of the source sequence is configurable, as are the cardinality of the input and output sequences, and the length of the target sequence. The number of source sequence elements we will use is 6, the base is 50, and the number of target sequence elements is 3.

Here’s a concrete example.

Start by defining a function to generate a random sequence of integers. We will use a value of 0 as the padding or beginning of the sequence character, so 0 is a reserved character and cannot be used in the source sequence. To achieve this, add 1 to the cardinality of the configuration to ensure that the unique heat encoding is large enough.

Such as:

N_features = 50 + 1 Random integers between 1 and -1 can be generated using the randint() function. The following generate_sequence() generates a random sequence of integers. # generate a sequence of random integers def generate_sequence(length, n_unique): return [randint(1, n_unique-1) for _ in range(length)]Copy the code

Next, create an output sequence that corresponds to the given source sequence. For convenience, take the first n elements of the source sequence as the target sequence and order them in reverse order.

# define target sequence
target = source[:n_out]
target.reverse()
Copy the code

We also need an output sequence that moves a time-step version forward as the generated simulation target sequence. It can be created directly from the target sequence.

# create padded input 
target sequencetarget_in = [0] + target[:-1]
Copy the code

Now that all the sequences have been defined, it’s time to thermally encode them individually, converting them to binary vector sequences. You can do this using Keras’s built-in to_categorical() function.

You can put all of these operations into a function called get_dataset() that produces a specified number of sequences.

Finally, the unique thermal encoding sequence is decoded so that it can be read again. This is not only necessary for the printed generated target sequence, but can also be used to compare fully predict whether the target sequence matches the expected target sequence. The one_hot_decode() function decodes the encoded sequence.

# decode a one hot encoded string
def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]
Copy the code

We can put all of this together and test it. A complete code example is listed below.

Running the example, we first print the shapes of the generated dataset to ensure that the 3D shapes required for the training model are what we expect.

The resulting sequence is then decoded and printed on the screen to show whether the source and target sequences match our intentions and what decoding is going on.

(1, 6, 51) (1, 3, 51) (1, 3, 51)
X1=[32, 16, 12, 34, 25, 24], X2=[0, 12, 16], y=[12, 16, 32]
Copy the code

A model for this sequence prediction problem will be developed below.

Codec LSTM for Sequence Prediction In this chapter, we apply the codec LSTM model developed in Section 1 to the sequence prediction problem developed in Section 2.

The first step is to configure the problem.

# configure problem
n_features = 50 + 1
n_steps_in = 6
n_steps_out = 3
Copy the code

Next, define the model and compile the training model.

Next, a training dataset containing 100,000 samples is generated and the model is trained.

Once the model has been trained, it can be evaluated. The evaluation is done by making predictions for 100 source sequences and counting the number of correct predictions for target sequences. You can use Numpy’s array_equal() function on the decoded sequence to check for equality.

Finally, the example will generate some predictions and print out decoded sequences of sources, targets, and predicted targets to check that the model works as expected.

Putting all the above code snippets together, the complete code example is shown below.





To run the example, first print the shape of the prepared dataset.

(100000, 6, 51) (100000, 3, 51) (100000, 3, 51)
Copy the code

Next, a progress bar is displayed that will run for less than a minute on a modern multicore CPU.

100000/100000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 50 s - loss: 0.6344 acc: 0.7968Copy the code

Next, evaluate the model and print accuracy. As you can see, the model achieves 100% accuracy on the newly randomly generated samples.

Accuracy: 100.00%
Copy the code

Finally, 10 new examples are generated and the target sequence is predicted. As you can see, the model correctly predicts the output sequence in each case, and the expected value matches the first three elements of the inverted source sequence.

X=[22, 17, 23, 5, 29, 11] y=[23, 17, 22], yhat=[23, 17, 22]
X=[28, 2, 46, 12, 21, 6] y=[46, 2, 28], yhat=[46, 2, 28]
X=[12, 20, 45, 28, 18, 42] y=[45, 20, 12], yhat=[45, 20, 12]
X=[3, 43, 45, 4, 33, 27] y=[45, 43, 3], yhat=[45, 43, 3]
X=[34, 50, 21, 20, 11, 6] y=[21, 50, 34], yhat=[21, 50, 34]
X=[47, 42, 14, 2, 31, 6] y=[14, 42, 47], yhat=[14, 42, 47]
X=[20, 24, 34, 31, 37, 25] y=[34, 24, 20], yhat=[34, 24, 20]
X=[4, 35, 15, 14, 47, 33] y=[15, 35, 4], yhat=[15, 35, 4]
X=[20, 28, 21, 39, 5, 25] y=[21, 28, 20], yhat=[21, 28, 20]
X=[50, 38, 17, 25, 31, 48] y=[17, 38, 50], yhat=[17, 38, 50]
Copy the code

You now have a template for the COdec LSTM model that you can apply to your own sequence prediction problems.

conclusion

In this tutorial, you learned how to use Keras to develop complex codec cyclic neural networks for sequence prediction problems, specifically, the following:

How to define a complex codec model for sequence prediction in Keras. How to define a scalable sequence prediction problem that can be used to evaluate codec LSTM models. How to apply the programming LSTM model in Keras to solve the scalable integer sequence prediction problem.

Original text: machinelearningmastery.com/develop-enc…