CB014:TensorFlow seq2SEQ Model step by step

Neural networks. “Make Your Own Neural Network”, with a very easy to understand description explain the principle of artificial Neural Network with code implementation, the test effect is very good.

Recurrent neural network and LSTM. Christopher Olah http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

Seq2seq model is based on sequence to sequence model of recurrent neural network. Seq2seq model can be used for sequence to sequence scenes of language translation and automatic question answering. Seq2seq is used to implement the principle of bot at http://suriyadeepan.github.io/2016-06-28-easy-seq2seq/.

The attention model is to solve the problem that the seQ2SEq decoder only accepts the last output of the encoder far away from the previous output, resulting in information loss. One answer generally based on the problems in the key position information, focused, http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/.

Tensorflow Seq2seq makes chatbots. Tensorflow mention key interface: https://www.tensorflow.org/api_docs/python/tf/contrib/legacy_seq2seq/embedding_attention_seq2seq.

embedding_attention_seq2seq(
    encoder_inputs,
    decoder_inputs,
    cell,
    num_encoder_symbols,
    num_decoder_symbols,
    embedding_size,
    num_heads=1,
    output_projection=None,
    feed_previous=False,
    dtype=None,
    scope=None,
    initial_state_attention=False
)
Copy the code

Encoder_inputs are list, list is 1D Tensor, Tensor shape is [batch_size], Tensor is an integer.

[array([0, 0, 0, 0], dtype=int32), 
array([0, 0, 0, 0], dtype=int32), 
array([8, 3, 5, 3], dtype=int32), 
array([7, 8, 2, 1], dtype=int32), 
array([6, 2, 10, 9], dtype=int32)]
Copy the code

5 array, indicating that a sentence is 5 words long. Each array contains four numbers, indicating that Batch is 4, consisting of four samples. The first sample is [[0], [0], [8], [7], [6]], the second sample is [[0], [0], [3], [8], [2]], the id number to distinguish the different word, generally through statistics, an id say a word.

The decoder_inputs parameter has the same structure as encoder_inputs.

Parameters of the cell is tf. The nn. Rnn_cell. RNNCell type neural network unit, available tf. Contrib. RNN. BasicLSTMCell, tf. Contrib. RNN. GRUCell.

The num_encoder_symbols parameter is an integer, indicating the number of encoder_inputs integer ids.

Num_decoder_symbols indicates the id number of integer words in decoder_inputs.

Embedding_size specifies that the internal word embedding becomes a multi-dimensional vector and needs to be the same size as RNNCell size.

Num_heads indicates the number of attention_states taps.

Output_projection is a (W, B) structure tuple, W is a shape [output_size x NUM_decoder_symbols] weight matrix, B is a shape [NUM_decoder_symbols] bias vector, Each RNNCell output is mapped to the NUM_decoder_symbols dimension vector by WX+B, and the vector value represents the possibility of any decoder_symbol, softmax.

Feed_previous indicates whether the inputs provide training data directly, or use the previous RNNCell for mapping. If feed_previous is True, use the previous RNNCell for mapping through WX+B.

Dtype is the RNN state data type and the default is TF.FLOAT32.

Scope is the subgraph name, and the default is “embedding_attention_seq2seq.”

Initial_state_attention indicates whether attentions are initialized. The default value is no, indicating that all attentions are initialized to 0. Elsider_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list encoder_inputs list For example, four sets of Tensor for four samples, each of which is embedding_size. Outputs describes 548 floating-point numbers, 5 for sentence length, 4 for sample number, and 8 for word vector dimension.

Return state, num_layers LSTMStateTuple to form a large tuple, num_layers is the initialization cell parameter, indicating that the neural network unit has several layers, a three-layer LSTM neuron composed of encoder-decoder multi-layer cyclic neural network. Encoder_inputs Encoder LSTM neuron at the first layer, neuron output to LSTM neuron at the second layer, output at the second layer to LSTM neuron at the third layer, encoder LSTM neuron at the first layer output state to decoder LSTM neuron at the first layer, and so on.

LSTMStateTuple is composed of two Tensor tuples. The first Tensor is called C and it has four 8-dimensional vectors. The second Tensor is called H and it also has four 8-dimensional vectors.

C is passed to the next sequential storage, and H is hidden output.

Tensorflow code implementation:

concat = _linear([inputs, h], 4 * self._num_units, True)
i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)
new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j))
new_h = self._activation(new_c) * sigmoid(o)
Copy the code

Embedding_attention_seq2seq; return state is not generally used.

Construct input parameters to train a SEQ2SEQ model. 1, 3, 5, 7, 9…… Odd number sequence structure sample, such as the two samples is [,3,5 [1], [7,9,11]] and [[3, 5, 7], [9,11,13]] :

train_set = [[[1, 3, 5], [7, 9, 11]], [[3, 5, 7], [9, 11, 13]]]
Copy the code

The length of the training sequence is longer than that of the sample sequence, and the ratio is 5

input_seq_len = 5
output_seq_len = 5
Copy the code

The sample length is less than the training sequence length and is filled with 0

PAD_ID = 0
Copy the code

First sample encoder_input:

encoder_input_0 = [PAD_ID] * (input_seq_len - len(train_set[0][0])) + train_set[0][0]
Copy the code

Second sample encoder_input:

encoder_input_1 = [PAD_ID] * (input_seq_len - len(train_set[1][0])) + train_set[1][0]
Copy the code

Decoder_input starts with GO_ID, enters the sample sequence, and fills with PAD_ID.

GO_ID = 1
decoder_input_0 = [GO_ID] + train_set[0][1] 
    + [PAD_ID] * (output_seq_len - len(train_set[0][1]) - 1)
decoder_input_1 = [GO_ID] + train_set[1][1] 
    + [PAD_ID] * (output_seq_len - len(train_set[1][1]) - 1)
Copy the code

Embedding_attention_seq2seq Input parameters encoder_inputs and decoder_inputs

encoder_inputs = []
decoder_inputs = []
for length_idx in xrange(input_seq_len):
    encoder_inputs.append(np.array([encoder_input_0[length_idx], 
                          encoder_input_1[length_idx]], dtype=np.int32))
for length_idx in xrange(output_seq_len):
    decoder_inputs.append(np.array([decoder_input_0[length_idx], 
                          decoder_input_1[length_idx]], dtype=np.int32))
Copy the code

Independent function:

# coding: UTF-8 import numpy as NP # Input sequence length input_seq_len = 5 # output sequence length output_seq_len = 5 # Empty padding 0 PAD_ID = 0 # Output sequence start flag GO_ID = 1 def get_samples(): """ return: encoder_inputs: [array([0, 0], dtype=int32), array([0, 0], dtype=int32), array([1, 3], dtype=int32), array([3, 5], dtype=int32), array([5, 7], dtype=int32)] decoder_inputs: [array([1, 1], dtype=int32), array([7, 9], dtype=int32), array([ 9, 11], dtype=int32), array([11, 13], dtype=int32), array([0, 0], dtype=int32)] """ train_set = [[[1, 3, 5], [7, 9, 11]], [[3, 5, 7], [9, 11, 13]]] encoder_input_0 = [PAD_ID] * (input_seq_len - len(train_set[0][0])) + train_set[0][0] encoder_input_1 = [PAD_ID] *  (input_seq_len - len(train_set[1][0])) + train_set[1][0] decoder_input_0 = [GO_ID] + train_set[0][1] + [PAD_ID] * (output_seq_len - len(train_set[0][1]) - 1) decoder_input_1 = [GO_ID] + train_set[1][1] + [PAD_ID] * (output_seq_len - len(train_set[1][1]) - 1) encoder_inputs = [] decoder_inputs = [] for length_idx in xrange(input_seq_len): encoder_inputs.append(np.array([encoder_input_0[length_idx], encoder_input_1[length_idx]], dtype=np.int32)) for length_idx in xrange(output_seq_len): decoder_inputs.append(np.array([decoder_input_0[length_idx], decoder_input_1[length_idx]], dtype=np.int32)) return encoder_inputs, decoder_inputsCopy the code

To construct a model, The running process of TensorFlow is to construct a graph first and then plug in the data calculation. The process of constructing a model is actually to build a graph. Inputs encoder_inputs and DECOder_inputs placeholder

import tensorflow as tf
encoder_inputs = []
decoder_inputs = []
for i in xrange(input_seq_len):
    encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], 
                          name="encoder{0}".format(i)))
for i in xrange(output_seq_len):
    decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], 
                          name="decoder{0}".format(i)))
Copy the code

Create a LSTM neuron structure with the number of memory units size=8:

size = 8
cell = tf.contrib.rnn.BasicLSTMCell(size)
Copy the code

The maximum value of the training odd sequence is 10 for input and 16 for output

num_encoder_symbols = 10
num_decoder_symbols = 16
Copy the code

Pass the parameter embedding_attention_seq2seq to get output

from tensorflow.contrib.legacy_seq2seq.python.ops import seq2seq
outputs, _ = seq2seq.embedding_attention_seq2seq(
                    encoder_inputs,
                    decoder_inputs[:output_seq_len],
                    cell,
                    cell,
                    num_encoder_symbols=num_encoder_symbols,
                    num_decoder_symbols=num_decoder_symbols,
                    embedding_size=size,
                    output_projection=None,
                    feed_previous=False,
                    dtype=tf.float32)
Copy the code

Build the model part to put a separate function:

Def get_inputs (): """ "encoder_inputs = [] inputs = [] for I in xrange(input_seq_len): encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i))) for i in xrange(output_seq_len): decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], Name = "decoder {0}". The format (I))) cell = tf. Contrib. The RNN. BasicLSTMCell (size) # here output status, we don't need outputs, _ = seq2seq.embedding_attention_seq2seq( encoder_inputs, decoder_inputs, cell, num_encoder_symbols=num_encoder_symbols, num_decoder_symbols=num_decoder_symbols, embedding_size=size, output_projection=None, feed_previous=False, dtype=tf.float32) return encoder_inputs, decoder_inputs, outputsCopy the code

Construct a runtime session and fill in the sample data:

with tf.Session() as sess:
    sample_encoder_inputs, sample_decoder_inputs = get_samples()
    encoder_inputs, decoder_inputs, outputs = get_model()
    input_feed = {}
    for l in xrange(input_seq_len):
        input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l]
    for l in xrange(output_seq_len):
        input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l]

    sess.run(tf.global_variables_initializer())
    outputs = sess.run(outputs, input_feed)
    print outputs
Copy the code

The output outputs are a list of five arrays (5 is the sequence length), each array consists of two lists with a size of 16 (2 means two samples, 16 means 16 output symbols). Inputs corresponding to seq2seq W, X, Y, Z, EOS, decoder_inputs[1:], sample [7,9,11] and [9,11,13]

Decoder_inputs structure:

[array([1, 1], dtype=int32), array([ 7, 29], dtype=int32), array([ 9, 31], dtype=int32), array([11, 33], dtype=int32), array([0, 0], dtype=int32)]
Copy the code

Loss function description: https://www.tensorflow.org/api_docs/python/tf/contrib/legacy_seq2seq/sequence_loss

sequence_loss(
    logits,
    targets,
    weights,
    average_across_timesteps=True,
    average_across_batch=True,
    softmax_loss_function=None,
    name=None
)
Copy the code

Loss function, the target word has the lowest mean negative logarithmic probability. Logits is a Tensor list composed of 2D shapes [Batch * NUM_decoder_symbols], batch is 2, num_decoder_symbols is 16, The number of numbers that make up the list Tensor is output_seq_len. Targets is the same length as logits (output_seq_len) list, each item of list is an integer that makes up 1D Tensor, each Tensor shape is [Batch], the data type is tf.int32, Inputs [1:] W, X, Y, Z Weights and targets have the same structure and the data type is tf.float32.

To calculate the weighted cross entropy loss, weights need to be initialized with placeholders:

target_weights = []
    target_weights.append(tf.placeholder(tf.float32, shape=[None], 
                          name="weight{0}".format(i)))
Copy the code

Calculate loss value:

targets = [decoder_inputs[i + 1] for i in xrange(len(decoder_inputs) - 1)]
loss = seq2seq.sequence_loss(outputs, targets, target_weights)
Copy the code

Targets have one less length than decoder_inputs, and the initial length of decoder_inputs must be increased by 1. To calculate the weighted cross entropy loss, the significant weight is significant and the meaningless weight is small. Targets has a value of 1 and has no value of 0:

# coding:utf-8 import numpy as np import tensorflow as tf from tensorflow.contrib.legacy_seq2seq.python.ops import Seq2seq # input sequence length input_seq_len = 5 # output sequence length output_seq_len = 5 # Null fill 0 PAD_ID = 0 # output sequence start tag GO_ID = 1 # LSTM neuron size Size = 8 # maximum input symbols num_encoder_symbols = 10 # maximum output symbols num_decoder_symbols = 16 def get_samples(): "" encoder_inputs: [array([0, 0], dtype=int32), array([0, 0], dtype=int32), array([1, 3], dtype=int32), array([3, 5], dtype=int32), array([5, 7], dtype=int32)] decoder_inputs: [array([1, 1], dtype=int32), array([7, 9], dtype=int32), array([ 9, 11], dtype=int32), array([11, 13], dtype=int32), array([0, 0], dtype=int32)] """ train_set = [[[1, 3, 5], [7, 9, 11]], [[3, 5, 7], [9, 11, 13]]] encoder_input_0 = [PAD_ID] * (input_seq_len - len(train_set[0][0])) + train_set[0][0] encoder_input_1 = [PAD_ID] *  (input_seq_len - len(train_set[1][0])) + train_set[1][0] decoder_input_0 = [GO_ID] + train_set[0][1] + [PAD_ID] * (output_seq_len - len(train_set[0][1]) - 1) decoder_input_1 = [GO_ID] + train_set[1][1] + [PAD_ID] * (output_seq_len - len(train_set[1][1]) - 1) encoder_inputs = [] decoder_inputs = [] target_weights = [] for length_idx in xrange(input_seq_len): encoder_inputs.append(np.array([encoder_input_0[length_idx], encoder_input_1[length_idx]], dtype=np.int32)) for length_idx in xrange(output_seq_len): decoder_inputs.append(np.array([decoder_input_0[length_idx], decoder_input_1[length_idx]], Dtype =np.int32)) target_weights. Append (np.array([0.0 if length_IDX == output_seq_len-1 or decoder_input_0[length_IDx]) Length_idx == length_idx [length_IDx] == LENGTH_idx [length_idx] == LENGTH_idx [length_idx] == LENGTH_idx [length_idx] == LENGTH_idx [length_idx] == LENGTH_idx [length_idx] == LENGTH_IDx [length_idx] dtype=np.float32)) return encoder_inputs, decoder_inputs, target_weights def get_model(): Encoder_inputs = [] decoder_inputs = [] target_weights = [] for I in xrange(input_seq_len): encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i))) for i in xrange(output_seq_len + 1): decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i))) for i in xrange(output_seq_len): target_weights.append(tf.placeholder(tf.float32, shape=[None], Name ="weight{0}". Format (I))) # decoder_inputs left 1 sequence as targets targets = [inputs[I + 1] for I in Xrange (output_seq_len)] cell = tf. Contrib. The RNN. BasicLSTMCell (size) # here output state we don't need outputs, _ = seq2seq.embedding_attention_seq2seq( encoder_inputs, decoder_inputs[:output_seq_len], cell, num_encoder_symbols=num_encoder_symbols, num_decoder_symbols=num_decoder_symbols, embedding_size=size, Output_projection =None, feed_previous=False, dtype=tf.float32) # Calculate weighted cross entropy loss = seq2seq.sequence_loss(outputs, targets, target_weights) return encoder_inputs, decoder_inputs, target_weights, outputs, loss def main(): with tf.Session() as sess: sample_encoder_inputs, sample_decoder_inputs, sample_target_weights = get_samples() encoder_inputs, decoder_inputs, target_weights, outputs, loss = get_model() input_feed = {} for l in xrange(input_seq_len): input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l] for l in xrange(output_seq_len): input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l] input_feed[target_weights[l].name] = sample_target_weights[l] input_feed[decoder_inputs[output_seq_len].name] = np.zeros([2], dtype=np.int32) sess.run(tf.global_variables_initializer()) loss = sess.run(loss, input_feed) print loss if __name__ == "__main__": main()Copy the code

After several rounds of calculation, loss becomes small, and gradient descent is used to update parameters. Tensorflow with gradient descent classes: https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer.

Class GradientDescentOptimizer constructor

__init__(
    learning_rate,
    use_locking=False,
    name='GradientDescent'
)
Copy the code

The key is the first parameter learning rate. Gradient calculation method:

compute_gradients(
    loss,
    var_list=None,
    gate_gradients=GATE_OP
    aggregation_method=None,
    colocate_gradients_with_ops=False,
    grad_loss=None
)
Copy the code

The key parameter Loss is the input error value, and the return value is (gradient, variable), which constitutes the list. Update parameter method:

apply_gradients(
    grads_and_vars,
    global_step=None,
    name=None
)
Copy the code

Grads_and_vars is the return value of compute_gradients. Gradient parameter updating method based on Loss calculation:

Learning_rate = 0.1 opt = tf. Train. GradientDescentOptimizer update = (learning_rate) opt.apply_gradients(opt.compute_gradients(loss))Copy the code

The main function adds loop iteration:

def main():
    with tf.Session() as sess:
        sample_encoder_inputs, sample_decoder_inputs, sample_target_weights 
                          = get_samples()
        encoder_inputs, decoder_inputs, target_weights, outputs, loss, update 
                          = get_model()

        input_feed = {}
        for l in xrange(input_seq_len):
            input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l]
        for l in xrange(output_seq_len):
            input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l]
            input_feed[target_weights[l].name] = sample_target_weights[l]
        input_feed[decoder_inputs[output_seq_len].name] = np.zeros([2], dtype=np.int32)

        sess.run(tf.global_variables_initializer())
        while True:
            [loss_ret, _] = sess.run([loss, update], input_feed)
            print loss_ret
Copy the code

Prediction logic, only input sample enCOder_input, automatic prediction decoder_input. Training model saved, loaded when restarting prediction:

def get_model(): ... saver = tf.train.Saver(tf.global_variables()) return ... , saverCopy the code

Execute after training

saver.save(sess, './model/demo')
Copy the code

The model is stored in the./model directory with a file starting with demo.

saver.restore(sess, './model/demo')
Copy the code

Embedding_attention_seq2seq feed_previous parameter, If True, the decoder fills each input with the previous output.

Get_model is a different feed_previous configuration to distinguish training from prediction. The main function for prediction is also different. Separate the two functions for train and predict.

# coding:utf-8 import sys import numpy as np import tensorflow as tf from tensorflow.contrib.legacy_seq2seq.python.ops Import seq2seq # input sequence length input_seq_len = 5 # output sequence length output_seq_len = 5 # Empty fill 0 PAD_ID = 0 # Output sequence start tag GO_ID = 1 # end tag EOS_ID = 2 # LSTM neuron size size = 8 # maximum number of input symbols num_encoder_symbols = 10 # Maximum number of output symbols num_decoder_symbols = 16 # learning rate Learning_rate = 0.1 def get_samples(): "" return: encoder_inputs: [array([0, 0], dtype=int32), array([0, 0], dtype=int32), array([5, 5], dtype=int32), array([7, 7], dtype=int32), array([9, 9], dtype=int32)] decoder_inputs: [array([1, 1], dtype=int32), array([11, 11], dtype=int32), array([13, 13], dtype=int32), array([15, 15], dtype=int32), array([2, 2], dtype=int32)] """ train_set = [[[5, 7, 9], [11, 13, 15, EOS_ID]], [[7, 9, 11], [13, 15, 17, EOS_ID]]] raw_encoder_input = [] raw_decoder_input = [] for sample in train_set: raw_encoder_input.append([PAD_ID] * (input_seq_len - len(sample[0])) + sample[0]) raw_decoder_input.append([GO_ID] + sample[1] + [PAD_ID] * (output_seq_len - len(sample[1]) - 1)) encoder_inputs = [] decoder_inputs = [] target_weights = [] for length_idx in xrange(input_seq_len): encoder_inputs.append(np.array([encoder_input[length_idx] for encoder_input in raw_encoder_input], dtype=np.int32)) for length_idx in xrange(output_seq_len): decoder_inputs.append(np.array([decoder_input[length_idx] for decoder_input in raw_decoder_input], Dtype =np.int32)) target_weights.appEnd (np.array([0.0 if length_IDX == output_seq_len-1 or decoder_INPUT [length_IDx]) == PAD_ID else 1.0 for decoder_input in raw_decoder_input], dType = Np.float32)) return encoder_inputs, decoder_inputs, target_weights def get_model(feed_previous=False): Encoder_inputs = [] decoder_inputs = [] target_weights = [] for I in xrange(input_seq_len): encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i))) for i in xrange(output_seq_len + 1): decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i))) for i in xrange(output_seq_len): target_weights.append(tf.placeholder(tf.float32, shape=[None], Name ="weight{0}". Format (I))) # decoder_inputs left 1 sequence as targets targets = [inputs[I + 1] for I in Xrange (output_seq_len)] cell = tf. Contrib. The RNN. BasicLSTMCell (size) # here output state we don't need outputs, _ = seq2seq.embedding_attention_seq2seq( encoder_inputs, decoder_inputs[:output_seq_len], cell, num_encoder_symbols=num_encoder_symbols, num_decoder_symbols=num_decoder_symbols, embedding_size=size, output_projection=None, feed_previous=feed_previous, Dtype =tf.float32) # Calculate weighted cross entropy loss = seq2SEq. sequence_loss(outputs, targets, Target_weights) # gradient descent optimizer opt = tf. Train. GradientDescentOptimizer (learning_rate) # optimization goal: Update = opt.apply_gradients(opt.compute_gradients(loss)) # Model persistent saver = tf.train.saver (tf.global_variables())  return encoder_inputs, decoder_inputs, target_weights, outputs, loss, update, saver, targets def train(): With tf.session () as sess: sample_encoder_inputs, sample_decoder_inputs, sample_target_weights = get_samples() encoder_inputs, decoder_inputs, target_weights, outputs, loss, update, saver, targets = get_model() input_feed = {} for l in xrange(input_seq_len): input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l] for l in xrange(output_seq_len): input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l] input_feed[target_weights[l].name] = sample_target_weights[l] input_feed[decoder_inputs[output_seq_len].name] = np.zeros([2], Dtype =np.int32) # Initialize sess.run(tf.global_variables_initializer()) # Train 200 iterations, print every 10 times loss for step in xrange(200): [loss_ret, _] = sess.run([loss, update], input_feed) if step % 10 == 0: Print 'step=', step, 'loss=', loss_ret # saver.save(sess, './model/demo') def predict(): With tf.session () as sess: sample_encoder_inputs, sample_decoder_inputs, sample_target_weights = get_samples() encoder_inputs, decoder_inputs, Target_weights, outputs, loss, update, saver, the targets = get_model # (feed_previous = True) from the file recovery model saver. Restore (sess. './model/demo') input_feed = {} for l in xrange(input_seq_len): input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l] for l in xrange(output_seq_len): input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l] input_feed[target_weights[l].name] = sample_target_weights[l] input_feed[decoder_inputs[output_seq_len].name] = np.zeros([2], Outputs =np.int32) # outputs = sess.run(outputs, input_feed) # outputs for sample_index in xrange(2): Since the output data is each of the # values of the num_decoder_symbols dimension, find the one with the largest value. Outputs outputs lsi_outputs outputs_seq = [int(np.argmax(logit[sample_index], axis=0)) for logit in outputs] If EOS_ID in outputs_seq: if EOS_ID in outputs_seq: outputs_seq = outputs_seq[:outputs_seq.index(EOS_ID)] outputs_seq = [str(v) for v in outputs_seq] print " ".join(outputs_seq) if __name__ == "__main__": if sys.argv[1] == 'train': train() else: predict()Copy the code

Py, run./demo.py train, and run./demo.py predict.

Inputs encoder_inputs and decoder_inputs are used for prediction. Continue to improve PREDICT by manually entering a string of numbers (encoder only).

Implementation from the input space delimited numeric ID string, into the prediction with encoder, decoder, target_weight function.

def seq_to_encoder(input_seq): """ Numeric ID string separated from input space, "" input_seq_array = [int(v) for v in input_seq.split()] encoder_input = [PAD_ID] * (input_seq_len - len(input_seq_array)) + input_seq_array decoder_input = [GO_ID] + [PAD_ID] * (output_seq_len - 1) encoder_inputs = [np.array([v], dtype=np.int32) for v in encoder_input] decoder_inputs = [np.array([v], Dtype =np.int32) for v in decoder_input] target_weights = [np.array([1.0], dtype=np.float32)] * output_seq_len return encoder_inputs, decoder_inputs, target_weightsCopy the code

Then we rewrite the predict function as follows:

Def predict(): """ "with tf.session () as sess: encoder_inputs, decoder_inputs, target_weights, outputs, loss, update, saver = get_model(feed_previous=True) saver.restore(sess, './model/demo') sys.stdout.write("> ") sys.stdout.flush() input_seq = sys.stdin.readline() while input_seq: input_seq = input_seq.strip() sample_encoder_inputs, sample_decoder_inputs, sample_target_weights = seq_to_encoder(input_seq) input_feed = {} for l in xrange(input_seq_len): input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l] for l in xrange(output_seq_len): input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l] input_feed[target_weights[l].name] = sample_target_weights[l] input_feed[decoder_inputs[output_seq_len].name] = np.zeros([2], Dtype =np.int32) # outputs_seq = sess.run(outputs, Input_feed) # Because the output data is each of the # values of the num_decoder_symbols dimension, Outputs_seq = [int(np.argmax(logit[0], axis=0)) for logit in outputs_seq] # If EOS_ID in outputs_seq: if EOS_ID in outputs_seq: outputs_seq = outputs_seq[:outputs_seq.index(EOS_ID)] outputs_seq = [str(v) for v in outputs_seq] print " ".join(outputs_seq) sys.stdout.write("> ") sys.stdout.flush() input_seq = sys.stdin.readline()Copy the code

Run./demo.py predict.

Set num_encoder_symbols = 10,11 cannot be expressed, modify the parameters and add samples:

Num_encoder_symbols = 32 num_decoder_symbols = 32...... train_set = [ [[5, 7, 9], [11, 13, 15, EOS_ID]], [[7, 9, 11], [13, 15, 17, EOS_ID]], [[15, 17, 19], [21, 23, 25, EOS_ID]]]...Copy the code

The number of iterations was expanded to 10,000.

Input the sample, the prediction effect is very good, change to other input, or in the sample output to find the most similar results for prediction results, without thinking, no intelligence, the model is more suitable for classification, not suitable for inference.

Translate Chinese words into ID numbers during training and id predictions into Chinese.

Word_token. py file and create a WordToken class. The load function is responsible for loading samples and generating word2id_dict and id2word_dict dictionaries. Word2id is responsible for converting words into ids and id2word is responsible for converting ids into words:

# coding:utf-8 import sys import jieba class WordToken(object): def __init__(self): Self.start_id = 4 self.word2id_dict = {} self.id2word_dict = {} def load_file_list(self, file_list): "" load the sample file list, count the word frequency after all the words are cut, Word2id_dict = {} for file in file_list words_count = {} for file in file_list: with open(file, 'r') as file_object: for line in file_object.readlines(): line = line.strip() seg_list = jieba.cut(line) for str in seg_list: if str in words_count: words_count[str] = words_count[str] + 1 else: words_count[str] = 1 sorted_list = [[v[1], v[0]] for v in words_count.items()] sorted_list.sort(reverse=True) for index, item in enumerate(sorted_list): word = item[1] self.word2id_dict[word] = self.START_ID + index self.id2word_dict[self.START_ID + index] = word def word2id(self, word): if not isinstance(word, unicode): print "Exception: error word not unicode" sys.exit(1) if word in self.word2id_dict: return self.word2id_dict[word] else: return None def id2word(self, id): id = int(id) if id in self.id2word_dict: return self.id2word_dict[id] else: return NoneCopy the code

Demo. py modify get_train_set:

def get_train_set():
    global num_encoder_symbols, num_decoder_symbols
    train_set = []
    with open('./samples/question', 'r') as question_file:
        with open('./samples/answer', 'r') as answer_file:
            while True:
                question = question_file.readline()
                answer = answer_file.readline()
                if question and answer:
                    question = question.strip()
                    answer = answer.strip()

                    question_id_list = get_id_list_from(question)
                    answer_id_list = get_id_list_from(answer)
                    answer_id_list.append(EOS_ID)
                    train_set.append([question_id_list, answer_id_list])
                else:
                    break
    return train_set
Copy the code

Get_id_list_from implementation:

def get_id_list_from(sentence):
    sentence_id_list = []
    seg_list = jieba.cut(sentence)
    for str in seg_list:
        id = wordToken.word2id(str)
        if id:
            sentence_id_list.append(wordToken.word2id(str))
    return sentence_id_list
Copy the code

WordToken:

Import word_token import jieba wordToken = word_token.wordtoken () # To dynamically figure out num_encoder_symbols and num_decoder_symbols max_token_id = wordtoken.load_file_list (['./samples/question', './samples/answer']) num_encoder_symbols = max_token_id + 5 num_decoder_symbols = max_token_id + 5Copy the code

Training code:

Print loss every 10 times, CTRL + C stop for step in xrange(100000): [loss_ret, _] = sess.run([loss, update], input_feed) if step % 10 == 0: Print 'step=', step, 'loss=', loss_ret # Saver.save (sess, './model/demo')Copy the code

Prediction code modification:

Def predict(): """ "with tf.session () as sess: encoder_inputs, decoder_inputs, target_weights, outputs, loss, update, saver = get_model(feed_previous=True) saver.restore(sess, './model/demo') sys.stdout.write("> ") sys.stdout.flush() input_seq = sys.stdin.readline() while input_seq: input_seq = input_seq.strip() input_id_list = get_id_list_from(input_seq) if (len(input_id_list)): sample_encoder_inputs, sample_decoder_inputs, sample_target_weights = seq_to_encoder(' '.join([str(v) for v in input_id_list])) input_feed = {} for l in xrange(input_seq_len): input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l] for l in xrange(output_seq_len): input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l] input_feed[target_weights[l].name] = sample_target_weights[l] input_feed[decoder_inputs[output_seq_len].name] = np.zeros([2], Dtype =np.int32) # outputs_seq = sess.run(outputs, Input_feed) # Because the output data is each of the # values of the num_decoder_symbols dimension, Outputs_seq = [int(np.argmax(logit[0], axis=0)) for logit in outputs_seq] # If EOS_ID in outputs_seq: if EOS_ID in outputs_seq: outputs_seq = outputs_seq[:outputs_seq.index(EOS_ID)] outputs_seq = [wordToken.id2word(v) for v in outputs_seq] print " ".join(outputs_seq) else: Print "WARN: words out of service "sys.stdout.write("> ") sys.stdout.flush() input_seq = sys.stdin.readline()Copy the code

Training with 1000 conversation samples stored in [‘./samples/question’, ‘./samples/answer’] to make the loss output converge to a certain extent (such as 1.0) below:

python demo.py train
Copy the code

When below 1.0, manually CTRL + C stops and store the model every 10 steps.

Model convergence is very slow, set the learning rate to 0.1. First of all, the learning rate should be larger, and then try to reduce the learning rate when the next loss rebounds (but increases) compared with the previous step. Instead of using learning_rate directly, initialize a learning rate:

init_learning_rate = 1
Copy the code

Create a variable in get_model and initialize it with init_learning_rate:

learning_rate = tf.Variable(float(init_learning_rate), trainable=False, dtype=tf.float32)
Copy the code

Create another operation to discount the learning rate by 10% when appropriate:

Learning_rate_decay_op = learning_rate.assign(learning_rate * 0.9)Copy the code

Training code adjustment:

CTRL + C to stop previous_losses = [] for step in xrange(100000) [loss_ret, _] = sess.run([loss, update], input_feed) if step % 10 == 0: print 'step=', step, 'loss=', loss_ret, 'learning_rate=', learning_rate.eval() if loss_ret > max(previous_losses[-5:]): Sess. run(learning_rate_decay_op) Previous_losses. Append (loss_ret) # Model persisted saver.save(sess, './model/demo')Copy the code

Training can converge quickly.

References: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ http://suriyadeepan.github.io/2016-06-28-easy-seq2seq/ http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/ https://arxiv.org/abs/1406.1078 https://arxiv.org/abs/1409.3215 https://arxiv.org/abs/1409.0473

Full sample loading, training with a large number of samples, memory can not support, always Out of memory. The method is to change the full load sample to batch load, no matter how large the sample size is, the memory will not increase indefinitely.

https://github.com/warmheartli/ChatBotCourse/tree/master/chatbotv5

With the increase of sample size, the memory grows and the sample size reaches ten thousand level, and the memory occupation reaches 10G. Each iteration, the full sample is loaded into the memory and the model is updated after one-time training. The thesaurus is based on sample generation without any restriction, resulting in the large sample, large thesaurus and large model, which occupy more memory.

Optimize the scheme. Change the full load sample to batch load and change the train() function.

Control + C to stop previous_losses = [] for step in xrange(20000): sample_encoder_inputs, sample_decoder_inputs, sample_target_weights = get_samples(train_set, 1000) input_feed = {} for l in xrange(input_seq_len): input_feed[encoder_inputs[l].name] = sample_encoder_inputs[l] for l in xrange(output_seq_len): input_feed[decoder_inputs[l].name] = sample_decoder_inputs[l] input_feed[target_weights[l].name] = sample_target_weights[l] input_feed[decoder_inputs[output_seq_len].name] = np.zeros([len(sample_decoder_inputs[0])], dtype=np.int32) [loss_ret, _] = sess.run([loss, update], input_feed) if step % 10 == 0: print 'step=', step, 'loss=', loss_ret, 'learning_rate=', learning_rate.eval() if len(previous_losses) > 5 and loss_ret > max(previous_losses[-5:]): Sess. run(learning_rate_decay_op) Previous_losses. Append (loss_ret) # Model persisted saver.save(sess, './model/demo')Copy the code

Get_samples (train_set, 1000) samples are obtained in batches, 1000 is the sample size obtained each time:

    if batch_num >= len(train_set):
        batch_train_set = train_set
    else:
        random_start = random.randint(0, len(train_set)-batch_num)
        batch_train_set = train_set[random_start:random_start+batch_num]
    for sample in batch_train_set:
        raw_encoder_input.append([PAD_ID] * (input_seq_len - len(sample[0])) + sample[0])
        raw_decoder_input.append([GO_ID] + sample[1] + [PAD_ID] * (output_seq_len - len(sample[1]) - 1))
Copy the code

1000 consecutive samples are drawn at random locations in the full sample at a time.

Do the minimum frequency limit when loading the sample word list:

    def load_file_list(self, file_list, min_freq):
    ......
        for index, item in enumerate(sorted_list):
            word = item[1]
            if item[0] < min_freq:
                break
            self.word2id_dict[word] = self.START_ID + index
            self.id2word_dict[self.START_ID + index] = word
        return index
Copy the code

https://github.com/warmheartli/ChatBotCourse/tree/master/chatbotv5

References: “Python natural language processing”, “me basic tutorial With me and Python library build machine learning application “http://www.shareditor.com/blogshow?blogId=136 http://www.shareditor.com/blogshow?blogId=137

Welcome to recommend machine learning opportunities in Shanghai. My wechat account is Qingxingfengzi

CB014:TensorFlow seq2SEQ Model step by step

Related Posts

TPAGT interpretation

NoCode: Implementing a task tracking and management System

[Fixed] Elasticsearch and Kibana open Source license changes Facebook is using AI to enhance the ability to describe photos for visually impaired people