· Using bidirectional BiRNN(LSTM) to do handwritten number recognition accuracy of 99%+

In this blog post, we use BiRNN(LSTM) structure to further improve the accuracy of the model on the basis of completing the actual combat 5 · Using RNN(LSTM) to do handwritten number recognition. The accuracy of 1000Steps reaches 99%.

First, let’s get familiar with BiRNN

Tf.nn. Static_bidirectional_rnn function prototype


tf.nn.static_bidirectional_rnn(
    cell_fw,
    cell_bw,
    inputs,
    initial_state_fw=None,
    initial_state_bw=None,
    dtype=None,
    sequence_length=None,
    scope=None
)
Copy the code

Description of input values:

Cell_fw: RNNCell for forward propagation.

Cell_bw: RNNCell for back propagation.

inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size], At the same time, you need to put your values into a Tensor. Then you need to know that your values are [batch_size, input_size]. Inputs: List (1000tensor(batch_size100) for 1000 words per document

Initial_state_fw: (optional) Indicates the initial status of the forward RNN. This must be a tensor of appropriate type and shape [batch_size, cell_fw.state_size]. If cell_fw.state_size is a tuple, This should be a tuple of tensors having none [batch_size, S] for S in cell_fw.state_size.

initial_state_bw: (optional) Same as for initial_state_fw, but using the corresponding properties of cell_bw.

dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.

sequence_length: (optional) An int32/int64 vector, size [batch_size], containing the actual lengths for each of the sequences.

scope: VariableScope for the created subgraph; Defaults to “bidirectional_rnn”

The output values are as follows:

A tuple (outputs, output_state_fw, output_state_bw) where: outputs is a length T list of outputs (one for each input), which are depth-concatenated forward and backward outputs. output_state_fw is the final state of the forward rnn. output_state_bw is the final state of the backward rnn.

Code section

· Use RNN(LSTM) to do handwritten number recognition, the modified place added remarks information, recommended comparison of actual combat five.

import os
os.environ["KMP_DUPLICATE_LIB_OK"] ="TRUE"
import time
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2 

from  tensorflow.examples.tutorials.mnist import  input_data

mnist=input_data.read_data_sets("./2RNN/data",one_hot=True)

train_rate=0.002  
train_step=1001
batch_size=1000
display_step=10

frame_size=28
sequence_length=28
hidden_num=128
n_classes=10

"" where: train_rate is the learning rate, which is a hyperparameter, currently set by experience, and of course also adaptive. Batch_size: the number of samples per batch. RNN can also be trained by using stochastic gradient descent to feed data in batches instead of feeding the whole data set each time. Sequence_size: The length of each sample sequence. Since we want to input a 28x28 image as a sequence to the RNN for training, we need to serialize the image. One of the easiest ways to do this is to assume that there is some relationship between the rows and take each row of the picture as a dimension of the sequence. So sequence_size is set to 28. What is reflected in Figure 1 is the number of xi from left to right after the left loop is expanded. RNN cell number frame_size: Size of each component in a sequence. Because each component is a row of pixels, and a row of pixels has 28 pixels. So frame_size is 28. As reflected in Figure 1, each xi in the bottom-most variable input is a vector or matrix of length frame_size. Input cell number hidden_num: indicates the number of hidden layers. The empirical setting is 5, as shown in Figure 1, there are hidden_num hidden layer units from bottom to top. N_classes: set the number of classes to 10

x=tf.placeholder(dtype=tf.float32,shape=[None,sequence_length*frame_size],name="inputx")

y=tf.placeholder(dtype=tf.float32,shape=[None,n_classes],name="expected_y")

weights=tf.Variable(tf.random_normal(shape=[2*hidden_num,n_classes]))Hidden_num (-1, 2*hidden_num)
bias=tf.Variable(tf.fill([n_classes],0.1))
Weights is the last layer of the network. Its shape is hidden_numXn_class. Bias The bias of the last layer.

# Define RNN network
def RNN(x,weights,bias) :
    x = tf.reshape(x,shape=[-1,sequence_length,frame_size])
    You will need to have a Tensor at play. The elements in the list will be Tensor. Your shape for each Tensor will be [batch_size, input_size]
    Change the first dimension of data x with the second dimension
    x = tf.transpose(x,[1.0.2])
    # Transform to (-1, frame_size) shape
    x = tf.reshape(x,shape=[-1,frame_size])
    Then you have a Tensor, and your shape for each Tensor is [batch_size, input_size]
    x = tf.split(x,sequence_length)

    lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_num) # Forward RNN, the number of output neurons is 128
 
    lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_num) # Reverse RNN, the number of output neurons is 128
 
    output, fw_state, bw_state = tf.nn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x, dtype=tf.float32)
    print(len(output))
    # generate hidden_num hidden layer RNN network
    For each xi of a sequence_length sequence [x1,x2,x3...], an RNN is run in the depth direction, each of which is processed by hidden_num hidden layer units.
    h = tf.matmul(output[int(sequence_length/2)],weights)+bias#output = sequence_length #output = sequence_length
    Output will be [batch_size,sequence_length, RNn_cell.output_size
    return (h)
    Output_size = [batch_size,rnn_cell.output_size] = [batch_size,hidden_num] so it can be multiplied by weights. This is why the weights shapes are initialized to [hidden_num,n_classes] in 2.5. And then normalized by Softmax.

predy=RNN(x,weights,bias)

cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=predy,labels=y))

opt=tf.train.AdamOptimizer(train_rate).minimize(cost)

correct_pred=tf.equal(tf.argmax(predy,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.to_float(correct_pred))

testx,testy=mnist.test.next_batch(batch_size)

saver=tf.train.Saver()

with tf.Session() as sess:
    srun = sess.run
    init =  tf.global_variables_initializer()
    srun(init)
    for t in range(train_step):
        batch_x,batch_y=mnist.train.next_batch(batch_size)
        _cost_val,_ = srun([cost,opt],{x:batch_x,y:batch_y})
        if(t%display_step==0):
            accuracy_val, cost_val = srun([accuracy,cost],{x:testx,y:testy})
            print(t,cost_val,accuracy_val)

    saver.save(sess,'./2RNN/ckpt1/mnist1.ckpt',global_step=train_step)
Copy the code

The results

0 2.531533 0.168 10 0.8894601 0.699 20 0.6328424 0.796 30 0.46291852 0.856... 970 0.022114469 0.992 980 0.03192995 0.99 990 0.021659942 0.988 1000 0.023274422 0.992Copy the code

Results analysis

Through this actual practice, we improved the RNN structure into BiRNN structure, and the accuracy rate was further improved successfully. The results show that BiRNN is better than ordinary RNN under appropriate conditions.

· Using bidirectional BiRNN(LSTM) to do handwritten number recognition accuracy of 99%+

· Using bidirectional BiRNN(LSTM) to do handwritten number recognition accuracy of 99%+

Related Posts

Alibaba’s Big Data Practice | Real-time Technology Chapter Streaming Technology Architecture (II)

Wang Haifeng: Cross-language communication is becoming a reality

【 kinematics 】 Based on matlab flat throw ball ground jump rule