Copyright notice: This article was originally published by Noogel’s Notes. Please apply to my mailbox for reprinting.

This section applies the concept of deep neural network learned in the last section to realize MNIST handwriting recognition through TF. First import tf library and training data:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)Copy the code

Define global initial constants, where INPUT_NODE is the number of pixels 28 * 28 per image and OUTPUT_NODE is the number of categories 10. LAYER1_NODE indicates the number of nodes at the hidden layer, and BATCH_SIZE indicates the number of training data at each time. LEARNING_RATE_BASE is the basic learning rate, LEARNING_RATE_DECAY is the decay rate of learning rate, REGULARIZATION_RATE is the coefficient of regularization loss function, TRAINING_STEPS is the training times, MOVING_AVERAGE_DECAY is the sliding average decay rate.

INPUT_NODE = 784
OUTPUT_NODE = 10
LAYER1_NODE = 500
BATCH_SIZE = 100
LEARNING_RATE_BASE = 0.8
LEARNING_RATE_DECAY = 0.99
REGULARIZATION_RATE = 0.0001
TRAINING_STEPS = 30000
MOVING_AVERAGE_DECAY = 0.99Copy the code

A inference function was defined to calculate the forward propagation results of the neural network, and the linearization was realized by the RELU function. The AVG_class parameter is used to support the use of the moving average model when testing. When we use the moving average model, weights and biases are extracted from the AVg_class.

def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):
    if avg_class is None:
        layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)
        return tf.matmul(layer1, weights2) + biases2
    else:
        layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))
        return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)Copy the code

Define input layer, generate hidden layer and output layer parameters

x = tf.placeholder(tf.float32, [None, INPUT_NODE]) y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE]) weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], Stddev biases1 = tf = 0.1)). The Variable (tf) constant (0.1, shape=[LAYER1_NODE])) weights2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, OUTPUT_NODE], Stddev =0.1)) biases2 = tf.variable (tf.constant(0.1, shape=[OUTPUT_NODE]))Copy the code

Calculate the forward propagation effect of the neural network under the current parameters.

y = inference(x, None, weights1, biases1, weights2, biases2)Copy the code

Here, the class is initialized through the sliding average decay rate and training times to speed up the updating speed of the variables in the early training. Global_step indicates the number of dynamic storage training.

global_step = tf.Variable(0, trainable=False)
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)Copy the code

Variables_averages_op here uses the moving average for all the upper parameters of the neural network, not for parameters trainable=False. The forward propagation results are calculated using the moving average model.

variables_averages_op = variable_averages.apply(tf.trainable_variables())
average_y = inference(x, variable_averages, weights1, biases1, weights2, biases2)Copy the code

Count the losses. Cross entropy is used to describe the loss function of the gap between the predicted value and the true value, and softmax regression is used to change the results into probability distribution. Tf provides a function to combine the two functions, with the first argument being the result propagated forward and the second argument being the answer to the training data. Then the average cross entropy of all samples is calculated.

cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cross_entropy_mean = tf.reduce_mean(cross_entropy)Copy the code

Here, L2 regularization loss function is used to calculate the regularization loss of the model, and bias is not calculated for weight. The regularized loss function is used to avoid overfitting.

regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
regularization = regularizer(weights1) + regularizer(weights2)Copy the code

The final total loss is equal to the sum of cross entropy loss and regularization loss.

loss = cross_entropy_mean + regularizationCopy the code

Set the learning rate to decay exponentially.

learnging_rate = tf.train.exponential_decay(
    LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE, LEARNING_RATE_DECAY)Copy the code

The total loss was optimized using an optimization algorithm.

train_step = tf.train.GradientDescentOptimizer(learnging_rate).minimize(loss, global_step=global_step)Copy the code

The data needs to be updated every time.

with tf.control_dependencies([train_step, variables_averages_op]):
    train_op = tf.no_op()Copy the code

Verify whether the forward propagation results using the moving average model are correct.

correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))Copy the code

Average accuracy was calculated.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))Copy the code

Finally, we started our training and verified the accuracy of the data.

Tf.global_variables_initializer ().run() # Prepare to validate data validate_feed = {x: Mnist.validation. images, y_: mnist.validation.labels} # prepare test data test_feed = {x: mnist.validation.images, y_: mnist.validation.labels} # prepare test data test_feed = {x: mnist.validation.images, y_: mnist.validation.labels} Mnist.test.labels} # iterate for I in range(TRAINING_STEPS): if I % 1000 == 0: mnist.test.labels} # iterate for I in range(TRAINING_STEPS): if I % 1000 == 0: Validate_acc = sess.run(accuracy, feed_dict=validate_feed) print "validate_acc = sess.run(accuracy, feed_dict=validate_feed) print" Next_batch (BATCH_SIZE) # train sess.run(train_op, train_size) # train sess.run(train_op, Feed_dict ={x: xs, y_: ys}) # Calculate the final accuracy. Test_acc = sess. Run (accuracy, feed_dict=test_feed) print "TRAINING_STEPS "," test_ACC * 100, "%"Copy the code

Begin the training process by initializing all variables.

tf.global_variables_initializer().run()Copy the code

MNIST data is divided into training data, validation data and test data. We first prepare the validation data and test data, because the amount of data is not large, we can directly use all the data for training. Then we started iterative training. There were a lot of training data, and we only took part of the data for training each time, so as to reduce the amount of calculation and speed up the training of neural network without having too much influence on the results.

The training of TF is via sess.run function. The first parameter is the final parameter to be calculated, which is the output of the formula, and the second parameter is the input of placeholder.

sess.run(train_op, feed_dict={x: xs, y_: ys})Copy the code

With each training, the total loss becomes smaller and smaller, and the model’s prediction becomes more and more accurate, reaching a critical point.

The complete code is as follows:

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = Input_data. read_data_sets("MNIST_data", one_hot=True) # MNIST data set correlation constant, where the input node number is 28 * 28 pixels per image, the output node number is the number of classification 10; LAYER1_NODE indicates the number of nodes at the hidden layer, and # BATCH_SIZE indicates the number of training data at each time. LEARNING_RATE_BASE is the basic learning rate, LEARNING_RATE_DECAY is the decay rate of learning rate, # REGULARIZATION_RATE is the coefficient of regularization loss function, TRAINING_STEPS is the training times, MOVING_AVERAGE_DECAY: INPUT_NODE = 784 OUTPUT_NODE = 10 LAYER1_NODE = 500 BATCH_SIZE = 100 LEARNING_RATE_BASE = 0.8 LEARNING_RATE_DECAY = 0.99 REGULARIZATION_RATE = 0.0001 TRAINING_STEPS = 30000 MOVING_AVERAGE_DECAY = 0.99 # This function is used to calculate the forward propagation result of the neural network and is de-linearized by RELU function. The AVG_class parameter is used to support testing when using the moving average model. def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2): if avg_class is None: layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1) return tf.matmul(layer1, weights2) + biases2 else: layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1)) return tf.matmul(layer1, Weight. Average (biases2) # input layer x = tf.placeholder(tf.placeholder, [None, INPUT_NODE]) y_ = tf.placeholder(tf.float32, [None, Truncated_normal ([INPUT_NODE, LAYER1_NODE], Stddev biases1 = tf = 0.1)). The Variable (tf) constant (0.1, Weights2 = tf.truncated_normal([LAYER1_NODE, OUTPUT_NODE], Stddev =0.1)) biases2 = tf.variable (tf.constant(0.1, shape=[OUTPUT_NODE])) # Y = inference(x, None, weights1, biases1, weights2, biases2) # Global_step = tf.Variable(0, trainable=False) Variable_averages = tf. Train. ExponentialMovingAverage (MOVING_AVERAGE_DECAY global_step) # will be here all the parameters of the neural network using the moving average, This does not apply to arguments trainable=False. Variables_averages_op = variable_averages.apply(tF.trainable_variables ()) # Computes the forward propagation results processed using the moving average model. Average_y = inference(X, variable_Averages, weightS1, BiASES1, weights2, BiASES2) # Calculate cross entropy used to describe the loss function of the gap between the predicted value and the true value The first parameter is the result of the forward propagation and the second is the answer to the training data. Cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(Logits = Y, labels=tf.argmax(y_, 1)) # Calculate the average cross entropy for all samples. Cross_entropy_mean = tf.reduce_mean(cross_entropy) # Calculate L2 regularizing loss function regularizer = Contrib. layers.l2_regularizer(REGULARIZATION_RATE) # Calculate regularization loss of model, calculate weight, not bias. Regularloss = regularizer(weights1) + regularizer(weights2) # The total loss is equal to the sum of the cross entropy loss and regularization loss. Loss = Cross_entropy_mean + regularization # Sets the learning rate of exponential decay. learnging_rate = tf.train.exponential_decay( LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE, LEARNING_RATE_DECAY) # Optimize the total loss using an optimization algorithm. Train_step = tf. Train. GradientDescentOptimizer (learnging_rate). Minimize (loss, global_step = global_step) # every time data need to update the parameter.  With tf.control_dependencies([train_step, variables_averages_op]): train_op = tf.no_op() # Correct_prediction = TF.equal (tF.argmax (average_y, 1), tF.argmax (y_, 1)) # accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) if __name__ == "__main__": Tf.global_variables_initializer ().run() # Prepare to validate data validate_feed = {x: Mnist.validation. images, y_: mnist.validation.labels} # prepare test data test_feed = {x: mnist.validation.images, y_: mnist.validation.labels} # prepare test data test_feed = {x: mnist.validation.images, y_: mnist.validation.labels} Mnist.test.labels} # iterate for I in range(TRAINING_STEPS): if I % 1000 == 0: mnist.test.labels} # iterate for I in range(TRAINING_STEPS): if I % 1000 == 0: Validate_acc = sess.run(accuracy, feed_dict=validate_feed) print "validate_acc = sess.run(accuracy, feed_dict=validate_feed) print" Next_batch (BATCH_SIZE) # train sess.run(train_op, train_size) # train sess.run(train_op, Feed_dict ={x: xs, y_: ys}) # Calculate the final accuracy. Test_acc = sess. Run (accuracy, feed_dict=test_feed) print "TRAINING_STEPS "," test_ACC * 100, "%"Copy the code
Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/ T10K-images-IDx3-ubyte. gz Fully MNIST_data/ T10K-Alllabs-IDx1-ubyte. gz Training rounds: 0, Accuracy: 9.20000001788% 1000, accuracy rate: 97.619998455 % Training rounds: 2000, accuracy rate: 98.0799973011 % training rounds: 3000, accuracy rate: 98.2599973679 % training rounds: 4000, accuracy rate: 98.1999993324 % Number of training rounds: 5000, accuracy: 98.1800019741 % Number of training rounds: 6000, accuracy: 98.2400000095 % Number of training rounds: 7000, accuracy: 98.2200026512 % Number of training rounds: 8000, accuracy: 98.1999993324 % Number of training rounds: 9000, accuracy: 98.2599973679 % Number of training rounds: 10000, accuracy: 98.2400000095 % Training rounds: 11000, Accuracy: 98.2400000095 % Training rounds: 12000, Accuracy: 98.1599986553 % Training rounds: 13000, accuracy: 98.2400000095 % Training rounds: 12000, Accuracy: 98.1599986553 % Training rounds: 13000, accuracy: 98.2599973679 % Training rounds: 14000, accuracy: 98.299998045 % Training rounds: 15000, accuracy: 98.4200000763 % Training rounds: 16000, accuracy: 98.2599973679 % Training rounds: 14000, accuracy: 98.299998045 % Training rounds: 15000, accuracy: 98.4200000763 % Training rounds: 16000, accuracy: 98.2800006866 % Number of training rounds: 17000, accuracy: 98.3799993992 % Number of training rounds: 18000, accuracy: 98.3600020409 % Number of training rounds: 19000, accuracy: 98.3200013638 % Number of training rounds: 20000, accuracy: 98.3399987221 % Number of training rounds: 21000, accuracy: 98.3799993992 % Number of training rounds: 22000, accuracy: 98.400002718% Number of training rounds: 23000, accuracy: 98.400002718% Number of training rounds: 24000, accuracy: 98.4200000763% Number of training rounds: 25000, accuracy: 98.3200013638 % Number of training rounds: 26000, accuracy: 98.4200000763 % Number of training rounds: 27000, accuracy: 98.3799993992 % Number of training rounds: 28000, accuracy: 98.400002718% Number of training rounds: 29000, accuracy: 98.3200013638 % Number of training rounds: 30000, accuracy: 98.3900010586%Copy the code

The next section summarizes the variation trend of accuracy rate, average cross entropy, total loss, learning rate and average absolute gradient.


WeChat exceptional

Alipay rewards