Tensorflow 1.x Tutorial (7) - Add Dropout classification models

The target

This article aims to introduce the basic knowledge and practical examples of TensorFlow. We hope that you will become familiar with the basic operation of TensorFlow after learning it

Simple classification model code

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST", one_hot=True) batch_size = 16 n_batches = mnist.train.num_examples // batch_size x = tf.placeholder(dtype=tf.float32, shape=[None, 784]) y = tf.placeholder(dtype=tf.float32, shape=[None, 10]) keep_prob = tf.placeholder(tF.float32) # W1 = tf.variable (tf.zeros([784, 1024])) b1 = tf.Variable(tf.zeros(1024)) a1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1) # Relu Nonlinear activation function O1 = TF.nn.dropout (A1, W2 = tf.variable (tf.zeros([1024, 512])) b2 = tf.Variable(tf.zeros(512)) a2 = tf.nn.sigmoid(tf.matmul(o1, w2) + b2) o2 = tf.nn.dropout(a2, W3 = tf.variable (tf.zeros([512,128])) b3 = tf.variable (tf.zeros(128)) A3 = Tf.nn. sigmoid(tf.matmul(O2,w3) + b3) O3 = tF.nn. dropout(A3, keep_prob) # Dropout w4 = tf.nn. Variable(tf.zeros([128, 10])) b4 = tf.Variable(tf.zeros(10)) prediction = tf.nn.softmax(tf.matmul(o3, w4) + b4) loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, Labels =y)) opt = tf.train.adamoptimizer (0.001). Minimize (loss) correct = tF.equal (tF.argmax (prediction, 1), tF.argmax (y, 1)) accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) import time with tf.Session() as sess: sess.run(tf.global_variables_initializer()) total_batch = 0 last_batch = 0 best = 0 start = time.time() for epoch in range(100): for _ in range(n_batches): batch_x, batch_y = mnist.train.next_batch(batch_size) sess.run([opt], feed_dict={x:batch_x, y:batch_y, Keep_prob :0.5}) loss_value, acc = sess.run([loss, accuracy], feed_dict={x:mnist.test.images, y:mnist.test.labels, Keep_prob :1.0}) if acc > best: best = acc last_batch = total_batch print('epoch:%d, loss:%f, acc:%f, time:%f' % (epoch, loss_value, acc, time.time()-start)) start = time.time() if total_batch - last_batch > 5: Print ('when epoch-%d early stop train'%epoch) break total_batch += 1Copy the code

Results output

Extracting MNIST/train-images-idx3-ubyte.gz Extracting MNIST/train-labels-idx1-ubyte.gz Extracting MNIST/ T10K-images-IDx3-ubyte. gz Fully MNIST/ T10K-Alllabs-IDx1-ubyte. gz epoch:0, Loss :1.609388, ACC :0.849500, Time :11.415676 epoch:1, Loss :1.525251, ACC :0.935700, Time :11.466139 epoch:2, Loss :1.515377, ACC :0.946300, Time :10.386464 epoch:3, Loss :1.507261, ACC :0.954000, Time :10.178594 epoch:4, Loss :1.503498, ACC :0.957800, Time :11.311379 epoch:5, Loss :1.501618, ACC :0.959000, time:10.101135 Epoch :6, Loss :1.499541, ACC :0.961100, Time :10.134475 epoch:7, Loss :1.496089, ACC :0.965000, Time :10.052625 Epoch :8, Loss :1.495209, ACC :0.965500, Time :10.609939 epoch:9, Loss :1.494871, ACC :0.966000, Time :10.070237 epoch:10, Loss :1.490888, ACC :0.970000, Time :10.127296 epoch:13, Loss :1.490968, ACC :0.970200, Time :30.249309 Epoch :16, Loss :1.489859, ACC :0.971600, Time :30.295541 epoch:17, Loss :1.489045, ACC :0.971800, Time :10.351570 epoch:18, Loss :1.487513, ACC :0.974000, Time :10.136432 epoch:22, Loss :1.486135, ACC :0.974900, Time :40.279734 epoch:24, Loss :1.485551, ACC :0.975600, Time :20.794270 epoch:26, Loss :1.485324, ACC :0.975900, Time :21.456657 epoch:29, Loss :1.485043, ACC :0.976200, Time :32.043005 epoch:30, Loss :1.483336, ACC :0.978000, time:10.434125 when epoch-36 early stop trainCopy the code

The point a

This article uses tf.zeros to initialize w1, w2, w3, w4, training effect is very good, if using tf.random_normal initialization convergence will be relatively slow, don’t believe you can try.

Point 2

The activation function used in this paper is Sigmoid, but it can be replaced with other activation functions relu and TANH. However, it should be noted that tF. zeros cannot be used to initialize W1, W2, W3 and W4, so weight parameters cannot be optimized in the back propagation of the training process. So we use tf.random_normal(shape, stddev=0.1) to initialize w1, w2, w3, and w4.

Point 3

Learning rate according to the conventional setting of 0.001 order of magnitude, learning rate is too large model training effect no improvement, even training is getting worse and worse, learning rate is too small model training may be no improvement, convergence may be too slow, do not believe you can change the learning rate to 0.1 and 0.0001 try.

Points,

It can be seen that the use of multi-layer nonlinear change and Dropout technology can greatly improve the accuracy of image recognition. The accuracy of the simple classification model in the previous paper is only 0.926, while the accuracy of the technology in this paper can reach 0.978.

In this paper, the reference

Reference for this article: blog.csdn.net/qq_19672707…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Tensorflow 1.x Tutorial (7) — Add Dropout classification models

The target

Simple classification model code

Results output

The point a

Point 2

Point 3

Points,

In this paper, the reference

Tensorflow 1.x Tutorial (7) — Add Dropout classification models

The target

Simple classification model code

Results output

The point a

Point 2

Point 3

Points,

In this paper, the reference

Related Posts

Shandong algorithm contest grid event intelligent classification topline

BERT model distillation TinyBERT

Introduction to Machine learning 06. Polynomial linear regression