This article mainly introduces how to save and restore neural network variables and the early-stopping optimization strategy. There is a large section of text and code from the previous tutorial, if you have seen it, you can quickly scroll to the relevant section Saver below.

01 – | a simple linear model 02 – | 03 – PrettyTensor convolutional neural network

By Magnus Erik Hvass Pedersen/GitHub/Videos on YouTube 英 文翻译

If reproduced, please attach a link to this article.


introduce

This tutorial shows how to save and restore variables in a neural network. In the process of optimization, when the classification accuracy of the verification set is improved, the variables of the neural network are saved. If performance does not improve after 1000 iterations, the optimization is aborted. Then we reload the variables that perform best on the validation set.

This strategy is called early-stopping. It is used to avoid over-fitting of neural networks. (overfitting) will occur when the neural network training time is too long. At this time, the neural network begins to learn the noise in the training set, which will lead to its misclassification of new images.

This tutorial focuses on using neural networks to identify handwritten numbers in MNIST datasets, where overfitting is not a big problem. But this tutorial shows the thinking of Early Stopping.

This is based on the previous tutorial, so you need to know the basics of TensorFlow and the add-on package Pretty Tensor. Much of the code and text is similar to previous tutorials, so if you’ve already seen it, you can quickly navigate through this article.

The flow chart

The chart below directly shows the data transfer in the convolutional neural network implemented later. The network has two convolution layers and two full connection layers, the last layer is used to classify the input image. See tutorial #02 for more details on networks and convolution.

from IPython.display import Image
Image('images/02_network_flowchart.png')Copy the code

The import

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math
import os

# Use PrettyTensor to simplify Neural Network construction.
import prettytensor as ptCopy the code

Developed using Python3.5.2 (Anaconda), the TensorFlow version is:

tf.__version__Copy the code

‘0.12.0 – rc0’

PrettyTensor version:

pt.__version__Copy the code

‘0.7.1’

Load the data

MNIST datasets are approximately 12MB and are automatically downloaded if not found in a given path.

from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/MNIST/', one_hot=True)Copy the code

Extracting data/MNIST/train-images-idx3-ubyte.gz

Extracting data/MNIST/train-labels-idx1-ubyte.gz

Extracting data/MNIST/t10k-images-idx3-ubyte.gz

Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST dataset has now been loaded, consisting of 70,000 images with corresponding labels (such as the category of the image). The data set is divided into three independent subsets. We will use only the training set and test set in the tutorial.

print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))Copy the code

Size of:

-Training-set: 55000

-Test-set: 10000

-Validation-set: 5000

The type labels are one-hot encoded, so each label is a vector of length 10, zero for all but One element. The index of this element is the number of the category, the number drawn in the corresponding image. We also need to test the integer value of the data set category number using the following method.

data.test.cls = np.argmax(data.test.labels, axis=1)
data.validation.cls = np.argmax(data.validation.labels, axis=1)Copy the code

Data dimension

In the source code below, data dimensions are used in many places. They are only defined in one place, so we can use these numbers in our code instead of writing numbers directly.

# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1

# Number of classes, one class for each of 10 digits.
num_classes = 10Copy the code

Help function for drawing pictures

This function is used to draw nine images in a 3×3 grid and write the real category and the predicted category under each image.

def plot_images(images, cls_true, cls_pred=None):
    assert len(images) == len(cls_true) == 9

    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3.3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape(img_shape), cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Draw a few images to see if the data is correct

# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)Copy the code

TensorFlow figure

The whole point of TensorFlow is to use something called computational graph, which is much more efficient than doing the same amount of computation directly in Python. TensorFlow is more efficient than Numpy because TensorFlow knows the entire graph that needs to be run, whereas Numpy only knows the unique mathematical operation at a point in time.

TensorFlow also automatically calculates gradients of variables that need to be optimized for better model performance. This is because the graph is a combination of simple mathematical expressions, so the gradient of the entire graph can be derived using the chain rule.

TensorFlow also takes advantage of multi-core cpus and gpus. Google has made special chips for TensorFlow called Tensor Processing Units (TPUs), which are faster than gpus.

A TensorFlow diagram consists of the following parts, described in detail below:

  • Placeholder variables are used to change the input to the diagram.
  • The Model variables will be optimized to make the Model perform better.
  • The model is essentially just a bunch of mathematical functions that compute some outputs based on the Placeholder and the input variables of the model.
  • A cost measure is used to guide the optimization of variables.
  • An optimization strategy updates the variables of the model.

In addition, the TensorFlow diagram contains debugging states, such as printing log data with TensorBoard, which are not covered in this tutorial.

Placeholder variables

Placeholder is the input to the diagram, and we can change them every time we run the diagram. Call this process the feeding placeholder variable, which will be described later.

First we define the placeholder variable for the input image. This allows us to change the image we input into the TensorFlow diagram. This is also a tensor, which means a multidimensional vector or matrix. Set the type to float32 and the shape to [None, img_size_flat]. None means tensor has an arbitrary number of images, each image is a vector of img_size_flat.

x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x')Copy the code

The convolutional layer wants x to be encoded as a 4-dimensional tensor, so we need to convert its shape to [num_images, img_height, IMg_width, num_channels]. Note that img_height == img_width == img_size, if the first dimension is set to -1, the size of num_images will also be derived automatically. The conversion operation is as follows:

x_image = tf.reshape(x, [- 1, img_size, img_size, num_channels])Copy the code

Next we define placeholder variables for the actual tags that correspond to the image in the input variable X. The variable has the shape [None, num_classes], which means it holds any number of labels, each of which is a vector of length num_classes, which in this case is 10.

y_true = tf.placeholder(tf.float32, shape=[None.10], name='y_true')Copy the code

We could also provide a placeholder for class-number, but we’ll calculate that in argmax. Here are just a few operations in TensorFlow; no operations are performed.

y_true_cls = tf.argmax(y_true, dimension=1)Copy the code

The neural network

This tutorial uses PrettyTensor to implement a convolutional neural network, which is much easier than just doing it in TensorFlow. See tutorial #03.

The basic idea is to encapsulate the input Tensor X_image with a Pretty Tensor object, which has a helper function that adds a new convolutional layer, to create the whole neural network. Pretty Tensor does variable assignments and so on.

x_pretty = pt.wrap(x_image)Copy the code

Now that we’ve put our input image into a PrettyTensor object, we can add a convolutional layer and a full connected layer with a few lines of code.

Note that in the with block, pt.defaults_scope(activation_fn=tf.nn.relu) takes activation_fn=tf.nn.relu as each layer argument, 1. Therefore, these layers have been Rectified Linear Units (ReLU). Defaults_scope makes it easier to change the parameters of all layers.

with pt.defaults_scope(activation_fn=tf.nn.relu):
    y_pred, loss = x_pretty.\
        conv2d(kernel=5, depth=16, name='layer_conv1').\
        max_pool(kernel=2, stride=2).\
        conv2d(kernel=5, depth=36, name='layer_conv2').\
        max_pool(kernel=2, stride=2).\
        flatten().\
        fully_connected(size=128, name='layer_fc1').\
        softmax_classifier(num_classes=num_classes, labels=y_true)Copy the code

To get the weight

Now, we want to plot the weights of the neural network. When you use Pretty Tensor to create a network, all the variables of the layers are created indirectly by Pretty Tensoe. So we’re going to get variables from TensorFlow.

We use layer_conv1 and layer_conv2 to represent two convolution layers. This is also called variable scope (not to be confused with defaults_scope described above). PrettyTensor automatically names the variables it creates for each level, so we can use the scope name and the variable name to get the weight of a level.

The function implementation is a bit clunky because we have to use the TensorFlow function get_variable(), which is designed for other uses, creating new variables or reusing existing ones. Creating the following help function is simple.

def get_weights_variable(layer_name):
    # Retrieve an existing variable named 'weights' in the scope
    # with the given layer_name.
    # This is awkward because the TensorFlow function was
    # really intended for another purpose.

    with tf.variable_scope(layer_name, reuse=True):
        variable = tf.get_variable('weights')

    return variableCopy the code

With this helper function we can get variables. These are the objects of TensorFlow. You need a similar operation to get the contents of the variable: contents = session.run(weights_conv1), as described below.

weights_conv1 = get_weights_variable(layer_name='layer_conv1')
weights_conv2 = get_weights_variable(layer_name='layer_conv2')Copy the code

An optimization method

PrettyTensor gives us predictive type labels (Y_pred) and a loss metric that needs to be minimized to improve the neural network’s ability to classify images.

The documentation for PrettyTensor doesn’t say whether its loss measure is cross-entropy or something else. But now we use AdamOptimizer to minimize the damage.

The optimization process is not performed here. In fact, we haven’t computed anything yet, we just added the optimizer to the TensorFlow diagram for later operations.

optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(loss)Copy the code

Performance measurement

We need additional performance measures to show the user the process.

First we calculate the predicted category from the neural network output y_pred, which is a vector containing 10 elements. The category number is the index of the largest element.

y_pred_cls = tf.argmax(y_pred, dimension=1)Copy the code

We then create a Boolean vector that tells us whether the true category of each image is the same as the predicted category.

correct_prediction = tf.equal(y_pred_cls, y_true_cls)Copy the code

The above calculation calculates the accuracy of the classification by converting the Boolean vector type to a floating-point vector, where False becomes 0 and True becomes 1, and then averaging the values.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))Copy the code

Saver

To hold the neural network variables, we create an object called saver-Object, which is used to hold and restore all the variables of the TensorFlow diagram. Nothing is saved here, it’s done later in the optimize() function.

saver = tf.train.Saver()Copy the code

A saved file is commonly known as a parameter that I’ve discovered because it is often written interchangeably.

This is the folder used to save or restore data.

save_dir = 'checkpoints/'Copy the code

Create a folder if it does not exist.

if not os.path.exists(save_dir):
    os.makedirs(save_dir)Copy the code

This is the path to save the checkpoint file.

save_path = os.path.join(save_dir, 'best_validation')Copy the code

Run TensorFlow

Creating TensorFlow sessions (Session)

Once the TensorFlow diagram is created, we need to create a TensorFlow session to run the diagram.

session = tf.Session()Copy the code

Initialize a variable

Variables weights and biases need to be initialized before optimization. We write a simple wrapper function that will be called again later.

def init_variables(a):
    session.run(tf.global_variables_initializer())Copy the code

Run the function to initialize the variable.

init_variables()Copy the code

Help functions to optimize iterations

There are 50,000 images in the training set. Using these images to calculate the gradient of the model takes a lot of time. So we use the stochastic gradient descent method, which uses only a small portion of the image in each iteration of the optimizer.

If running out of memory causes your computer to crash or become slow, you should try to reduce those numbers, but more optimized iterations may be needed in the meantime.

train_batch_size = 64Copy the code

Every 100 iterations of the following optimization function, the classification accuracy on the verification set is calculated once. If the accuracy does not improve after 1000 iterations, the optimization is stopped. We need some variables to track this process.

# Best validation accuracy seen so far.
best_validation_accuracy = 0.0

# Iteration-number for last improvement to validation accuracy.
last_improvement = 0

# Stop optimization if no improvement found in this many iterations.
require_improvement = 1000Copy the code

Function is used to perform a number of optimization iterations to gradually improve the network layer variables. In each iteration, a new batch of data is selected from the training set, and TensorFlow performs optimization on these training samples. It prints out every 100 iterations, computes the validation accuracy, and saves it to a file if it improves.

# Counter for total number of iterations performed so far.
total_iterations = 0

def optimize(num_iterations):
    # Ensure we update the global variables rather than local copies.
    global total_iterations
    global best_validation_accuracy
    global last_improvement

    # Start-time used for printing time-usage below.
    start_time = time.time()

    for i in range(num_iterations):

        # Increase the total number of iterations performed.
        # It is easier to update it in each iteration because
        # we need this number several times in the following.
        total_iterations += 1

        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch = data.train.next_batch(train_batch_size)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)

        # Print status every 100 iterations and after last iteration.
        if (total_iterations % 100= =0) or (i == (num_iterations - 1)) :# Calculate the accuracy on the training-batch.
            acc_train = session.run(accuracy, feed_dict=feed_dict_train)

            # Calculate the accuracy on the validation-set.
            # The function returns 2 values but we only need the first.
            acc_validation, _ = validation_accuracy()

            # If validation accuracy is an improvement over best-known.
            if acc_validation > best_validation_accuracy:
                # Update the best-known validation accuracy.
                best_validation_accuracy = acc_validation

                # Set the iteration for the last improvement to current.
                last_improvement = total_iterations

                # Save all variables of the TensorFlow graph to file.
                saver.save(sess=session, save_path=save_path)

                # A string to be printed below, shows improvement found.
                improved_str = The '*'
            else:
                # An empty string to be printed below.
                # Shows that no improvement was found.
                improved_str = ' '

            # Status-message for printing.
            msg = "Iter: {0: > 6}, Train - Batch Accuracy: {1} > 6.1%, the Validation Acc: {2: > 6.1%} {3}"

            # Print it.
            print(msg.format(i + 1, acc_train, acc_validation, improved_str))

        # If no improvement found in the required number of iterations.
        if total_iterations - last_improvement > require_improvement:
            print("No improvement found in a while, stopping optimization.")

            # Break out from the for-loop.
            break

    # Ending time.
    end_time = time.time()

    # Difference between start and end-times.
    time_dif = end_time - start_time

    # Print the time-usage.
    print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))Copy the code

A helper function for drawing error samples

The draw () function is used to draw misclassified samples in the test set.

def plot_example_errors(cls_pred, correct):
    # This function is called from print_test_accuracy() below.

    # cls_pred is an array of the predicted class-number for
    # all images in the test-set.

    # correct is a boolean array whether the predicted class
    # is equal to the true class for each image in the test-set.

    # Negate the boolean array.
    incorrect = (correct == False)

    # Get the images from the test-set that have been
    # incorrectly classified.
    images = data.test.images[incorrect]

    # Get the predicted classes for those images.
    cls_pred = cls_pred[incorrect]

    # Get the true classes for those images.
    cls_true = data.test.cls[incorrect]

    # Plot the first 9 images.
    plot_images(images=images[0:9],
                cls_true=cls_true[0:9],
                cls_pred=cls_pred[0:9])Copy the code

Help function to draw the confusion matrix

def plot_confusion_matrix(cls_pred):
    # This is called from print_test_accuracy() below.

    # cls_pred is an array of the predicted class-number for
    # all images in the test-set.

    # Get the true classifications for the test-set.
    cls_true = data.test.cls

    # Get the confusion matrix using sklearn.
    cm = confusion_matrix(y_true=cls_true,
                          y_pred=cls_pred)

    # Print the confusion matrix as text.
    print(cm)

    # Plot the confusion matrix as an image.
    plt.matshow(cm)

    # Make various adjustments to the plot.
    plt.colorbar()
    tick_marks = np.arange(num_classes)
    plt.xticks(tick_marks, range(num_classes))
    plt.yticks(tick_marks, range(num_classes))
    plt.xlabel('Predicted')
    plt.ylabel('True')

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Compute the help function for classification

This function computes the predicted categories of images and returns a Boolean array representing whether each image is correctly classified.

Because the calculations may consume too much memory, batch them. If your computer crashes, try lowering batch-size.

# Split the data-set in batches of this size to limit RAM usage.
batch_size = 256

def predict_cls(images, labels, cls_true):
    # Number of images.
    num_images = len(images)

    # Allocate an array for the predicted classes which
    # will be calculated in batches and filled into this array.
    cls_pred = np.zeros(shape=num_images, dtype=np.int)

    # Now calculate the predicted classes for the batches.
    # We will just iterate through all the batches.
    # There might be a more clever and Pythonic way of doing this.

    # The starting index for the next batch is denoted i.
    i = 0

    while i < num_images:
        # The ending index for the next batch is denoted j.
        j = min(i + batch_size, num_images)

        # Create a feed-dict with the images and labels
        # between index i and j.
        feed_dict = {x: images[i:j, :],
                     y_true: labels[i:j, :]}

        # Calculate the predicted class using TensorFlow.
        cls_pred[i:j] = session.run(y_pred_cls, feed_dict=feed_dict)

        # Set the start-index for the next batch to the
        # end-index of the current batch.
        i = j

    # Create a boolean array whether each image is correctly classified.
    correct = (cls_true == cls_pred)

    return correct, cls_predCopy the code

Calculates the prediction categories on the test set.

def predict_cls_test(a):
    return predict_cls(images = data.test.images,
                       labels = data.test.labels,
                       cls_true = data.test.cls)Copy the code

Calculate the prediction category on the validation set.

def predict_cls_validation(a):
    return predict_cls(images = data.validation.images,
                       labels = data.validation.labels,
                       cls_true = data.validation.cls)Copy the code

Help function for classification accuracy

This function calculates the classification accuracy of a given Boolean array indicating whether each image is correctly classified. For example, cls_accuracy([True, True, False, False, False]) = 2/5 = 0.4.

def cls_accuracy(correct):
    # Calculate the number of correctly classified images.
    # When summing a boolean array, False means 0 and True means 1.
    correct_sum = correct.sum()

    # Classification accuracy is the number of correctly classified
    # images divided by the total number of images in the test-set.
    acc = float(correct_sum) / len(correct)

    return acc, correct_sumCopy the code

Calculate the classification accuracy on the verification set.

def validation_accuracy(a):
    # Get the array of booleans whether the classifications are correct
    # for the validation-set.
    # The function returns two values but we only need the first.
    correct, _ = predict_cls_validation()

    # Calculate the classification accuracy and return it.
    return cls_accuracy(correct)Copy the code

Help functions that show performance

The class accuracy () function prints the classification accuracy on the test set.

It will take a while to calculate the classification for all the images on the test set, so we call the above functions directly from this function so that we don’t have to recalculate the classification for each function.

def print_test_accuracy(show_example_errors=False, show_confusion_matrix=False):

    # For all the images in the test-set,
    # calculate the predicted classes and whether they are correct.
    correct, cls_pred = predict_cls_test()

    # Classification accuracy and the number of correct classifications.
    acc, num_correct = cls_accuracy(correct)

    # Number of images being classified.
    num_images = len(correct)

    # Print the accuracy.
    msg = "Accuracy on Test-Set: {0:.1%} ({1} / {2})"
    print(msg.format(acc, num_correct, num_images))

    # Plot some examples of mis-classifications, if desired.
    if show_example_errors:
        print("Example errors:")
        plot_example_errors(cls_pred=cls_pred, correct=correct)

    # Plot the confusion matrix, if desired.
    if show_confusion_matrix:
        print("Confusion Matrix:")
        plot_confusion_matrix(cls_pred=cls_pred)Copy the code

Draw the help function of the convolution weights

def plot_conv_weights(weights, input_channel=0):
    # Assume weights are TensorFlow ops for 4-dim variables
    # e.g. weights_conv1 or weights_conv2.

    # Retrieve the values of the weight-variables from TensorFlow.
    # A feed-dict is not necessary because nothing is calculated.
    w = session.run(weights)

    # Print mean and standard deviation.
    print("Mean: {0:.5f}, Stdev: {1:.5f}".format(w.mean(), w.std()))

    # Get the lowest and highest values for the weights.
    # This is used to correct the colour intensity across
    # the images so they can be compared with each other.
    w_min = np.min(w)
    w_max = np.max(w)

    # Number of filters used in the conv. layer.
    num_filters = w.shape[3]

    # Number of grids to plot.
    # Rounded-up, square-root of the number of filters.
    num_grids = math.ceil(math.sqrt(num_filters))

    # Create figure with a grid of sub-plots.
    fig, axes = plt.subplots(num_grids, num_grids)

    # Plot all the filter-weights.
    for i, ax in enumerate(axes.flat):
        # Only plot the valid filter-weights.
        if i<num_filters:
            # Get the weights for the i'th filter of the input channel.
            # The format of this 4-dim tensor is determined by the
            # TensorFlow API. See Tutorial #02 for more details.
            img = w[:, :, input_channel, i]

            # Plot image.
            ax.imshow(img, vmin=w_min, vmax=w_max,
                      interpolation='nearest', cmap='seismic')

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Optimize previous performance

The accuracy on the test set is very low because the model is only initialized, not optimized, so it just randomly classifies the images.

print_test_accuracy()Copy the code

Accuracy on test-set: 8.5% (849/10000)

The convolution weight is random, but it is also difficult to distinguish it from the optimized weight below. Mean and standard deviation are also shown here, so we can see if there is a difference.

plot_conv_weights(weights=weights_conv1)Copy the code

Mean: 0.00880, Stdev: 0.28635

Performance after 10,000 optimized iterations

We now have 10,000 iterations of optimization, and stop tuning when 1000 iterations show no improvement in performance on the set.

The asterisk * represents an improvement in classification accuracy on the validation set.

optimize(num_iterations=10000)Copy the code

Iter: 100, train-batch Accuracy: 84.4%, Validation Acc: 85.2% Iter: 200, train-batch Accuracy: 92.2%, Validation Acc: Iter: 300, train-batch Accuracy: 95.3%, Validation Acc: 93.7% Iter: 400, train-batch Accuracy: Validation Acc: 94.3% Iter: 500, train-batch Accuracy: 98.4%, Validation Acc: 94.7% Iter: Iter: 700, train-batch Accuracy: 98.4%, Validation Acc: 93.8%, Validation Acc: 94.7% Iter: 800, train-batch Accuracy: 100.0%, Validation Acc: 96.3% Iter: 900, train-batch Accuracy: Iter: 1000, train-batch Accuracy: 100.0%, Validation Acc: 96.9% Iter: Iter: 1200, train-batch Accuracy: 93.8%, Validation Acc: 97.0% Validation Acc: 97.2% Iter: 1300, Batch Accuracy: 92.2%, Validation Acc: 97.2% Iter: 1400, Batch Accuracy: Iter: 1500, Train-batch Accuracy: 96.9%, Validation Acc: 97.4% Iter: Iter: 1700, Batch Accuracy: 100.0%, Validation Acc: 97.7% Iter: 1700, Batch Accuracy: 100.0%, Validation Acc: 97.7% Iter: 1800, train-batch Accuracy: 98.4%, Validation Acc: 97.7% Iter: 1900, train-batch Accuracy: Validation Acc: 98.1% Iter: 2000, Train-batch Accuracy: 95.3%, Validation Acc: 98.0% Iter: Iter: 2200, train-batch Accuracy: 100.0%, Validation Acc: 98.4%, Validation Acc: 97.9% Validation Acc: 98.1% Iter: 2400, train-batch Accuracy: Validation Acc: 98.1% Iter: 2500, Train-batch Accuracy: 98.4%, Validation Acc: 98.2% Iter: Iter: 2800, train-batch Accuracy: 98.4%, Validation Acc: 98.0% Iter: 2700, train-batch Accuracy: 98.4%, Validation Acc: 98.4% Validation Acc: 98.1% Iter: 2900, train-batch Accuracy: Validation Acc: 98.2% Iter: 3000, train-batch Accuracy: 98.4%, Validation Acc: 98.2% Iter: Iter: 3200, train-batch Accuracy: 100.0%, Validation Acc: 98.1% Iter: 3200, train-batch Accuracy: 100.0%, Validation Acc: 98.1% Iter: 3300, train-batch Accuracy: 98.4%, Validation Acc: 98.4% Iter: 3400, train-batch Accuracy: Validation Acc: 98.0% Iter: 3500, Train-batch Accuracy: 98.4%, Validation Acc: 98.3% Iter: Iter: 3700, train-batch Accuracy: 98.4%, Validation Acc: Iter: 3800, train-batch Accuracy: 96.9%, Validation Acc: 98.1% Iter: 3900, train-batch Accuracy: Validation Acc: 98.5% Iter: 4000, Train-batch Accuracy: 100.0%, Validation Acc: 98.4% Iter: Iter: 4200, train-batch Accuracy: 100.0%, Validation Acc: 98.5% Iter: 4200, train-batch Accuracy: 100.0%, Validation Acc: 98.5% Iter: 4300, train-batch Accuracy: 100.0%, Validation Acc: 98.6% Iter: 4400, train-batch Accuracy: Validation Acc: 98.4% Iter: 4500, Train-batch Accuracy: 98.4%, Validation Acc: 98.5% Iter: Iter: 4700, train-batch Accuracy: 98.4%, Validation Acc: 98.4% Iter: 4800, train-batch Accuracy: 100.0%, Validation Acc: 98.8% * Iter: 4900, train-batch Accuracy: Validation Acc: 98.8% Iter: 5000, Train-batch Accuracy: 98.4%, Validation Acc: 98.6% Iter: Iter: 5200, train-batch Accuracy: 100.0%, Validation Acc: 98.6% Iter: 5300, train-batch Accuracy: 96.9%, Validation Acc: 98.5% Iter: 5400, train-batch Accuracy: Validation Acc: 98.7% Iter: 5500, Train-batch Accuracy: 98.4%, Validation Acc: 98.6% Iter: Iter: 5700, train-batch Accuracy: 100.0%, Validation Acc: 98.4% Iter: 5700, train-batch Accuracy: 100.0%, Validation Acc: Iter: 5800, train-batch Accuracy: 100.0%, Validation Acc: 98.7% No improvement found in a while, stopping optimization. Time Usage: 0:00:28

print_test_accuracy(show_example_errors=True,
                    show_confusion_matrix=True)Copy the code

Accuracy on test-set: 98.4% (9842/10000) Example errors:

Confusion Matrix: [[974 00 00 12 0 2 1] [0 1127 2 2 00 10 3 0] [4 4 1012 4 100 3 4 0] [00 1 1005 0 2 00 2 00 01] [101 0 961 0 2 0 3 14] [2 0 16 880 1 0 1 1] [42 0 1 3 4 942 0 20] [1 1 8 6 1 0 994 1 16] [6 0 14 1 1 2 952 6] [3 3 0 3 2 2 0 0 1 995]]

Now the convolution weights are optimized. Compare these to the random weights above. They look pretty much the same. In fact, AT first I thought the program was buggy because the weights before and after optimization looked similar.

But save the images and compare them side by side (you can right-click to save). You’ll notice a slight difference.

The mean and standard deviation also change a little bit, so the optimized weights must be different.

plot_conv_weights(weights=weights_conv1)Copy the code

Mean: 0.02895, Stdev: 0.29949

Initialize the variable again

Again, initialize all the neural network variables with random values.

init_variables()Copy the code

This means that the neural network classifies the images completely randomly, with very low accuracy because it is just a random guess.

print_test_accuracy()Copy the code

Accuracy on test-set: 13.4% (1341/10000)

The convolution weights should look different from the ones above.

plot_conv_weights(weights=weights_conv1)Copy the code

Mean: 0.01086, Stdev: 0.28023

Recover the best variable

Reload all variables saved to the file during optimization.

saver.restore(sess=session, save_path=save_path)Copy the code

Using previously saved variables, the classification accuracy was improved again.

Note that the accuracy may be slightly higher or lower than before because the variables in the file were used to maximize the classification accuracy on the validation set, but after saving the file, another 1000 optimization iterations were performed, so this is the result of two slightly different sets of variables. Sometimes this results in better or worse performance on the test set.

print_test_accuracy(show_example_errors=True,
                    show_confusion_matrix=True)Copy the code

Accuracy on test-set: 98.3% (9826/10000) Example errors:

Confusion Matrix: [[973 00 00 02 0 3 2] [0 1124 2 2 00 3 0 4 0] [2 1 1027 00 0 11 00] [00 1 1005 02 00 2 00 2] [00 3 0 968 0 1 0 3 7] [2 0 19 871 3 0 3] [4 2 1 0 3 3 939 0 6] [1 3 19 11 2 0 972 2 18] [6 0 3 51 0 1 2 951 5] [3 3 0 1 4 1 0 0 1 996]]

The convolution weights are also almost identical to those shown previously, and again, they are not exactly the same due to 1000 more optimization iterations.

plot_conv_weights(weights=weights_conv1)Copy the code

Mean: 0.02792, Stdev: 0.29822

Close the TensorFlow session

We have now completed the task with TensorFlow, closing the session and freeing resources.

# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()Copy the code

conclusion

This tutorial describes how to save and restore neural network variables in TensorFlow. It has many uses. For example, when you use a neural network to recognize images, you only need to train the network once, and then you can develop it on other computers.

Another advantage of checkpoint technology is that if you have a very large neural network and data set, you can add something that I’ve added to your ability to prevent a computer from crashing during a checkpoint, so that you can optimize at a recent checkpoint instead of starting over.

This tutorial also shows how to use validation sets to do what is called Early Stopping, which will terminate if validation errors are not reduced. This is useful when the neural network overfits and starts to learn the noise of the training set; However, this is not a big problem in this tutorial’s neural network and MNIST data set.

It is also interesting to note that the convolution weights (or filtering) change very little during optimization, even as the performance of the network improves from random guessing to near-perfect classification. Oddly enough, random weights seem to be good enough. Why do you think this is happening?

practice

Here are some suggested exercises that might help you improve your TensorFlow skills. In order to learn how to use TensorFlow more appropriately, practical experience is important.

Before you can make changes to this Notebook, you may want to make a backup.

  • The optimization ends after 1000 iterations with no performance improvement. Is that enough? Can you think of a better way to do Early Stopping? Try to make it happen.
  • If the checkpoint file already exists, load it instead of optimizing it.
  • Save checkpoint every 100 optimization iterations. throughsaver.latest_checkpoint()Get the latest (savepoint). Changing a feature of a powerful device
  • Try changing the neural network, like adding other layers. What happens when you reload variables from different networks?
  • withplot_conv_weights()The function draws the weight of the second convolution layer before and after optimization. Are they almost the same?
  • Why do you think the optimized convolution weights are almost the same as the randomly initialized weights?
  • Don’t read the source code, rewrite the program.
  • Explain to a friend how the program works.