01 – Simple linear model

By Magnus Erik Hvass Pedersen/GitHub/Videos on YouTube 英 文翻译

If reproduced, please attach a link to this article.


introduce

The previous tutorial demonstrated a simple linear model with a 91% recognition rate for handwritten numbers in the MNIST dataset.

In this tutorial, we will implement a simple convolutional neural network in TensorFlow that can achieve classification accuracy of about 99%, perhaps even higher if you do some recommended exercises.

The convolutional neural network moves a small filter over an input image. This means repeating the filters as you traverse the entire image to identify patterns. This makes convolutional neural network more powerful than Fully Connected network when it has the same number of variables, and also makes convolutional neural network training faster.

You should be familiar with basic linear algebra, Python, and Jupyter Notebook editors. If you are new to TensorFlow, you should study the first tutorial before this tutorial.

The flow chart

The chart below directly shows the data transfer in the convolutional neural network implemented later.

from IPython.display import Image
Image('images/02_network_flowchart.png')Copy the code

The input image is processed using a refilter in the first convolution layer. The result is 16 new images, each representing a filter in the convolution layer. The image was downsampled to reduce the resolution from 28×28 to 14×14.

Sixteen small graphs are processed in the second convolution layer. Each of the 16 channels and the output of this layer requires a filter weight. There are 36 outputs in total, so there are 16 x 36 = 576 filters in the second convolution layer. The output image is again sampled down to 7×7 pixels.

The output of the second convolution layer is 36 7×7 pixel images. They are converted into a vector of length 7 x 7 x 36 = 1764, which serves as input to a fully connected network of 128 neurons (or elements). These are fed into another fully connected layer of 10 neurons, each representing a category, used to determine the category of the image, the number on the image.

Convolution filtering is randomly selected at first, so classification is also done randomly. The error between the predicted value of the input graph and the true category is measured according to the cross-entropy. Then the optimizer uses the chain rule to automatically transfer the error in the convolutional network and update the filtering weight to improve the classification quality. This process was iterated thousands of times until the classification error was low enough.

These specific filtering weights and intermediate images are an optimization result and may differ from what you see when you execute the code.

Note that these calculations on TensorFlow are performed on a subset of images rather than a single graph, which makes the calculations more efficient. It also means that when implemented on TensorFlow, the flowchart actually has more data dimensions.

Convolution layer

The following image illustrates the basic idea of processing images in the first convolution layer. The input image depicts the number 7, four copies of which are shown here, and we can clearly see how the filter moves at different points in the image. At each position of the filter, the dot product of the filter and the image pixels below the filter is calculated to get a pixel of the output image. As a result, a new image is generated as you move across the entire input image.

The filter weight in red indicates that the filter has a positive response to the black pixels of the input graph, and the blue indicates that the filter has a negative response.

In this example, it is clear that the filter recognizes the horizontal line segment of the number 7, and its strong response to the line segment can be seen in the output graph.

Image('images/02_convolution.png')Copy the code

The moving step of the filter traversing the input graph is called the stride. There is a stride in the horizontal direction and a stride in the vertical direction.

In the source code below, the stride in both directions is set to 1, which means that the filter starts from the upper left corner of the input image and moves to the right pixel next. When the filter reaches the right side of the image, it returns to the far left and moves down 1 pixel. This process continues until the filter reaches the lower right corner of the input image, and the entire output image is also generated.

When the filter reaches the right or bottom of the input graph, it is filled with zeros (white pixels). Because the output diagram has to be the same size as the input diagram.

In addition, the output of the convolution layer may be passed to the modified linear unit (ReLU), which ensures that the output is positive and sets negative values to zero. The output will also be de-sampled by max-pooling, which uses a small 2×2 window to retain only the maximum value in pixels. This reduces the input image resolution by half, from, say, 28×28 to 14×14.

The second convolution layer is more complicated because it has 16 input channels. We want to give each channel a separate filter, so we need 16. In addition, we want 36 outputs from the second convolution layer, so we need a total of 16 x 36 = 576 filters. It can be difficult to understand how this works.

The import

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import mathCopy the code

Developed using Python3.5.2 (Anaconda), the TensorFlow version is:

tf.__version__Copy the code

‘0.12.0 – rc0’

Configuration of neural networks

For convenience, by defining the configuration of the neural network here, you can easily find or change the values and rerunn the Notebook.

# Convolutional Layer 1.
filter_size1 = 5          # Convolution filters are 5 x 5 pixels.
num_filters1 = 16         # There are 16 of these filters.

# Convolutional Layer 2.
filter_size2 = 5          # Convolution filters are 5 x 5 pixels.
num_filters2 = 36         # There are 36 of these filters.

# Fully-connected layer.
fc_size = 128             # Number of neurons in fully-connected layer.Copy the code

Load the data

MNIST datasets are about 12MB and are automatically downloaded if not found in folders.

from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/MNIST/', one_hot=True)Copy the code

Extracting data/MNIST/train-images-idx3-ubyte.gz

Extracting data/MNIST/train-labels-idx1-ubyte.gz

Extracting data/MNIST/t10k-images-idx3-ubyte.gz

Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST dataset has now been loaded, consisting of 70,000 images with corresponding labels (such as the category of the image). The data set is divided into three independent subsets. We will use only the training set and test set in the tutorial.

print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))Copy the code

Size of:

-Training-set: 55000

-Test-set: 10000

-Validation-set: 5000

The type labels are one-hot encoded, so each label is a vector of length 10, zero for all but One element. The index of this element is the number of the category, the number drawn in the corresponding image. We also need to test the integer value of the data set category number using the following method.

data.test.cls = np.argmax(data.test.labels, axis=1)Copy the code

Data dimension

In the source code below, data dimensions are used in many places. They are only defined in one place, so we can use these numbers in our code instead of writing numbers directly.

# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1

# Number of classes, one class for each of 10 digits.
num_classes = 10Copy the code

Help function for drawing pictures

This function is used to draw nine images in a 3×3 grid and write the real category and the predicted category under each image.

def plot_images(images, cls_true, cls_pred=None):
    assert len(images) == len(cls_true) == 9

    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3.3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape(img_shape), cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Draw a few images to see if the data is correct

# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)Copy the code

TensorFlow figure

The whole point of TensorFlow is to use something called computational graph, which is much more efficient than doing the same amount of computation directly in Python. TensorFlow is more efficient than Numpy because TensorFlow knows the entire graph that needs to be run, whereas Numpy only knows the unique mathematical operation at a point in time.

TensorFlow also automatically calculates gradients of variables that need to be optimized for better model performance. This is because the graph is a combination of simple mathematical expressions, so the gradient of the entire graph can be derived using the chain rule.

TensorFlow also takes advantage of multi-core cpus and gpus. Google has made special chips for TensorFlow called Tensor Processing Units (TPUs), which are faster than gpus.

A TensorFlow diagram consists of the following parts, described in detail below:

  • Placeholder variables are used to change the input to the diagram.
  • The Model variables will be optimized to make the Model perform better.
  • The model is essentially just a bunch of mathematical functions that compute some outputs based on the Placeholder and the input variables of the model.
  • A cost measure is used to guide the optimization of variables.
  • An optimization strategy updates the variables of the model.

In addition, the TensorFlow diagram contains debugging states, such as printing log data with TensorBoard, which are not covered in this tutorial.

A helper function for creating a new variable

The TensorFlow () function creates TensorFlow variables of a given size and initializes them with random values. Note that the initialization is not complete at this point, just defining them in the TensorFlow diagram.

def new_weights(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))Copy the code
def new_biases(length):
    return tf.Variable(tf.constant(0.05, shape=[length]))Copy the code

Create a helper function for the convolution layer

This function creates a new convolution layer for TensorFlow in the computation diagram. There is no calculation performed here, just a mathematical formula added to the TensorFlow diagram.

Suppose the input is a four-dimensional tensor with the following dimensions:

  1. Number of images
  2. The Y-axis of each image
  3. The X-axis of each image
  4. Number of channels per image

The input channel may be a color channel, or it may be a filter channel when the input is generated by the previous convolution layer.

The output is another 4-channel tensor, as follows:

  1. Number of images, same as input
  2. The Y-axis of each image. If 2×2 pooling is used, it is half the width and height of the input image.
  3. The X-axis of each image. Same as above.
  4. Number of channels generated by convolution filtering.
def new_conv_layer(input, # The previous layer. num_input_channels, # Num. channels in prev. layer. filter_size, # Width and height of each filter. num_filters, # Number of filters. use_pooling=True):  # Use 2x2 max-pooling.

    # Shape of the filter-weights for the convolution.
    # This format is determined by the TensorFlow API.
    shape = [filter_size, filter_size, num_input_channels, num_filters]

    # Create new weights aka. filters with the given shape.
    weights = new_weights(shape=shape)

    # Create new biases, one for each filter.
    biases = new_biases(length=num_filters)

    # Create the TensorFlow operation for convolution.
    # Note the strides are set to 1 in all dimensions.
    # The first and last stride must always be 1,
    # because the first is for the image-number and
    # the last is for the input-channel.
    # But e.g. strides=[1, 2, 2, 1] would mean that the filter
    # is moved 2 pixels across the x- and y-axis of the image.
    # The padding is set to 'SAME' which means the input image
    # is padded with zeroes so the size of the output is the same.
    layer = tf.nn.conv2d(input=input,
                         filter=weights,
                         strides=[1.1.1.1],
                         padding='SAME')

    # Add the biases to the results of the convolution.
    # A bias-value is added to each filter-channel.
    layer += biases

    # Use pooling to down-sample the image resolution?
    if use_pooling:
        # This is 2x2 max-pooling, which means that we
        # consider 2x2 windows and select the largest value
        # in each window. Then we move 2 pixels to the next window.
        layer = tf.nn.max_pool(value=layer,
                               ksize=[1.2.2.1],
                               strides=[1.2.2.1],
                               padding='SAME')

    # Rectified Linear Unit (ReLU).
    # It calculates max(x, 0) for each input pixel x.
    # This adds some non-linearity to the formula and allows us
    # to learn more complicated functions.
    layer = tf.nn.relu(layer)

    # Note that ReLU is normally executed before the pooling,
    # but since relu(max_pool(x)) == max_pool(relu(x)) we can
    # save 75% of the relu-operations by max-pooling first.

    # We return both the resulting layer and the filter-weights
    # because we will plot the weights later.
    return layer, weightsCopy the code

Convert a layer helper function

The convolution layer generates a 4-dimensional tensor. We will add a full connection layer after the convolution layer, so we need to convert this 4-dimensional tensor into a 2-dimensional tensor that can be used by the full connection layer.

def flatten_layer(layer):
    # Get the shape of the input layer.
    layer_shape = layer.get_shape()

    # The shape of the input layer is assumed to be:
    # layer_shape == [num_images, img_height, img_width, num_channels]

    # The number of features is: img_height * img_width * num_channels
    # We can use a function from TensorFlow to calculate this.
    num_features = layer_shape[1:4].num_elements()

    # Reshape the layer to [num_images, num_features].
    # Note that we just set the size of the second dimension
    # to num_features and the size of the first dimension to -1
    # which means the size in that dimension is calculated
    # so the total size of the tensor is unchanged from the reshaping.
    layer_flat = tf.reshape(layer, [- 1, num_features])

    # The shape of the flattened layer is now:
    # [num_images, img_height * img_width * num_channels]

    # Return both the flattened layer and the number of features.
    return layer_flat, num_featuresCopy the code

Create a helper function for the full connection layer

This function creates a full connection layer in the diagram for TensorFlow. There is no calculation here, just adding mathematical formulas to the TensorFlow diagram.

The input is a two-dimensional tensor of size [num_images, num_inputs]. The output is a 2-dimensional tensor of size [num_images, num_outputs].

def new_fc_layer(input, # The previous layer. num_inputs, # Num. inputs from prev. layer. num_outputs, # Num. outputs. use_relu=True): # Use Rectified Linear Unit (ReLU)?

    # Create new weights and biases.
    weights = new_weights(shape=[num_inputs, num_outputs])
    biases = new_biases(length=num_outputs)

    # Calculate the layer as the matrix multiplication of
    # the input and weights, and then add the bias-values.
    layer = tf.matmul(input, weights) + biases

    # Use ReLU?
    if use_relu:
        layer = tf.nn.relu(layer)

    return layerCopy the code

Placeholder variables

Placeholder is the input to the diagram, and we’re going to change them every time we run the diagram. Call this process the feeding placeholder variable, which will be described later.

First we define the placeholder variable for the input image. This allows us to change the image we input into the TensorFlow diagram. This is also a tensor, which means a multidimensional vector or matrix. Set the type to float32 and the shape to [None, img_size_flat]. None means tensor has an arbitrary number of images, each image is a vector of img_size_flat.

x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x')Copy the code

The convolutional layer wants x to be encoded as a 4-dimensional tensor, so we need to convert its shape to [num_images, img_height, IMg_width, num_channels]. Note that img_height == img_width == img_size, if the first dimension is set to -1, the size of num_images will also be derived automatically. The conversion operation is as follows:

x_image = tf.reshape(x, [- 1, img_size, img_size, num_channels])Copy the code

Next we define placeholder variables for the actual tags that correspond to the image in the input variable X. The variable has the shape [None, num_classes], which means it holds any number of labels, each of which is a vector of length num_classes, which in this case is 10.

y_true = tf.placeholder(tf.float32, shape=[None.10], name='y_true')Copy the code

We could also provide a placeholder for class-number, but we’ll calculate that in argmax. Here are just a few operations in TensorFlow; no operations are performed.

y_true_cls = tf.argmax(y_true, dimension=1)Copy the code

Convolution layer 1

Create the first convolution layer. Using X_image as input, create num_filters1 different filters, each with the same width and height as filter_size1. Finally, we will use a 2×2 max-pooling to de-sample the image and halve its size.

layer_conv1, weights_conv1 = \
    new_conv_layer(input=x_image,
                   num_input_channels=num_channels,
                   filter_size=filter_size1,
                   num_filters=num_filters1,
                   use_pooling=True)Copy the code

Check the size of the output tensor of the convolution layer. (is it? ,14, 14, 16), which means that there are any number of images (? Represents quantity), each image is 14 pixels wide and high, with 16 different channels, one channel for each filter.

layer_conv1Copy the code

Convolution layer 2

Create a second convolution layer that takes the output of the first convolution layer as input. The number of input channels corresponds to the number of filters in the first convolution layer.

layer_conv2, weights_conv2 = \
    new_conv_layer(input=layer_conv1,
                   num_input_channels=num_filters1,
                   filter_size=filter_size2,
                   num_filters=num_filters2,
                   use_pooling=True)Copy the code

Check the size of the output tensor of the convolution layer. Its size is (? , 7, 7, 36), where? Also represents any number of images, each of which is 7 pixels wide and high, and each filter has 36 channels.

layer_conv2Copy the code

The transformation layer

The convolution layer outputs a 4-dimensional tensor. Now we want to use it as an input to a fully connected network, which requires converting it to a 2-dimensional tensor.

layer_flat, num_features = flatten_layer(layer_conv2)Copy the code

The magnitude of this tensor is (? , 1764), means that there are a certain number of images, and each image is transformed into 1764 vectors. So 1764 is 7 x 7 x 36.

layer_flatCopy the code
num_featuresCopy the code

1764

Full connection layer 1

Add a full connection layer to your network. The input is a converted layer from the previous convolution. The number of neurons or nodes in the full connection layer is fc_size. We can use ReLU to learn nonlinear relationships.

layer_fc1 = new_fc_layer(input=layer_flat,
                         num_inputs=num_features,
                         num_outputs=fc_size,
                         use_relu=True)Copy the code

The output of the full connection layer is a value of (? , 128),? Represents a certain number of images, and fc_size == 128.

layer_fc1Copy the code

Full connection layer 2

Add another fully connected layer whose output is a vector of length 10 that determines which category the input graph belongs to. This layer does not use ReLU.

layer_fc2 = new_fc_layer(input=layer_fc1,
                         num_inputs=fc_size,
                         num_outputs=num_classes,
                         use_relu=False)Copy the code
layer_fc2Copy the code

Predicted class

The second full connection layer estimates how likely the input graph is to fall into one of the 10 categories. However, this is a rough estimate and difficult to interpret because the values can be small or large, so we normalize them, limiting each element to between 0 and 1, and adding up to 1. This is calculated using a function called softmax, and the results are stored in y_pred.

y_pred = tf.nn.softmax(layer_fc2)Copy the code

The category number is the index of the largest element.

y_pred_cls = tf.argmax(y_pred, dimension=1)Copy the code

Optimized loss function

In order for the model to better classify the input images, we had to change the weights and biases variables. First we need to compare the predicted output of the model Y_pred with the expected output of y_true to see how well the current model performs.

Cross-entropy is a performance measure used in classification. The cross entropy is a continuous function that is always positive and is equal to zero if the predicted value of the model conforms precisely to the desired output. Therefore, the purpose of optimization is to minimize the cross entropy by changing the variables at the network layer.

TensorFlow has a built-in function for calculating cross entropy. This function computes Softmax internally, so we’ll use the output of layer_fc2 instead of using y_pred directly because softmax is already computed on y_pred.

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
                                                        labels=y_true)Copy the code

We calculate cross entropy for each image classification, so there is a measure of how the current model is represented on each graph. But in order to use cross entropy to guide the optimization of model variables, we need an additional scalar value, so we simply use the mean of all image classification cross entropy.

cost = tf.reduce_mean(cross_entropy)Copy the code

An optimization method

Now that we have a loss measure that needs to be minimized, we can set up an optimizer to optimize it. In this example, we use a variant of gradient descent, AdamOptimizer.

The optimization process is not performed here. In fact, we haven’t computed anything yet, we just added the optimizer to the TensorFlow diagram for later operations.

optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(cost)Copy the code

Performance measurement

We need additional performance measures to show the user the process.

This is a Boolean vector that represents whether the predicted type is equal to the true type of each image.

correct_prediction = tf.equal(y_pred_cls, y_true_cls)Copy the code

The above calculation calculates the accuracy of the classification by converting the Boolean vector type to a floating-point vector, where False becomes 0 and True becomes 1, and then averaging the values.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))Copy the code

Run TensorFlow

Creating TensorFlow sessions (Session)

Once the TensorFlow diagram is created, we need to create a TensorFlow session to run the diagram.

session = tf.Session()Copy the code

Initialize a variable

We need to initialize the weights and biases variables before we can start optimizing them.

session.run(tf.global_variables_initializer())Copy the code

Help functions to optimize iterations

There are 50,000 images in the training set. Using these images to calculate the gradient of the model takes a lot of time. So we use the stochastic gradient descent method, which uses only a small portion of the image in each iteration of the optimizer.

If running out of memory causes your computer to crash or become slow, you should try to reduce those numbers, but more optimized iterations may be needed in the meantime.

train_batch_size = 64Copy the code

The function performs several optimization iterations to progressively improve the variables at the network layer. In each iteration, a new batch of data is selected from the training set, which TensorFlow then uses to execute the optimizer. The information is printed out every 100 iterations.

# Counter for total number of iterations performed so far.
total_iterations = 0

def optimize(num_iterations):
    # Ensure we update the global variable rather than a local copy.
    global total_iterations

    # Start-time used for printing time-usage below.
    start_time = time.time()

    for i in range(total_iterations,
                   total_iterations + num_iterations):

        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch = data.train.next_batch(train_batch_size)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)

        # Print status every 100 iterations.
        if i % 100= =0:
            # Calculate the accuracy on the training-set.
            acc = session.run(accuracy, feed_dict=feed_dict_train)

            # Message for printing.
            msg = "Optimization Iteration: {0:>6}, Training Accuracy: {1:>6.1%}"

            # Print it.
            print(msg.format(i + 1, acc))

    # Update the total number of iterations performed.
    total_iterations += num_iterations

    # Ending time.
    end_time = time.time()

    # Difference between start and end-times.
    time_dif = end_time - start_time

    # Print the time-usage.
    print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))Copy the code

A helper function for drawing error samples

The draw () function is used to draw misclassified samples in the test set.

def plot_example_errors(cls_pred, correct):
    # This function is called from print_test_accuracy() below.

    # cls_pred is an array of the predicted class-number for
    # all images in the test-set.

    # correct is a boolean array whether the predicted class
    # is equal to the true class for each image in the test-set.

    # Negate the boolean array.
    incorrect = (correct == False)

    # Get the images from the test-set that have been
    # incorrectly classified.
    images = data.test.images[incorrect]

    # Get the predicted classes for those images.
    cls_pred = cls_pred[incorrect]

    # Get the true classes for those images.
    cls_true = data.test.cls[incorrect]

    # Plot the first 9 images.
    plot_images(images=images[0:9],
                cls_true=cls_true[0:9],
                cls_pred=cls_pred[0:9])Copy the code

Help function to draw the confusion matrix

def plot_confusion_matrix(cls_pred):
    # This is called from print_test_accuracy() below.

    # cls_pred is an array of the predicted class-number for
    # all images in the test-set.

    # Get the true classifications for the test-set.
    cls_true = data.test.cls

    # Get the confusion matrix using sklearn.
    cm = confusion_matrix(y_true=cls_true,
                          y_pred=cls_pred)

    # Print the confusion matrix as text.
    print(cm)

    # Plot the confusion matrix as an image.
    plt.matshow(cm)

    # Make various adjustments to the plot.
    plt.colorbar()
    tick_marks = np.arange(num_classes)
    plt.xticks(tick_marks, range(num_classes))
    plt.yticks(tick_marks, range(num_classes))
    plt.xlabel('Predicted')
    plt.ylabel('True')

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Help functions that show performance

The function is used to print the classification accuracy on the test set.

It will take a while to calculate the classification for all the images on the test set, so we will call the above results directly with this function so that we don’t have to recalculate each time.

This function can take up a lot of computer memory, which is why the test set is divided into smaller parts. If your computer has low memory or crashes, try lowering batch-size.

# Split the test-set into smaller batches of this size.
test_batch_size = 256

def print_test_accuracy(show_example_errors=False, show_confusion_matrix=False):

    # Number of images in the test-set.
    num_test = len(data.test.images)

    # Allocate an array for the predicted classes which
    # will be calculated in batches and filled into this array.
    cls_pred = np.zeros(shape=num_test, dtype=np.int)

    # Now calculate the predicted classes for the batches.
    # We will just iterate through all the batches.
    # There might be a more clever and Pythonic way of doing this.

    # The starting index for the next batch is denoted i.
    i = 0

    while i < num_test:
        # The ending index for the next batch is denoted j.
        j = min(i + test_batch_size, num_test)

        # Get the images from the test-set between index i and j.
        images = data.test.images[i:j, :]

        # Get the associated labels.
        labels = data.test.labels[i:j, :]

        # Create a feed-dict with these images and labels.
        feed_dict = {x: images,
                     y_true: labels}

        # Calculate the predicted class using TensorFlow.
        cls_pred[i:j] = session.run(y_pred_cls, feed_dict=feed_dict)

        # Set the start-index for the next batch to the
        # end-index of the current batch.
        i = j

    # Convenience variable for the true class-numbers of the test-set.
    cls_true = data.test.cls

    # Create a boolean array whether each image is correctly classified.
    correct = (cls_true == cls_pred)

    # Calculate the number of correctly classified images.
    # When summing a boolean array, False means 0 and True means 1.
    correct_sum = correct.sum()

    # Classification accuracy is the number of correctly classified
    # images divided by the total number of images in the test-set.
    acc = float(correct_sum) / num_test

    # Print the accuracy.
    msg = "Accuracy on Test-Set: {0:.1%} ({1} / {2})"
    print(msg.format(acc, correct_sum, num_test))

    # Plot some examples of mis-classifications, if desired.
    if show_example_errors:
        print("Example errors:")
        plot_example_errors(cls_pred=cls_pred, correct=correct)

    # Plot the confusion matrix, if desired.
    if show_confusion_matrix:
        print("Confusion Matrix:")
        plot_confusion_matrix(cls_pred=cls_pred)Copy the code

Optimize previous performance

The accuracy on the test set is very low because the model is only initialized, not optimized, so it just randomly classifies the images.

print_test_accuracy()Copy the code

Accuracy on test-set: 10.9% (1093/10000)

Performance after one iteration

After one optimization, the learning rate of the optimizer is very low and the performance does not actually improve much.

optimize(num_iterations=1)Copy the code

Optimization Iteration: 1, Training Accuracy: 6.2%

Time usage: 0:00:00

print_test_accuracy()Copy the code

Accuracy on test-set: 13.0% (1296/10000)

Optimized performance after 100 iterations

After 100 iterations of optimization, the model significantly improves the accuracy of classification.

optimize(num_iterations=99) # We already performed 1 iteration above.Copy the code

Time usage: 0:00:00

print_test_accuracy(show_example_errors=True)Copy the code

Accuracy on test-set: 66.6% (6656/10000) Example errors:

Performance after 1000 optimization iterations

After 1000 iterations of optimization, the model was more than 90% accurate on the test set.

optimize(num_iterations=900) # We performed 100 iterations above.Copy the code

Optimization Iteration: 101, Training Accuracy: 71.9% Optimization Iteration: 201, Training Accuracy: Optimization Iteration: 301, Training Accuracy: 71.9% Optimization Iteration: 401, Training Accuracy: Optimization Iteration: 501, Training Accuracy: 89.1% Optimization Iteration: 601, Training Accuracy: 95.3% Optimization Iteration: 701, Training Accuracy: 90.6% Optimization Iteration: 801, Training Accuracy: 92.2% Optimization Iteration: 901, Training Accuracy: 95.3% Time usage: 0:00:03

print_test_accuracy(show_example_errors=True)Copy the code

Accuracy on test-set: 93.1% (9308/10000) Example errors:

Performance after 10,000 optimized iterations

After 10,000 optimization iterations, the classification accuracy on the test set was 99%.

optimize(num_iterations=9000) # We performed 1000 iterations above.Copy the code

Optimization Iteration: 1001, Training Accuracy: 98.4% Optimization Iteration: 1101, Training Accuracy: 93.8% Optimization Iteration: 1201, Training Accuracy: 92.2% Optimization Iteration: 1301, Training Accuracy: 95.3% Optimization Iteration: 1401, Training Accuracy: 93.8% Optimization Iteration: 1501, Training Accuracy: Optimization Iteration: 1601, Training Accuracy: 93.8% Optimization Iteration: 1701, Training Accuracy: 92.2% Optimization Iteration: 1801, Training Accuracy: 89.1% Optimization Iteration: 1901, Training Accuracy: 95.3% Optimization Iteration: 2001, Training Accuracy: 93.8% Optimization Iteration: 2101, Training Accuracy: 98.4% Optimization Iteration: 2201, Training Accuracy: 92.2% Optimization Iteration: 2301, Training Accuracy: 95.3% Optimization Iteration: 2401, Training Accuracy: 100.0% Optimization Iteration: 2501, Training Accuracy: 96.9% Optimization Iteration: 2601, Training Accuracy: 93.8% Optimization Iteration: 2701, Training Accuracy: Optimization Iteration: 2801, Training Accuracy: 95.3% Optimization Iteration: 2901, Training Accuracy: 95.3% Optimization Iteration: 3001, Training Accuracy: 96.9% Optimization Iteration: 3101, Training Accuracy: 96.9% Optimization Iteration: 3201, Training Accuracy: 95.3% Optimization Iteration: 3301, Training Accuracy: 96.9% Optimization Iteration: 3401, Training Accuracy: 98.4% Optimization Iteration: 3501, Training Accuracy: Optimization Iteration: 3601, Training Accuracy: 98.4% Optimization Iteration: 3701, Training Accuracy: 95.3% Optimization Iteration: 3801, Training Accuracy: 95.3% Optimization Iteration: 3901, Training Accuracy: 95.3% Optimization Iteration: 4001, Training Accuracy: 100.0% Optimization Iteration: 4101, Training Accuracy: 93.8% Optimization Iteration: 4201, Training Accuracy: 95.3% Optimization Iteration: 4301, Training Accuracy: Optimization Iteration: 4401, Training Accuracy: 96.9% Optimization Iteration: 4501, Training Accuracy: 100.0% Optimization Iteration: 4601, Training Accuracy: 100.0% Optimization Iteration: 4701, Training Accuracy: Optimization Iteration: 4801, Training Accuracy: 98.4% Optimization Iteration: 4901, Training Accuracy: 98.4% Optimization Iteration: 5001, Training Accuracy: 98.4% Optimization Iteration: 5101, Training Accuracy: Optimization Iteration: 5201, Training Accuracy: 95.3% Optimization Iteration: 5301, Training Accuracy: 96.9% Optimization Iteration: 5401, Training Accuracy: 100.0% Optimization Iteration: 5501, Training Accuracy: 100.0% Optimization Iteration: 5601, Training Accuracy: 100.0% Optimization Iteration: 5701, Training Accuracy: 96.9% Optimization Iteration: 5801, Training Accuracy: 98.4% Optimization Iteration: 5901, Training Accuracy: Optimization Iteration: 6001, Training Accuracy: 95.3% Optimization Iteration: 6101, Training Accuracy: 96.9% Optimization Iteration: 6201, Training Accuracy: 100.0% Optimization Iteration: 6301, Training Accuracy: 96.9% Optimization Iteration: 6401, Training Accuracy: 100.0% Optimization Iteration: 6501, Training Accuracy: 98.4% Optimization Iteration: 6601, Training Accuracy: 98.4% Optimization Iteration: 6701, Training Accuracy: 95.3% Optimization Iteration: 6801, Training Accuracy: 100.0% Optimization Iteration: 6901, Training Accuracy: 98.4% Optimization Iteration: 7001, Training Accuracy: 95.3% Optimization Iteration: 7101, Training Accuracy: 100.0% Optimization Iteration: 7201, Training Accuracy: 100.0% Optimization Iteration: 7301, Training Accuracy: 100% Optimization Iteration: 7401, Training Accuracy: 100% Optimization Iteration: 7501, Training Accuracy: Optimization Iteration: 7601, Training Accuracy: 96.9% Optimization Iteration: 7701, Training Accuracy: 98.4% Optimization Iteration: 7801, Training Accuracy: 95.3% Optimization Iteration: 7901, Training Accuracy: 100.0% Optimization Iteration: 8001, Training Accuracy: 100.0% Optimization Iteration: 8101, Training Accuracy: 98.4% Optimization Iteration: 8201, Training Accuracy: 98.4% Optimization Iteration: 8301, Training Accuracy: Optimization Iteration: 8401, Training Accuracy: 96.9% Optimization Iteration: 8501, Training Accuracy: 98.4% Optimization Iteration: 8601, Training Accuracy: 98.4% Optimization Iteration: 8701, Training Accuracy: 100.0% Optimization Iteration: 8801, Training Accuracy: 100.0% Optimization Iteration: 8901, Training Accuracy: 98.4% Optimization Iteration: 9001, Training Accuracy: 95.3% Optimization Iteration: 9101, Training Accuracy: 100.0% Optimization Iteration: 9201, Training Accuracy: 100.0% Optimization Iteration: 9301, Training Accuracy: 96.9% Optimization Iteration: 9401, Training Accuracy: 9501, Training Accuracy: 98.4% Optimization Iteration: 9601, Training Accuracy: 100.0% Optimization Iteration: 9701, Training Accuracy: 96.9% Optimization Iteration: 9801, Training Accuracy: 98.4% Optimization Iteration: 9901, Training Accuracy: 98.4% of the Time usage: 0:00:26

print_test_accuracy(show_example_errors=True,
                    show_confusion_matrix=True)Copy the code

Accuracy on test-set: 98.8% (9880/10000) Example errors:

Confusion Matrix: [[973 0 100 11 0 3 1] [0 1129 2 100 11 1002 1002 2 002 2 00 02 2 00 00 00] [10 1 1002 0 3 0 12 0] [0 10 974 0 10 24] [2 00 3 0 882 2 0 1 2] [4 100 1 4 948 00 0] [1 4 11 2 00 1004 2 4] [3 04 2 1 2 0 960 2] [3 4 100 7 5 0 2 2 985]]

Visualization of weights and layers

To understand why convolutional neural networks can recognize handwritten numbers, we will visualize the convolutional filtering and the output image.

Draw the help function of the convolution weights

def plot_conv_weights(weights, input_channel=0):
    # Assume weights are TensorFlow ops for 4-dim variables
    # e.g. weights_conv1 or weights_conv2.

    # Retrieve the values of the weight-variables from TensorFlow.
    # A feed-dict is not necessary because nothing is calculated.
    w = session.run(weights)

    # Get the lowest and highest values for the weights.
    # This is used to correct the colour intensity across
    # the images so they can be compared with each other.
    w_min = np.min(w)
    w_max = np.max(w)

    # Number of filters used in the conv. layer.
    num_filters = w.shape[3]

    # Number of grids to plot.
    # Rounded-up, square-root of the number of filters.
    num_grids = math.ceil(math.sqrt(num_filters))

    # Create figure with a grid of sub-plots.
    fig, axes = plt.subplots(num_grids, num_grids)

    # Plot all the filter-weights.
    for i, ax in enumerate(axes.flat):
        # Only plot the valid filter-weights.
        if i<num_filters:
            # Get the weights for the i'th filter of the input channel.
            # See new_conv_layer() for details on the format
            # of this 4-dim tensor.
            img = w[:, :, input_channel, i]

            # Plot image.
            ax.imshow(img, vmin=w_min, vmax=w_max,
                      interpolation='nearest', cmap='seismic')

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Draw the help function for the convolution layer output

def plot_conv_layer(layer, image):
    # Assume layer is a TensorFlow op that outputs a 4-dim tensor
    # which is the output of a convolutional layer,
    # e.g. layer_conv1 or layer_conv2.

    # Create a feed-dict containing just one image.
    # Note that we don't need to feed y_true because it is
    # not used in this calculation.
    feed_dict = {x: [image]}

    # Calculate and retrieve the output values of the layer
    # when inputting that image.
    values = session.run(layer, feed_dict=feed_dict)

    # Number of filters used in the conv. layer.
    num_filters = values.shape[3]

    # Number of grids to plot.
    # Rounded-up, square-root of the number of filters.
    num_grids = math.ceil(math.sqrt(num_filters))

    # Create figure with a grid of sub-plots.
    fig, axes = plt.subplots(num_grids, num_grids)

    # Plot the output images of all the filters.
    for i, ax in enumerate(axes.flat):
        # Only plot the images for valid filters.
        if i<num_filters:
            # Get the output image of using the i'th filter.
            # See new_conv_layer() for details on the format
            # of this 4-dim tensor.
            img = values[0, :, :, i]

            # Plot image.
            ax.imshow(img, interpolation='nearest', cmap='binary')

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

The input image

Help function to draw an image

def plot_image(image):
    plt.imshow(image.reshape(img_shape),
               interpolation='nearest',
               cmap='binary')

    plt.show()Copy the code

Draw an image from the test set as shown below.

image1 = data.test.images[0]
plot_image(image1)Copy the code

Draws another image from the test set.

image2 = data.test.images[13]
plot_image(image2)Copy the code

Convolution layer 1

Now plot the filter weights for the first convolution layer.

Where positive weight is red, negative weight is blue.

plot_conv_weights(weights=weights_conv1)Copy the code

These convolution filters are added to the first input image, resulting in the following outputs, which also serve as inputs to the second convolution layer. Note that these images are sampled down to 14 x 14 pixels, which is half the resolution of the original input image.

plot_conv_layer(layer=layer_conv1, image=image1)Copy the code

Here is the result of adding convolution filtering to the second image.

plot_conv_layer(layer=layer_conv1, image=image2)Copy the code

It’s hard to see from these images what convolution filtering does. Obviously, they produce some variation of the input image, as light strikes the image from different angles and creates shadows.

Convolution layer 2

Now plot the filtering weights for the second convolution layer.

The first convolutional layer has 16 output channels, representing 16 inputs in the second volume base. The second convolution layer also has some weight filtering for each input channel. Let’s first draw the weight filter for the first channel.

Similarly, positive values are red, negative values are blue.

plot_conv_weights(weights=weights_conv2, input_channel=0)Copy the code

There are 16 input channels in the second convolution layer, and we can do the same for other images. So let’s draw the second channel here.

plot_conv_weights(weights=weights_conv2, input_channel=1)Copy the code

Because these filters are highly dimensional, it is difficult to understand how they are applied.

Adding these filters to the output of the first convolution layer yields the following image.

These images are sampled down to 7 x 7 pixels, which is half the output of the previous convolution layer.

plot_conv_layer(layer=layer_conv2, image=image1)Copy the code

This is the result of adding filter weights to the second image.

plot_conv_layer(layer=layer_conv2, image=image2)Copy the code

From these images, it appears that the second convolution layer detects line segments and patterns in the input image, which are less sensitive to local changes in the input graph.

Close the TensorFlow session

We have now completed the task with TensorFlow, closing the session and freeing resources.

# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()Copy the code

conclusion

We see that convolutional neural networks are much better at recognizing handwritten numbers than the simple linear model shown in Tutorial #01. Convolutional neural networks are probably 99 percent accurate at classification, and maybe even better if you make some tweaks, whereas simple linear models are only 91 percent accurate.

However, convolutional neural networks are more complex to implement, and it is not easy to see why weight filtering works or fails.

So we need a simpler way to implement convolutional neural networks, and a better way to visualize their inner workings.

practice

Here are some suggested exercises that might help you improve your TensorFlow skills. In order to learn how to use TensorFlow more appropriately, practical experience is important.

Before you can make changes to this Notebook, you may want to make a backup.

  • If you run the Notebook multiple times without changing any parameters, would you get the same result? What is the source of randomness?

  • Do another 10,000 optimizations. Did the results get better?

  • Change the optimizer’s learning rate.

  • Change the attributes of the hierarchy, such as the number of convolution filters, the size of filters, the number of neurons in the fully connected layer and so on.

  • Add a drop-out layer after the full connection layer. The drop-out layer might be 0 when you calculate the classification accuracy, so you need a placeholder variable.

  • Change the order of ReLU and max-pooling. Does it compute the same? What’s the fastest way to do that? How much computation is saved? Does this also apply to sigmoid-function and average-pooling?

  • Add one or more convolution layers and full connection layers. Does this help performance?

  • What’s the smallest possible configuration that gives you good results?

  • Try using ReLU in the last fully connected layer. Is there a change in performance? Why is that?

  • Pooling is not required in the convolution layer. Does this affect classification accuracy and training time?

  • Replace max-pooling with 2×2 stride in the convolution layer. Has anything changed?

  • Don’t read the source code, rewrite the program.

  • Explain to a friend how the program works.