Photo: Github

PrettyTensor is used to build neural networks quickly. Of course, the original was written in ’16, and now there are more convenient apis, which will be covered later. I’ve got a lot of code and a lot of text from the previous tutorial, so if you’ve seen the last one you’ll need to scroll through the sequence that PrettyTensor implements.

Simple Linear Model/convolutional Neural Network

By Magnus Erik Hvass Pedersen/GitHub/Videos on YouTube 英 文翻译

If reproduced, please attach a link to this article.


introduce

The previous tutorial demonstrated how to implement a convolutional neural network in TensorFlow, which requires some understanding of the underlying principles of how TensorFlow works. It’s a bit complicated and error-prone to implement.

This tutorial shows you how to use PrettyTensor, an add-on package for TensorFlow, which was also developed by Google. PrettyTensor provides an easier way to create neural networks in TensorFlow, allowing us to focus on the ideas we want to implement without worrying too much about the underlying implementation details. This also makes the code shorter and easier to read and modify.

Aside from using PrettyTensor to construct the diagrams, most of the code for this tutorial is the same as in Tutorial #02, with a few minor changes of course.

This tutorial is based on tutorial #02 and is recommended if you are new to TensorFlow. You will need to be familiar with basic linear algebra, Python, and the Jupyter Notebook editor.

The flow chart

The chart below directly shows the data transfer in the convolutional neural network later realized. See the previous tutorial for a detailed description of convolution.

from IPython.display import Image
Image('images/02_network_flowchart.png')Copy the code

The input image is processed with a weight filter at the first volume base. The result is 16 new graphs, each representing a filter in the convolution layer. The image was also downsampled so that the image resolution was reduced from 28×28 to 14×14.

These 16 small graphs are processed in the second convolution layer. Each of the 16 channels needs a weight filter, and each channel of the output of this layer also needs a weight filter. There are 36 outputs in total, so there are 16 x 36 = 576 filters in the second convolution layer. The output image is again sampled down to 7×7 pixels.

The output of the second convolution layer is 36 7×7 pixel images. They are pressed into a vector of length 7 x 7 x 36 = 1764, which serves as input to a fully connected network of 128 neurons (or elements). These are fed into another fully connected layer of 10 neurons, each representing a category, to determine the category of the image, the number on the image.

Convolution filtering is randomly selected at first, so classification is also done randomly. The error between the predicted value of the input graph and the true category is measured according to the cross-entropy. Then the optimizer uses the chain rule to automatically transmit the error in the convolutional network and updates the filtering weight to improve the classification quality. This process was iterated thousands of times until the classification error was low enough.

These specific filtering weights and intermediate images are the result of an optimization that may differ from what you see when you execute the code.

Note that these calculations on TensorFlow are performed on a subset of images rather than a single graph, which makes the calculations more efficient. It also means that when implemented on TensorFlow, the flowchart actually has more data dimensions.

The import

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math

# We also need PrettyTensor.
import prettytensor as ptCopy the code

Developed using Python3.5.2 (Anaconda), the TensorFlow version is:

tf.__version__Copy the code

‘0.12.0 – rc0’

PrettyTensor version:

pt.__version__Copy the code

‘0.7.1’

Load the data

MNIST datasets are approximately 12MB and are automatically downloaded if not found in a given path.

from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/MNIST/', one_hot=True)Copy the code

Extracting data/MNIST/train-images-idx3-ubyte.gz

Extracting data/MNIST/train-labels-idx1-ubyte.gz

Extracting data/MNIST/t10k-images-idx3-ubyte.gz

Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST dataset has now been loaded, consisting of 70,000 images with corresponding labels (such as the category of the image). The data set is divided into three independent subsets. We will use only the training set and test set in the tutorial.

print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))Copy the code

Size of:

-Training-set: 55000

-Test-set: 10000

-Validation-set: 5000

The type labels are one-hot encoded, so each label is a vector of length 10, zero for all but One element. The index of this element is the number of the category, the number drawn in the corresponding image. We also need to test the integer value of the data set category number using the following method.

data.test.cls = np.argmax(data.test.labels, axis=1)Copy the code

Data dimension

In the source code below, data dimensions are used in many places. They are only defined in one place, so we can use these numbers in our code instead of writing numbers directly.

# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1

# Number of classes, one class for each of 10 digits.
num_classes = 10Copy the code

Help function for drawing pictures

This function is used to draw nine images in a 3×3 grid and write the real category and the predicted category under each image.

def plot_images(images, cls_true, cls_pred=None):
    assert len(images) == len(cls_true) == 9

    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3.3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape(img_shape), cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Draw a few images to see if the data is correct

# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)Copy the code

TensorFlow figure

The whole point of TensorFlow is to use something called computational graph, which is much more efficient than doing the same amount of computation directly in Python. TensorFlow is more efficient than Numpy because TensorFlow knows the entire graph that needs to be run, whereas Numpy only knows the unique mathematical operation at a point in time.

TensorFlow also automatically calculates gradients of variables that need to be optimized for better model performance. This is because the graph is a combination of simple mathematical expressions, so the gradient of the entire graph can be derived using the chain rule.

TensorFlow also takes advantage of multi-core cpus and gpus. Google has made special chips for TensorFlow called Tensor Processing Units (TPUs), which are faster than gpus.

A TensorFlow diagram consists of the following parts, described in detail below:

  • Placeholder variables are used to change the input to the diagram.
  • The Model variables will be optimized to make the Model perform better.
  • The model is essentially just a bunch of mathematical functions that compute some outputs based on the Placeholder and the input variables of the model.
  • A cost measure is used to guide the optimization of variables.
  • An optimization strategy updates the variables of the model.

In addition, the TensorFlow diagram contains debugging states, such as printing log data with TensorBoard, which are not covered in this tutorial.

Placeholder variables

Placeholder is the input to the diagram, and we can change them every time we run the diagram. Call this process the feeding placeholder variable, which will be described later.

First we define the placeholder variable for the input image. This allows us to change the image we input into the TensorFlow diagram. This is also a tensor, which means a multidimensional vector or matrix. Set the type to float32 and the shape to [None, img_size_flat]. None means tensor has an arbitrary number of images, each image is a vector of img_size_flat.

x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x')Copy the code

The convolutional layer wants x to be encoded as a 4-dimensional tensor, so we need to convert its shape to [num_images, img_height, IMg_width, num_channels]. Note that img_height == img_width == img_size, if the first dimension is set to -1, the size of num_images will also be derived automatically. The conversion operation is as follows:

x_image = tf.reshape(x, [- 1, img_size, img_size, num_channels])Copy the code

Next we define placeholder variables for the actual tags that correspond to the image in the input variable X. The variable has the shape [None, num_classes], which means it holds any number of labels, each of which is a vector of length num_classes, which in this case is 10.

y_true = tf.placeholder(tf.float32, shape=[None.10], name='y_true')Copy the code

We could also provide a placeholder for class-number, but we’ll calculate that in argmax. Here are just a few operations in TensorFlow; no operations are performed.

y_true_cls = tf.argmax(y_true, dimension=1)Copy the code

TensorFlow implementation

This section shows the source code for tutorial #02 for implementing convolutional neural networks directly with TensorFlow. The Notebook doesn’t use this code directly, just for comparison with the PrettyTensor implementation below.

Notice how much code is involved and the low-level details of how TensorFlow stores the data and performs the calculations. Even small neural networks are prone to error.

Help function

When implemented directly with TensorFlow, we create some helper functions that are often used in constructing diagrams.

These two functions create new variables in the TensorFlow diagram and initialize them with random values.

def new_weights(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))Copy the code
def new_biases(length):
    return tf.Variable(tf.constant(0.05, shape=[length]))Copy the code

The following help function creates a new convolutional network. The inputs and outputs are 4-dimensional tensors (4-Rank tensors). Notice the low-level details of the TensorFlow API, such as the size of the weight variable. It’s easy to make mistakes, can lead to strange error messages, and can be hard to debug.

def new_conv_layer(input, # The previous layer. num_input_channels, # Num. channels in prev. layer. filter_size, # Width and height of filters. num_filters, # Number of filters. use_pooling=True):  # Use 2x2 max-pooling.

    # Shape of the filter-weights for the convolution.
    # This format is determined by the TensorFlow API.
    shape = [filter_size, filter_size, num_input_channels, num_filters]

    # Create new weights aka. filters with the given shape.
    weights = new_weights(shape=shape)

    # Create new biases, one for each filter.
    biases = new_biases(length=num_filters)

    # Create the TensorFlow operation for convolution.
    # Note the strides are set to 1 in all dimensions.
    # The first and last stride must always be 1,
    # because the first is for the image-number and
    # the last is for the input-channel.
    # But e.g. strides=[1, 2, 2, 1] would mean that the filter
    # is moved 2 pixels across the x- and y-axis of the image.
    # The padding is set to 'SAME' which means the input image
    # is padded with zeroes so the size of the output is the same.
    layer = tf.nn.conv2d(input=input,
                         filter=weights,
                         strides=[1.1.1.1],
                         padding='SAME')

    # Add the biases to the results of the convolution.
    # A bias-value is added to each filter-channel.
    layer += biases

    # Use pooling to down-sample the image resolution?
    if use_pooling:
        # This is 2x2 max-pooling, which means that we
        # consider 2x2 windows and select the largest value
        # in each window. Then we move 2 pixels to the next window.
        layer = tf.nn.max_pool(value=layer,
                               ksize=[1.2.2.1],
                               strides=[1.2.2.1],
                               padding='SAME')

    # Rectified Linear Unit (ReLU).
    # It calculates max(x, 0) for each input pixel x.
    # This adds some non-linearity to the formula and allows us
    # to learn more complicated functions.
    layer = tf.nn.relu(layer)

    # Note that ReLU is normally executed before the pooling,
    # but since relu(max_pool(x)) == max_pool(relu(x)) we can
    # save 75% of the relu-operations by max-pooling first.

    # We return both the resulting layer and the filter-weights
    # because we will plot the weights later.
    return layer, weightsCopy the code

The following helper function transforms a 4-dimensional tensor into a 2-dimensional one, so we can add a full connection layer after the convolution layer.

def flatten_layer(layer):
    # Get the shape of the input layer.
    layer_shape = layer.get_shape()

    # The shape of the input layer is assumed to be:
    # layer_shape == [num_images, img_height, img_width, num_channels]

    # The number of features is: img_height * img_width * num_channels
    # We can use a function from TensorFlow to calculate this.
    num_features = layer_shape[1:4].num_elements()

    # Reshape the layer to [num_images, num_features].
    # Note that we just set the size of the second dimension
    # to num_features and the size of the first dimension to -1
    # which means the size in that dimension is calculated
    # so the total size of the tensor is unchanged from the reshaping.
    layer_flat = tf.reshape(layer, [- 1, num_features])

    # The shape of the flattened layer is now:
    # [num_images, img_height * img_width * num_channels]

    # Return both the flattened layer and the number of features.
    return layer_flat, num_featuresCopy the code

The next helper function creates a full connection layer.

def new_fc_layer(input, # The previous layer. num_inputs, # Num. inputs from prev. layer. num_outputs, # Num. outputs. use_relu=True): # Use Rectified Linear Unit (ReLU)?

    # Create new weights and biases.
    weights = new_weights(shape=[num_inputs, num_outputs])
    biases = new_biases(length=num_outputs)

    # Calculate the layer as the matrix multiplication of
    # the input and weights, and then add the bias-values.
    layer = tf.matmul(input, weights) + biases

    # Use ReLU?
    if use_relu:
        layer = tf.nn.relu(layer)

    return layerCopy the code

Constructing a Graph

We will now use the above helper function to create a convolutional neural network. Without these functions, the code would be long and hard to understand.

Note that we will not run the following code. I wrote it here just for comparison with PrettyTensor’s code.

Previous tutorials used defined constants, so it was easy to change. For example, instead of taking filter_size=5 as an argument to new_conv_layer(), we make filter_size=filter_size1 and define filter_size1=5 elsewhere. So it’s easy to change all the constants.

if False:  # Don't execute this! Just show it for easy comparison.
    # First convolutional layer.
    layer_conv1, weights_conv1 = \
        new_conv_layer(input=x_image,
                       num_input_channels=num_channels,
                       filter_size=5,
                       num_filters=16,
                       use_pooling=True)

    # Second convolutional layer.
    layer_conv2, weights_conv2 = \
        new_conv_layer(input=layer_conv1,
                       num_input_channels=16,
                       filter_size=5,
                       num_filters=36,
                       use_pooling=True)

    # Flatten layer.
    layer_flat, num_features = flatten_layer(layer_conv2)

    # First fully-connected layer.
    layer_fc1 = new_fc_layer(input=layer_flat,
                             num_inputs=num_features,
                             num_outputs=128,
                             use_relu=True)

    # Second fully-connected layer.
    layer_fc2 = new_fc_layer(input=layer_fc1,
                             num_inputs=128,
                             num_outputs=num_classes,
                             use_relu=False)

    # Predicted class-label.
    y_pred = tf.nn.softmax(layer_fc2)

    # Cross-entropy for the classification of each image.
    cross_entropy = \
        tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
                                                labels=y_true)

    # Loss aka. cost-measure.
    # This is the scalar value that must be minimized.
    loss = tf.reduce_mean(cross_entropy)Copy the code

PrettyTensor implementation

This video shows you how to implement the same kind of convolutional neural network using PrettyTensor.

The basic idea is to encapsulate the input tensor X_image with a PrettyTensor object, which has a helper function that adds a new convolutional layer to create the whole neural network. This is a little bit like the helper functions we’ve implemented before, but it’s a little simpler because PrettyTensor keeps track of the input and output dimensions of each layer and so on.

x_pretty = pt.wrap(x_image)Copy the code

Now that we’ve put our input image into a PrettyTensor object, we can add a convolutional layer and a full connected layer with a few lines of code.

Note that in the with block, pt.defaults_scope(activation_fn=tf.nn.relu) takes activation_fn=tf.nn.relu as each layer argument, 1. Therefore, these layers have been Rectified Linear Units (ReLU). Defaults_scope makes it easier to change the parameters of all layers.

with pt.defaults_scope(activation_fn=tf.nn.relu):
    y_pred, loss = x_pretty.\
        conv2d(kernel=5, depth=16, name='layer_conv1').\
        max_pool(kernel=2, stride=2).\
        conv2d(kernel=5, depth=36, name='layer_conv2').\
        max_pool(kernel=2, stride=2).\
        flatten().\
        fully_connected(size=128, name='layer_fc1').\
        softmax_classifier(num_classes=num_classes, labels=y_true)Copy the code

That’s it! Now we have created the exact same convolutional neural network in just a few lines of code, which would have required a very complicated piece of code to implement directly with TensorFlow.

By using PrettyTensor instead of TensorFlow, we can see clearly how the network is constructed and how data flows through it. This allows us to focus on the key ideas of neural networks rather than the low-level implementation details. It’s so simple and elegant!

To get the weight

Unfortunately, not everything is elegant when you use PrettyTensor.

Now, we want to plot the weights of the convolution layer. When implemented with TensorFlow, we created the variables ourselves, so we could access them directly. But when you use PrettyTensor to construct a network, all the variables are created indirectly by PrettyTensor. So we have to retrieve variables from TensorFlow.

We use layer_conv1 and layer_conv2 to represent two convolution layers. This is also called variable scope (not to be confused with defaults_scope described above). PrettyTensor automatically names the variables it creates for each level, so we can use the scope name and the variable name to get the weight of a level.

The function implementation is a bit clunky because we have to use the TensorFlow function get_variable(), which is designed for other uses, creating new variables or reusing existing ones. Creating the following help function is simple.

def get_weights_variable(layer_name):
    # Retrieve an existing variable named 'weights' in the scope
    # with the given layer_name.
    # This is awkward because the TensorFlow function was
    # really intended for another purpose.

    with tf.variable_scope(layer_name, reuse=True):
        variable = tf.get_variable('weights')

    return variableCopy the code

With this helper function we can get variables. These are the objects of TensorFlow. You need a similar operation to get the contents of the variable: contents = session.run(weights_conv1), as described below.

weights_conv1 = get_weights_variable(layer_name='layer_conv1')
weights_conv2 = get_weights_variable(layer_name='layer_conv2')Copy the code

An optimization method

PrettyTensor gives us predictive type labels (Y_pred) and a loss metric that needs to be minimized to improve the neural network’s ability to classify images.

The documentation for PrettyTensor doesn’t say whether its loss measure is cross-entropy or something else. But now we use AdamOptimizer to minimize the damage.

The optimization process is not performed here. In fact, we haven’t computed anything yet, we just added the optimizer to the TensorFlow diagram for later operations.

optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(loss)Copy the code

Performance measurement

We need additional performance measures to show the user the process.

First we calculate the predicted category from the neural network output y_pred, which is a vector containing 10 elements. The category number is the index of the largest element.

y_pred_cls = tf.argmax(y_pred, dimension=1)Copy the code

We then create a Boolean vector that tells us whether the true category of each image is the same as the predicted category.

correct_prediction = tf.equal(y_pred_cls, y_true_cls)Copy the code

The above calculation calculates the accuracy of the classification by converting the Boolean vector type to a floating-point vector, where False becomes 0 and True becomes 1, and then averaging the values.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))Copy the code

Run TensorFlow

Creating TensorFlow sessions (Session)

Once the TensorFlow diagram is created, we need to create a TensorFlow session to run the diagram.

session = tf.Session()Copy the code

Initialize a variable

We need to initialize the weights and biases variables before we can start optimizing them.

session.run(tf.global_variables_initializer())Copy the code

Help functions to optimize iterations

There are 50,000 images in the training set. Using these images to calculate the gradient of the model takes a lot of time. So we use the stochastic gradient descent method, which uses only a small portion of the image in each iteration of the optimizer. If running out of memory causes your computer to crash or become slow, you should try to reduce those numbers, but more optimized iterations may be needed in the meantime.

train_batch_size = 64Copy the code

The function performs several optimization iterations to progressively improve the variables at the network layer. In each iteration, a new batch of data is selected from the training set, which TensorFlow then uses to execute the optimizer. The information is printed out every 100 iterations.

# Counter for total number of iterations performed so far.
total_iterations = 0

def optimize(num_iterations):
    # Ensure we update the global variable rather than a local copy.
    global total_iterations

    # Start-time used for printing time-usage below.
    start_time = time.time()

    for i in range(total_iterations,
                   total_iterations + num_iterations):

        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch = data.train.next_batch(train_batch_size)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)

        # Print status every 100 iterations.
        if i % 100= =0:
            # Calculate the accuracy on the training-set.
            acc = session.run(accuracy, feed_dict=feed_dict_train)

            # Message for printing.
            msg = "Optimization Iteration: {0:>6}, Training Accuracy: {1:>6.1%}"

            # Print it.
            print(msg.format(i + 1, acc))

    # Update the total number of iterations performed.
    total_iterations += num_iterations

    # Ending time.
    end_time = time.time()

    # Difference between start and end-times.
    time_dif = end_time - start_time

    # Print the time-usage.
    print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))Copy the code

A helper function for drawing error samples

The draw () function is used to draw misclassified samples in the test set.

def plot_example_errors(cls_pred, correct):
    # This function is called from print_test_accuracy() below.

    # cls_pred is an array of the predicted class-number for
    # all images in the test-set.

    # correct is a boolean array whether the predicted class
    # is equal to the true class for each image in the test-set.

    # Negate the boolean array.
    incorrect = (correct == False)

    # Get the images from the test-set that have been
    # incorrectly classified.
    images = data.test.images[incorrect]

    # Get the predicted classes for those images.
    cls_pred = cls_pred[incorrect]

    # Get the true classes for those images.
    cls_true = data.test.cls[incorrect]

    # Plot the first 9 images.
    plot_images(images=images[0:9],
                cls_true=cls_true[0:9],
                cls_pred=cls_pred[0:9])Copy the code

Help function to draw the confusion matrix

def plot_confusion_matrix(cls_pred):
    # This is called from print_test_accuracy() below.

    # cls_pred is an array of the predicted class-number for
    # all images in the test-set.

    # Get the true classifications for the test-set.
    cls_true = data.test.cls

    # Get the confusion matrix using sklearn.
    cm = confusion_matrix(y_true=cls_true,
                          y_pred=cls_pred)

    # Print the confusion matrix as text.
    print(cm)

    # Plot the confusion matrix as an image.
    plt.matshow(cm)

    # Make various adjustments to the plot.
    plt.colorbar()
    tick_marks = np.arange(num_classes)
    plt.xticks(tick_marks, range(num_classes))
    plt.yticks(tick_marks, range(num_classes))
    plt.xlabel('Predicted')
    plt.ylabel('True')

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Help functions that show performance

The function is used to print the classification accuracy on the test set.

It will take a while to calculate the classification for all the images on the test set, so we will call the above results directly with this function so that we don’t have to recalculate each time.

This function can take up a lot of computer memory, which is why the test set is divided into smaller parts. If your computer has low memory or crashes, try lowering batch-size.

# Split the test-set into smaller batches of this size.
test_batch_size = 256

def print_test_accuracy(show_example_errors=False, show_confusion_matrix=False):

    # Number of images in the test-set.
    num_test = len(data.test.images)

    # Allocate an array for the predicted classes which
    # will be calculated in batches and filled into this array.
    cls_pred = np.zeros(shape=num_test, dtype=np.int)

    # Now calculate the predicted classes for the batches.
    # We will just iterate through all the batches.
    # There might be a more clever and Pythonic way of doing this.

    # The starting index for the next batch is denoted i.
    i = 0

    while i < num_test:
        # The ending index for the next batch is denoted j.
        j = min(i + test_batch_size, num_test)

        # Get the images from the test-set between index i and j.
        images = data.test.images[i:j, :]

        # Get the associated labels.
        labels = data.test.labels[i:j, :]

        # Create a feed-dict with these images and labels.
        feed_dict = {x: images,
                     y_true: labels}

        # Calculate the predicted class using TensorFlow.
        cls_pred[i:j] = session.run(y_pred_cls, feed_dict=feed_dict)

        # Set the start-index for the next batch to the
        # end-index of the current batch.
        i = j

    # Convenience variable for the true class-numbers of the test-set.
    cls_true = data.test.cls

    # Create a boolean array whether each image is correctly classified.
    correct = (cls_true == cls_pred)

    # Calculate the number of correctly classified images.
    # When summing a boolean array, False means 0 and True means 1.
    correct_sum = correct.sum()

    # Classification accuracy is the number of correctly classified
    # images divided by the total number of images in the test-set.
    acc = float(correct_sum) / num_test

    # Print the accuracy.
    msg = "Accuracy on Test-Set: {0:.1%} ({1} / {2})"
    print(msg.format(acc, correct_sum, num_test))

    # Plot some examples of mis-classifications, if desired.
    if show_example_errors:
        print("Example errors:")
        plot_example_errors(cls_pred=cls_pred, correct=correct)

    # Plot the confusion matrix, if desired.
    if show_confusion_matrix:
        print("Confusion Matrix:")
        plot_confusion_matrix(cls_pred=cls_pred)Copy the code

Optimize previous performance

The accuracy on the test set is very low because the model is only initialized, not optimized, so it just randomly classifies the images.

print_test_accuracy()Copy the code
Accuracy on test-set: 9.1% (909/10000)Copy the code

Performance after one iteration

After one optimization, the learning rate of the optimizer is very low and the performance does not actually improve much.

optimize(num_iterations=1)Copy the code

Optimization Iteration: 1, Training Accuracy: 6.2%

Time usage: 0:00:00

print_test_accuracy()Copy the code

Accuracy on test-set: 8.9% (892/10000)

Optimized performance after 100 iterations

After 100 iterations of optimization, the model significantly improves the accuracy of classification.

optimize(num_iterations=99) # We already performed 1 iteration above.Copy the code

Time usage: 0:00:00

print_test_accuracy(show_example_errors=True)Copy the code

Accuracy on test-set: 83.9% (8393/10000) Example errors:

Performance after 1000 optimization iterations

After 1000 iterations of optimization, the model was more than 90% accurate on the test set.

optimize(num_iterations=900) # We performed 100 iterations above.Copy the code

Optimization Iteration: 101, Training Accuracy: 93.8% Optimization Iteration: 201, Training Accuracy: 89.1% Optimization Iteration: 301, Training Accuracy: 85.9% Optimization Iteration: 401, Training Accuracy: Optimization Iteration: 501, Training Accuracy: 92.2% Optimization Iteration: 601, Training Accuracy: 95.3% Optimization Iteration: 701, Training Accuracy: 95.3% Optimization Iteration: 801, Training Accuracy: 90.6% Optimization Iteration: 901, Training Accuracy: 98.4% Time usage: 0:00:03

print_test_accuracy(show_example_errors=True)Copy the code

Accuracy on test-set: 96.3% (9634/10000) Example errors:

Performance after 10,000 optimized iterations

After 10,000 optimization iterations, the classification accuracy on the test set was 99%.

optimize(num_iterations=9000) # We performed 1000 iterations above.Copy the code

Optimization Iteration: 1001, Training Accuracy: 98.4%

Optimization Iteration: 1101, Training Accuracy: 95.3%

Optimization Iteration: 1201, Training Accuracy: 98.4%

Optimization Iteration: 1301, Training Accuracy: 96.9%

Optimization Iteration: 1401, Training Accuracy: 100.0%

Optimization Iteration: 1501, Training Accuracy: 95.3%

Optimization Iteration: 1601, Training Accuracy: 96.9%

Optimization Iteration: 1701, Training Accuracy: 96.9%

Optimization Iteration: 1801, Training Accuracy: 98.4%

Optimization Iteration: 1901, Training Accuracy: 96.9%

Optimization Iteration: 2001, Training Accuracy: 98.4%

Optimization Iteration: 2101, Training Accuracy: 95.3%

Optimization Iteration: 2201, Training Accuracy: 98.4%

Optimization Iteration: 2301, Training Accuracy: 98.4%

Optimization Iteration: 2401, Training Accuracy: 98.4%

Optimization Iteration: 2501, Training Accuracy: 93.8%

Optimization Iteration: 2601, Training Accuracy: 98.4%

Optimization Iteration: 2701, Training Accuracy: 98.4%

Optimization Iteration: 2801, Training Accuracy: 95.3%

Optimization Iteration: 2901, Training Accuracy: 98.4%

Optimization Iteration: 3001, Training Accuracy: 98.4%

Optimization Iteration: 3101, Training Accuracy: 100.0%

Optimization Iteration: 3201, Training Accuracy: 96.9%

Optimization Iteration: 3301, Training Accuracy: 100.0%

Optimization Iteration: 3401, Training Accuracy: 98.4%

Optimization Iteration: 3501, Training Accuracy: 96.9%

Optimization Iteration: 3601, Training Accuracy: 98.4%

Optimization Iteration: 3701, Training Accuracy: 96.9%

Optimization Iteration: 3801, Training Accuracy: 100.0%

Optimization Iteration: 3901, Training Accuracy: 98.4%

Optimization Iteration: 4001, Training Accuracy: 96.9%

Optimization Iteration: 4101, Training Accuracy: 98.4%

Optimization Iteration: 4201, Training Accuracy: 100.0%

Optimization Iteration: 4301, Training Accuracy: 100.0%

Optimization Iteration: 4401, Training Accuracy: 100.0%

Optimization Iteration: 4501, Training Accuracy: 100.0%

Optimization Iteration: 4601, Training Accuracy: 98.4%

Optimization Iteration: 4701, Training Accuracy: 96.9%

Optimization Iteration: 4801, Training Accuracy: 95.3%

Optimization Iteration: 4901, Training Accuracy: 100.0%

Optimization Iteration: 5001, Training Accuracy: 96.9%

Optimization Iteration: 5101, Training Accuracy: 100.0%

Optimization Iteration: 5201, Training Accuracy: 98.4%

Optimization Iteration: 5301, Training Accuracy: 98.4%

Optimization Iteration: 5401, Training Accuracy: 100.0%

Optimization Iteration: 5501, Training Accuracy: 98.4%

Optimization Iteration: 5601, Training Accuracy: 96.9%

Optimization Iteration: 5701, Training Accuracy: 100.0%

Optimization Iteration: 5801, Training Accuracy: 96.9%

Optimization Iteration: 5901, Training Accuracy: 100.0%

Optimization Iteration: 6001, Training Accuracy: 98.4%

Optimization Iteration: 6101, Training Accuracy: 98.4%

Optimization Iteration: 6201, Training Accuracy: 98.4%

Optimization Iteration: 6301, Training Accuracy: 98.4%

Optimization Iteration: 6401, Training Accuracy: 100.0%

Optimization Iteration: 6501, Training Accuracy: 100.0%

Optimization Iteration: 6601, Training Accuracy: 100.0%

Optimization Iteration: 6701, Training Accuracy: 100.0%

Optimization Iteration: 6801, Training Accuracy: 96.9%

Optimization Iteration: 6901, Training Accuracy: 100.0%

Optimization Iteration: 7001, Training Accuracy: 100.0%

Optimization Iteration: 7101, Training Accuracy: 100.0%

Optimization Iteration: 7201, Training Accuracy: 100.0%

Optimization Iteration: 7301, Training Accuracy: 96.9%

Optimization Iteration: 7401, Training Accuracy: 100.0%

Optimization Iteration: 7501, Training Accuracy: 100.0%

Optimization Iteration: 7601, Training Accuracy: 96.9%

Optimization Iteration: 7701, Training Accuracy: 100.0%

Optimization Iteration: 7801, Training Accuracy: 100.0%

Optimization Iteration: 7901, Training Accuracy: 100.0%

Optimization Iteration: 8001, Training Accuracy: 98.4%

Optimization Iteration: 8101, Training Accuracy: 100.0%

Optimization Iteration: 8201, Training Accuracy: 100.0%

Optimization Iteration: 8301, Training Accuracy: 100.0%

Optimization Iteration: 8401, Training Accuracy: 100.0%

Optimization Iteration: 8501, Training Accuracy: 98.4%

Optimization Iteration: 8601, Training Accuracy: 100.0%

Optimization Iteration: 8701, Training Accuracy: 100.0%

Optimization Iteration: 8801, Training Accuracy: 100.0%

Optimization Iteration: 8901, Training Accuracy: 100.0%

Optimization Iteration: 9001, Training Accuracy: 98.4%

Optimization Iteration: 9101, Training Accuracy: 98.4%

Optimization Iteration: 9201, Training Accuracy: 100.0%

Optimization Iteration: 9301, Training Accuracy: 100.0%

Optimization Iteration: 9401, Training Accuracy: 98.4%

Optimization Iteration: 9501, Training Accuracy: 100.0%

Optimization Iteration: 9601, Training Accuracy: 100.0%

Optimization Iteration: 9701, Training Accuracy: 100.0%

Optimization Iteration: 9801, Training Accuracy: 98.4%

Optimization Iteration: 9901, Training Accuracy: 100.0%

Time usage: 0:00:27

print_test_accuracy(show_example_errors=True,
                    show_confusion_matrix=True)Copy the code

Accuracy on test-set: 98.8% (9881/10000) Example errors:

Confusion Matrix: [[975 00 00 11 3 0] [0 1127 2 00 01 2 3 0] [2 2 1019 11 01 2 4 0] [00 0 1005 01 01 3 0] [00 0 977 0 10 13] [2 0 13 0 870 10 6 0] [5 2 00 13 943 0 4 0] [0 2 8 2 100 1007 1 7] [2 0 2 3 10 964 1] [0 2 0 4 5 1 0 1 2 994]]

Visualization of weights and layers

When we directly use TensorFlow to implement the convolutional neural network, it is easy to draw the convolutional weights and output images of different layers. When you do PrettyTensor, you can do the weights that WE’ve done before, but you can’t just get the output of the convolutional layer. So I just plot the weights.

Draw the help function of the convolution weights

def plot_conv_weights(weights, input_channel=0):
    # Assume weights are TensorFlow ops for 4-dim variables
    # e.g. weights_conv1 or weights_conv2.

    # Retrieve the values of the weight-variables from TensorFlow.
    # A feed-dict is not necessary because nothing is calculated.
    w = session.run(weights)

    # Get the lowest and highest values for the weights.
    # This is used to correct the colour intensity across
    # the images so they can be compared with each other.
    w_min = np.min(w)
    w_max = np.max(w)

    # Number of filters used in the conv. layer.
    num_filters = w.shape[3]

    # Number of grids to plot.
    # Rounded-up, square-root of the number of filters.
    num_grids = math.ceil(math.sqrt(num_filters))

    # Create figure with a grid of sub-plots.
    fig, axes = plt.subplots(num_grids, num_grids)

    # Plot all the filter-weights.
    for i, ax in enumerate(axes.flat):
        # Only plot the valid filter-weights.
        if i<num_filters:
            # Get the weights for the i'th filter of the input channel.
            # See new_conv_layer() for details on the format
            # of this 4-dim tensor.
            img = w[:, :, input_channel, i]

            # Plot image.
            ax.imshow(img, vmin=w_min, vmax=w_max,
                      interpolation='nearest', cmap='seismic')

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Convolution layer 1

Now plot the filter weights for the first convolution layer.

Where positive weight is red, negative weight is blue.

plot_conv_weights(weights=weights_conv1)Copy the code

Convolution layer 2

Now plot the filtering weights for the second convolution layer.

The first convolutional layer has 16 output channels, representing 16 inputs in the second volume base. The second convolution layer also has some weight filtering for each input channel. Let’s first draw the weight filter for the first channel.

Similarly, positive values are red, negative values are blue.

plot_conv_weights(weights=weights_conv2, input_channel=0)Copy the code

There are 16 input channels in the second convolution layer, and we can also draw 15 images of other filtering weights. So let’s draw the second channel again.

plot_conv_weights(weights=weights_conv2, input_channel=1)Copy the code

Close the TensorFlow session

We have now completed the task with TensorFlow, closing the session and freeing resources.

# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()Copy the code

conclusion

PrettyTensor can implement neural networks in much simpler code than using TensorFlow directly. This allows you to focus on your ideas rather than the low-level implementation details. It makes the code easier to understand and less likely to make mistakes.

However, PrettyTensor has some contradictions and clumsy designs, its documentation is short, confusing and not easy to learn. Hopefully there will be some improvements in the future (written in July 2016).

There are some alternatives to PrettyTensor, including TFLearn and Keras.

practice

Here are some suggested exercises that might help you improve your TensorFlow skills. In order to learn how to use TensorFlow more appropriately, practical experience is important.

Before you can make changes to this Notebook, you may want to make a backup.

  • Change the activation function for all layers to SIGmod.
  • Sigmod is used in some layers, relu is used in others. Here can usedefaults_scope?
  • Use 12Loss in all layers. Then try using this only in certain layers.
  • 0 replace TensorFlow’s 0 with PrettyTensor’s 0 function Is one of them better?
  • Add a dropout- Layer after the full connection layer. If you want a different one for training and testingkeep_prob, you need to set a placeholder variable in the feed-dict.
  • Replace the 2×2 max-pooling layer with the stride=2. Will the classification accuracy be different? What happens when you optimize them many times? Differences are random. How do you measure if there is a real difference? What are the advantages and disadvantages of using max-pooling and stride in the convolution layer?
  • Change layer parameters such as kernel, depth, size, etc. What is the difference in time and classification accuracy?
  • Add or remove some convolutional layers and full connection layers.
  • What is the simplest network you have designed to perform well?
  • Fetch the bias-values of the convolution layer and print them out. consultget_weights_variable()The implementation of the.
  • Don’t read the source code, rewrite the program.
  • Explain to a friend how the program works.