This is my translated article in Zhihu, updated in succession, now moved to the nuggets. In this series of articles, you will learn some basic concepts of Deep learning and the use of TensorFlow, and complete projects on handwritten number recognition, image classification, transfer learning, Deep Dream, style transfer, and reinforcement learning. The Python NoteBook on Github also makes it easy to debug code.

All in all, a great primer. Welcome to share/follow/subscribe.

I have to say, the Nuggets are rooting for Markdown a lot easier. Salute to Aaron Swartz.

By Magnus Erik Hvass Pedersen/GitHub/Videos on YouTube 英 文翻译

If reproduced, please attach a link to this article.


introduce

This tutorial demonstrates the workflow of using a simple linear model in TensorFlow. After loading a handwritten digital image dataset called MNIST, we defined and optimized a mathematical model in TensorFlow. The results will be drawn and discussed.

You should be familiar with basic linear algebra, Python, and Jupyter Notebook editors. It also helps if you have a basic understanding of machine learning and classification.

The import

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrixCopy the code

Developed using Python3.5.2 (Anaconda), the TensorFlow version is:

tf.__version__Copy the code

‘0.12.0 – rc1’

Load the data

The MNIST dataset is about 12MB and will be downloaded automatically if there are no files at a given address.

from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets("data/MNIST/", one_hot=True)Copy the code

Extracting data/MNIST/train-images-idx3-ubyte.gz

Extracting data/MNIST/train-labels-idx1-ubyte.gz

Extracting data/MNIST/t10k-images-idx3-ubyte.gz

Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST dataset has now been loaded, consisting of 70,000 images with corresponding labels (such as the category of the image). The data set is divided into three independent subsets. We will use only the training set and test set in the tutorial.

print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))Copy the code

Size of:

  • Training-set: 55000
  • Test-set: 10000
  • Validation-set: 5000

One – Hot coding

Data sets are loaded in a method called one-hot encoding. This means that the label is converted from a single number to a vector whose length is equal to the number of all possible categories. The vector is 0 except for the first $I $element, which means it is of class $I $’. For example, the one-hot encoding of the first five image labels is:

data.test.labels[0:5To:]Copy the code

array([[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0., 0., .. 0, 0, 0., 0.], [. 1, 0, 0), and 0., 0., 0., 0., 0., 0., 0.], [… 0, 0, 0, 0., 1, 0), and 0., 0., 0., 0.]])

We also need separate numbers to represent categories when comparing and measuring performance differently, so we convert the one-Hot encoded vector into a single number by taking the index of the largest element. Note that ‘class’ is a keyword in Python, so we use ‘CLS’ instead.

data.test.cls = np.array([label.argmax() for label in data.test.labels])Copy the code

Now we can see the categories of the first five images in the test set. Compare these to the one-hot encoded vectors above. For example, the category of the first image is 7, and the corresponding one-hot encoding vector is zero except for the seventh element.

data.test.cls[0:5]Copy the code

array([7, 2, 1, 0, 4])

Data dimension

In the source code below, data dimensions are used in many places. In computer programming, it is generally better to use variables and constants rather than hard code every time you use a numerical value. This means the numbers need to be changed in only one place. These are best taken from the data being read, but here we just write the values.

# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of classes, one class for each of 10 digits.
num_classes = 10Copy the code

A help function to draw an image

This function is used to draw nine images in a 3×3 grid and write real and predicted categories under each image.

def plot_images(images, cls_true, cls_pred=None):
    assert len(images) == len(cls_true) == 9

    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3.3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape(img_shape), cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

        ax.set_xlabel(xlabel)

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])Copy the code

Draw a few images to see if the data is correct

# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)Copy the code

TensorFlow figure

The whole point of TensorFlow is to use something called computational graph, which is much more efficient than doing the same amount of computation directly in Python. TensorFlow is more efficient than Numpy because TensorFlow knows the entire graph that needs to be run, whereas Numpy only knows the unique mathematical operation at a point in time.

TensorFlow also automatically calculates gradients of variables that need to be optimized for better model performance. This is because the Graph is a combination of simple mathematical expressions, so the gradient of the entire Graph can be derived using the chain rule.

TensorFlow also takes advantage of multi-core cpus and gpus. Google has made special chips for TensorFlow called Tensor Processing Units (TPUs), which are faster than gpus.

A TensorFlow diagram consists of the following parts, described in detail below:

  • Placeholder variables are used to change the input to the diagram.
  • The Model variables will be optimized to make the Model perform better.
  • The model is essentially just a bunch of mathematical functions that compute some outputs based on the Placeholder and the input variables of the model.
  • A cost measure is used to guide the optimization of variables.
  • An optimization strategy updates the variables of the model.

In addition, the TensorFlow diagram contains debugging states, such as printing log data with TensorBoard, which are not covered in this tutorial.

Placeholder variables

Placeholder is the input to the diagram, and we’re going to change them every time we run the diagram. Call this process the feeding placeholder variable, which will be described later.

First we define the placeholder variable for the input image. This allows us to change the image we input into the TensorFlow diagram. This is also a tensor, which means a multidimensional vector or matrix. Set the type to float32 and the shape to [None, img_size_flat]. None means tensor has an arbitrary number of images, each image is a vector of img_size_flat.

x = tf.placeholder(tf.float32, [None, img_size_flat])Copy the code

Next we define placeholder variables for the actual tags that correspond to the image in the input variable X. The variable has the shape [None, num_classes], which means it holds any number of labels, each of which is a vector of length num_classes, which in this case is 10.

y_true = tf.placeholder(tf.float32, [None, num_classes])Copy the code

And finally we’ll define placeholder variables for the real categories of images in variable X. They’re integers, and the variable’s dimension is set to [None], which means the placeholder variable is an arbitrarily long one-dimensional vector.

y_true_cls = tf.placeholder(tf.int64, [None])Copy the code

Variables to be optimized

In addition to the variables defined above that feed the model data, TensorFlow also needs to change some of the model variables to make the training data perform better.

The first variable to be optimized is called weight, and the TensorFlow variable needs to be initialized to zero. Its shape is [img_size_flat, num_classes], so it is a two-dimensional tensor (or matrix) of img_size_flat rows and num_classes columns.

weights = tf.Variable(tf.zeros([img_size_flat, num_classes]))Copy the code

The second thing that needs to be optimized is the bias variable heurism, which is defined as a 1-dimensional tensor (or vector) of length NUM_classes.

biases = tf.Variable(tf.zeros([num_classes]))Copy the code

model

This most basic mathematical model multiplies the image in placeholder variable X with the weight, and then adds the bias biases.

The result is a matrix of size [num_images, num_classes], since the shape of x is [num_images, img_size_flat] and the weights are [img_size_flat, num_classes], So the product of two matrices takes the shape of [num_images, num_classes], and then add the biases vector to each row of the matrix.

logits = tf.matmul(x, weights) + biasesCopy the code

Now logits is a matrix of num_classes columns in the num_images row, and the element in the $I $row and $j$column represents how likely the $I $input image is to be the $J $category.

However, this is a rough estimate and difficult to interpret because the values can be small or large, so we want to normalize them so that each row of the Logits matrix adds up to 1 and each element is limited to between 0 and 1. This is calculated using a function called softmax, and the results are stored in y_pred.

y_pred = tf.nn.softmax(logits)Copy the code

The predicted categories can be obtained by taking the index value of the largest element in each row from the y_pred matrix.

y_pred_cls = tf.argmax(y_pred, dimension=1)Copy the code

Optimized loss function

In order for the model to better classify the input images, we had to change the weights and biases variables. First we need to compare the model’s predicted output y_pred with the expected output y_true to see how well the model is performing so far.

Cross-entropy is a performance measure used in classification. The cross entropy is a continuous function that is always positive and is equal to zero if the predicted value of the model conforms precisely to the desired output. Therefore, the purpose of optimization is to minimize the cross entropy. By changing the values of weights and biases in the model, the cross entropy is closer to zero, the better.

TensorFlow has a built-in function for calculating cross entropy. Note that it uses the value of logits, because softmax is also computed internally.

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits,
                                                        labels=y_true)Copy the code

Now, we have calculated the cross entropy for each image classification, so there is a performance measure of the current model on each graph. But in order to use cross entropy to guide the optimization of model variables, we need an additional scalar value, so we simply use the mean of all image classification cross entropy.

cost = tf.reduce_mean(cross_entropy)Copy the code

An optimization method

Now we have a loss measure that needs to be minimized, and we can create the optimizer. In this case, the basic form of gradient descent is used, and the step size is set to 0.5.

The optimization process is not performed here. In fact, we haven’t computed anything yet, we just added the optimizer to the TensorFlow diagram for later operations.

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)Copy the code

Performance measurement

We need additional performance measures to show the user the process.

This is a Boolean vector that represents whether the predicted type is equal to the true type of each image.

correct_prediction = tf.equal(y_pred_cls, y_true_cls)Copy the code

The accuracy of the classification is calculated by converting the Boolean vector type to a floating-point vector, where False becomes 0 and True becomes 1, and averaging the values.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))Copy the code

Run TensorFlow

Creating TensorFlow sessions (Session)

Once the TensorFlow diagram is created, we need to create a TensorFlow session to run the diagram.

session = tf.Session()Copy the code

Initialize a variable

We need to initialize the weights and biases variables before we can start optimizing them.

session.run(tf.global_variables_initializer())Copy the code

Help functions to optimize iterations

There are 50,000 images in the training set. Using these images to calculate the gradient of the model takes a lot of time. So we use the stochastic gradient descent method, which uses only a small portion of the image in each iteration of the optimizer.

batch_size = 100Copy the code

The function performs several optimization iterations to progressively increase the weights and biases of the model. In each iteration, a new batch of data is selected from the training set, which TensorFlow then uses to execute the optimizer.

def optimize(num_iterations):
    for i in range(num_iterations):
        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch = data.train.next_batch(batch_size)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        # Note that the placeholder for y_true_cls is not set
        # because it is not used during training.
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)Copy the code

Help functions that show performance

The test set data dictionary is used as input to the TensorFlow diagram. Note that in the TensorFlow diagram, placeholder variables must use the correct name.

feed_dict_test = {x: data.test.images,
                  y_true: data.test.labels,
                  y_true_cls: data.test.cls}Copy the code

A function used to print the classification accuracy of the test set.

def print_accuracy(a):
    # Use TensorFlow to compute the accuracy.
    acc = session.run(accuracy, feed_dict=feed_dict_test)

    # Print the accuracy.
    print("Accuracy on test-set: {0:.1%}".format(acc))Copy the code

The scikit-learn function prints and plots the confusion matrix.

def print_confusion_matrix(a):
    # Get the true classifications for the test-set.
    cls_true = data.test.cls

    # Get the predicted classifications for the test-set.
    cls_pred = session.run(y_pred_cls, feed_dict=feed_dict_test)

    # Get the confusion matrix using sklearn.
    cm = confusion_matrix(y_true=cls_true,
                          y_pred=cls_pred)

    # Print the confusion matrix as text.
    print(cm)

    # Plot the confusion matrix as an image.
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)

    # Make various adjustments to the plot.
    plt.tight_layout()
    plt.colorbar()
    tick_marks = np.arange(num_classes)
    plt.xticks(tick_marks, range(num_classes))
    plt.yticks(tick_marks, range(num_classes))
    plt.xlabel('Predicted')
    plt.ylabel('True')Copy the code

A function to draw misclassified images in the test set.

def plot_example_errors(a):
    # Use TensorFlow to get a list of boolean values
    # whether each test-image has been correctly classified,
    # and a list for the predicted class of each image.
    correct, cls_pred = session.run([correct_prediction, y_pred_cls],
                                    feed_dict=feed_dict_test)

    # Negate the boolean array.
    incorrect = (correct == False)

    # Get the images from the test-set that have been
    # incorrectly classified.
    images = data.test.images[incorrect]

    # Get the predicted classes for those images.
    cls_pred = cls_pred[incorrect]

    # Get the true classes for those images.
    cls_true = data.test.cls[incorrect]

    # Plot the first 9 images.
    plot_images(images=images[0:9],
                cls_true=cls_true[0:9],
                cls_pred=cls_pred[0:9])Copy the code

Draw a help function for model weights

This function is used to draw the weights of the model. Ten images were drawn, and each number identified by the training model corresponded to a graph.

def plot_weights(a):
    # Get the values for the weights from the TensorFlow variable.
    w = session.run(weights)

    # Get the lowest and highest values for the weights.
    # This is used to correct the colour intensity across
    # the images so they can be compared with each other.
    w_min = np.min(w)
    w_max = np.max(w)

    # Create figure with 3x4 sub-plots,
    # where the last 2 sub-plots are unused.
    fig, axes = plt.subplots(3.4)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Only use the weights for the first 10 sub-plots.
        if i<10:
            # Get the weights for the i'th digit and reshape it.
            # Note that w.shape == (img_size_flat, 10)
            image = w[:, i].reshape(img_shape)

            # Set the label for the sub-plot.
            ax.set_xlabel("Weights: {0}".format(i))

            # Plot the image.
            ax.imshow(image, vmin=w_min, vmax=w_max, cmap='seismic')

        # Remove ticks from each sub-plot.
        ax.set_xticks([])
        ax.set_yticks([])Copy the code

Optimize previous performance

The accuracy on the test set was 9.8 percent. This is because the model is only initialized, not optimized, so it usually predicts digital zeros for the images, as shown in the graph below, which happens to be 9.8% of the images in the test set.

print_accuracy()Copy the code

Accuracy on the test – set: 9.8%

plot_example_errors()Copy the code

Performance after one iteration optimization

After one iteration, the accuracy of the model on the test set increased from 9.8% to 40.7%. That means it misclassifies about 6 out of 10 times, as shown below.

optimize(num_iterations=1)Copy the code
print_accuracy()Copy the code

Accuracy on the test – set: 40.7%

plot_example_errors()Copy the code

The weights are plotted below. Positive values are red and negative values are blue. These weights can be intuitively understood as image filters.

For example, weights are used to determine that an image with a numeric zero responds positively to a circular image (red) and negatively to the middle part of the circular image (blue).

Similarly, weights are used to determine that an image of the number one has a positive response (red) to the vertical line in the center of the image and a negative response (blue) to the periphery of the line segment.

Notice that most weights look a lot like the numbers it’s trying to recognize. This is because only one iteration is done, i.e. weights are trained on only 100 images. After training with thousands of images, weights become harder to distinguish because they require many different ways of writing numbers.

plot_weights()Copy the code

Performance after 10 optimization iterations

# We have already performed 1 iteration.
optimize(num_iterations=9)Copy the code
print_accuracy()Copy the code

Accuracy on the test – set: 78.2%

plot_example_errors()Copy the code

plot_weights()Copy the code

Performance after 1000 iterations

After 1000 iterations, the model only misidentified about once in 10. As the image below shows, some misidentification is understandable because images are hard to pin down even to the human eye, but some images are so obvious that a good model should be able to pick them out. However, this simple model cannot achieve better performance, so a more complex model is required.

# We have already performed 10 iterations.
optimize(num_iterations=990)Copy the code
print_accuracy()Copy the code

Accuracy on the test – set: 91.7%

plot_example_errors()Copy the code

The model went through 1000 iterations of training, each iteration using 100 images in the training set. Now that the weights are hard to read due to the variety of images, we might wonder if they really understand how numbers are made up of lines, or if the model just remembers many different pixels.

plot_weights()Copy the code


print_confusion_matrix()Copy the code

[957 0 3 2 0 5 11 110] [0 1108 2 2 1 2 4 2 14 0] [4 9 914 19 15 5 13 14 35 4] [10 16 928 0 28 2 14 13 8] [11 3 2 939 0 10 2 6 18] [10 33 33 10 784 17 6 19 7] [8 33 2 11 14 915 11 0] [39 21 9 7 10 959 2 17] [8 8 8 38 11 40 14 18 825 4] [11 7 1 13 75 13 1 39 4 845]


# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()Copy the code

practice

Here are some suggested exercises that might help you improve your TensorFlow skills. In order to learn how to use TensorFlow more appropriately, practical experience is important.

Before you can make changes to this Notebook, you may want to make a backup.

  • Change the optimizer’s learning rate.
  • Change the optimizer, such as usingAdagradOptimizerAdamOptimizer.
  • Change batch-size to 1 or 1000.
  • How do these changes affect performance?
  • Do you think these changes have the same effect on other classification problems or mathematical models?
  • If you run the Notebook multiple times without changing any parameters, would you get the same result? Why is that?
  • changeplot_example_errors()Function to make it print misclassifiedlogitsandy_predValue.
  • withsparse_softmax_cross_entropy_with_logitsInstead ofsoftmax_cross_entropy_with_logits. This may require changes in several areas of the code. Discuss the advantages and disadvantages of using both methods.
  • Don’t read the source code, rewrite the program.
  • Explain to a friend how the program works.