This article will introduce the ensemble of neural networks. There is a large section of text and code from the previous tutorial, if you have seen it, you can quickly scroll to the ensemble section below.

01 – | a simple linear model 02 – convolution neural network | 03 – PrettyTensor | 04 – save & recovery

By Magnus Erik Hvass Pedersen/GitHub/Videos on YouTube 英 文翻译

If reproduced, please attach a link to this article.


Introduction to the

This tutorial introduces the ensemble of convolutional Neural Networks. Instead of just one, we take multiple neural networks and average their output.

Finally, handwritten numbers were recognized on MINIST datasets. Ensemble slightly improved classification accuracy on the test set, but the differences were small and could have been random. In addition, ensemble misclassified some images that were correctly classified on individual networks.

This is based on the previous tutorial, so you need to know the basics of TensorFlow and the add-on package Pretty Tensor. Much of the code and text is similar to previous tutorials, so if you’ve already seen it, you can quickly navigate through this article.

The flow chart

The chart below directly shows the data transfer in the convolutional neural network implemented later. The network has two convolution layers and two full connection layers, the last layer is used to classify the input image. See tutorial #02 for more details on networks and convolution.

This tutorial implements the integration of five such neural networks, each with the same structure but different weights and other variables.

from IPython.display import Image
Image('images/02_network_flowchart.png')Copy the code

The import

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math
import os

# Use PrettyTensor to simplify Neural Network construction.
import prettytensor as ptCopy the code

Developed using Python3.5.2 (Anaconda), the TensorFlow version is:

tf.__version__Copy the code

‘0.12.0 – rc0’

PrettyTensor version:

pt.__version__Copy the code

‘0.7.1’

Load the data

MNIST datasets are approximately 12MB and are automatically downloaded if not found in a given path.

from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/MNIST/', one_hot=True)Copy the code

Extracting data/MNIST/train-images-idx3-ubyte.gz

Extracting data/MNIST/train-labels-idx1-ubyte.gz

Extracting data/MNIST/t10k-images-idx3-ubyte.gz

Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST dataset has now been loaded, consisting of 70,000 images with corresponding labels (such as the category of the image). The data set is divided into three independent subsets, but later we will generate a random training set.

print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))Copy the code

Size of:

  • Training-set: 55000
  • Test-set: 10000
  • Validation-set: 5000

Category of digital

The type labels are one-hot encoded, so each label is a vector of length 10, zero for all but One element. The index of this element is the number of the category, the number drawn in the corresponding image. We also need the test set and validation set’s integer category numbers, which are computed here.

data.test.cls = np.argmax(data.test.labels, axis=1)
data.validation.cls = np.argmax(data.validation.labels, axis=1)Copy the code

Create a helper function for a random training set

We will train five different neural networks on a randomly selected training set. First, combine the original training set and validation set into a large array. This is done for both images and labels.

combined_images = np.concatenate([data.train.images, data.validation.images], axis=0)
combined_labels = np.concatenate([data.train.labels, data.validation.labels], axis=0)Copy the code

Check that the merged array is the correct size.

print(combined_images.shape)
print(combined_labels.shape)Copy the code

(60000, 784)

(60000, 10)

The size of the merged data set.

combined_size = len(combined_images)
combined_sizeCopy the code

60000

Define the size of the training set used by each neural network. You can try to change the size.

train_size = int(0.8 * combined_size)
train_sizeCopy the code

48000

The validation set is not used for training, but its size is as follows.

validation_size = combined_size - train_size
validation_sizeCopy the code

12000

The help function divides the merge array into random training and verification sets.

def random_training_set(a):
    # Create a randomized index into the full / combined training-set.
    idx = np.random.permutation(combined_size)

    # Split the random index into training- and validation-sets.
    idx_train = idx[0:train_size]
    idx_validation = idx[train_size:]

    # Select the images and labels for the new training-set.
    x_train = combined_images[idx_train, :]
    y_train = combined_labels[idx_train, :]

    # Select the images and labels for the new validation-set.
    x_validation = combined_images[idx_validation, :]
    y_validation = combined_labels[idx_validation, :]

    # Return the new training- and validation-sets.
    return x_train, y_train, x_validation, y_validationCopy the code

Data dimension

In the source code below, data dimensions are used in many places. They are only defined in one place, so we can use these variables in our code instead of writing numbers directly.

# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1

# Number of classes, one class for each of 10 digits.
num_classes = 10Copy the code

Help function for drawing pictures

This function is used to draw nine images in a 3×3 grid and write the real category and the predicted category under each image.

def plot_images(images,                  # Images to plot, 2-d array.
                cls_true,                # True class-no for images.
                ensemble_cls_pred=None,  # Ensemble predicted class-no.
                best_cls_pred=None):     # Best-net predicted class-no.

    assert len(images) == len(cls_true)

    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3.3)

    # Adjust vertical spacing if we need to print ensemble and best-net.
    if ensemble_cls_pred is None:
        hspace = 0.3
    else:
        hspace = 1.0
    fig.subplots_adjust(hspace=hspace, wspace=0.3)

    # For each of the sub-plots.
    for i, ax in enumerate(axes.flat):

        # There may not be enough images for all sub-plots.
        if i < len(images):
            # Plot image.
            ax.imshow(images[i].reshape(img_shape), cmap='binary')

            # Show true and predicted classes.
            if ensemble_cls_pred is None:
                xlabel = "True: {0}".format(cls_true[i])
            else:
                msg = "True: {0}\nEnsemble: {1}\nBest Net: {2}"
                xlabel = msg.format(cls_true[i],
                                    ensemble_cls_pred[i],
                                    best_cls_pred[i])

            # Show the classes as the label on the x-axis.
            ax.set_xlabel(xlabel)

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()Copy the code

Draw a few images to see if the data is correct

# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)Copy the code

TensorFlow figure

The whole point of TensorFlow is to use something called computational graph, which is much more efficient than doing the same amount of computation directly in Python. TensorFlow is more efficient than Numpy because TensorFlow knows the entire graph that needs to be run, whereas Numpy only knows the unique mathematical operation at a point in time.

TensorFlow also automatically calculates gradients of variables that need to be optimized for better model performance. This is because the graph is a combination of simple mathematical expressions, so the gradient of the entire graph can be derived using the chain rule.

TensorFlow also takes advantage of multi-core cpus and gpus. Google has made special chips for TensorFlow called Tensor Processing Units (TPUs), which are faster than gpus.

A TensorFlow diagram consists of the following parts, described in detail below:

  • Placeholder variables are used to change the input to the diagram.
  • The Model variables will be optimized to make the Model perform better.
  • The model is essentially just a bunch of mathematical functions that compute some outputs based on the Placeholder and the input variables of the model.
  • A cost measure is used to guide the optimization of variables.
  • An optimization strategy updates the variables of the model.

In addition, the TensorFlow diagram contains debugging states, such as printing log data with TensorBoard, which are not covered in this tutorial.

Placeholder variables

Placeholder is the input to the diagram, and we can change them every time we run the diagram. Call this process the feeding placeholder variable, which will be described later.

First we define the placeholder variable for the input image. This allows us to change the image we input into the TensorFlow diagram. This is also a tensor, which means a multidimensional vector or matrix. Set the type to float32 and the shape to [None, img_size_flat]. None means tensor has an arbitrary number of images, each image is a vector of img_size_flat.

x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x')Copy the code

The convolutional layer wants x to be encoded as a 4-dimensional tensor, so we need to convert its shape to [num_images, img_height, IMg_width, num_channels]. Note that img_height == img_width == img_size, if the first dimension is set to -1, the size of num_images will also be derived automatically. The conversion operation is as follows:

x_image = tf.reshape(x, [- 1, img_size, img_size, num_channels])Copy the code

Next we define placeholder variables for the actual tags that correspond to the image in the input variable X. The variable has the shape [None, num_classes], which means it holds any number of labels, each of which is a vector of length num_classes, which in this case is 10.

y_true = tf.placeholder(tf.float32, shape=[None.10], name='y_true')Copy the code

We could also provide a placeholder for class-number, but we’ll calculate that in argmax. Here are just a few operations in TensorFlow; no operations are performed.

y_true_cls = tf.argmax(y_true, dimension=1)Copy the code

The neural network

This tutorial uses PrettyTensor to implement a convolutional neural network, which is much easier than just doing it in TensorFlow. See tutorial #03.

The basic idea is to encapsulate the input Tensor X_image with a Pretty Tensor object, which has a helper function that adds a new convolutional layer, to create the whole neural network. Pretty Tensor does variable assignments and so on.

x_pretty = pt.wrap(x_image)Copy the code

Now that we’ve put our input image into a PrettyTensor object, we can add a convolutional layer and a full connected layer with a few lines of code.

Note that in the with block, pt.defaults_scope(activation_fn=tf.nn.relu) takes activation_fn=tf.nn.relu as each layer argument, 1. Therefore, these layers have been Rectified Linear Units (ReLU). Defaults_scope makes it easier to change the parameters of all layers.

with pt.defaults_scope(activation_fn=tf.nn.relu):
    y_pred, loss = x_pretty.\
        conv2d(kernel=5, depth=16, name='layer_conv1').\
        max_pool(kernel=2, stride=2).\
        conv2d(kernel=5, depth=36, name='layer_conv2').\
        max_pool(kernel=2, stride=2).\
        flatten().\
        fully_connected(size=128, name='layer_fc1').\
        softmax_classifier(num_classes=num_classes, labels=y_true)Copy the code

An optimization method

PrettyTensor gives us predictive type labels (Y_pred) and a loss metric that needs to be minimized to improve the neural network’s ability to classify images.

The documentation for PrettyTensor doesn’t say whether its loss measure is cross-entropy or something else. But now we use AdamOptimizer to minimize the damage.

The optimization process is not performed here. In fact, we haven’t computed anything yet, we just added the optimizer to the TensorFlow diagram for later operations.

optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(loss)Copy the code

Performance measurement

We need additional performance measures to show the user the process.

First we calculate the predicted category from the neural network output y_pred, which is a vector containing 10 elements. The category number is the index of the largest element.

y_pred_cls = tf.argmax(y_pred, dimension=1)Copy the code

We then create a Boolean vector that tells us whether the true category of each image is the same as the predicted category.

correct_prediction = tf.equal(y_pred_cls, y_true_cls)Copy the code

The above calculation calculates the accuracy of the classification by converting the Boolean vector type to a floating-point vector, where False becomes 0 and True becomes 1, and then averaging the values.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))Copy the code

Saver

To hold the neural network variables, we create an object called saver-Object, which is used to hold and restore all the variables of the TensorFlow diagram. Nothing is saved here, it’s done later in the optimize() function.

Note that if there are more than 100 neural networks in the ensemble, you need to increase max_to_keep as appropriate.

saver = tf.train.Saver(max_to_keep=100)Copy the code

This is the folder used to save or restore data.

save_dir = 'checkpoints/'Copy the code

Create a folder if it does not exist.

if not os.path.exists(save_dir):
    os.makedirs(save_dir)Copy the code

This function returns the path to the data file based on the entered network number.

def get_save_path(net_number):
    return save_dir + 'network' + str(net_number)Copy the code

Run TensorFlow

Creating TensorFlow sessions (Session)

Once the TensorFlow diagram is created, we need to create a TensorFlow session to run the diagram.

session = tf.Session()Copy the code

Initialize a variable

Variables weights and biases need to be initialized before optimization. We write a simple wrapper function that will be called again later.

def init_variables(a):
    session.run(tf.initialize_all_variables())Copy the code

Create a help function for random training Batch

There are thousands of images in the training set. Using these images to calculate the gradient of the model takes a lot of time. Therefore, it uses only a small portion of the image in each iteration of the optimizer.

If running out of memory causes your computer to crash or become slow, you should try to reduce those numbers, but more optimized iterations may be needed in the meantime.

train_batch_size = 64Copy the code

The function picks a random train-batch based on the given size.

def random_batch(x_train, y_train):
    # Total number of images in the training-set.
    num_images = len(x_train)

    # Create a random index into the training-set.
    idx = np.random.choice(num_images,
                           size=train_batch_size,
                           replace=False)

    # Use the random index to select random images and labels.
    x_batch = x_train[idx, :]  # Images.
    y_batch = y_train[idx, :]  # Labels.

    # Return the batch.
    return x_batch, y_batchCopy the code

Perform the help function for the optimization iteration

Function is used to perform a number of optimization iterations to gradually improve the network layer variables. In each iteration, a new batch of data is selected from the training set, and TensorFlow performs optimization on these training samples. Every 100 iterations (information) is printed.

def optimize(num_iterations, x_train, y_train):
    # Start-time used for printing time-usage below.
    start_time = time.time()

    for i in range(num_iterations):

        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch = random_batch(x_train, y_train)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)

        # Print status every 100 iterations and after last iteration.
        if i % 100= =0:

            # Calculate the accuracy on the training-batch.
            acc = session.run(accuracy, feed_dict=feed_dict_train)

            # Status-message for printing.
            msg = "Optimization Iteration: {0:>6}, Training Batch Accuracy: {1:>6.1%}"

            # Print it.
            print(msg.format(i + 1, acc))

    # Ending time.
    end_time = time.time()

    # Difference between start and end-times.
    time_dif = end_time - start_time

    # Print the time-usage.
    print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))Copy the code

Ensemble for Creating Neural Networks (Ensemble)

Ensemble number of neural networks

num_networks = 5Copy the code

The number of optimization iterations per neural network.

num_iterations = 10000Copy the code

Ensemble for creating neural networks. All networks use the TensorFlow diagram defined above. TensorFlow weights and variables for each network are initialized with random values and then optimized. The variable is then saved to disk for later reloading.

If you just want to rerunge the Notebook to analyze the results differently, skip this step.

if True:
    # For each of the neural networks.
    for i in range(num_networks):
        print("Neural network: {0}".format(i))

        # Create a random training-set. Ignore the validation-set.
        x_train, y_train, _, _ = random_training_set()

        # Initialize the variables of the TensorFlow graph.
        session.run(tf.global_variables_initializer())

        # Optimize the variables using this training-set.
        optimize(num_iterations=num_iterations,
                 x_train=x_train,
                 y_train=y_train)

        # Save the optimized variables to disk.
        saver.save(sess=session, save_path=get_save_path(i))

        # Print newline.
        print()Copy the code

Neural network: 0 Optimization Iteration: 1, Training Batch Accuracy: 6.2% Optimization Iteration: 101, Training Batch Accuracy: 87.5%… Optimization Iteration: 9901, Training Batch Accuracy: 100.0% Time usage: 0:00:40

Neural network: 1 Optimization Iteration: 1, Training Batch Accuracy: 7.8% Optimization Iteration: 101, Training Batch Accuracy: 85.9%… Optimization Iteration: 9901, Training Batch Accuracy: 98.4% Time usage: 0:00:40

Neural network: 2 Optimization Iteration: 1, Training Batch Accuracy: 3.1% Optimization Iteration: Training Batch Accuracy: 84.4%… Optimization Iteration: 9901, Training Batch Accuracy: 100.0% Time usage: 0:00:39

Neural network: 3 Optimization Iteration: 1, Training Batch Accuracy: 9.4% Optimization Iteration: 101, Training Batch Accuracy: 89.1%… Optimization Iteration: 9901, Training Batch Accuracy: 100.0% Time usage: 0:00:39

Neural network: 4 Optimization Iteration: 1, Training Batch Accuracy: 9.4% Optimization Iteration: Training Batch Accuracy: 82.8%… Optimization Iteration: 9901, Training Batch Accuracy: 98.4% Time usage: 0:00:39

Calculate and predict classification help functions

This function computes the prediction label of the image, and for each image, the function computes a vector of length 10 that displays the category of the image.

The calculations are done in batches, otherwise they take up too much memory. If the computer crashes, you need to reduce batch-size.

# Split the data-set in batches of this size to limit RAM usage.
batch_size = 256

def predict_labels(images):
    # Number of images.
    num_images = len(images)

    # Allocate an array for the predicted labels which
    # will be calculated in batches and filled into this array.
    pred_labels = np.zeros(shape=(num_images, num_classes),
                           dtype=np.float)

    # Now calculate the predicted labels for the batches.
    # We will just iterate through all the batches.
    # There might be a more clever and Pythonic way of doing this.

    # The starting index for the next batch is denoted i.
    i = 0

    while i < num_images:
        # The ending index for the next batch is denoted j.
        j = min(i + batch_size, num_images)

        # Create a feed-dict with the images between index i and j.
        feed_dict = {x: images[i:j, :]}

        # Calculate the predicted labels using TensorFlow.
        pred_labels[i:j] = session.run(y_pred, feed_dict=feed_dict)

        # Set the start-index for the next batch to the
        # end-index of the current batch.
        i = j

    return pred_labelsCopy the code

Computes a Boolean vector representing whether the predicted type of the image is correct.

def correct_prediction(images, labels, cls_true):
    # Calculate the predicted labels.
    pred_labels = predict_labels(images=images)

    # Calculate the predicted class-number for each image.
    cls_pred = np.argmax(pred_labels, axis=1)

    # Create a boolean array whether each image is correctly classified.
    correct = (cls_true == cls_pred)

    return correctCopy the code

Computes a Boolean array that represents whether the images in the test set are correctly classified.

def test_correct(a):
    return correct_prediction(images = data.test.images,
                              labels = data.test.labels,
                              cls_true = data.test.cls)Copy the code

Computes a Boolean array that represents the verification that the images in the set are correctly classified.

def validation_correct(a):
    return correct_prediction(images = data.validation.images,
                              labels = data.validation.labels,
                              cls_true = data.validation.cls)Copy the code

Help function to calculate classification accuracy

This function calculates the classification accuracy of a given Boolean array indicating whether each image is correctly classified. For example, cls_accuracy([True, True, False, False, False]) = 2/5 = 0.4.

def classification_accuracy(correct):
    # When averaging a boolean array, False means 0 and True means 1.
    # So we are calculating: number of True / len(correct) which is
    # the same as the classification accuracy.
    return correct.mean()Copy the code

Calculate the classification accuracy of the test set.

def test_accuracy(a):
    # Get the array of booleans whether the classifications are correct
    # for the test-set.
    correct = test_correct()

    # Calculate the classification accuracy and return it.
    return classification_accuracy(correct)Copy the code

Calculate the classification accuracy on the original validation set.

def validation_accuracy(a):
    # Get the array of booleans whether the classifications are correct
    # for the validation-set.
    correct = validation_correct()

    # Calculate the classification accuracy and return it.
    return classification_accuracy(correct)Copy the code

Results and Analysis

Function is used to calculate prediction labels for all neural networks in ensemble. These tags will be merged later.

def ensemble_predictions(a):
    # Empty list of predicted labels for each of the neural networks.
    pred_labels = []

    # Classification accuracy on the test-set for each network.
    test_accuracies = []

    # Classification accuracy on the validation-set for each network.
    val_accuracies = []

    # For each neural network in the ensemble.
    for i in range(num_networks):
        # Reload the variables into the TensorFlow graph.
        saver.restore(sess=session, save_path=get_save_path(i))

        # Calculate the classification accuracy on the test-set.
        test_acc = test_accuracy()

        # Append the classification accuracy to the list.
        test_accuracies.append(test_acc)

        # Calculate the classification accuracy on the validation-set.
        val_acc = validation_accuracy()

        # Append the classification accuracy to the list.
        val_accuracies.append(val_acc)

        # Print status message.
        msg = "Network: {0}, Accuracy on Validation-Set: {1:.4f}, Test-Set: {2:.4f}"
        print(msg.format(i, val_acc, test_acc))

        # Calculate the predicted labels for the images in the test-set.
        # This is already calculated in test_accuracy() above but
        # it is re-calculated here to keep the code a bit simpler.
        pred = predict_labels(images=data.test.images)

        # Append the predicted labels to the list.
        pred_labels.append(pred)

    return np.array(pred_labels), \
           np.array(test_accuracies), \
           np.array(val_accuracies)Copy the code
pred_labels, test_accuracies, val_accuracies = ensemble_predictions()Copy the code

Network: 0, Accuracy on validation-set: 0.9948, test-set: 0.9893 Network: 1, Accuracy on validation-set: 0.9948, test-set: 0.9893 0.9936, test-set: 0.9880 Network: 2, Accuracy on validation-set: 0.9958, test-set: 0.9893 Network: Accuracy on validation-set: 0.9938, test-set: 0.9889 Network: 4, Accuracy on validation-set: 0.9938, test-set: 0.9892

Summarize the classification accuracy of ensemble neural network in the test set.

print("Mean test-set accuracy: {0:.4f}".format(np.mean(test_accuracies)))
print("Min test-set accuracy: {0:.4f}".format(np.min(test_accuracies)))
print("Max test-set accuracy: {0:.4f}".format(np.max(test_accuracies)))Copy the code

Mean test-set accuracy: 0.9889

Min test-set accuracy: 0.9880

Max test-set accuracy: 0.9893

Ensemble’s prediction labels are arrays of three dimensions: the first dimension is the number of neural networks, the second dimension is the number of images, and the third dimension is the classification vector.

pred_labels.shapeCopy the code

(5, 10000, 10)

The ensemble prediction

There are several different ways to calculate ensemble’s prediction labels. One way is to calculate the number of prediction categories for each neural network and choose the category with the most votes. But depending on the number of categories classified, this method requires a lot of neural networks.

The method used here is to take the average of all the prediction labels in the ensemble. The calculation is simple, and the ensemble does not require a lot of neural networks.

ensemble_pred_labels = np.mean(pred_labels, axis=0)
ensemble_pred_labels.shapeCopy the code

(10000, 10)

The index of the largest number in the tag is taken as the ensemble’s forecast category number, which is usually calculated in argmax.

ensemble_cls_pred = np.argmax(ensemble_pred_labels, axis=1)
ensemble_cls_pred.shapeCopy the code

(10000)

Boolean arrays indicate whether the images in the test set are correctly classified by the ensemble of the neural network.

ensemble_correct = (ensemble_cls_pred == data.test.cls)Copy the code

Invert the Boolean array, so we can use it to find misclassified images.

ensemble_incorrect = np.logical_not(ensemble_correct)Copy the code

The best neural network

Now we find the single neural network that performs best on the test set.

First, the classification accuracy of all neural networks in ensemble on the test set was listed.

test_accuraciesCopy the code

Array ([0.9893, 0.988, 0.9893, 0.9889, 0.9892])

The most accurate neural network index.

best_net = np.argmax(test_accuracies)
best_netCopy the code

0

The classification accuracy of optimal neural network on test set.

test_accuracies[best_net]Copy the code

0.98929999999999996

Predictive labels for optimal neural networks.

best_net_pred_labels = pred_labels[best_net, :, :]Copy the code

Projected category numbers.

best_net_cls_pred = np.argmax(best_net_pred_labels, axis=1)Copy the code

The optimal neural network correctly classifies the Boolean array of images on the test set.

best_net_correct = (best_net_cls_pred == data.test.cls)Copy the code

Boolean array for whether images are misclassified.

best_net_incorrect = np.logical_not(best_net_correct)Copy the code

Ensemble versus best networks

The number of images in the test set correctly classified by ensemble.

np.sum(ensemble_correct)Copy the code

9916

The number of images correctly classified by the optimal network in the test set.

np.sum(best_net_correct)Copy the code

9893

The Boolean array indicates whether each image in the test set was “correctly classified by ensemble and misclassified by the best network.”

ensemble_better = np.logical_and(best_net_incorrect,
                                 ensemble_correct)Copy the code

The number of images that Ensemble performed better than the best network on the test set:

ensemble_better.sum()Copy the code

39

Boolean arrays indicate whether each image in the test set was “correctly classified by the best network and misclassified by ensemble”.

best_net_better = np.logical_and(best_net_correct,
                                 ensemble_incorrect)Copy the code

The number of images on the test set that the best network performed better than ensemble:

best_net_better.sum()Copy the code

16

Help function to draw and print comparisons

The function is used to draw the images in the test set, along with their real and predicted categories.

def plot_images_comparison(idx):
    plot_images(images=data.test.images[idx, :],
                cls_true=data.test.cls[idx],
                ensemble_cls_pred=ensemble_cls_pred[idx],
                best_cls_pred=best_net_cls_pred[idx])Copy the code

Function to print prediction labels.

def print_labels(labels, idx, num=1):
    # Select the relevant labels based on idx.
    labels = labels[idx, :]

    # Select the first num labels.
    labels = labels[0:num, :]

    # Round numbers to 2 decimal points so they are easier to read.
    labels_rounded = np.round(labels, 2)

    # Print the rounded labels.
    print(labels_rounded)Copy the code

Print the ensemble prediction label function of the neural network.

def print_labels_ensemble(idx, **kwargs):
    print_labels(labels=ensemble_pred_labels, idx=idx, **kwargs)Copy the code

A function that prints a single network prediction label.

def print_labels_best_net(idx, **kwargs):
    print_labels(labels=best_net_pred_labels, idx=idx, **kwargs)Copy the code

Print the function of all neural network prediction labels in ensemble. Print only the label of the first image.

def print_labels_all_nets(idx):
    for i in range(num_networks):
        print_labels(labels=pred_labels[i, :, :], idx=idx, num=1)Copy the code

Sample: Ensemble is better than best Network

Plot those samples that are correctly classified by the integrated network and misclassified by the best network.

plot_images_comparison(idx=ensemble_better)Copy the code

Ensemble’s prediction tag for the first image (top left) :

print_labels_ensemble(idx=ensemble_better, num=1)Copy the code

[[0. 0. 0.76 0. 0. 0.

Optimal network prediction label for the first image:

print_labels_best_net(idx=ensemble_better, num=1)Copy the code

[[0. 0. 0.21 0. 0. 0.

Prediction labels of all networks in ensemble for the first image:

print_labels_all_nets(idx=ensemble_better)Copy the code

[[0. 0 0. 0.21 0. 0. 0. 0. 0.79 0.]] [[0. 0. 0. 0.96 0. 0.01 0. 0. 0.03 0.]] [[0. 0. 0. 0.99 0. 0. 0. 0. 0.01 0. ]] [[0. 0 0. 0.88 0. 0. 0. 0. 0.12 0.]] [[0. 0. 0. 0.76 0. 0.01 0. 0. 0.22 0.]]

Sample: Best Network is better than Ensemble

Now plot samples that were misclassified by ensemble, but correctly classified by the best network.

plot_images_comparison(idx=best_net_better)Copy the code

Ensemble’s prediction tag for the first image (top left) :

print_labels_ensemble(idx=best_net_better, num=1)Copy the code

[[0.5 0.0. 0.05 0.45 0.0.]

Optimal network prediction label for the first image:

print_labels_best_net(idx=best_net_better, num=1)Copy the code

[[0.3 0.0.0.15 0.56 0.0.0. 0.]

Prediction labels of all networks in ensemble for the first image:

print_labels_all_nets(idx=best_net_better)Copy the code

[[0.3 0. 0 0. 0. 0.15 0.56 0. 0. 0.]] [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[0.19 0. 0. 0. 0. 0. 0.81 0. 0. 0.]] [[ 0.15 0. 0. 0. 0. 0.12 0.72 0. 0. 0.]] [[0.85 0. 0. 0. 0. 0. 0.14 0. 0. 0.]]

Close the TensorFlow session

We have now completed the task with TensorFlow, closing the session and freeing resources.

# This has been commented out in case you want to modify and experiment
# with the Notebook without having to restart it.
# session.close()Copy the code

conclusion

This tutorial creates an ensemble of five neural networks to recognize handwritten numbers in MINIST datasets. Ensemble takes the average values of five individual neural networks. The result was a slight improvement in classification accuracy on the test set, with ensemble achieving 99.1% accuracy compared to 98.9% for the best single network.

However, ensemble did not always perform better than individual networks, and some images correctly classified by individual networks were misclassified by ensemble. This suggests that ensemble neural networks act somewhat randomly and may not provide a reliable way to improve performance (compared to neural network performance alone).

The form of integration learning used here is called bagging (or Bootstrap Aggregating) and is often used to avoid over-fitting, but is not necessary for this particular neural network and data set (in this article). Integrated learning may still work in other situations.

Technical specifications

In this paper, the Saver() object in TensorFlow is used to save and restore variables in the neural network when implementing integration learning. But this feature was designed for other purposes and is a bit awkward to use in integrated learning with multiple types of neural networks, or when you want to load multiple neural networks at once. A TensorFlow add-on package called SK-Flow has a simpler approach, but as of August 2016, it is still in the early stages of development.

practice

Here are some suggested exercises that might help you improve your TensorFlow skills. In order to learn how to use TensorFlow more appropriately, practical experience is important.

Before you can make changes to this Notebook, you may want to make a backup.

  • Change several different parts of the program to see how it affects performance:
    • Use more neural networks in integration.
    • Change the size of the training set.
    • Change the number of optimization iterations and try to increase or decrease them.
  • Explain to a friend how the program works.
  • Do you think integrated learning deserves more research, or would you rather focus on improving the performance of individual neural networks?