This is the fifth day of my participation in the August More text Challenge. For details, see: August More Text Challenge

This article follows on from deep Learning in Python: Computer Vision (part 1).

Using pre-trained convolutional neural networks

5.3 Using a pretrained convnet

For small image data sets, a common and efficient method is to use pretrained network to build deep learning model. Pre-training networks are networks that have previously been trained on large data sets (usually on large image classification tasks). If sufficient data is used in the pre-training set and the model is generalized enough, the spatial hierarchy learned by the pre-training network can effectively be used as a general model to reflect the real visual world, and thus can be applied to a variety of computer vision problems, even if the new problems have nothing to do with the original task.

For example, we can handle the classification of cats and dogs with a network pretrained in ImageNet, a dataset of 1.4 million images in 1,000 different categories, mainly animals and various everyday objects. We will use the VGG16 architecture.

There are two methods to use the pre-training network: feature extraction and fine-tuning.

Feature extraction

Feature extraction is to use the representations learned from previous networks to extract the required features from new samples, input a new classifier, and start training from scratch.

In the previous example of convolutional neural networks, we know that the model we use for image classification can be divided into two parts:

  • Convolutional base: the preceding convolutional and pooling layers;
  • Classifier: the secret join layer behind;

Therefore, for convolutional neural network, feature extraction is to take out the convolution basis of the previously trained network, input new data into running, and then use its output to train a new classifier:

Note that only the convolution basis is multiplexed, not the classifier. The convolution basis is used to extract features, and this could be the same; But because the classification of each problem is different, different classifiers should be used. In addition, the position of features in some problems is useful, but when we turn the feature map to the Dense layer, these position features will be lost. Therefore, not all problems are simple to complete the classification of Dense, so the classifier should not be applied mindlessly.

In the pre-training network, the generality of the feature representation that can be extracted depends on its depth. The fewer layers you have, the more generic (e.g. image colors, edges, textures, etc.), and the more layers you have, the more abstract information you have (e.g. a cat’s eye, etc.). So if the original problem that the pre-trained network is dealing with is too far from the problem that we’re dealing with right now, it’s just a matter of using the first few layers rather than the whole convolution basis.

Now, to put this into practice, we’re going to use the VGG16 model that we pre-trained on ImageNet to deal with the cat and dog classification problem, and we’re going to keep the convolution basis unchanged and change the classifier.

The VGG16 model is built in Keras, just use it directly:

from tensorflow.keras.applications import VGG16

conv_base = VGG16(weights='imagenet'.# Specifies the weight checkpoint for model initialization
                  include_top=False.Whether to include the final dense join layer classifier
                  input_shape=(150.150.3)) Input shape. Can handle input of any shape without input
Copy the code

The model is to be downloaded here: github.com/fchollet/de…

If it is slow to let him down, you can consider manually downloading the installation. Reference vgg16 source: / usr/local/lib/python3.7 / site – packages/keras_applications/vgg16. Py, found that it is called a get_file to obtain the model:

weights_path = keras_utils.get_file(
    'vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',
    WEIGHTS_PATH_NO_TOP,
    cache_subdir='models',
    file_hash='6d6bbae143d832006294945121d1f1fc')
Copy the code

You can see here that he’s reading models from a subdirectory called Models, And this keras_utils. Get_file is/usr/local/lib/python3.7 / site – packages/tensorflow_core/python/keras/utils/data_utils py first The get_file function is in line 150 or so, as stated in the documentation comment, things are placed in the ~/.keras directory by default.

In short, just download the model and put it in ~/.keras/models.

Also note that with or without top(the final classifier), the size of the downloaded model varies greatly:

Anyway, we end up with a Conv_base model, which is very easy to understand, using all the things we have learned before:

conv_base.summary()
Copy the code
Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =... (space limitations, intermediate information omitted) = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: Trainable Params: 14,714,688 0 _________________________________________________________________Copy the code

Note that the first input is tuned to what we want (150, 150, 3), and the final output is (4, 4, 512), after which we connect our own classifier. There are two ways to do this:

  1. Run our current data set on this convolution basis, put the result into a Numpy array, save it to disk, and then take that array as input, throw it into a closely connected network for training. This method is relatively simple, and only needs to calculate the most expensive part of the convolution basis once. However, this approach does not use data enhancement.
  2. Extend the conv_base, add the Dense layer behind it, and then run the whole network from end to end on the upper end of the input data, so that data enhancement can be used, but the calculation cost is relatively high.

First of all, let’s do the first one.

Fast feature extraction without data enhancement

In general, this method saves the output of our data through conv_base, and then takes that output as input to a new model.

Again, we use the ImageDataGenerator to extract the images and tags into a Numpy array. Then the predict method of Conv_base is invoked to extract features from the pre-trained model.

Extracting features using pre-trained convolution basis

import os
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator

base_dir = '/Volumes/WD/Files/dataset/dogs-vs-cats/cats_and_dogs_small'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')
test_dir = os.path.join(base_dir, 'test')

datagen = ImageDataGenerator(rescale=1./255)
batch_size = 20

def extract_features(directory, sample_count) :
    features = np.zeros(shape=(sample_count, 4.4.512))    # This should match the shape of the last layer of conv_base.summary()
    labels = np.zeros(shape=(sample_count))
    generator = datagen.flow_from_directory(
        directory,
        target_size=(150.150),
        batch_size=batch_size,
        class_mode='binary')
    i = 0
    for inputs_batch, labels_batch in generator:    Add data batch by batch
        features_batch = conv_base.predict(inputs_batch)
        features[i * batch_size : (i + 1) * batch_size] = features_batch
        labels[i * batch_size : (i + 1) * batch_size] = labels_batch
        i += 1
        if i * batch_size >= sample_count:    # Generator will generate an infinite amount of data
            break
    return features, labels

train_features, train_labels = extract_features(train_dir, 2000)
validation_features, validation_labels = extract_features(validation_dir, 1000)
test_features, test_labels = extract_features(test_dir, 1000)
Copy the code

Output:

Found 2000 images belonging to 2 classes.

Found 1000 images belonging to 2 classes.

Found 1000 images belonging to 2 classes.

And then we’re going to connect to the classifier of the dense connection layer, so here we squash the tensor:

train_features = np.reshape(train_features, (2000.4 * 4 * 512))
validation_features = np.reshape(validation_features, (1000.4 * 4 * 512))
test_features = np.reshape(test_features, (1000.4 * 4 * 512))
Copy the code

Then it’s time to make a densely-joined classifier, again using dropout regularization:

from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import optimizers

model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_dim=4 * 4 * 512))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
              loss='binary_crossentropy',
              metrics=['acc'])

history = model.fit(train_features, train_labels,
                    epochs=30,
                    batch_size=20,
                    validation_data=(validation_features, validation_labels))
Copy the code
Train on 2000 samples, Validate on 1000 samples of Epoch 1/30 2000/2000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 3 s 1 ms/sample - loss: 0.5905 acc: 0.6840-val_loss: 0.4347 - val_ACC: 0.8430...... Epoch 30/30 2000/2000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 1 s 687 us/sample - loss: 0.0889 acc: 0.9715 - val_loss: 0.2401 - val_acc: 0.9010Copy the code

See the result, draw the training process curve:

import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1.len(acc) + 1)

plt.plot(epochs, acc, 'bo-', label='Training acc')
plt.plot(epochs, val_acc, 'rs-', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo-', label='Training loss')
plt.plot(epochs, val_loss, 'rs-', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()
Copy the code

And as you can see, it’s still pretty good at almost 90% accuracy. But there are still problems, because without data enhancement, the overfitting is still serious, and it’s been overfitting almost from the beginning. For such small image data sets, without data enhancement, it is generally not very good.

Feature extraction with data enhancement

The second way is to extend Conv_base, add a Dense layer to it, and then run the entire network end-to-end with input data.

This method calculates the cost of non! Always the case! High! , basically can only run with GPU. It’s used without a GPU.

Within Keras, we can add a model to a Sequential Model as if we were adding a layer:

# Add dense join classifier to convolution basis

from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Copy the code

Take a look at the model:

model.summary()
Copy the code

Output:

Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Model) (None, 4, 4, 512) 14714688 _________________________________________________________________ flatten (Flatten) (None, 8192) 0 _________________________________________________________________ dense_2 (Dense) (None, 256) 2097408 _________________________________________________________________ dense_3 (Dense) (None, 1) 257 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 16812353 Trainable params: 16812353 Non - trainable params: 0 _________________________________________________________________Copy the code

Note that in this pre-training model approach, it is important to freeze the convolution basis, that is, to tell the network not to update the parameters of the convolution basis during the training process! This is very important because otherwise the pre-trained model will be broken and you will end up training from scratch, which will lose the point of using pre-training.

# Freeze the convolution basis
conv_base.trainable = False
Copy the code

By doing so, the model is trained to constantly update the weight of the Dense layer:

model.summary()
Copy the code
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Model) (None, 4, 4, 512) 14714688 _________________________________________________________________ flatten (Flatten) (None, 8192) 0 _________________________________________________________________ dense_2 (Dense) (None, 256) 2097408 _________________________________________________________________ dense_3 (Dense) (None, 1) 257 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 16812353 Trainable params: 2097665 Non - trainable params: 14714688 _________________________________________________________________Copy the code

Next, we can go to the data enhancement, training model:

Train the model end-to-end using the convolution base of the freeze

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

test_datagen = ImageDataGenerator(rescale=1./255)    # Note that test is not enhanced

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150.150),
    batch_size=20,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_dir,
    target_size=(150.150),
    batch_size=20,
    class_mode='binary')

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=2e-5),
              metrics=['acc'])

history = model.fit_generator(
    train_generator,
    steps_per_epoch=100,
    epochs=30,
    validation_data=validation_generator,
    validation_steps=50)
Copy the code

P.S. This runs on my CPU, one round takes about 15 minutes, 30 rounds, I give up. I ran this with Kaggle in 30 seconds for a round 😭 :

Found 2000 images belonging to 2 classes. Found 1000 images belonging to 2 classes. Epoch 1/30 100/100 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 30 s 300 ms/step - loss: 0.5984 acc: 0.6855 - val_loss: 0.4592 - val_acc: 0.8250... Epoch 30/30 100/100 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 30 s 302 ms/step - loss: 0.2693 acc: 0.8900 - val_loss: 0.2401 - val_acc: 0.9070Copy the code

Fine- Tuning the model

(I prefer the word fine-tuning.)

Fine-tuning complements feature extraction to further optimize the model. What fine-Tuning does is thawing the layers at the top (near the back) of the convolution basis and training the thawed layers with a new addition (the full connection classifier). This approach tweaks the higher-level abstract representations in the pre-training model (that is, those near the top) to make them more appropriate for the problem at hand.

Note that the final fully connected classifier must be trained before the Conv block at the top of the fine-tune convolution basis can be used; otherwise, the pre-training results will completely destroy the tune.

Therefore, fine-tuning needs to follow the following steps:

  1. Add our own network (such as classifier) to the base network that has been trained.
  2. Freeze-based network;
  3. Train yourself to add that part;
  4. Partial layer of unfrosted base network;
  5. Joint training thaws the layer and own part;

The first three steps are the same as feature extraction, so we’ll start with step four. Let’s first look at our VGG16 convolution basis:

conv_base.summary()
Copy the code
Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 150, 150, 3)] 0 _________________________________________________________________... (Space is limited, Specific information is omitted) _________________________________________________________________ among block5_pool (MaxPooling2D) (None, 4, 4, 512) 0 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 14714688 Trainable params: 0 Non - trainable params: 14714688 _________________________________________________________________Copy the code

We will thaw block5_conv1, block5_conv2 and block5_conv3 to finish fine-turning:

# Freeze all layers up to a certain layer

conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False
        
# Fine tuning model

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-5),    The learning rate (LR) used here is very small, in the hope that the fine tuning of the three levels of the range of change is not too large
              metrics=['acc'])

history = model.fit_generator(
    train_generator,
    steps_per_epoch=100,
    epochs=100,
    validation_data=validation_generator,
    validation_steps=50)
Copy the code

This is also a Kaggle run:

I don’t know why there is a big gap between my work and the book. I have repeatedly read the Notebook and found nothing wrong. Even if I use the Notebook given by the author to run it, it is 😂. No matter? That’s it.

Finally, let’s look at the results on the test set:

test_generator = test_datagen.flow_from_directory(
        test_dir,
        target_size=(150.150),
        batch_size=20,
        class_mode='binary')
test_loss, test_acc = model.evaluate_generator(test_generator, steps=50)
print('test acc:', test_acc)
Copy the code

The book is 97% accurate. I’m — I’m less than 95% accurate.