Original link:tecdat.cn/?p=6714

Original source:Tuo End number according to the tribe public number

 

Having to train an image classification model with very little data is a common situation that you may encounter in practice if you work in computer vision in a professional environment. A “handful” sample can mean anything from a few hundred to tens of thousands of images. As a practical example, we focus on a dataset that classifies images as dogs or cats, which contains 4,000 cat and dog images (2,000 cats, 2,000 dogs). We will use 2,000 images for training – 1,000 for verification and 1,000 for testing.

 

Relevance of deep learning to small data problems

You’ll sometimes hear that deep learning only works when there’s a lot of data available. This part works: one of the basic features of deep learning is that it can find interesting features in training data on its own, without the need for manual feature engineering, which is only possible when a large number of training samples are available. This is especially true for problems with very high-dimensional input samples, such as images.

Let’s start with the data.

Download the data

Use the Dogs vs. Cats data set.

Here are some examples:

 

 

This dataset contains 25,000 dog and cat images (12,500 per category), 543 MB. Once downloaded and unzipped, you will create a new data set with three subsets: a training set with 1,000 samples per class, a validation set with 500 samples per class, and a test set with 500 samples per class.

Here is the code to do this:

Original_dataset_dir < - "~ / Downloads/kaggle_original_data" base_dir < - "~ / Downloads / Cats_and_dogs_small "dir.create (base_dir) train_dir < -file. path (base_dir, "Train") dir.create (train_DIR) validation_dir < -file. Path (base_dir, "Validation")Copy the code

Use a pre-trained convnet

A common and efficient approach to deep learning on small image data sets is to use pre-training networks. A pre-trained network is a saved network that has been previously trained on large data sets, usually for large-scale image classification tasks. If the original data set is large enough and general enough, compared with the training of network learning characteristic space hierarchy can be effectively ACTS as a general model of the visual world, so its characteristics can be proved useful for many different computer vision problems, even though these new problems may be involved in the class of totally different from the original task.

There are two ways to use pre-training networks: feature extraction and fine-tuning. Let’s start with feature extraction.

Feature extraction

Feature extraction involves extracting features of interest from new samples using representations previously learned from the network. These functions are then run through a new classifier that is trained from scratch.

Why only reuse convolution cardinals? Can you reuse densely connected classifiers? In general, this should be avoided. The reason is that the representation of convolution based learning is likely to be more general and therefore more reusable.

Note that the level of generality (and therefore reusability) of the representation extracted by a particular convolution layer depends on the depth of layers in the model. Layers that appear earlier in the model extract local, highly generic feature maps (such as visible edges, colors, and textures), while higher-level layers extract more abstract concepts (such as “cat ears” or “dog eyes”). Therefore, if your new data set is very different from the one used to train the original model, it is best to use only the first few layers of the model for feature extraction, rather than the entire convolution foundation.

 

Let’s do this by using the convolution foundation of the VGG16 network trained on ImageNet to extract interesting features from cat and dog images and then train dog and cat classifiers on top of those features.

 

Let’s instantiate the VGG16 model.

Conv_base < -application_vgg16 (weights = "imagenet", include_TOP = FALSE, input_shape = c (150,150,3))Copy the code

Pass three arguments to the function:

  • Weights specifies the weights from which the model is initialized.
  • Include_top “Dense connections” refers to a classifier that includes (or does not include) dense connections at the top of the network. By default, this densely linked classifier corresponds to ImageNet’s 1,000 classes.
  • Input_shape is the shape of the image tensor that you will supply to the network. This parameter is optional: if you do not pass it, the network will be able to handle input of any size.

It’s similar to the simple web you’re already familiar with:

summary(conv_base) Layer (type) Output Shape Param # ================================================================ input_1 (InputLayer) (None, 150, 150, 3) 0 ________________________________________________________________ block1_conv1 (Convolution2D) (None, 150, 150, 64) 1792 ________________________________________________________________ block1_conv2 (Convolution2D) (None, 150, 150, 64) 36928 ________________________________________________________________ block1_pool (MaxPooling2D) (None, 75, 75, 64) 0 ________________________________________________________________ block2_conv1 (Convolution2D) (None, 75, 75, 128) 73856 ________________________________________________________________ block2_conv2 (Convolution2D) (None, 75, 75, 128) 147584 ________________________________________________________________ block2_pool (MaxPooling2D) (None, 37, 37, 128) 0 ________________________________________________________________ block3_conv1 (Convolution2D) (None, 37, 37, 256) 295168 ________________________________________________________________ block3_conv2 (Convolution2D) (None, 37, 37, 256) 590080 ________________________________________________________________ block3_conv3 (Convolution2D) (None, 37, 37, 256) 590080 ________________________________________________________________ block3_pool (MaxPooling2D) (None, 18, 18, 256) 0 ________________________________________________________________ block4_conv1 (Convolution2D) (None, 18, 18, 512) 1180160 ________________________________________________________________ block4_conv2 (Convolution2D) (None, 18, 18, 512) 2359808 ________________________________________________________________ block4_conv3 (Convolution2D) (None, 18, 18, 512) 2359808 ________________________________________________________________ block4_pool (MaxPooling2D) (None, 9, 9, 512) 0 ________________________________________________________________ block5_conv1 (Convolution2D) (None, 9, 9, 512) 2359808 ________________________________________________________________ block5_conv2 (Convolution2D) (None, 9, 9, 512) 2359808 ________________________________________________________________ block5_conv3 (Convolution2D) (None, 9, 9, 512) 2359808 ________________________________________________________________ block5_pool (MaxPooling2D) (None, 4, 4, 512) 0 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 14714688 Trainable params: 14714688 Non - trainable params: 0Copy the code

At this point, there are two ways to continue:

  • Run the convolution on the data set.
  • Conv_base extends your model by adding a dense layer at the top ().

In this article, we’ll cover the second technique in detail. Note that you should only try if you have access to the GPU.

Feature extraction

Because models behave like layers, you can add models (such as conv_base) to a sequential model just as you would add layers.

Model < -keras_model_sequential () % > % conv_base % > % layer_flatten () % > % layer_dense (= 256, Activation = "relu") % > % layer_dense (u its =, "sigmoid")Copy the code

This is what the model looks like now:

summary(model) Layer (type) Output Shape Param # ================================================================ vgg16 (Model) (None, 4, 4, 512) 14714688 ________________________________________________________________ flatten_1 (Flatten) (None, 8192) 0 ________________________________________________________________ dense_1 (Dense) (None, 256) 2097408 ________________________________________________________________ dense_2 (Dense) (None, 1) 257 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 16812353 Trainable params: 16812353 Non - trainable params: 0Copy the code

As you can see, the convolution cardinality of VGG16 has 14,714,688 parameters, which is very large.

In Keras, freeze the network using the following freeze_weights() function:

 

Freeze_weights (conv_base) length (model $trainable_weights)Copy the code

Using data augmentation

Overfitting occurs when too many samples need to be learned, making it impossible to train models that can be generalized to new data.

In Keras, this can be done by configuring multiple random transforms to be performed on the read image, image_data_generator(). Such as:

Train_datagen = image_data_generator (rescale = 1/255, = 40, width_shift_range = 0.2, height_shift_range = 0.2, = 0.2, Zoom_range = 0.2, Horizono = TRUE, fill_mode = "nearest")Copy the code

Take a look at this code:

  • Rotation_range is a number of degrees (0-180), a range of randomly rotated images.
  • Width_shift and height_shift is a range of randomly panned images vertically or horizontally.
  • Shear_range is used to apply shear transforms randomly.
  • Zoom_range is used to scale the inside of an image randomly.
  • Horizontal_flip is used to flip half an image horizontally at random – relevant when there are no assumptions of horizontal asymmetry (for example, real-world images).
  • Fill_mode is the strategy used to fill newly created pixels, which can appear after rotation or width/height offset.

Now we can use the image data generator to train our model:

Model % > % compile (loss = "binary_crossentropy", optimizer = Optimizer_RMsprop (lr = 2E-5), Metrics = c (" accuracy ") history < -model % > % FIT_generator (train_generator, steps_per_epoch = 100,Copy the code

Draw the result. The accuracy rate is about 90%.

fine-tuning

Another widely used model reuse technique, which complements feature extraction, is fine-tuning, and the steps to fine-tune the network are as follows:

  • Add a custom network to the already trained base network.
  • Freeze the basic network.
  • Train the parts you add.
  • Unfreeze some layers in the underlying network.
  • Train these layers in conjunction with the parts you add.

When you do feature extraction, you have completed the first three steps. Let’s move on to step 4: You will unfreeze your content conv_base, and then freeze the layers within it.

Now you can start fine-tuning the network.

Model % > % compile (lo ropy ", optimizer = opt imizer_rmsprop (lr = 1E-5), Metrics = C (" accuracy ") his el % > % FIT_generator (train_ g steps_per_epoch = 100, epochs = 100, Validation_data = VALIDation_genera tor, validation_steps = 50Copy the code

Let’s plot the result:

,

You can see a 6 percent improvement in accuracy, from about 90 percent to more than 96 percent.

You can now finally evaluate this model on test data:

Test_generator < - (test_dir, test_datagen, target_size = c (150,150), batch_size = 20, Evaluate_generator (, steps = 50)Copy the code

 

$Loss [1] 0.2158171 $ACC [1] 0.965Copy the code

Here, you can get 96.5% test accuracy.

 

Thank you very much for reading this article, please leave a comment below if you have any questions!