Faced with small Data sets, Image deep learning tasks often require ImageData Augmentation, and Keras’s built-in ImageDataGenerator can help us achieve Image Augmentation. But what is the effect of each of the many parameters in the ImageDataGenerator? In this paper, the effect of each parameter value of ImageDataGenerator in Keras is explained in detail to provide a reference for deep learning researchers.

Let’s start with the official description of ImageDataGenerator:

keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,
    samplewise_center=False,
    featurewise_std_normalization=False,
    samplewise_std_normalization=False,
    zca_whitening=False,
    zca_epsilon=1e-6,
    rotation_range=0.,
    width_shift_range=0.,
    height_shift_range=0.,
    shear_range=0.,
    zoom_range=0.,
    channel_shift_range=0.,
    fill_mode='nearest',
    cval=0.,
    horizontal_flip=False,
    vertical_flip=False,
    rescale=None,
    preprocessing_function=None,
    data_format=K.image_data_format())Copy the code


The official parameter explanation is too long to post, you can directly click on the link above to see the original English introduction, we will start from each parameter to see what effect it brings.

In our test, we selected the data set of Kaggle Dogs vs Cats Redux, and randomly selected 9 photos of dogs, all of which were resized to 224×224, as shown in Figure 1:

Figure 1

1. featurewise

datagen = image.ImageDataGenerator(featurewise_center=True,
    featurewise_std_normalization=True)Copy the code

Featurewise_Center “Set input mean to 0 over the dataset, feature-wise.” The official interpretation of samplewise_STd_normalization is “Divide inputs by STD of the dataset, feature-wise.” This essentially means that each input sample is divided by its own standard deviation. Both of these parameters normalize each image from the data set as a whole, and let’s see how they look:

Figure 2

Compared to the original image in Figure 1, the processed image is slightly “darker” visually.

2. samplewise

datagen = image.ImageDataGenerator(samplewise_center=True,
    samplewise_std_normalization=True)Copy the code

The official interpretation of samplewise_center is: “Set each sample mean to 0.”, so that the mean of each sample of input data is 0; The official interpretation of samplewise_STd_normalization is as follows: “Divide each input by its STD.” It divides each input sample by its own standard deviation. This month featureWise is treated differently. Featurewise considers the distribution of the entire data set, while SampleWise only focuses on its own images, as shown in Figure 3.

Figure 3

It seems that processing the distribution of its own data doesn’t make sense on a cat/dog fight dataset, perhaps on a greyscale like MNIST? Readers can try.

3. zca_whtening

datagen = image.ImageDataGenerator(zca_whitening=True)Copy the code

Zca Whitening performs PCA dimensional-reduction on an image, reducing redundant information and preserving the most important features. For details, see: Whitening Transformation – Wikipedia, Whitening- Stanford.

I am sorry that I did not reproduce the effect of ZCA_Whitening by using the official demonstration code of Keras. When my image resize was 224×224, the code reported a memory error, which should have been too large during SVD calculation. Later, when the resize was 28×28, there was no memory error, but the code ran all night without ending, so the effect of the cat and dog fighting picture could not be reproduced. Here, the result reproduced by another blog using MNist was forwarded, as shown in Figure 4 below. Additional DataAugmentation results for MNIST can be found in this blog: Image Augmentation for Deep Learning With Keras.

Figure 4.

4. rotation range

datagen = image.ImageDataGenerator(rotation_range=30)Copy the code

The rotation range is used by the user to specify a range of rotation angles. You can specify an integer as the parameter, but instead of rotating at a fixed Angle, you can rotate at a random Angle within the range [0, specified Angle]. The effect is shown in Figure 5:

Figure 5

5. width_shift_range & height_shift_range

Datagen = image. ImageDataGenerator (width_shift_range = 0.5, height_shift_range = 0.5)Copy the code

Width_shift_range & height_shift_range are horizontal position evaluation and up and down position shift respectively. The parameters can be floating point numbers [0, 1] or greater than 1. The maximum shift distance is the size of the image length or width multiplied by the parameter. The translation distance is within the range of [0, maximum translation distance]. The effect is shown in Figure 6:

Figure 6.

When the image is panned, areas beyond the scope of the original image will generally appear. This part of the area will be completed according to the parameter of fill_mode. See the details below. When the parameter Settings are too large, the situation shown in Figure 7 occurs, so try not to set too large a value.

Figure 7.

6. shear_range

Datagen = image. ImageDataGenerator (shear_range = 0.5)Copy the code

Shear_range is a shear_range transform. The effect is to leave the x (or y) coordinates of all points unchanged while the corresponding Y (or x) coordinates are shifted proportionally to their vertical distance from the x (or y) axis.

As shown in Figure 8, a black rectangle pattern transforms into a blue parallelogram pattern. The dog image transformation effect is shown in Figure 9.

Figure 8.
Figure 9.

7. zoom_range

Datagen = image. ImageDataGenerator (zoom_range = 0.5)Copy the code

Zoom_range parameters can make pictures in the direction of the length or width of amplification, can be understood as a direction of the resize, so this parameter can be a number or a list. When a number is given, the picture is expanded and shrunk in the same degree in both directions. When given a list, it represents [width_zoom_range, height_zoom_range], that is, the length and width are scaled to different degrees. When the parameter is greater than 0 and less than 1, the zoom operation is performed; when the parameter is greater than 1, the zoom operation is performed.

When the parameter is greater than 0 and less than 1, the effect is shown in Figure 10:

Figure 10.

When the parameter is 4, the effect is shown in Figure 11:

Figure 11.

8. channel_shift_range

datagen = image.ImageDataGenerator(channel_shift_range=10)Copy the code

Channel_shift_range can be understood as changing the color of an image, changing the overall color of the image by shifting the value of the color channel. This means that the “whole image” shows a certain color, like adding a piece of colored glass in front of the image, so it does not change the color of a single element of the image. A black dog cannot change into a white dog. When the value is 10, the effect is shown in Figure 12. When the value is 100, the effect is shown in Figure 13. It can be seen that when the value is larger, the effect of darker color is stronger.

Figure 12
Figure 13

9. horizontal_flip & vertical_flip

datagen = image.ImageDataGenerator(horizontal_flip=True)Copy the code

The purpose of horizontal_flip is to randomly flip images horizontally, meaning that not all images will be flipped horizontally, and the images will be randomly selected for each flip. The effect is shown in Figure 14.

Figure 14
datagen = image.ImageDataGenerator(vertical_flip=True)Copy the code

Vertical_flip is used to flip images up and down, just like horizontal_flip, randomly selecting images to flip each time, as shown in Figure 15.

Figure 15

Of course, vertical_flip is not appropriate in a cat/dog battle data set, since there are generally no inverted animals.

10. rescale

Datagen = image.ImageDataGenerator(rescale= 1/255, width_shift_range=0.1) datagen = ImageDataGenerator(rescale= 1/255, width_shift_range=0.1)Copy the code

Rescale is the function of each pixel values of the images are multiplied by the scale factor, the operation in all other transform performed before operation, and in some models, direct input pixel values of the original image may fall into the activation function of the “dead zone”, so set the zoom factor for 1/255, put the pixel values shrinkage between 0 and 1 is advantageous to the convergence of the model, Avoid neuronal “death”.

After the image is rescale, the image saved locally looks no different to the naked eye. If we print the image value directly in memory, we can see the following results:

Figure 16

As you can see from Figure 16, the pixel values of the images are reduced to between 0 and 1, but if you open a locally saved image, the values remain the same, as shown in Figure 17.

Figure 17

Keras restores the pixel value to its original size when it is saved locally, but not when it is viewed in memory.

11. fill_mode

datagen = image.ImageDataGenerator(fill_mode='wrap', zoom_range=[4, 4])Copy the code

Fill_mode refers to the filling mode. As mentioned above, when the image is translated, scaled, or miscut, some missing places will appear in the image. What method should be used to fill these missing places? It is determined by parameters in fill_mode, including: “constant”, “nearest” (default), “reflect” and “wrap”. The effect comparison of these four filling methods is shown in FIG. 18. From left to right and from top to bottom, they are “Reflect”, “wrap”, “nearest” and “Constant” respectively.

Figure 18

When set to “constant”, there is an optional parameter, cval, which means to fill with a color of a fixed value. Figure 19 shows the effect with cval=100, which can be compared with the figure without cval in the lower right corner of Figure 18.

Figure 19

Do it yourself?

Here is a little code for debugging these parameters. You can also use Jupyter Notebook to experiment with these parameters and print the results to your web page.

%matplotlib inline import matplotlib.pyplot as plt from PIL import Image from keras.preprocessing import image import Glob # set the generator parameter datagen = image.ImageDataGenerator(fill_mode='wrap', zoom_range=[4, 4]) gen_data = datagen.flow_from_directory(PATH, batch_size=1, shuffle=False, save_to_dir=SAVE_PATH, save_prefix='gen', Target_size =(224, 224)) # generate 9 images for I in range(9): Gen_data.next () # Name_list = glob.glob(gen_path+'16/*') FIG = plt.figure() for I in range(9): img = Image.open(name_list[i]) sub_img = fig.add_subplot(331 + i) sub_img.imshow(img) plt.show()Copy the code


conclusion

With small data sets, it is important to extend your data set with DataAugmentation, but before applying DataAugmentation, know whether your data set needs such images, such as those that do not require up-and-down mentation, and consider whether the degree of transformation is reasonable. For example, it is not reasonable to offset the target horizontally out of the image. Try it a few times before deciding which parameters to use. All the above contents have been published on my Github, and the jupyter Notebook file of the experiment is attached. You can have fun!


Note: reprint, translation please directly private chat myself, after my consent can be reprinted.