Data input is like water, flowing like a waterfall, blocked like tap water in a university dormitory.

If you’ve ever taken Jeremy Howard and Rachel Tomas’s fast. Ai course, the first part of the course asks you to throw images into a folder and call the custom “get_batch” and “get_data” functions to input data.

Here are some ways to use Keras for data entry, because this is the most basic part, but many beginners tend to get lost in this part, and if set up properly, it can speed up your training process.

First of all, it is clear that the model cannot directly convolve images. It must be converted into a NUMPY array before it can be entered into the model. Moreover, if the image sizes of the data set are not uniform, there will be different operation details.

One, single picture input

Start with the simplest, converting the image into a Numpy array. Single image input can only be used for model prediction.

Application:


Number of pictures: single


Application scenario: Model prediction


Picture size: uniform

import numpy as np
from keras.utils import to_categorical
from keras.preprocessing import image
from keras.applications.resnet50 import ResNet50

file_path = '/image/dogs_001.jpg'
img = image.load_img(file_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)

model = Resnet50(weights='imagenet')
model.predict(x)Copy the code

Keras calls pillow’s Image function to extract the Image with the specified file_path, then uses img_to_array to convert the Image into a NUMpy array with shape (224, 224, 3). Expand_dims converts shape(224, 224, 3) into (1, 224, 224, 3). The model itself requires the input size to be (None, 224, 224, 3). None means Batch, which means that the model does not know how many images you want to input, so it is used as None. When you input images, shape must be the same as the shape entered by the model.

Two, many pictures input

The following code applies to model predictions. The idea of entering multiple images is also very simple, which is to load all images under the same category with a for loop, then add them to a list, and concatenate them.

Application:


Number of pictures: multiple


Application scenario: Model prediction


Picture size: uniform

import numpy as np from keras.preprocessing import image from keras.applications.resnet50 import ResNet50 import glob file_path = 'D:/Data/dogs/' f_names = glob.glob(file_path + '*.jpg') imgs = [] for i in range(len(f_names)): # f_names for all image addresses, list img = image.load_img(f_names[i], target_size=(224, Img_to_array (img) # convert images to arrays arr_img = np.expand_dims(arr_img, Axis =0) # Add the first batch dimension imgs.append(arr_img) # add the image array to a list print("loading no.%s image."% I) x = np.concatenate([x for x In imgs]) print("predicting...") ) model = ResNet50(weights='imagenet') y = model.predict(x) print("Completed!" )Copy the code

And what concatenate does is you take every picture that’s in shape 0, 224, 224, 3 and then you put it into a tensor that’s in shape Batch, 224, 224, 3, so you can do batch prediction or batch training.

Third, generator input

In many cases, you can’t use these methods to directly input data for training or prediction because your data set is too large to fit all the images into memory. The Data Generator of Keras comes in handy. When your model needs training data, the generator automatically generates a batch of images from the CPU and feeds them to the GPU for the model to train until the training is complete.

Application:


Photo Quantity: Large quantity


Applicable scenarios: model prediction and training


Picture size: not uniform

from keras.preprocessing import ImageDataGenerator from keras.applications.resnet50 import ResNet50 trn_path = '/image/ TRN /' val_path = '/image/val/' # define a generator = ImageDataGenerator() # Assume that nothing is done to the original image # Generator generates images from the specified path trn_data = generator.flow_from_directory(trn_path, batch_size=32, target_size=(224, 224)) val_data = generator.flow_from_directory(val_path, batch_size=32, target_size=(224, Model.compile (optimizers=' Adam ', loss='catagorical_crosscentropy', Metrics =['accuracy']) # training model model.fit_generator(train_generator, # training set generator steps_per_epoch=2000, Batch epochs=50, validATION_data = validATION_generator, validATION_steps =800Copy the code

To use the generator, first design the generator’s “functions” (ImageDataGenerator()), what you want the generator to be able to do with your original image, such as rotate, shrink, pan, color change, etc. I’m not going to do anything here by default. For those of you interested, check out another of my articles detailing generator features: Too few image data sets? See my seventy-two changes, and I have mentation of Keras Image Data Augmentation

The flow_from_directory function is to create a generator based on the desired function. This generator will generate your data from a path (trn_path, val_path), 32 images at a time (batch_size). And all images are of size (224, 224).

The next step, of course, is to define the model, and finally put the generator into a machine called “FIT_generator”.

Slow!

Keras has a special request for the folder path above. For example, if I have “cats” and “dogs” folders under “/image/ TRN /”, the Keras generator will know that I have two categories and it will automatically generate labels for me without having to define them. If I locate the path to the “/image/ TRN /cats/” folder with 1000 cat pictures, Keras will think I have 1000 categories! So locate the path to the one that contains all category folders, and Keras will automatically generate as many labels as there are folders in this path!

The scope of application above says that “picture size can not be unified”, but if the picture size is not unified, it can not be generated in a batch, please see section 4.

Four, the picture size is not unified input

If your model is input with variable size, that is, the input_shape of the model is (None, None, None, 3), you cannot batch the images of different sizes for input, because the model can only accept one size during prediction or training. If there are multiple sizes in batch, the model will not know which size to use to calculate the output shape. Currently TensorFlow and Keras (TF backend) really cannot be resized, other frameworks are unclear. So you can only enter it picture by picture.

1. The most primitive way, for loop method one by one prediction:

import blablabla...... Def read_image(path): f_names = glob.glob(path + '*.jpg') arr_list = [] for I in range(len(f_names)): # f_names for all image addresses, List img = image.load_img(f_names[I]) # image.img_to_array(img) # image.img_to_array(img Np.expand_dims (arr_img, axis=0) # add the first batch dimension arr_list. Append (arr_img) return arr_list def predict_image(model, img_arr_list): preds = [] for i in range(len(img_arr_list)): pred = model.predict(img_arr_list[i], batch_size=1, verbose=0) preds.append(pred) return preds ## model define blablabla...... model = Resnet50() path = '/image/test/' arr_list = read_image(path) preds = predict_image(model, arr_list)Copy the code

2. In the form of generator, sheet by sheet:

from keras.preprocessing import ImageDataGenerator from keras.applications.resnet50 import ResNet50 trn_path = '/image/ TRN /' val_path = '/image/val/' # define a generator = ImageDataGenerator() # Assume that nothing is done to the original image # Generator generates images from the specified path trn_data = generator.flow_from_directory(trn_path, Val_data = generator.flow_from_directory(val_path, Batch_size =1) # pile(optimizers=' Adam ', Measurement =['accuracy']) # fit_generator(train_generator, Steps_per_epoch =2000; batch epochs=50; # Validation_steps =800 # validation_stepsCopy the code

Generator hardly needs to be modified much, just change the batch_size to 1, and generator is recommended. The Generator can also be set up to input data in multi-process and multi-thread mode, which can speed up training and reduce GPU wait time. See FancyKeras- Data Input (Fancy) for details.


If you like this column, please follow and share it with your friends, the author will have more motivation to write!