This paper defines the concept of multi-label image classification and explains how to build multi-label image classification model.

introduce

Are you processing image data? We can use computer vision algorithms to do a lot of things:

  • Object detection
  • Image segmentation
  • Image translation
  • Object tracking (in real time), and more…

This got me thinking — what do we do if we have multiple object classes in an image? Making an image classification model is a good start, but I want to expand my horizons to take on a more challenging task – building a multi-label image classification model!

Make an image classification model

www.analyticsvidhya.com/blog/2019/0…

I don’t want to use a simple toy dataset to build my model — that’s too generic. Then, it struck me — movie/TV posters that included all kinds of people. Can I build my own multi-label image classification model to predict different genres just by looking at posters?

The simple answer is yes! In this article, I explain the idea behind multi-label image classification. We will build our own model using movie posters. You will be surprised by the impressive results our model produces. If you’re a fan of The Avengers or Game of Thrones, there’s a great surprise for you in the implementation section.

Excited? Great, let’s get started!

directory

  1. What is multi-label image classification?
  2. What is the difference between multi-label image classification and multi-category image classification?
  3. Understand the architecture of multi-label image classification model;
  4. The steps of constructing multi-label image classification model;
  5. Case study: Using Python to solve multi-label image classification problem;

6. Next steps and your experiment;

7. The tail.

1. What is multi-label image classification?

Let’s use an intuitive example to understand the concept of multi-label image classification. Take a look at the pictures below:

The object in Figure 1 is a car. That’s obvious. However, in Figure 2 there are no cars, just a group of buildings. Can you see what we’re going to do? We divide the image into two categories, namely, with or without a car.

When we only have two types of images to classify, this is called the binary image classification problem.

Let’s look at another picture:

How many objects can you identify in this picture? There are so many — houses, ponds with fountains, trees, rocks and so on. Therefore, when we can classify an image into multiple classes (as shown in the figure above), it is called multi-label image classification problem.

Now, here’s the problem — most of us are confused by multi-label and multi-category image categorization. When I first came across these terms, I was confused, too. Now that I have a better understanding of the two topics, let me clarify the differences for you.

2. What are the differences between multi-label image classification and multi-category image classification?

Suppose we were given some pictures of animals and asked to put them into their respective categories. For the sake of understanding, let’s assume that a given image can be divided into four categories (cat, dog, rabbit, and parrot). Now, there are two possible scenarios:

  • Each image contains only one object (any of the four categories above), so it can only be grouped into one of the four categories.
  • An image may contain more than one object (from the four categories above), so the image will belong to more than one category.

Let’s look at each situation through examples, starting with the first scenario:

Here, each of our images contains only one object. Keen you will notice that there are four different types of objects (animals) in this collection.

Each image here can only be classified as cat, dog, parrot or rabbit. There is no case where one image belongs to more than one category.

  • When the image can be classified into more than two categories
  • An image does not belong to more than one category

If the above two conditions are met, it is called multi-class image classification problem.

Now, let’s consider the second case — look at the following image:

  • The first image (top left) contains a dog and a cat
  • The second image (top right) includes a dog, a cat and a parrot
  • The third picture (lower left) contains a rabbit and a parrot, as well
  • The last image (bottom right) contains a dog and a parrot

These are all labels for a given image. Each image here belongs to more than one class, so it is a multi-label image classification problem.

These two cases should help you understand the difference between multi-category and multi-label image classification. If you need further clarification, please contact me in the comments section below this article.

Before moving on to the next section, I suggest you read through this article – build your first image classification model in 10 minutes! It will help you understand how to solve a multi-class image classification problem.

Build your first image classification model in 10 minutes:

www.analyticsvidhya.com/blog/2019/0…

3. Steps of constructing multi-label image classification model

Now that we have an intuitive understanding of multi-label image classification, let’s dive into the steps you should follow to solve this problem.

The first step is to get the data in a structured format. This applies to both binary image classification and multi-category image classification.

You should have a folder that contains all the images you want to train the model on. Now, in order to train the model, we also need the actual label of the image. Therefore, you should also have a.csv file that contains the names of all the training images and their corresponding real labels.

We will learn how to create this.csv file later in this article. Now, just remember that the data should be in a particular format. With the data ready, we can divide the further steps as follows:

Load and preprocess data

First, all images are loaded and then preprocessed according to the needs of the project. To check how our model will perform against invisible data (test data), we create a validation set. We train our models on training sets and validate them using validation sets (standard machine learning methods).

Define the structure of the model

The next step is to define the structure of the model. This includes determining the number of hidden layers, the number of neurons in each layer, the activation function, and so on.

Training model

It’s time to train our model on the training set! We input the training image and its corresponding true label to train the model. We also pass in validation images here to help us verify the performance of the model on invisible data.

Make a prediction

Finally, we use the trained model to predict the new image.

4. Understand the structure of multi-label image classification model

Now, the preprocessing step of multi-label image classification task will be similar to the preprocessing step of multi-class problem. The key difference lies in the steps we take to define the structure of the model.

For multi-class image classification model, softmax activation function is used in the output layer. For each image, we want to maximize the probability of a single class. As the probability of one class increases, the probability of the other class decreases. So, we can say that the probability of each class depends on the other classes.

However, in the case of multi-label image classification, a single image can have multiple labels. We want the probabilities to be independent of each other. Using the Softmax activation function is not appropriate. Instead, we can use the SigmoID activation function. This will independently predict the probability of each class. It will internally create n models (where n is the total number of classes), one model per class, and predict the probability of each class.

Sigmoid activation function is used to transform the multi-label problem into n-dichotomy problem. So for each image, we’ll get the probability to determine whether the image is in the first category, and so on. Since we have converted this to an N-dichotomy problem, we will use the binary_cross-sentropy loss. Our goal is to minimize this loss in order to improve the performance of the model.

This was the major change we had to make when defining the model structure for solving the multi-label image classification problem. The training section will be similar to a multi-class problem. We will pass in the training image and its corresponding real label, as well as the validation set to verify the performance of the model.

Finally, we will take a new image and use the trained model to predict the label of the image. Are you with me?

5. Case study: Solving multi-label image classification problems with Python

Congratulations on coming this far! Your prize – Solving a horrible multi-label image classification problem in Python. It’s time to launch your favorite Python IDE!

Let’s be clear about the problem statement. Our goal is to predict the genre of a movie from its poster image. Can you guess why this is a multi-label image classification problem? Think about it before you look down.

A movie can be in many genres, right? It doesn’t just fall into one category, like action or comedy. Movies can be a combination of two or more genres. Therefore, it is multi-label image classification.

The data set we will use contains poster images of multiple multi-genre movies. I made some changes to the data set and converted it to a structured format, a folder containing images and a.CSV file storing the real tags. You can download structured data sets from here. Here are some posters from our data set:

here

Drive.google.com/file/d/1dNa…

If you wish, you can download the original data set and baseline truth values here.

here

www.cs.ccu.edu.tw/~wtchu/proj…

Let’s start programming!

First, import all the Python libraries you need:

  1. import keras
  2. from keras.models import Sequential
  3. from keras.layers import Dense, Dropout, Flatten
  4. from keras.layers import Conv2D, MaxPooling2D
  5. from keras.utils import to_categorical
  6. from keras.preprocessing import image
  7. import numpy as np
  8. import pandas as pd
  9. import matplotlib.pyplot as plt
  10. from sklearn.model_selection import train_test_split
  11. from tqdm import tqdm
  12. %matplotlib inline
Now, read the.csv file and look at the first five lines:

  1. train = pd.read_csv(‘multi_label_train.csv’) # reading the csv file
  2. train.head() # printing first five rows of the file

There are 27 columns in this file. Let’s print the names of these columns:

  1. train.columns

The Genre column contains a list of each image, specifying the type of movie to which each image corresponds. So, starting at the head of the.csv file, the first image types are comedy and drama.

The remaining 25 columns are unique heat code columns. Therefore, if a movie is an action type, its value will be 1, otherwise it will be 0. Each image can belong to 25 different types.

We will build a model that returns the given movie poster type. But before that, do you remember the first step in building an image classification model?

That’s right — correctly loading and preprocessing data. So, let’s take a look at all the training pictures:

  1. train_image = []
  2. for i in tqdm(range(train.shape[0])):
  3. Img = image. Load_img (‘ Multi_Label_dataset/Images / + train ‘Id’ + ‘JPG’, target_size = (400400, 3))
  4. img = image.img_to_array(img)
  5. img = img/255
  6. train_image.append(img)
  7. X = np.array(train_image)

Take a quick look at the array shape:

  1. X.shape

There are 7254 poster images, all of which have been converted to (400,300,3) shapes. Let’s draw and visualize one of these images:

  1. plt.imshow(X[2])

This is the poster for the movie Trading Place. Let’s export the genre of this movie:

  1. train’Genre’

This movie has only one genre — comedy. The next step required for our model is real labels for all images. Can you guess what the shape of the real label of these 7,254 images is?

Let’s take a look. We know that there are 25 possible types. For each image, we will have 25 targets for whether or not the movie falls into that category. Therefore, all 25 targets have values of 0 or 1.

We will remove the Id and Genre columns from the training file and convert the remaining columns to the array that will be the target of our image:

  1. y = np.array(train.drop([‘Id’, ‘Genre’],axis=1))
  2. y.shape

The shape of the output array is (7254,25), as expected. Now, let’s create a validation set that will help us check the performance of our model on invisible data. We randomly separate 10% of the images as our verification set:

  1. X_train, X_test, y_train, y_test =
  2. Train_test_split (X, y, random_state = 42, test_size = 0.1)

The next step is to define the model structure. The output layer will have 25 neurons (equal to the number of types) and we will use sigmoID as the activation function.

I will use some structure (as shown below) to solve this problem. You can also modify this schema by changing the number of hidden layers, activation functions, and other hyperparameters.

  1. model = Sequential()
  2. Model. Add (Conv2D(filters=16, kernel_size=(5, 5), activation=”relu”, input_shape=(400,400,3))
  3. model.add(MaxPooling2D(pool_size=(2, 2)))
  4. Model. The add (Dropout (0.25))
  5. model.add(Conv2D(filters=32, kernel_size=(5, 5), activation=’relu’))
  6. model.add(MaxPooling2D(pool_size=(2, 2)))
  7. Model. The add (Dropout (0.25))
  8. model.add(Conv2D(filters=64, kernel_size=(5, 5), activation=”relu”))
  9. model.add(MaxPooling2D(pool_size=(2, 2)))
  10. Model. The add (Dropout (0.25))
  11. model.add(Conv2D(filters=64, kernel_size=(5, 5), activation=’relu’))
  12. model.add(MaxPooling2D(pool_size=(2, 2)))
  13. Model. The add (Dropout (0.25))
  14. model.add(Flatten())
  15. model.add(Dense(128, activation=’relu’))
  16. Model. The add (Dropout (0.5))
  17. model.add(Dense(64, activation=’relu’))
  18. Model. The add (Dropout (0.5))
  19. model.add(Dense(25, activation=’sigmoid’))

Let’s display our model summary:

  1. model.summary()

There are quite a few parameters to learn! Now compile the model. I’ll use binary_Crossentropy as the loss function and ADAM as the optimizer (again, you can use other optimizers as well) :

  1. model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])

Finally, our most interesting part — the training model. We will train the model for 10 cycles and pass in the validation data we created earlier to verify the model’s performance:

  1. model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), batch_size=64)

We can see that training losses are down to 0.24 and verification losses are down as well. What’s next? It’s time to make predictions!

To all “GoT” and “Avengers” fans out there — this is a gift for you. Let me get GoT and Avengers posters and provide them to our model. Download the GOT and Avengers posters before continuing.

GOT

Drive.google.com/file/d/1cfI…

Avengers

Drive.google.com/file/d/1buN…

Before making predictions, we need to preprocess these images using the same steps we saw earlier.

  1. Img = image. Load_img (‘ GOT. JPG, target_size = (400400, 3))
  2. img = image.img_to_array(img)
  3. img = img/255

Now, we will use our trained model to predict the types of posters. The model will tell us the probability of each type, from which we will get the first three predictions.

  1. classes = np.array(train.columns[2:])
  2. Proba = model predict (img. Reshape (1400400, 3))
  3. top_3 = np.argsort(proba[0])[:-4:-1]
  4. for i in range(3):
  5. print(“{}”.format(classes[top_3[i]])+” ({:.3})”.format(proba0]))
  6. plt.imshow(img)

That’s great! Our model predicts drama, thriller and action genres for Game of Thrones. In my opinion, this classification is very good. Let’s try out our model on the Avengers poster. Image preprocessing:

  1. Img = image. Load_img (‘ avengers. Jpeg, target_size = (400400, 3))
  2. img = image.img_to_array(img)
  3. img = img/255

Then make a prediction:

  1. classes = np.array(train.columns[2:])
  2. Proba = model predict (img. Reshape (1400400, 3))
  3. top_3 = np.argsort(proba[0])[:-4:-1]
  4. for i in range(3):
  5. print(“{}”.format(classes[top_3[i]])+” ({:.3})”.format(proba0]))
  6. plt.imshow(img)

The types given by our model are drama, action, and horror. Again, these are very accurate results. Would the model do as well with Hollywood’s categories? Let’s take a look. We’re going to use this poster for Golmal 3.

You know what to do at this stage – load and preprocess the image:

  1. Img = image. Load_img (‘ golmal. Jpeg, target_size = (400400, 3))
  2. img = image.img_to_array(img)
  3. img = img/255

Then predict the movie type for this poster:

  1. classes = np.array(train.columns[2:])
  2. Proba = model predict (img. Reshape (1400400, 3))
  3. top_3 = np.argsort(proba[0])[:-4:-1]
  4. for i in range(3):
  5. print(“{}”.format(classes[top_3[i]])+” ({:.3})”.format(proba0]))
  6. plt.imshow(img)

Golmaal 3 is a comedy that our model predicts will be the most popular genre. The other categories were drama and romance — relatively accurate assessments. We can see that the model can predict the type of movie from the poster alone.

6. Next steps and experiment on your own

This is how to solve the problem of multi-label image classification. Even though we only had about 7,000 images to train the model, our model performed very well.

You can try collecting more training posters. My recommendation is to have a relatively equal distribution of data sets for all genre categories. Why is that?

If a type is repeated in most training images, then our model may overmatch that type. For each new image, the model might predict the same type. To overcome this problem, you should try to have a balanced distribution of genre categories.

These are some of the key points where you can try to improve model performance. Can you think of anything else? Tell me!!!!!!!

7. The tail

In addition to genre prediction, multi-label image classification has many applications. For example, you can use this technique to automatically tag images. Suppose you want to predict the type and color of clothing in the image. You can build a multi-label image classification model that will help you predict both at the same time!


Article from Ali Cloud developer community

The original link: developer.aliyun.com/article/715…