Do you really know Image Classification?

Welcome to follow my personal wechat public number: Little paper crumbs

Image classification is the classic problem of machine learning, and it is also the work of deep learning. It was AlexNet’s extraordinary performance in image classification contest ImageNet in 2012 that made deep learning become popular overnight and led to the explosion of artificial intelligence today.

Since deep learning is extremely successful in image classification and the code is simple, image classification becomes an introductory learning task for deep learning. It usually takes a few lines of code to call up a model training. Because it is too simple, we lack an overall understanding of picture classification.

This paper intends to fully display the classic problem of image classification from the perspectives of problem definition, granularity of image classification, common data sets, evaluation criteria and classic papers, so as to give readers a complete impression.

Problem definition

The definition of picture classification is as follows:

Enter: a picture

Output: Picture category

The picture category here refers to the category of objects contained in the picture.

In traditional image classification, an image contains only one or more instances of a category object. In multi-label image classification, an image generally contains one or more instances of multiple categories of objects (such as multi-label cat and dog image classification, most photos contain not only cats, but also dogs).

We’re just going to focus on the traditional image categories. Here are a few examples:

Enter the above image, the correct output should be category 9 (image from dataset MNIST)

Granularity of image classification

Coarse-grain classification

Coarse-grained image classification refers to the classification of cross-species semantic level, including cat and dog classification, ImageNet classification of 1000 categories, CIFAR10 classification of 10 categories, etc. Such image classification, because each category belongs to different species or categories, tends to have large inter-class variance and small intra-class variance.

The following is a schematic of the 10 categories in CIFAR10, which is a typical example.

Fine grain classification

Fine-grained image classification is at a lower level than cross-species image classification. It is often the classification of subclasses in the same big category, such as the classification of different birds, different dogs, different car types and so on.

The following is an example of inter-subclass classification of birds. Fine-grained classification has small inter-class variance due to the similarity between classes, while intra-class variance is large due to attitude, optical fiber and other problems.

Instance level classification

If we want to distinguish between individuals, not just species or subclasses, it’s a recognition problem, or instance level image classification, and the typical task is face recognition.

Common data set

MNIST

MNIST is a handwritten data set, every picture is one of 0-9 numbers, the resolution is 28×28, all the pictures are grayscale, so the pictures are tensor is 28x28x1, the training set is 60,000 pictures, the test set is 10,000 pictures. Here are 16 MNIST samples.

Fashion-MNIST

Fashion MNIST is proposed for the convenience of scholars to test the effectiveness of the algorithm. Because the features of MNIST data set are too simple, it is hard to say whether the algorithm is improved or the data set is too simple. Fashion MNIST is exactly the same with MNIST except that the category is changed into clothing and shoes, with the same resolution of 28×28, gray scale, training set of 60000 and test set of 10000. Basically, the algorithms tested in MNIST can be tested and trained on fashion-Mnist without any modification. Below are 30 samples of Fashion MNIST.

CIFAR-10

The CIFAR-10 dataset consists of 10 categories of 32×32 color images containing a total of 60,000 images, with each category containing 6,000 images. 50000 images were used as a training set and 10000 as a test set.

Cifar-10 data set was divided into 5 training batches and 1 test batch, and each batch contained 10000 images. The test set Batch consists of 1,000 randomly selected images from each category, and the training set Batch consists of the remaining 50,000 images in a random order. However, some training sets may contain more images of one type than others. The training set contains 5000 images from each category, making a total of 50000 training images.

Here are the classes in the dataset, along with 10 random images from each class:

CIFAR-100

The CIFAR-100 dataset is just like CIFAR-10 except that it has 100 classes, each containing 600 images. There were 500 training images and 100 test images in each category. The 100 classes in CIFAR-100 are divided into 20 superclasses. Each image comes with a “fine” tag (the class it belongs to) and a “rough” tag (the superclass it belongs to)

ImageNet

The ImageNet dataset itself has about 15 million photos and 22,000 classes. Imagenet-based large-scale computer vision Challenge ISLVRC consists of five tasks, Image Classification, Image Localization, Object Detection, Video Object Detection and scene Classification.

Due to its reputation for image classification task, ImageNet is commonly referred to, usually referring to its image classification sub-task, which has a total of 1,000 categories, 1,281,167 photos in the training set, 50,000 in the verifier, and 100,000 photos in the test set, each of which is of different sizes and color images.

Performance criteria

The criteria for image classification are relatively simple.

Top1 accuracy

For a picture, take the maximum probability as a prediction. The rate of correct prediction under statistics is top1 accuracy.

Top5 accuracy

A picture is considered correct if the first five probabilities contain correct answers. Then, the rate of correct prediction is counted as top5 accuracy rate.

Classic algorithms

Image classification task is the basic task of computer vision. A large number of classical image classification algorithms greatly promote the development of computer vision. The time axis of each classical algorithm is shown in the figure below, and the time points of each classical model are given in the figure.

The image above looks a bit messy, put the same series of models together and rearrange them as shown below:

I’ll just list the classical algorithms, and I’ll go through the classical papers one by one. I also recommend that you read every one of them yourself (except LeNet). It’s a classic.

conclusion

In this paper, the classical computer vision task of image classification is presented from the perspectives of problem definition, granularity of image classification, common data sets, evaluation criteria and classical papers. I believe that a complete understanding can make everyone better start learning deep learning.