Kaggle – So Easy! One hundred lines of code to achieve the Top 5% image classification competition

\

\

Explanation: This article was originally published by Teacher Peng authorized Zhai Huiliang on the public account “July Online Lab”, and is now edited by July and published on this blog. Making: github.com/pengpaiSH/K…

\

preface

According to my personal experience, there are five compulsory courses to learn AI well: Mathematics, Data Structure, Python data Analysis, ML, DL. Besides compulsory courses, there are five optional courses to choose from: NLP, CV, DM, Quantification, Spark, and leetCode and Kaggle for the required and optional courses online in July, and finally open source experiments. \

Today, let’s take a look at how to implement Kaggle’s Top 5% image classification contest with 100 lines of code.

\

1. NCFM image classification task introduction

In an effort to protect and monitor The Marine environment and ecological balance, The Nature Conservancy invited participants from The Kaggle[1] community to develop machine learning algorithms that automatically classify and identify fish species in images captured by cameras on oceangoing fishing boats. Like different species of tuna and sharks. The nature Conservancy provided a training set of 3,777 tagged images that were divided into eight categories, seven of which were of different species of fish and one of which was fish free, each belonging to one of the eight categories.

FIG. 1 shows several sample images in the dataset. It can be seen that in some images, the fish to be recognized occupies a small part of the whole image, which brings great challenges to the recognition. In addition, to measure the effectiveness of the algorithm, an additional 1000 images were provided as a test set, and the contestants were required to design an image recognition algorithm that could identify as much as possible which of the eight categories the 1000 test images belonged to. The Kaggle platform provides a Leaderboard for each contest, and the more accurate the contestant is, the higher the ranking on the Leaderboard.

 

Figure 1. NCFM image classification contest

 

2. Problem analysis and solution ideas

2.1 Convolutional Neural Networks (ConvNets)

From the description of the problem, we can see that the NCFM contest is a typical “single-label image classification” problem, that is, given an image, the system needs to predict which category the image belongs to in the pre-defined category. In the field of computer vision, the core technical framework to solve this kind of problem is Deep Learning. In particular, for image-type data, Convolutional Neural Networks (ConvNets) are the Convolutional Neural Networks (Convolutional Neural Networks) architecture of deep learning.

In general, convolutional neural network is a special neural network structure, that is, through the convolution operation can realize automatic learning of image features, and select those useful visual features to maximize the accuracy of image classification.

\

FIG. 2. Convolutional neural network architecture

FIG. 2 shows the structure of a simple convolutional neural network for cat and dog recognition. The point block at the bottom (and at the same time the largest) represents the Input Layer of the network, which is usually used to read images as data Input of the network. The top point block is the Output Layer of the network, whose function is to predict and Output the categories of the read images. Here, there are only two neural computing units in the Output Layer because only cats and dogs need to be distinguished. Both the input and output layers are called the Hidden Layer. There are three Hidden layers in the figure. As mentioned above, the Hidden Layer of image classification is completed by Convolutional operation, so the Hidden Layer is also called the Convolutional Layer.

Therefore, the structure of input layer, convolution layer and output layer and their corresponding parameters constitute a typical convolutional neural network. Of course, the convolutional neural network we use in practice is more complex than the structure of this example. Since the ImageNet competition in 2012, new network structures have been born almost every year. Common recognized networks include AlexNet[5], VGG-Net[6], GoogLeNet[7], Inception v2-V4 [8, 9], ResNet[10], etc.

2.2 An effective Network training technique — Fine-tune

There is no need to construct a deep network from scratch by experimenting with parameters one by one, because there are already many published papers that have helped us to do these verification, we just need to stand on the shoulders of predecessors and choose a suitable network structure. Another important reason for choosing the recognized network structure is that almost all of these networks provide pre-trained parameter Weights on the large-scale data set ImageNet[11]. This is very important! Because we only have thousands of training samples, and the deep network has many parameters, it means that the number of training images is much smaller than the space of parameter search. Therefore, if we just randomly initialize the deep network and train with thousands of images, it is very easy to produce “Overfitting”.

The so-called overfitting means that the deep network has only seen a small number of samples, so it can only identify a small number of images and lose the Generalization ability, and cannot identify other similar images that have not been seen before. In order to solve this problem, we usually use those who are already in the millions or even tens of millions of trained network parameters as initialization parameters, you can imagine a set of parameters of the networks have “seen” a lot of pictures, thus greatly improving the generalization ability, the extracted features are also more robust and effective.

Then we can use the 3,000 + pictures of Marine fish that have been labeled to continue the training. Note that in order to avoid missing the optimal solution, the training pace (actually called “learning rate”) should be slow, so we call such training strategy “fine-tune”.

There are some lessons to be learned when fine-tuning a pre-trained network with our own annotation data. Taking general Figure 3 as an example, it is assumed that our network structure is a 7-layer structure similar to AlexNet, in which the first 5 layers are the convolution layer and the last 2 layers are the fully connected layer.

    

(1) 

  • (1) We first fine-tune the last layer of Softmax classifier, assuming that the original network was used to classify 1000 classes of objects (such as the target of ImageNet), but now our data only has 10 category tags, so the number of neurons in our last output layer (FC8) becomes 10. We use a very small learning rate to learn the weight matrix between layers FC7 and FC8 and fix the weight of all layers before this;

\

(2)

  • (2) Once the network tends to converge, we further expand the scope of fine tuning, and then fine-tune the weights of the two fully connected layers, namely FC6 and FC7, as well as between FC7 and FC8, while fixing the weights of all convolutional layers before FC6.

    

(3)

  • (3) We extend the scope of fine tuning to the penultimate convolutional layer C5;

\

(4)

  • (4) We extend the scope of fine-tuning to more convolutional layers. However, in fact, we think that the features extracted from the convolutional layer at the front are more low-level and universal, while the convolutional layer and the fully connected layer at the back are more relevant to the data set, so sometimes we do not fine-tune the first few convolutional layers.

General figure 3. Basic steps of network Finetune

\

3. Algorithm implementation and analysis

In the BBS of NCFM this game has been the realization of the open source for your reference (https://www.kaggle.com/c/the-nature-conservancy-fisheries-monitoring/discussion/26202),

Here we analyze the logical structure of the model training file train.py.

  • U Import related modules and parameter Settings — Figure 4;
  • U Construct Inception_V3 depth volume and network, initialize with parameters already trained on ImageNet large-scale image dataset, define callback function to save the best model on validation set during training — Figure 5;
  • U Mentation of training images using Data Augmentation, a common technique that controls overfitting, with the simple idea that by turning the same image horizontally, cutting the edges and corners, or toning it darker or brighter, Will not change the category of the image — Figure 6;
  • U Inception_V3 network model training;

 

Figure 4. Import and parameter Settings

 

Figure 5. Build the Inception_V3 network and load the pre-training parameters

 

Figure 6. Loading training and validation sets using data amplification techniques

 

Figure 7. Model training

 

4. Tips for improving your ranking

Once we have trained the model, we use it to predict the categories of test images. The code in predict.py in the forum predicts fish and generates submission files. Here we share two simple and effective techniques that are common in machine learning and image recognition competitions. Their ideas are based on averages and voting. The principle behind it is summed up in a word: the eyes of the masses are clear!

Tip 1: Average multiple test samples for the same model

This skill refers to, when we train a model to one piece of test images, we can use the similar data augmentation techniques to generate similar multiple pictures to change picture, and send the pictures to our trained network to predict, we take the highest votes category for the final result. The predict_average_augmentation. Py in the Github repository implements this idea, and its effects are strong.

Tip 2: Cross-validate training multiple models

Remember when we talked about dividing the 3,000 + images into training sets and verification sets? There are many different kinds of divisions. A common partition is to shuffle the order of the images and divide all the images into K pieces on average. Then we can have K combinations of < training set, verification set >, that is, one piece at a time is taken as the verification set, and the remaining K-1 pieces are taken as the training set. Therefore, we can train a total of K models, so for each test picture, we can send it into K models to predict, and finally choose the category with the highest number of votes as the final result of prediction. We call this approach k-fold cross-validation. Figure 9 shows a data partition method of 5 fold cross validation.

 

Figure 9. Fifty fold cross validation

Of course, tips 1 and 2 can also be used together. Assuming we do a 5 fold cross validation and use 5 data amplifications for each test image, it is not difficult to calculate that the number of votes per test image is 25. In this way, we can move up the rankings.

 

5. Afterword.

We review the typical structure and characteristics of deep convolutional networks in deep learning, and learn how to use gradient descent algorithm to train a deep network. We showed how to use fine-tuning techniques to solve Kaggle’s NCFM Marine fish classification ratio using the Inception_V3 network and get our ranking into the Top 5% with two simple but effective tricks.

If readers are interested in the contest and want to move up the rankings, one method worth trying is Object Detection. Just think about it. In fact, we only need to distinguish the species of Marine fish. Due to the distance and distance of the camera, the area of Marine fish in the picture actually occupies a small part of pixels, and more areas are made of hull, mast or ocean noise. If there is an algorithm that can help us “detect” the fish from the photos, then we can imagine that the accuracy of the deep network can be further improved, and this part of the work will be left to the interested students to do their own research.

July online Peng teacher, May 10, 2017.

\

\

reference

  1. [1] www.kaggle.com/
  2. [2] cs231n. Making. IO/neural netw…
  3. [3] github.com/tensorflow/…
  4. [4] github.com/fchollet/ke…
  5. [5] Image Classification with Deep Convolutional Neural Networks. NIPS 2012.
  6. [6] Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015.
  7. [7] Going Deep with Convolutions. CVPR 2015.
  8. [8] Rethinking the Inception Architecture for Computer Vision. CVPR 2016.
  9. [9] Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. ICLR 2016.
  10. [10] Deep Residual Learning for Image Recognition. CVPR 2016.
  11. [11] www.image-net.org/
  12. Online Kaggle Case Class in July
  13. Model Analysis and Model Fusion