• Implementation of Convolutional Neural Network using Python and Keras
  • Original article by Rubikscode
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: JohnJiangLA
  • Proofreader: Gladysgong Starrier

Have you ever thought about it? How Does Snapchat detect faces? How does a self-driving car know where the road is? You guessed it, they used convolutional neural networks, neural networks that are designed to process computer vision. In the previous article, we looked at how they work. We discussed the layers of these neural networks and their functions. Basically, the additional layer of the CONVOLUtional neural network processes the image into a standard format that the neural network can support. The first step in doing this is to detect certain features or attributes, which are done by the convolution layer.

This layer uses filters to detect lower-level features (such as edges and curves) as well as higher-level features (such as faces and hands). The convolutional neural network then uses additional layers to eliminate linear interference in the image, which can lead to overfitting. When the linear interference is removed, the additional layer will sample the image and reduce the dimension of the data. Finally, this information is passed on to a neural network, which in convolutional neural networks is called the fully connected layer. Again, the goal of this article is how to implement these layers, so more details about these additional layers and how they work and exactly what they are for can be found in the previous article.

Before we start to solve problems and start coding, please configure your environment correctly. As with all previous articles in this series, I use Python 3.6. In addition, I use Anaconda and Spyder, but you can also use other ides. Then, most importantly, install Tensorflow and Keras. See here for instructions on installing and using Tensorflow, and here for installing and using Keras.

MNIST data set

Therefore, in this article, we will train our network to recognize numbers in images. For this, we will use another well-known dataset, the MNIST dataset. Building on its predecessor NIST, this data set consists of a training set of 60,000 samples and a test set of 10,000 handwritten digital images. All numbers are normalized and centered. The image size is also fixed, so preprocessing the image data has been simplified. That’s why this data set is so popular. It is considered to be the “Hello World” example in the World of convolutional neural networks.

Sample MNIST data set

In addition, using convolutional neural networks, we can get results that are not much different from human judgments. The record is currently held by Parallel Computing Center (Hmelnitsky, Ukraine). They only used a set of five convolutional neural networks and managed to keep the error rate at 0.21%. Pretty cool, huh?

Import libraries and data

As in the previous articles in this series, we start by importing all the necessary libraries. Some of them are familiar, but some of them need further explanation.

As you can see, we’ll be using Numpy, which is the library we used to manipulate multidimensional arrays and matrices in the previous examples. Also, as you can see, we’ll be using some of the features we’ve used in the Keras library before this series, as well as some new features. For example, creating model and standard layers (such as the full connection layer) uses Sequential and Dense.

In addition, we will use some new classes in Keras. Conv2D is the class used to create the convolution layer. MaxPooling2D is the class for creating the pooling layer, and Flatten is the class for dimension reduction. We also use to_categorical in Keras util. This class is used to transform vectors (integers) into binary category matrices, i.e., it is used for one-hot coding. Finally, notice that we will use Matplotlib to display the results.

After importing the necessary libraries and classes, we need to work with the data. Fortunately, Keras provides the MNIST data set, so we don’t need to download it. As mentioned earlier, all of these images have been partially preprocessed. This means they have the same size and the numbers are in the right place. So let’s import this data set and prepare the data for our model:

As you can see, we imported the MNIST dataset from the Keras dataset. We then load the data into the training and testing matrix. On this basis, the dimension of the image is obtained by using the shape attribute, and the input data is reconstructed to get a channel of the input image. Basically, we only use one channel for this image, instead of the normal three channels (RGB). This is done to simplify the implementation. Then the data in the input matrix is normalized. Finally, we use TO_categorical to encode the output matrix.

Model to create

Now that the data is ready, we can begin the most interesting part — creating the model:

Of course, we need to use Sequential for this, and first add the convolutional layer using the Conv2D class. As you can see, this class uses very few arguments, so let’s take a look. The first parameter is the number of filters to be used, the number of features to be detected. Normally we start at 32 and then we go up and up. And that’s exactly what we’re doing, we’re detecting 32 features in the first convolution layer, 64 in the second layer, and 128 in the third layer. The filter size used is defined by the next parameter, kernel_size, and we have selected a 3*3 filter.

In the activation function, we use the rectifier function. In this way, the degree of nonlinearity naturally increases in each layer of convolution. Another way to do this is to use LeakyReLU in keras.Layers.advanced_Activations. Unlike standard collator functions, which compress all values below a fixed value to zero, it has a slightly negative slope. If you decide to use it, be aware that you must use linear activation in Conv2D. Here is an example of this approach:

We’re getting a little off topic. Let’s go back to Conv2D and its parameters. Another very important parameter is input_shape. Use this parameter to define the dimensions of the input image. As mentioned earlier, we only use one channel, which is why our input_shape ends up with dimension 1. This is the dimension that we extracted from the input image.

In addition, we have added additional layers to the model. The Dropout layer helps prevent overfitting, and from there, we added the pooling layer using the MasPooling2D class. Obviously, this layer uses the max-pool algorithm, and the size of the pool filter is 2*2. The pooling layer is followed by the dimension reduction layer and finally the fully connected layer. For the final fully connected layer, we added two layers of neural networks, for which we used the Dense class. Finally, we compiled the model and used the Adam optimizer.

If you don’t understand some of these concepts, you can check out the previous article, which explains how the convolutional layer works. In addition, if you are confused about some of Keras content, then this article will help you.

training

Good. Our data is preprocessed, and our model is set up. Let’s merge them together and train our model. In order for what we are using to work properly. We pass in the input matrix and define the batch_size and epoch numbers. The other thing we’re going to do is define validation_split. This parameter is used to define which part of the test data is used as validation data.

Basically, the model will retain some of the training data, but it uses this data to calculate losses and other model matrices at the end of each cycle. This is different from test data because we use it at the end of each loop.

After our model is trained and ready, we use the evaluate method and pass in the test set. And here we can figure out the accuracy of this convolutional neural network.

To predict

Another thing we can do is collect predictions about the neural network in the test data set. In this way, we can compare the predicted results with the actual results. To do this, we will use the predict method. Using this method we can also predict individual inputs.

The results of

Let’s use the predictions we’ve just collected to complete the final step of our implementation. We will show the forecast and the actual numbers. We also show the images that we predicted. Basically, we’ll do a nice visualization of our implementation. After all, we’re processing images.

Here, we used Pyplot to display ten images and give the actual results and our predictions. When we run our implementation, it looks like this:

We ran 20 rounds and got 99.39% accuracy. Not bad, but there’s room for improvement.

conclusion

Convolutional neural networks are a very interesting branch of computer vision and one of the most influential innovations. In this paper we implement a simplified version of these neural networks and use it to detect numbers on the MNIST dataset.

Thanks for reading!


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.