This is the seventh day of my participation in the First Challenge 2022. For details: First Challenge 2022.

Hello, I’m Ding Xiaojie. Learn how to classify handwritten numbers using Python’s Keras library.

  • Objective: To classify gray scale images of handwritten digits (28 pixels ×28 pixels) into 10 categories (0~9)
  • Data source: MNIST dataset, including 60,000 training images and 10,000 test images.

What is a Keras

Keras is a deep learning library based on TensorFlow and Theano, a machine learning framework developed by the University of Montreal in Canada. Keras is a high-level neural network API written in pure Python, which also supports Python development only. It is a reencapsulation of Tensorflow or Theano for quick practice, allowing us to quickly turn ideas into results without paying too much attention to the low-level details. It is also flexible and relatively easy to learn.

Install Keras

Install Keras library using Douban mirror source.

pip install -i https://pypi.douban.com/simple Keras
Copy the code

Handwritten numeric classification

Import data set

Load the MNIST dataset in Keras.

from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
Copy the code

The training set

  • train_images: Training set sample
  • train_labels: Training set label

The test set

  • test_images: Test set sample
  • test_labels: Test set label

View the shape of the data set

train_images.shape
# Out: (60000, 28, 28)
train_labels.shape
# Out: (60000,)
test_images.shape
# Out: (10000, 28, 28)
test_labels.shape
# Out: (10000,)
Copy the code

To build the network

Layer is the core component of neural network. It is a data processing module. You can think of it as a data filter. Some data goes in, and the data that comes out becomes more useful. Most deep learning involves linking simple layers together in a progressive data distillation. Deep learning models are like data processing sieves, consisting of a series of increasingly elaborate data filters, or layers.

The required modules are first imported to construct a Sequential model, which is a linear stack of multiple network layers. One way leads to another.

from keras import models
from keras import layers

network = models.Sequential()
Copy the code

Layer is added to the model using the Add method.

network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28))),Copy the code

The main parameters

  • units: The number of neuron nodes, namely, the dimension of output space
  • activation: activation function, if not specified, the activation function is not used (i.e. linear activation: a(x) = x)
  • input_shape: is the shape of a tensor

Relu is a linear rectification function that returns Max (x, 0) element by element.

Add a second layer, a 10-way Softmax layer, through the Softmax function can convert the output value of multiple classification into probability distribution in the range of [0, 1] and 1.

network.add(layers.Dense(10, activation='softmax'))
Copy the code

Compile

Before training the network, we also need to select three parameters for the compile step.

  • Loss Function: How the network measures performance on training data.
  • Optimizer: Mechanism for updating networks based on training data and loss functions.
  • Metrics to monitor during training and testing: This example is concerned only with accuracy, which is the percentage of correctly classified images.
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])
Copy the code

The main parameters

  • RMSpropThe RMSProp optimizer is an improvement on the AdaGrad algorithm. Divide the gradient by the moving average of the most recent amplitude.
  • categorical_crossentropy: Classification cross entropy, derivation formula

i = 1 outputsize  y i x log y ^ i -\sum_{i=1}^{\text {outputsize }} y_{i} \times \log _{\hat{y}_{i}}

Loss functions and optimizers will be covered in more detail in future articles

Data preprocessing

Before starting the training, we will preprocess the data, transform it into the shape required by network, and scale it to the extent that all values are in the range [0, 1].

For example, the previous training images are stored in an array of uint8 type, whose shape is (60000, 28, 28) and the value range is [0, 255]. We need to transform it to a float32 array with a shape of (60000, 28 * 28) ranging from 0 to 1.

train_images = train_images.reshape((60000.28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000.28 * 28))
test_images = test_images.astype('float32') / 255
Copy the code

Category conversion alone heat coding

Now we need to categorize the label, converting the category label into a matrix type representation of binary (0 and 1 only).

Look at a simple example. We define a category label labels and convert it to a single heat vector by keras.utils.to_categorical.

from keras.utils import to_categorical

labels = [0.1.2.3.4.5]
convert_to_one_hot = to_categorical(labels)
convert_to_one_hot
Copy the code

As you can see, each value in the original category tag is converted to a row vector in the matrix. The 0 in the original label is [1. 0. 0. 0.

Let’s go back to this example and classify the tags.

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
Copy the code

We can look at the current shape of train_labels.

# Out: (60000,)
train_labels.shape
# Out: (60000, 10)
Copy the code

Training network

Now we start to train the network, through the FIT method to train the network.

network.fit(train_images, train_labels, epochs=5, batch_size=128)
Copy the code

The main parameters

  • train_images: Training set sample
  • train_labels: Training set label
  • epochs: Train the number of iterations of the model
  • batch_size: Number of samples updated by each gradient. In deep learning, SGD training is generally adopted, that is, batchsize samples are taken from each training set for training

Two values are output for each training mentioned above: one is the network loss in training data, that is, the gap between the current output and the expected value; the other is the network accuracy in training data (ACC).

It can be seen that the value of Loss decreases with the increase of training times, and the accuracy finally reaches 98.9%. Let’s take a look at the performance of the model on the test set.

Model test

test_loss, test_acc = network.evaluate(test_images, test_labels)
test_loss, test_acc
Copy the code

The accuracy of test set is 97.9%, which is lower than that of training set. The gap between training accuracy and test accuracy is caused by overfit, which leads to poor generalization performance of the model.

Deep Learning in Python, Fran? Ois Chaulet, translated by Zhang Liang


For those who are new to Python or want to get started with Python, you can follow the public account “Python New Horizons” to communicate and learn Python together. They are all beginners. Sometimes a simple question is stuck for a long time, but others may suddenly realize it with a little help. There are also nearly 1,000 resume templates and hundreds of e-books waiting for you to collect!