Make writing a habit together! This is the 8th day of my participation in the “Gold Digging Day New Plan · April More Text Challenge”. Click here for more details.

Train the vanilla neural network

Having learned the basic concepts of neural networks and how to build a neural network model using the Keras library, in this section we will take it a step further and glimpse the power of neural networks by implementing a practical model.

Introduction to vanilla neural networks and MNIST datasets

Networks that stack multiple fully connected layers between inputs and outputs are called multilayer perceptrons and are sometimes colloquially referred to as vanilla neural networks (i.e., primitive neural networks). To understand how to train the vanilla neural network, we will train the model to predict the number tags in the MNIST dataset, a very common dataset composed of handwritten numbers from 250 different people. The training set contains 60,000 images and the test set contains 10,000 images. Each image has its own label and the image size is 28 by 28.

Keras was used to construct the neural network model

  1. Import related packages and data sets, and visualize the data sets to understand the data:
from keras.datasets import mnist
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import np_utils
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = mnist.load_data()
Copy the code

In the previous code, import the relevant Keras methods and MNIST data sets.

  1. MNISTThe shape of the image in the dataset is28 x 28, draw some images of the dataset to better understand the dataset:
plt.subplot(221)
plt.imshow(x_train[0], cmap='gray')
plt.subplot(222)
plt.imshow(x_train[1], cmap='gray')
plt.subplot(223)
plt.imshow(x_test[0], cmap='gray')
plt.subplot(224)
plt.imshow(x_test[1], cmap='gray')
plt.show()
Copy the code

The following figure shows the output of the above code:

  1. flattening28 x 28Image in order to transform the input into a one-dimensional 784 pixel value and feed it toDenseLayer. In addition, the label needs to be transformed into a unique thermal code. This step is key in preparing the dataset:
num_pixels = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(-1, num_pixels).astype('float32')
x_test = x_test.reshape(-1, num_pixels).astype('float32')
Copy the code

0 0 In the code above, the shape change is taking place using the 0 0 method on the input data set. Np.0 () converts the array of the given shape into a different shape In this example, the X_train array has x_train.shape[0] data points (images), with x_train. Shape [1] rows and X_train. Shape [2] columns in each image, Shape [0] data, each data has x_train. Shape [1] * X_train. Shape [2] values. Next, we encode the label data as a unique heat vector:

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
Copy the code

Let’s take a quick look at how independent thermal coding works. Suppose that there is a data set of possible labels {apple, orange, banana, lemon, pear}, if we convert the corresponding labels to unique thermal coding, it will be as follows:

category Index 0 The index 1 Index 2 The index of 3 The index of 4
apple 1 0 0 0 0
orange 0 1 0 0 0
banana 0 0 1 0 0
lemon 0 0 0 1 0
pear 0 0 0 0 1

Each unique heat vector contains NNN values, where NNN is the number of possible labels, and only the value at the index corresponding to labels is 1, and all other values are 0. As shown above, apple’s unique thermal encoding can be expressed as [1, 0, 0, 0, 0]. In Keras, the to_categorical method is used to perform the unique heat encoding of tags. This method finds the number of unique tags in a dataset and then converts the tags into a unique heat vector.

  1. Construct a neural network with a hidden layer with 1000 nodes:
model = Sequential()
model.add(Dense(1000, input_dim=num_pixels, activation='relu'))
model.add(Dense(num_classes,  activation='softmax'))
Copy the code

The input has 28×28=784 values, which are linked to 1000 node cells in the hidden layer, specifying the activation function as ReLU. Finally, the hidden layer is connected to the output with num_classes=10 values (there are ten possible image labels, so the to_categorical method creates a unique heat vector with 10 columns), and the Softmax activation function is used before the output in order to obtain the category probability of the image.

  1. The above model architecture information visualization is as follows:
model.summary()
Copy the code

Architecture information output is as follows:

Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 1000) 785000 _________________________________________________________________ dense_1 (Dense) (None, 10) 10010 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 795010 Trainable params: 795010 Non - trainable params: 0 _________________________________________________________________Copy the code

In the above architecture, the number of parameters in the first layer is 785000, because 784 input units are connected to 1000 hidden layer units, there are 784 * 1000 weight values plus 1000 offset values in the hidden layer, for a total of 785000 parameters. Similarly, the output layer has 10 outputs, each connected to 1000 hidden layers, resulting in 1000 * 10 weights and 10 biases (for a total of 10010 parameters). The output layer has 10 node units, because there are 10 possible labels in the output, and the output layer provides us with the probability value of each category for a given input image, for example, the first node unit represents the probability that the image belongs to 0, the second unit represents the probability that the image belongs to 1, and so on.

  1. The compilation model is as follows:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
Copy the code

Because the target value is a single thermal coding vector containing multiple categories, the loss function is a multi-category cross entropy loss. In addition, we use the Adam optimizer to minimize the loss function and monitor accuracy (which can be shortened to ACC) metrics while training the model.

  1. The fitting model is as follows:
history = model.fit(x_train, y_train,
                    validation_data=(x_test, y_test),
                    epochs=50,
                    batch_size=64,
                    verbose=1)
Copy the code

In the above code, we specify the input (x_train) and output (y_train) to be fitted to the model; Specifying the inputs and outputs of the test data set, the model will not use the test data set to train weights, but it can be used to see how the loss values and accuracy differ between the training and test data sets.

  1. Extraction of training and testing losses and accuracy indicators of different Epochs:
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']
epochs = range(1.len(val_loss_values) + 1)
Copy the code

When fitting the model, the history variable would store the accuracy and loss values corresponding to the model in each epoch of the training and test data sets. We extracted and stored these values in the list to plot the changes in accuracy and loss in the training and test data sets.

  1. Visualizing training and testing losses and accuracy for different Epochs:
plt.subplot(211)
plt.plot(epochs, loss_values, marker='x', label='Traing loss')
plt.plot(epochs, val_loss_values, marker='o', label='Test loss')
plt.title('Training and test loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.subplot(212)
plt.plot(epochs, acc_values, marker='x', label='Training accuracy')
plt.plot(epochs, val_acc_values, marker='o', label='Test accuracy')
plt.title('Training and test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Copy the code

The previous code run input is shown in the following figure, where the first figure shows the loss value of training and testing as the number of Epochs increases, and the second figure shows the accuracy of training and testing as the number of Epochs increases:

The final model was about 97% accurate.

  1. In addition, we can also manually calculate the accuracy of the final model on the test set:
preds = model.predict(x_test)
correct = 0
for i in range(len(x_test)):
    pred = np.argmax(preds[i], axis=0)
    act = np.argmax(y_test[i], axis=0)
    if (pred == act):
        correct += 1
    else:
        continue
accuracy = correct / len(x_test)
print('Test accuracy: {:.4f}%'.format(accuracy*100))
Copy the code

In the code above, the model’s Predict method is used to calculate the predicted output value for the given input (x_test in this case). We then loop through the predicted results of all test sets, using argmax to compute the index with the highest probability value. At the same time, do the same for the real label values of the test data set. In the predicted value and true value of the test data set, the same index of the highest probability value means that the prediction is correct, and the accuracy of the model is divided by the number of correct predictions in the test data set by the total amount of data in the test data set.

Related links

Learning neural Network Forward Propagation from scratch – Nuggets (juejin. Cn)

Learning neural Network Back Propagation from scratch – Digging gold (juejin. Cn)

First Experience of Constructing Neural Networks using Keras – Nuggets (Juejin. Cn)