Make writing a habit together! This is the 11th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

Influence of input value distribution on model performance

Although we have been able to identify handwritten digits with high accuracy, we have not yet looked at the distribution of values in the MNIST dataset, which can alter training speed. In this section, we’ll learn how to train weights faster by modifying input values to shorten training time. Build exactly the same model architecture as the original neural network, however, some minor changes will be made to the input data set:

  • Reverse the background color and foreground color. Essentially, paint the background white and the numbers black.

Firstly, we analyze the influence of pixel value on model performance theoretically. Since the black pixel value is zero, when this input is multiplied by any weight value, the output is zero. This causes the weight value of the black pixels connected to the hidden layer to change regardless of how the loss value is affected. However, if there is a white pixel, it will contribute to some hidden node values and the weights will need to be adjusted.

  1. Load and scale the input data set:
from keras.datasets import mnist
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import np_utils
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = mnist.load_data()

num_pixels = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(-1, num_pixels).astype('float32')
x_test = x_test.reshape(-1, num_pixels).astype('float32')
x_train = x_train / 255.
x_test = x_test / 255.

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
Copy the code
  1. View the distribution of input values:
x_train.flatten()
Copy the code

The previous code flattens all the input to a shape of (28×28×x_train.shape[0]=4704000028\times 28\times Shape [0]=4704000028×28× X_train. Shape [0]=47040000). Draw the distribution of all input values:

plt.hist(x_train.flatten())
plt.title('Histogram of input values')
plt.xlabel('Input values')
plt.ylabel('Frequency of input values')
plt.show()
Copy the code

Since the background of the input image is black, most of the input is zero (black pixel value).

  1. Use the following code to reverse the colors so that the background is white and the numbers are black.
x_train = 1-x_train
x_test = 1-x_test
Copy the code

Drawing images:

plt.subplot(221)
plt.imshow(x_train[0].reshape(28.28), cmap='gray')
plt.subplot(222)
plt.imshow(x_train[1].reshape(28.28), cmap='gray')
plt.subplot(223)
plt.imshow(x_test[0].reshape(28.28), cmap='gray')
plt.subplot(224)
plt.imshow(x_test[1].reshape(28.28), cmap='gray')
plt.show()
Copy the code

As follows:

The histogram of the image generated after color inversion is shown below:

As you can see, most input values now have a value of 1.

  1. Use exactly the same model architecture as before:
model = Sequential()
model.add(Dense(1000, input_dim=num_pixels, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

history = model.fit(x_train, y_train,
                    validation_data=(x_test, y_test),
                    epochs=50,
                    batch_size=64,
                    verbose=1)
Copy the code

Accuracy and loss values of training and testing for plotting different Epochs:

It can be seen that the model accuracy drops to 97%, compared to about 98% accuracy of the trained model with the same epoch number, batch size and model architecture when the dataset is not reversed (the data values in the dataset are mostly zero). The accuracy of the model was 97% in the case of pixel inversion (fewer zeros in the data set), and the training process was much slower than in most cases with zero input pixels. When most pixels are zero, the model is easier to train because it only needs to make predictions based on the few pixel values that are greater than zero. However, when most pixels are not zero, more weight needs to be fine-tuned to reduce the loss value.

A link to the

Keras deep learning — Training primitive neural networks

Keras deep learning — Scaling input data sets to improve neural network performance