Make writing a habit together! This is the 10th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

The effect of batch size on model accuracy

In the original neural network, we used a batch size of 64 for all models constructed. In this section, we will examine the effect of changing batch size on accuracy. In order to explore the model accuracy of batch size, we compare two situations:

  • The batch size is 4096
  • Batch size is 64

The weights in each epoch were updated fewer times when the batch size was larger than when the batch size was smaller. When the batch size is small, each epoch will be updated with weight for several times, because in each epoch, all training data in the dataset must be traversed. Therefore, if each batch uses less data to calculate the loss value, each epoch will have more batches to traverse the entire dataset. Therefore, the smaller the batch size, the better the accuracy of the model after the same EPOCH training. However, you should also ensure that the batch size is not too small to cause overfitting.

In the previous model, we used a batch size of 64. In this section, we continue to use the same model architecture and only modify the batch sizes for model training to compare the impact of different batch sizes on model performance. Preprocessing data set and fitting model:

(x_train, y_train), (x_test, y_test) = mnist.load_data()

num_pixels = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(-1, num_pixels).astype('float32')
x_test = x_test.reshape(-1, num_pixels).astype('float32')
x_train = x_train / 255.
x_test = x_test / 255.

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
model = Sequential()
model.add(Dense(1000, input_dim=num_pixels, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

history = model.fit(x_train, y_train,
                    validation_data=(x_test, y_test),
                    epochs=50,
                    batch_size=4096,
                    verbose=1)
Copy the code

The only change to the code is the batch_size parameter during model fitting. Accuracy and loss values of training and testing to draw different epochs (the code used to draw the curve is exactly the same as the code used to train the original neural network) :

In the figure above, it can be noticed that compared with the model with small batch size, the model with large batch size needs to train more epoch to achieve 98% accuracy. In the model in this section, the accuracy of the model was relatively low in the initial training stage, and could reach a high level only after running a considerable amount of epochs. The reason for this is that with smaller batch sizes, there are far fewer weight updates in each epoch.

The total size of the dataset was 60000. When we ran the model 500 epochs with a batch size of 4096, Weight update 500×(60000÷4096)=7000500\times(60000\ DIV4096)=7000500 ×(60000÷4096)=7000 times. When the batch size is 64, the weight update is performed 500×(60000÷32)=468500500\times(60000\ DIV32)=468500500×(60000÷32)=468500 times. Therefore, the smaller the batch size, the more times the weight is updated, and generally the better the accuracy when the number of epochs is the same. At the same time, it should be noted that the batch size should not be too small, which may lead to long training time and over-fitting.

A link to the

Keras deep learning — Training primitive neural networks

Keras deep learning — Scaling input data sets to improve neural network performance