“This is the 10th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

preface

A few days ago, in the Resnet50 model for cifar-100 repetition training, I obtained the following two graphs ———— accuracy change graph and loss change graph (230 batches of iteration, only modified the convolution kernel of the first lower sampling layer and the full connection layer of the final output) :

acc

loss

During the training, two “cliffs” appeared, appearing in the figure of more than fifty batches and about 120 batches respectively.

Some answers may have been queried for this situation, and some summaries are made below.

why

In fact, before the occurrence of “cliff” twice, the learning rate has been adjusted, after the attenuation operation.

First, the conclusion: due to the adjustment of learning rate, the loss is reduced, so that the network can continue to adjust, separated from the local optimal, and began to converge to the global optimal.

The convergence of learning rate is shown in the figure below. Original Loss iterates convergence within a local optimal range. However, due to poor generalization of the local optimal model, the feedback on accuracy is not ideal, showing that Loss is at a high level. (as shown by 1 in figure)

However, when the learning rate is adjusted, the penalty of loss received by the model is changed, so that the model can jump out of the local optimal (although it may jump into the local optimal).

At the same time, due to jump to other Loss “valley”, many parameters are readjured and located in another state that is close to the optimal solution but the effect is still poor, loss recovery may occur:

Continue to adjust the learning rate, it can be seen that the model reaches the “valley” where the optimal solution (roughly) is located. It can be found that the changes of Loss and ACC curves tend to be gentle, and the model converges slowly: