Recently, when I was training a classified network, I found that the loss did not converge. There was no problem in all parts of the network, and the input and output were normal, but the loss of the network did not converge. At the beginning, the cross entropy loss was about 6, and after 50 epoches, the loss was still about 5.

After later analysis, it was found that each value sent by the final classification network was too small, so that the probability difference of each category after SoftMax was not big. To solve this problem, it was ok to add a temperature before sending the result sent by the classification network into the cross entropy loss function.

1. The following code is a pseudo-code that does not converge. After being sent by the classification network, it is directly sent into the cross entropy loss function, and the result does not converge

prob = self.classifier(x)
loss = self.crossentropyloss(prob,label)
Copy the code

2. The following code is convergent. After the classification network is sent out, divide the output by a temperature coefficient to adjust the difference of softMax output, which is equivalent to sharpening the distribution of SoftMax

prob = self.classifier(x)
prob = prob / self.temp
loss = self.crossentropyloss(prob,label)
Copy the code

I set temp to 0.1, which is optional