[Image Classification] Record a sinkhole: Binary classification network using CrossEntropyLoss, loss has been at 0.69 do not converge problem

Recently, I tried to use CrossEntropyLoss() to achieve the function of cat and dog image classification. I met a strange problem: Loss always stays near 0.69. I checked the formula of Loss function, which is as follows:

The second-class probability vector given by the network prediction is [0.5,0.5], that is, both A and 1-a are 0.5. No matter the value of y is 0/1, the average loss is -ln(0.5)=0.69, so the network has not learned anything at all.

The model I used was resnet18. Later, I tried to use the pre-training model and found that loss could converge, but it was pulled over without the pre-training model.

I looked up many solutions on the Internet, but none of them worked in my model. Until I found this article:

Common Pit in Neural Network Training – Zhihu (Zhihu.com)

13. Are you using too much data enhancement?

Data enhancement has regularization effects. Excessive data enhancement coupled with other forms of regularization (weighted L2, dropout operations, and so on) can result in under-fitting networks.

It occurred to me that I wasn’t using the right data enhancement, or that it didn’t work for my model. So I commented out the enhanced code

transform = transforms.Compose([
    transforms.Resize((128.128)),
    # transforms.RandomVerticalFlip(),
    # transforms.RandomCrop(50),
    # transforms. ColorJitter (brightness = 0.5, contrast = 0.5, hue = 0.5).
    transforms.ToTensor(),
    transforms.Normalize([0.5.0.5.0.5], [0.5.0.5.0.5]])Copy the code

Then it converges:

It seems that image enhancement can not be used in a bad way, but also play a negative role.

[Image Classification] Record a sinkhole: Binary classification network using CrossEntropyLoss, loss has been at 0.69 do not converge problem

13. Are you using too much data enhancement?

Related Posts

Horovod (6), a distributed training framework for deep learning, is implemented using threads

FAIR’s latest unsupervised Study: Unsupervised spatio-temporal representation learning in video

Machine Learning for Humans 3. Unsupervised learning