On the 16th day of the November Gwen Challenge, check out the details: the last Gwen Challenge 2021

The generalization ability of the model

The goal of machine learning is to discover patterns.

So this requires determining whether the model is actually finding the generalized pattern or simply remembering the data.

Let me start with a short story. In our junior year, we participated in a competition and wrote a language recognition with machine learning. Although the accuracy rate was very low, only 50%. One of the judges didn’t believe that it was written by machine learning and insisted that it was written by database comparison…

What’s the difference between generalization and remembering data?

It’s like giving student A and b a bunch of math materials to study. On the final exam, if you give the same question, both of them can get 100 points, and you can’t tell the difference between them. But if you re-write the question and A fails with A B90, you can be sure that A is just memorizing the original question, while B is really understanding the idea of the question.

A is remembering the data, B is generalization.

Training error and generalization error

Training error refers to the error calculated by our model on the training data set.

Generalization error is the generalization error expectation of our model when we apply the model to an infinite number of data samples that are also extracted from the distribution of the original sample.

You can’t calculate the generalization error exactly, because you never know what you’re going to score on a final exam with unknown questions.

Model complexity

Training samples can be interpreted as how much learning materials are given to you. And model size can be interpreted as your ability to memorize problems. Memorizing is memorizing, not understanding. Now it can be assumed that memorization + comprehension =1 Memorization + comprehension =1 memorization + comprehension =1. When you’re so focused on memorizing, you don’t have time to understand.

When our training sample matches the model size, we may be able to approximate the training error and the generalization error.

However, when the model is too complex and the sample size is small, we expect the training error to decrease but the generalization error to increase. (Overfitting)

It’s like you memorized the questions and did a good job facing the original questions. But the ability of understanding is not good, without the original question you scored very low.

Factors affecting model generalization

  1. The number of adjustable parameters. When the number of adjustable parameters is large, the model tends to be easier to overfit.
  2. The value used for the parameter. When the weight range is large, the model may be easier to overfit.
  3. Number of training samples. Even if your model is simple, it’s easy to overfit data sets with only one or two samples. Fitting a data set with millions of samples requires an extremely flexible model.

Reduce the difference between training error and generalization error

To close the gap between training and test performance. In fact, in reality, underfitting and overfitting we should think more about how to prevent overfitting. So the title could have been how to prevent overfitting.

  • For the number of adjustable parameters:

    The more tunable the parameters, the more complex the model. Keeping the model simple means that the model appears in smaller dimensions. When choosing a model, choose the model with the appropriate dimension.

  • Number of training samples. This is the same as the first one. Appropriate models are selected according to the training sample size.

  • Parameter values:

    Another simplicity is limiting the range of parameters, which involves regularization.

    Hands-on deep learning 4.5 regularized weight decay derivations (this is why model generalization is only written today for easy citing links).

  • Add: There is another Angle, which is to maintain smoothness, that is, functions should not be sensitive to small changes in their input. Hands-on deep learning 4.6 Dropout


You can read more about hands-on Deep Learning here: Hands-on Deep Learning – LolitaAnn’s Column – Nuggets (juejin. Cn)

Notes are still being updated …………