Validation set

Machine learning data is divided into three categories: training sets, validation sets, and test sets. When I first looked at TensorFlow, I never understood what validation sets were for.

If we need to choose between 10 polynomials of different degrees, the training set is the parameter θ of each formula of the training parameter, and the verification set is used to select which formula has the smallest error. Because the parameters are trained by training set, it will be inaccurate to compare which formula has the smallest error with the data of training set.

General training set accounted for 60% of the data, validation set and test set 20% each.

Error and polynomial degree

The figure shows the relationship between training set cost error and validation set cost error and polynomial number.

For the training set, when D is small, the fitting degree of the model is low and the error is large. With the increase of D, the error is smaller and the fitting degree is higher and higher. (It will overfit in the end, but the error of the cost function for the training set is always smaller).

For the cross validation set, when D is small, the fitting degree is low and the error is large. With the increase of D, the error becomes smaller, and then due to overfitting, the error will increase again.

When the training set error and validation set error are approximate: deviation (underfitting) Validation set error is much larger than training set error: variance (overfitting)

Regularization and bias/variance

Regularization methods are often used to prevent overfitting, so similar problems should be considered in the selection of parameters.

The larger λ is, the less overfitting will occur.

The choice of lambda is usually 0 to 10 of the present 2 times the relationship between value (such as: 0,0.01, 0.02, 0.04, 0.08, 0.15, 0.32, 0.64, 1.28, 2.56, 5.12, 10 a total of 12)

  • When λ is small, the error of the training set is small (overfitting) and the error of the cross validation set is large
  • With the increase of λ, the error of the training set increases (underfitting), while the error of the cross validation set decreases first and then increases

The learning curve

Learning curve is a good tool. We often use learning curve to judge whether a certain learning algorithm is in the problem of bias and variance. The learning curve is a good sanity check for learning algorithms.

The learning curve is a graph drawn with the training set error and cross-validation set error as a function of the number of training set instances (m).

How to use learning curve to identify high deviation (underfitting) and high variance (overfitting)?

As an example, we tried to fit the following data with a straight line, and you can see that no matter how large the error in the training set, it doesn’t make much difference.

As shown in the figure, adding data to the training set is not necessarily helpful in the case of high bias (underfitting).

Assuming that we use a polynomial model of very high order and the regularization is very small, it can be seen that when the error of the cross-validation set is much larger than the error of the training set, adding more data to the training set can improve the effect of the model.

In other words, in the case of high variance (overfitting), adding more data to the training set may improve the algorithm effect.

example

Let’s take a look at what we should do under what circumstances:

  • Get more training examples – solve high variance
  • Try to reduce the number of features — solve for high variance
  • Try to get more features – solve high deviations
  • Try to add polynomial features – solve high bias
  • Try to reduce the degree of regularization λ – resolve high deviations
  • Try increasing the degree of regularization λ – to solve the high variance