This article is a summary of Machine learning by Andrew Ng

Suggestions for applying machine learning

When we use the trained model to predict unknown data and find a large error, what can we do next? Usually we have the following options:

  1. Get more training samples
  2. Try to reduce the number of features
  3. Try to get more features
  4. Try adding polynomial features
  5. Try to reduce regularization
  6. Try increasing the degree of regularization

So how to choose? That decision can be made with some machine learning diagnostics.

Avoid overfitting

In order to check whether the algorithm is over-fitting, we divide the data into training set, test set and cross validation set. For example, 60% of the data can be used as the training set, 20% of the data as the test set, and 20% of the data as the cross validation set. In the training process, models can be selected as follows:

  1. Use the training camp to develop 10 models
  2. The cross validation error (the value of the cost function) is calculated for the cross validation set by using 10 models.
  3. Select the model with minimum cost function value
  4. Use the model selected in Step 3 to calculate the generalization error (value of the cost function) for the test set.

Diagnostic bias and variance

When running a learning algorithm, if the performance of the algorithm is not ideal, then most of the two situations occur: either the deviation is relatively large, or the variance is relatively large. In other words, what happens is either an under-fit or an over-fit problem. So which of these two cases is related to bias, which is related to variance, or is it related to both?

Understanding of the bias and variance: www.zhihu.com/question/20…

We usually help to analyze high bias or high variance by plotting the cost function error of the training set and the cross-validation set on the same chart as the number of polynomials:

Above:

  1. Training set error and cross validation set error approximation: deviation/underfitting
  2. When the cross validation set error is much larger than the training set error: variance/overfitting

Regularization and bias/variance

Generally we use the following method to choose:

  1. Let’s say we pick 12 at random, 12 regularized models of different degrees were developed by training intensive training
  2. The cross validation errors are calculated by 12 models respectively
  3. Select the model with minimum cross validation error
  4. The model selected in Step 3 is used to calculate the generalization error of the test set, and finally the one with the smallest error Z is selected

Use the learning curve to solve the bias/variance problem

We can observe whether the current training model is in the problem of deviation or variance by constantly increasing the size of training set:

The above training model is in high deviation/under-fitting condition, and adding data to the training set may not be helpful.

In the case of high variance/over-fitting of the above training model, adding more data to the training set may improve the algorithm effect.

The neural network

Use smaller neural network, similar to the circumstances of less parameters, easy to cause high deviation and owe fitting, but the neural network computation is less costly to use, more similar to the parameters, easy to cause high variance and fitting, although the computational cost is larger, but can be by regularization method to adjust and more adaptable to the data. It is usually better to choose larger neural networks and adopt regularization processing than smaller neural networks. For the layer number of hidden layer in neural network, the choice, usually from a layer begins to increase gradually, in order to better choose, can be divided into training data set, cross validation set and test set, in view of the different number of hidden layer neural network training neural networks, and then select the minimum cost of cross validation set in the neural network.

conclusion

  1. Get more training samples – solve high variance
  2. Try to reduce the number of features — solve for high variance
  3. Try to get more features – solve high deviations
  4. Try to add polynomial features – solve high bias
  5. Try to reduce the degree of regularization λ – resolve high deviations
  6. Try increasing the degree of regularization λ – to solve the high variance

Machine learning system design

The recommended method for constructing a learning algorithm is:

  1. Start with a simple algorithm that can be implemented quickly, implement the algorithm and test the algorithm with cross-validation set data
  2. Plot the learning curve and decide whether to add more data, or more features, or other options
  3. Perform error analysis: Manually examine the samples in the cross-validation set that produce prediction errors in our algorithm to see if there is a systematic trend in these samples

Error Analysis

Precision and Recall

We can divide the predicted results of our algorithm into four cases:

  1. True Positive (TP) : The prediction is True and the actual is True
  2. True Negative (TN) : The prediction is false and the actual is false
  3. False Positive (FP) : The prediction is true and the actual is False
  4. False Negative (FN) : The prediction is False, but the actual is true

Is:

Precision =

Recall =

But the problem of Skewed Classes

The quasi-skew situation shows that there are many samples of the same kind in our training set, and only few or no samples of other kinds.

For example, we want to use algorithms to predict whether cancer is malignant, and in our training set, only 0.5% of the cases are malignant. Suppose we write an unlearned algorithm that predicts that tumors are benign in all cases, with an error of just 0.5%. However, our trained neural network algorithm has a 1% error. In this case, the size of error cannot be regarded as the basis to judge the effect of the algorithm.

In order to avoid the deviation problem, we need to ensure the relative balance of the accuracy and recall rate.

For example, of all the patients we predicted to have malignant tumors, the higher the percentage of patients who actually had malignant tumors, the better, and our recall rate was relatively high.

If we want the prediction to be true only with a high degree of confidence (malignancy), i.e. we want higher accuracy, we can use thresholds larger than 0.5, such as 0.7, 0.9. In doing so, we will reduce the number of patients who are wrongly predicted to be malignant and increase the number of patients who are not.

If we want to improve the recall rate and allow all patients with possible malignancy to be further examined and diagnosed, we can use a lower threshold than 0.5, such as 0.3.

We can draw the relationship between recall rate and precision rate under different thresholds into a chart, and the shape of the curve varies according to different data:

So how do you choose a threshold? One method is to calculate F1 Score, which is calculated as follows:


Number of training sets

If you have a large amount of data, and you train a learning algorithm with many parameters, increasing the number of training sets can provide a high-performance learning algorithm.