This article is a summary of Machine learning by Andrew Ng

Linear regression

Cost function Fuction


Its goal is to select model parameters that minimize the sum of squares of modeling errors

Batch Gradient Descent


It is the learning rate, which determines how much step we take downward in the direction of maximum reduction of the cost function. In batch gradient descent, we subtract the learning rate from all parameters at the same time each time times the derivative of the cost function.

forThe value of can be tried:

Using the gradient descent method, the key lies in finding the derivative of the cost function, namely:


When:


When:


Standard Equation Method for Scaling

Ensuring that all features have similar scales can help the gradient descent algorithm to converge faster


Among themIs the average,Is the standard deviation

The Normal Equation


For linear models, when the number of characteristic variables is not large, the standard equation is a good alternative method to calculate parameters. Specifically, AS long as the number of characteristic variables is less than 10,000, I usually use the standard equation method rather than the gradient descent method.

Logistic regression

Logistic regression algorithm is a classification algorithm, and we use it as a classification algorithm. It is applicable to discrete label values, for example, 1 0 0 1.

Hypothesis Function




Cost function Fuction


Gradient descent

As with linear regression, it can be usedBatch gradient descent algorithmTo find the lowest cost function:

Repeat {

}

Compared with linear regression, the prediction function of logistic regressionIt’s totally different.

Multi-category classification: one to many

Logistic regression can be used to achieve one-to-many classification. The specific idea is as follows:

We mark one of the multiple classes as a forward class, and then mark all the other classes as negative classes. This model is called. Next, similarly, we select another class to mark as a forward class, and mark all the other classes as negative classes, and write this model asAnd so on. In the end we have a series of models which are abbreviated as:Among them:

Finally, when we need to make predictions, we run all the classifiers, and for each input variable, we choose the most likely output variable.

Regularization Regularization

Regularization is a method to solve the problem of OverFiting. It preserves all the features, but reduces the size of the parameters.

Another approach: discard some features that don’t help us predict correctly. You can manually select which features to keep, or use some model selection algorithm to help (such as PCA)

Regularized linear regression

We need to introduce regularization terms in gradient descent


Simplify this formula:


It can be seen that the change of the gradient descent algorithm of regularized linear regression lies in that each time, it is made on the basis of the updating rules of the original algorithmValue is reduced by an extra value

Regularized normal equation:


Regularized logistic regression

The regularization formula of logistic regression is similar to that of linear regression, but the function is assumed to be.