1. What is linear regression

  • Linearity: The relationship between two variables is linear – the graph is a straight line, called linearity.
  • Nonlinearity: the relationship between two variables is not a linear function – the picture is not a straight line, called nonlinearity.
  • Regression: When people measure things, due to the limitations of objective conditions, they get measured values instead of the real value of things. In order to get the real value, they make infinite measurements and finally calculate the regression to the real value through these measured data. This is the origin of regression.

2. What kind of problems can be solved

A large number of observation data are processed to obtain a mathematical expression in accordance with the internal laws of things. That is to say, find patterns between data, so that you can simulate the results, that is, predict the results. The solution is to get unknown results from known data. For example: the prediction of housing prices, judgment of credit evaluation, film box office estimates, etc.

3. What is a general expression

W is called the coefficient on x, and b is called the offset term.

4. How to calculate

4.1 Loss Function – the MSE

Use gradient descent to find the minimum point, the minimum error, and then solve for w and b.

5. How to solve over-fitting and under-fitting

The regularization item is used, that is, a parameter item is added to loss function. The regularization item includes L1 regularization, L2 regularization and ElasticNet. The benefits of adding this regularization item:

  • Control parameter amplitude, do not let the model “lawless”.
  • Limit the parameter search space
  • Solve the problem of underfitting and overfitting.

5.1 What is L2 Regularization (Ridge regression)

Equation:

Represents the above Loss function. Add the sum of squares of w parameter multiplied on the basis of Loss Function, it is assumed that:

Remember the identity equation you learned before:

As with the L2 regularization term, our task now becomes to find the solution to the minimum value of J under the L constraint. The process of solving for J0 can draw contour lines. At the same time, the regularized function L of L2 can also be drawn on the two-dimensional plane of W1W2. The diagram below:

L represents the black circle in the figure. With the continuous approach of gradient descent method, the intersection point with the circle is generated for the first time, and this intersection point is difficult to appear on the coordinate axis. This shows that L2 regularization is not easy to obtain the sparse matrix, and in order to find the minimum loss function, w1 and W2 are infinitely close to 0 to avoid overfitting.

5.2 When is L2 regularization used

As long as the data are linearly correlated, the LinearRegression is not very good and requires regularization, you can consider using ridge regression (L2). If the input features have a high dimension and a sparse linear relationship, ridge regression is not appropriate, so consider using Lasso regression.

5.3 What is L1 Regularization (Lasso Regression)

The difference between L1 regularization and L2 regularization lies in the difference of penalty terms:

The process of solving for J0 can draw contour lines. And the regularized function of L1 can also be drawn on the two-dimensional plane of W1W2. The diagram below:

The penalty term is represented as the black prism in the figure. With the continuous approximation of the gradient descent method, the intersection with the prism is generated for the first time, and this intersection is easy to appear on the coordinate axis. This shows that L1 regularization is easy to obtain sparse matrix.

5.4 When is L1 regularization used

L1 regularization (Lasso regression) can make the coefficients of some features smaller, and even make some coefficients with smaller absolute values directly become 0, thus enhancing the generalization ability of the model. For high feature data, especially if the linear relationship is sparse, L1 regularization (Lasso regression) is adopted, or to find the main features in a bunch of features, then L1 regularization (Lasso regression) is the first choice.

5.5 What is ElasticNet regression

ElasticNet combines L1 regularization and L2 regularization.

5.6 Application Scenarios of ElasticNet Regression

For ElasticNet regression, if we find that Lasso regression is too much (too many features are sparse to 0) and ridge regression is not regularized enough (regression coefficient attenuation is too slow), we can consider using ElasticNet regression to get better results.

6. Linear regression requires the dependent variable to be normally distributed?

We assume that the noise of linear regression follows a normal distribution with a mean of 0. When noise conforms to the normal distribution N(0,delta^2), the dependent variable conforms to the normal distribution N(Ax (I)+b,delta^2), where the prediction function y= Ax (I)+b. This conclusion can be derived from the probability density function of the normal distribution. That is to say, when noise conforms to normal distribution, its dependent variable must also conform to normal distribution.

Before fitting data with linear regression model, the data should conform to or approximate to normal distribution, otherwise the fitting function obtained is incorrect.

7. Code implementation

GitHub:github.com/NLP-LOVE/ML…

Machine Learning


Author: @ mantchs

GitHub:github.com/NLP-LOVE/ML…

Welcome to join the discussion! Work together to improve this project! Group Number: [541954936]