Model training for machine learning

Linear regression

In a nutshell, a linear model is a weighted sum of input features, plus a constant called a bias term (or intercept term), and makes predictions based on that, as shown in the following formula:

y = \theta_0 + \theta_1x_1 + \theta_2x_2 + … + \theta_nx_n

In this formula:

yIs predicted
nIs the number of features
$x_i$ Is the firstiA characteristic value
$\theta_j$ Is the firstjModel parameters (including deviation term) $\theta_0$ And feature weight $\theta_1,\theta_2,\theta_3,… ,\theta_n$ )

Training linear regression model is the process of setting model parameters until the model best fits the training set. Therefore, we need to know how to measure the model’s fitting degree to the training data first. The most common performance indicator of regression models is root mean square error (RMSE), so it seems necessary to find θ\ Theta θ values to minimize RMSE when training linear regression models. However, in practice, MSE minimization is often used instead of RMSE minimization because MSE is simple to calculate and has the same effect as RMSE.

On the training set X, the MSE of the linear regression is calculated using the following formula (hθh_\thetahθ is a hypothetical function) :

MSE=(X,h_\theta)=\frac{1}{m}\sum_{i=1}^m(\theta^Tx^{(i)}-y^{(i)})^2

To get the lowest θ\thetaθ value for the cost function, there is a closed form solution (the mathematical equation that leads directly to the result), the standard equation:

\theta=(X^TX)^{-1}X^Ty

In this equation:

θ\theta theta is the value of theta \theta theta that minimizes the cost function
Y is a vector of target values from y(1)y^{(1)}y(1) to y(m)y^{(m)}y(m)

We generate some random linear data to test this formula:

import numpy as np
X = 2 * np.random.rand(100.1)
y = 4 + 3*X +np.random.randn(100.1)
Copy the code

The following linear data set is obtained:

Now let’s calculate theta \theta theta using the standard equation. Invert the matrix using the inv() function in the linear algebra module (Np.linalg) provided by NumPy, and compute the inner product of the matrix using the dot() function:

X_b = np.c_[np.ones((100.1)),X] # add x0=1 to each instance
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.t).dot(y) 
Copy the code

The actual generated data function is y=4+3×1+ Gaussian noise y=4+3x_1+ Gaussian noise y=4+3×1+ Gaussian noise. The formula results are as follows:

>>>theta_best
array([[4.21509616], [2.77011339]])
Copy the code

This is very close to the expected θ0=4\theta_0=4θ0=4 (4.21509616), θ1=3\ theTA_1 =3θ1=3 (2.77011339), but because the noise does not completely restore the original function, we plot the prediction results:

The computational complexity is linear with respect to the number of instances and the number of features to be predicted, and it takes roughly twice as long to predict for twice as many instances (or twice as many features).

Linear regression

Related Posts

【 machine Learning Actual Task1】 (KNN) K nearest Neighbor algorithm application

Want to make kids’ smart gadgets smarter? Understanding children’s AI is the key

What is the system architecture of Flink?