Linear regression is one of the most basic algorithms in machine learning, and most algorithms are evolved from basic algorithms. This paper focuses on linear regression in very simple language.

Linear regression

There’s unary linear regression, where there’s only one X and one y, and there’s multiple linear regression. We have a basic understanding of linear regression by using one variable.

Unary linear regression is to find a straight line in the data and fit the data with the minimum error (Loss).

The error mentioned above can be expressed as follows, assuming that the line looks like this:

The ideal situation is that all the points fall on the line. Take a step back and want all the points to be closest to the line. For simplicity, square the distance, and the error can be expressed as:

The I above represents the ith number. In general, the average of Loss is taken as the final Loss.

Error minimization

Find the line that best fits the data, which minimizes the error.

Least square method

Only m and B are unknown in the above formula, so we can look at the quadratic equation of m and B, and the problem of finding Loss becomes the problem of finding the extreme value. I won’t go into details here.

The partial derivative of each variable is 0, and the solution of the system of equations is found.

Gradient descent method

There would be no deep learning without gradient descent. The least square method can solve for m and b in one step. In most formulas it’s not easy to calculate directly. Gradient descent, on the other hand, slowly approaches the optimal line through step by step iteration, so continuous optimization is required. The functional image of Loss can be analogous to a bowl.

Initialization:

def init_data():
    data = np.loadtxt('data.csv', delimiter=', ')
    returnData def linear_regression(): Learning_rate = 0.01# step
    initial_b = 0
    initial_m = 0
    num_iter = 1000 # number of iterations

    data = init_data()
    [b, m] = optimizer(data, initial_b, initial_m, learning_rate, num_iter)
    plot_data(data,b,m)
    print(b, m)
    return b, m
Copy the code

Optimizer to do gradient descent:

def optimizer(data, initial_b, initial_m, learning_rate, num_iter):
    b = initial_b
    m = initial_m

    for i in range(num_iter):
        b, m = compute_gradient(b, m, data, learning_rate)
        # after = computer_error(b, m, data)
        if i % 100 == 0:
            print(i, computer_error(b, m, data)) # Loss function, i.e. error
    return [b, m]
Copy the code

Calculate gradient for parameter update in each iteration:

def compute_gradient(b_cur, m_cur, data, learning_rate):
    b_gradient = 0
    m_gradient = 0

    N = float(len(data))
    #
    Partial derivative, gradient
    for i in range(0, len(data)):
        x = data[i, 0]
        y = data[i, 1]

        b_gradient += -(2 / N) * (y - ((m_cur * x) + b_cur))
        m_gradient += -(2 / N) * x * (y - ((m_cur * x) + b_cur)) # partial derivative

    new_b = b_cur - (learning_rate * b_gradient)
    new_m = m_cur - (learning_rate * m_gradient)
    return [new_b, new_m]
Copy the code

Calculation of Loss value:

def computer_error(b, m, data):
    totalError = 0
    x = data[:, 0]
    y = data[:, 1]
    totalError = (y - m * x - b) ** 2
    totalError = np.sum(totalError, axis=0)
    return totalError / len(data)
Copy the code

Calculation results of the execution function:

if __name__ == '__main__':
    linear_regression()
Copy the code

The calculation results are as follows:

0 3.26543633854
100 1.41872132865
200 1.36529867423
300 1.34376973304
400 1.33509372632
500 1.33159735872
600 1.330188348
700 1.32962052693
800 1.32939169917
900 1.32929948325
1.23930380135 1.86724196887
Copy the code

As can be seen, with the increase of iteration times, Loss function is getting closer and closer to the minimum value, and M and B are also getting closer and closer to the optimal solution.

Note:

In the above method, the error is minimized by calculating the partial derivative of Loss. The above method in the case of known gradient, that is, the fastest way to reach the bottom of the bowl. So how do you find the optimal solution when the formula is very difficult to calculate. So if you take the partial derivative, you can use the definition of a derivative, and look at another function.

def optimizer_two(data, initial_b, initial_m, learning_rate, num_iter):
    b = initial_b
    m = initial_m

    while True:
        before = computer_error(b, m, data)
        b, m = compute_gradient(b, m, data, learning_rate)
        after = computer_error(b, m, data)
        ifAbs (after-before) < 0.0000001:# Keep decreasing accuracy
            break
    return [b, m]

def compute_gradient_two(b_cur, m_cur, data, learning_rate):
    b_gradient = 0
    m_gradient = 0

    N = floatDelta (len (data)) = 0.0000001for i in range(len(data)):
        x = data[i, 0]
        y = data[i, 1]
        Calculate the gradient using the definition of the derivative
        b_gradient = (error(x, y, b_cur + delta, m_cur) - error(x, y, b_cur - delta, m_cur)) / (2*delta)
        m_gradient = (error(x, y, b_cur, m_cur + delta) - error(x, y, b_cur, m_cur - delta)) / (2*delta)

    b_gradient = b_gradient / N
    m_gradient = m_gradient / N
    #
    new_b = b_cur - (learning_rate * b_gradient)
    new_m = m_cur - (learning_rate * m_gradient)
    return [new_b, new_m]


def error(x, y, b, m):
    return (y - (m * x) - b) ** 2
Copy the code

In both cases, the number of iterations is enough to approach the optimal solution. The optimal solutions obtained are: 1:1.23930380135 1.86724196887 2:1.24291450769 1.86676417482

Simple comparison

There is a corresponding method in Sklearn to find linear regression, which directly uses the least square method to find the optimal solution. Simple implementation for comparison.

def scikit_learn():
    data = init_data()
    y = data[:, 1]
    x = data[:, 0]
    x = (x.reshape(-1, 1))
    linreg = LinearRegression()
    linreg.fit(x, y)
    print(linreg.coef_)
    print(linreg.intercept_)

if __name__ == '__main__':
    # linear_regression()
    scikit_learn()
Copy the code

At this point, the solution is 1.24977978176 1.86585571. It can be shown that the above calculation results are satisfactory, and better results can be achieved by adjusting the parameters later.

Source code and data reference: github.com/yunshuipiao…

Gradient Descent a summary of the process of Gradient Descent