This article is featured at: Aistudyclub.com

The previous article, “Taking you Hand in hand into Machine Learning 1 — Linear Regression,” explained the concept of linear regression and how to implement linear regression using gradient descent algorithms. This article continues the content of the last article and uses Python to implement the linear regression process according to the content of Exercise 1 in Machine Learning.

1. Visualize data sets

In order to facilitate demonstration and calculation, a data set with only one eigenvalue is selected here. The first column of this data set describes the population of a city, set to the eigenvalue X, and the second column represents the revenue of food trucks in that city, set to label Y.

csv_reader = csv.reader(open('ex1data1.txt'.'r'))
X = []
y = []
for line in csv_reader:
    X.append(float(line[0]))
    y.append(float(line[1]))
X = np.array(X)
y = np.array(y)
plt.scatter(X, y)
plt.ylabel('Profit in the $10000 s')
plt.xlabel('Population of City in 10,000s')
plt.show()
Copy the code

Save the data in ex1data1. TXT to X and y using the above code. We use scatter plots to visualize the data, resulting in the following image. From this figure, we can roughly see the relationship between urban population and income, which is an approximate linear relationship.

2. Implement the loss function

Here we can use the formula mentioned in the previous article to represent this relationship, where x0x_0x0 is constant 1 and an x0x_0x0 is added for programming purposes:

H = theta. Theta 0 x0 + 1 x1h = \ \ theta_1x_1H theta_0x_0 + = theta 0 x0 + 1 x1 theta (1)

In order to calculate the values of θ0\theta_0θ0 and θ1\theta_1θ1, we first need to establish the loss function, as mentioned earlier, as follows:

12 m J = ∑ I = 1 m (H – y (I)) = 2 J \ frac {1} {2} m. \ sum_ {I = 1} ^ m (H – y ^ {} (I)) ^ 2 J = 2 m1 ∑ I = 1 m (H – y (I)) 2 (2)

The code implementation is as follows

def computeCost(X, y, theta) :
    # Get sample number
    m = y.shape[0]
    J = 0
    # Iterate over all samples, calculate the values of h-y, and add them up.
    for i in range(m):
        h = np.matmul(X[i,:], theta)
        J = J + (h - y[i])**2
    # According to the formula, finally calculate J.
    J = J / 2 / m
    return J
Copy the code

After you have finished calculating the loss function, you can use the following code to verify that the loss function program is correct.

theta = np.zeros([2.1])
X = np.hstack([np.ones([X.shape[0].1]), X[:,np.newaxis]])

J = computeCost(X, y, theta)
# Calculate the loss value using two zeros as the value of theta.
print('With theta = [0 ; 0]\nCost computed = %f' % (J,));
# Expected loss value if the loss function is always stated correctly.
print('Expected cost value (approx) 32.07\n');


J = computeCost(X, y, np.array([[-1], [2]]))
# Use -1 and 2 as theta values to calculate the loss value.
print('\nWith theta = [-1 ; 2]\nCost computed = %f' % (J,));
# Expected loss value if the loss function is always stated correctly.
print('Expected cost value (approx) 54.24\n');
Copy the code

3. Achieve gradient descent

Here is the core of this article, implementing the gradient descent algorithm in Python. From the previous section, we know how to update theta \theta theta:

reapeat{

Theta. J: = theta j – alpha partial partial theta jJ (theta 0, 1, theta… , theta n) \ theta_j: = \ theta_j – \ alpha \ frac {\ partial} {\ partial \ theta_j} J (\ theta_0 \ theta_1,… , theta. J: \ theta_n) = theta – alpha partial theta partial j j j (theta 0, 1, theta… , theta n) (4)

}

In the above formula, the partial derivative of loss function J with respect to θ J \theta_jθ J is taken. We know the loss function J, so the above formula can be further expanded:

reapeat{

Theta. J: theta j – alpha 1 m = ∑ I = 1 m (h (xi) – yi) xji \ theta_j: = \ theta_j – \ alpha \ frac {1} {m} \ sum_ {I = 1} ^ m (h (x ^ I) – y ^ I) x_j ^ I theta. J: = theta j – alpha m1 ∑ I = 1 m (h (xi) – yi) xji (4)

}

So our iterative algorithm is clear, the above formula except for θ\thetaθ, can be obtained, the implementation code is as follows:

def gradientDescent(X, y, theta, alpha, num_iters) :
    m = y.shape[0]
    n = theta.shape[0]
    The number of times to iterate according to num_iters.
    for iter in range(num_iters):
        k = np.zeros([m, 1])
        p = np.zeros([n, 1])
        # Calculate the sum symbol according to the formula
        for i in range(m):
            X_tmp = np.expand_dims(X[i,:],0)
            k[i] = np.matmul(X_tmp, theta) - y[i]
            p = p + k[i] * np.transpose(X_tmp,[1.0])
            pass
        p = p / m
        # Iteratively update theta parameters
        theta = theta - alpha * p
        pass
    At the end of the iteration, return the parameters obtained by gradient descent.
    return theta
Copy the code

In the main program to call the gradient descent algorithm function, the learning rate is set to 0.01, the number of iterations is 1500.

alpha = 0.01
iterations = 1500
theta = gradientDescent(X, y, theta, alpha, iterations)


print('Theta found by gradient descent:');
print(theta[0], theta[1]);
print('Expected theta values (approx)');
print('1.1664-3.6303 \ n');

plt.scatter(X[:,1], y)
plt.plot(X[:,1], np.matmul(X, theta),'r')
plt.ylabel('Profit in the $10000 s')
plt.xlabel('Population of City in 10,000s')
plt.show()
Copy the code

4. Verify the linear regression results

After 1500 iterations, we get θ0=−3.6303\ theTA_0 =-3.6303θ0=−3.6303 and θ1=1.1664\ theTA_1 =1.1664θ1=1.1664. Then the previous formula (1) can be written as:

H = – 3.6303 + 1.1664 = 3.6303 + 1.1664 x1H x_1H = – 3.6303 x1 + 1.1664 (5)

This formula represents a straight line, and we visualize this line with all the samples, as shown below.

As can be seen from the figure above, this line can roughly represent the relationship between population and food truck revenue. So, we can use this line to roughly predict future revenue based on population.

Visualize the loss function

To better understand the loss function, we can visualize the relationship between θ0\theta_0θ0, θ1\ theTA_1 θ1 and JJJ as follows:

theta0_vals = np.linspace(-10.10.100)
theta1_vals = np.linspace(-1.4.100)

J_vals = np.zeros([100.100])
for i in range(100) :for j in range(100):
        t = np.array([theta0_vals[i], theta1_vals[j]])
        t = t[:,np.newaxis]
        J_vals[i, j] = computeCost(X, y, t)
        pass

fig = plt.figure()
ax = fig.gca(projection='3d')

X, Y = np.meshgrid(theta0_vals, theta1_vals)
J_vals = np.transpose(J_vals, [1.0])
surf = ax.plot_surface(X, Y, J_vals)

plt.show()
Copy the code

Since there are two variables, this is a 3-dimensional coordinate system. The two horizontal axes represent the variables θ0\theta_0θ0 and θ1\theta_1θ1, respectively, and the vertical axes represent the value of J. In this diagram, the values θ0\theta_0θ0 and θ1\theta_1θ1 corresponding to the lowest point of the surface are what we need because this position has the lowest value of J. We can further verify this by contour map.

The code for drawing the contour map is as follows:

plt.contour(X, Y, J_vals, np.logspace(-2.3.20))
plt.scatter(theta[0], theta[1])
plt.show()
Copy the code

In the code above, we also plot the previously calculated theta \theta theta values with a blue dot, which is indeed at the lowest point of the surface as can be seen from the contour plot.

The following is the complete code address: github.com/txyugood/Ma…

If you think this article is helpful, you are welcome to like, pay attention to and collect, and also welcome to pay attention to my official account: Artificial Intelligence Research Society, your support is my motivation to continue to create, thank you!