Do you practice on purpose?

In Deliberate Practice, the author uses a wealth of data and examples to illustrate the idea that anyone who practices hard with the right methods can excel in any field. One example is a study of violin students. He divided the students into three groups (excellent, excellent and most outstanding) and found that the most important difference between them was the amount of practice. Natural talent, though intended to make people learn faster in the beginning, was not decisive in the long run. By the age of 18, the top students averaged 3,420 hours of practice, the top students 5,301 hours, and the top students 7,401 hours. The author uses this to prove that anyone can achieve excellence through hard practice.

Linear regression

So let’s assume that the results in deliberate practice are true, and let’s assume that we have a set of data on the length of practice and math performance. The practice time (h/w) | grades

So how do we find the relationship between practice time and performance? If you find this relationship, you can predict a student’s performance if you know how many hours he or she has been practicing. Let’s assume that there is a linear relationship between the practice duration and the score, then a hypothetical relationship function can be expressed as follows:

If the number of samples is m, then we can figure out the variance with respect to this relationship

Its average value can be expressed as

The smaller the J(θ), the more accurate our prediction will be. So the problem is converted to finding the value of θ that minimizes J(θ), minJ(θ).

We can give any value of θ and then narrow it down, and when θ stabilizes at a value, we have found a value that minimizes θ. Step by step to narrow the scope can be done as follows:

Once the θ value has been determined by this method, the h(θ) that was originally defined can be used to predict the relationship between practice duration and performance. And that approach is linear regression in machine learning. The process of solving θ is the gradient descent algorithm.

After solving θ, we can substitute θ into h(θ) to get the relationship between grades and practice duration.

Linear regression Python implementation

$J(\theta)$; $J(\theta)$; $J(\theta)$

Cost function

According to the above mentioned

To define the compute_cost function, we just need to convert it into a matrix operation according to the formula.

def compute_cost(X, y, theta):
    m = y.size
    prediction = X.dot(theta) - y
    sqr = np.power(prediction, 2)
    cost = (1 / (2 * m)) * np.sum(sqr)
    return costCopy the code
We can visualize the relationship between the cost function and Theta:

def plot_J_history(X, y):
    theta0_vals = np.linspace(-10, 10, 100)
    theta1_vals = np.linspace(-1, 4, 100)

    J_vals = np.zeros((theta0_vals.size, theta1_vals.size))

    for i in range(theta0_vals.size):
        for j in range(theta1_vals.size):
            theta = np.array([theta0_vals[i], theta1_vals[j]])
            t = compute_cost(X, y, theta)
            J_vals[i, j] = t

    theta_x, theta_y = np.meshgrid(theta0_vals, theta1_vals)

    fig = plt.figure()
    ax = fig.gca(projection='3d')
    ax.plot_surface(theta_x, theta_y, J_vals)

    ax.set_xlabel(r'$\theta$0')
    ax.set_ylabel(r'$\theta$1')
    plt.show()

plotData.plot_J_history(X, y)Copy the code
The resulting image looks something like this:



And the lowest point on this graph is the point that we need to find by recursive descent.

Recursive drop

The above recursive descending differential expression can be transformed into a numerical expression

In Python, it can be expressed as:

def gradient_descent(X, y, theta, alpha, num_iters):
    m = y.size
    J_history = np.zeros((num_iters))

    for i in range(0, num_iters):
        prediction = X.dot(theta) - y
        delta = prediction.dot(X)
        theta = theta - alpha * (1 / m) * delta
        J_history[i] = compute_cost(X, y, theta)

    return theta, J_historyCopy the code
We can call this function to solve for $\theta$

Theta = Np. zeros((2,)) iterations = 1500 alpha = 0.01 theta, J_history = gradient_descent(X, y, theta, alpha, iterations)Copy the code
$\theta$= $h(\theta)$= $h(\theta)$

The last

In fact, it looks like a lot of trouble, but with sklearn’s algorithm, it only takes a few lines to do linear regression

from sklearn.linear_model import LinearRegression ... Regressor () regressor = regressor. Fit (X_train, Y_train)Copy the code
That’s it, Emmm…