Recently I saw an up user on Bilibili who realized machine learning with only 10 lines of code. I was amazed. Watch the video and poke here

The following is a summary of my study.

start

Will Jack Bauer go to the movies?

Ruhua, Xiaoqian, Xiaoming and Xiaoqiang, they are good gay friends and often go to the movies with each other. But jack Bauer didn’t go to the movies every time. Here’s what happened the first four times they went to the movies :(1 = went to the movies, 0 = didn’t go to the movies)

Such as flower Small qian Xiao Ming Jack Bauer
1 0 1 1
1 1 0 1
0 0 1 0
0 1 0 0

If the fifth time to see a movie, such as flowers do not go, xiaoqian and Xiaoming want to go, so xiaoqiang will go?

Such as flower Small qian Xiao Ming Jack Bauer
1 0 1 1
1 1 0 1
0 0 1 0
0 1 0 0
0 1 1 ?

Our human brain on the above data analysis, it is easy to see that cockroaches on flowers interesting, such as flowers, cockroaches go, such as flowers do not go, cockroaches do not go, so come to the conclusion that cockroaches do not go.

How can the process of human brain thinking and analysis be transformed into the process of computer thinking?

In the code

from numpy import array, exp, random, dot
X = array([[1.0.1], [1.1.0], [0.0.1], [0.1.0]])
y = array([[1.1.0.0]]).T
random.seed(1)
weights = 2 * random.random((3.1)) - 1
for _ in range(10000):
    output = 1/ (1+exp(-dot(X, weights)))
    error = y - output
    delta = error * output * (1-output)
    weights += dot(X.T, delta)
    
p = 1/ (1+exp(-dot([[1.0.0]], weights)))[0] [0]
print("Will Jack Bauer go?"."No go" if  p > 0.5 else "To go")
Copy the code

Not counting the code for printing, it’s just 10 lines. Those of you who rarely do scientific calculations in Python may be a little confused, but don’t worry, I’ll explain each line of code below.

The import library

from numpy import array, exp, random, dot
Copy the code

Numpy is arguably the cornerstone of Scientific computation in Python and is very easy to use. For the convenience of mathematical calculation, we mainly import array, EXP, random and dot

  • Array: creates a matrix
  • Exp: Exponential function with natural constant e as base
  • Random: Generates a random number ranging from 0 to 1
  • Dot: matrix multiplication

Generate the data

X = array([
[1,0,1],[1,1,0],[0,0,1],[0,1,0]
])
y = array([[1,1,0,0]]).T
Copy the code

Change the row vector into a column vector as follows:

[[[1, 1], [1], [1, 0], [1], [0, 1], [0], [0, 0], [0],]]Copy the code

Generating random weight

# Set the random factor, so that the random number generated each time is the same, convenient code debugging.
random.seed(1)
Generate a row vector with 3 columns in the range -1 to 1.
weights = 2 * random.random((3.1))- 1
Copy the code

Why do I set weights?

Take watching a movie for the first time as an example, [1,0,1] corresponds to [1], and there is some correlation between them as follows:

1*w1 + 0*w2 + 1*w3 = 1
Copy the code

W1, w2, w3 are just weights.

If we can figure out w1, w2, w3, can we plug in the fifth ([0,1,1]), and get whether cockroaches go to the movies or not?

0*w1 + 1*w2 + 1*w3 = roach?Copy the code

How do you figure out the weights?

It’s hard to substitute the weight of the first data into the next three.

So we take a random set of weights, we plug in each set of data, we get an error, we modify the weights, we get a new error, and so on until the error is minimized, and we call this machine learning process

Optimizing the weights

for _ in range(10000):
    The sigmoID function is used to convert the calculated results
    output = 1/(1+exp(-dot(X, weights)))
    Calculate the error by subtracting the calculated result from the true value
    error = y - output
    # calculate increment
    delta = error * output*(1-output)
    Get the new weight
    weights += dot(X.T, delta)
Copy the code

Repeat 10000 times, make the error smaller, and finally get the optimal weight, the weight into the fifth time of the data can be calculated whether cockroaches go to the movies.

Why the sigmoid function?

Since the range of the calculated results is positive infinity to negative infinity, sigmoid function is used to convert them into 0~1 for convenient classification, such as going to the movies if the value is greater than 0.5, and not going to the movies if the value is less than 0.5.

How do you calculate the increment?

delta = error * output*(1-output)
Copy the code

Break the above sentence into two lines of code to make sense:

# Calculate the slope, that is, take the derivative of the result
slope = output*(1-output)

# Calculate delta based on error to update the weight
delta = error * slope
Copy the code

What’s the slope?

The calculated results are converted into smooth curves of 0~1 by sigmoID function. In order to make error smaller, the calculation result will approach 0 or 1 wirelessly, and the slope will be smaller as the error approaches 0 or 1

Why do I multiply error times slope?

In gradient descent, the slope gets smaller as you get closer to the best point, so at the lowest point we want to reduce the delta change so as not to miss the best point.

Predicted results

P = 1 / (1 + exp (- dot ([[0, 1]], the weights))) [0] [0] print (" jack Bauer go or not, ", "don't" if p > 0.5 else "to") / / = > not to goCopy the code

After 10,000 times of optimization, the weight was substituted into [1,0,0] and p was calculated to be 0.9999253713868242, greater than 0.5 and infinitely close to 1, so Xiao Ming would go to the movies.

conclusion

That’s all 10 lines of code.

In order to minimize the professional knowledge required, these 10 lines of code do not consider local optimization, convergence of calculation results and other issues, so the code is not rigorous enough, but enough to reflect the whole operation mechanism of machine learning.

With these 10 lines of code, you can get a sense of how a machine mimics human learning — through trial and error, correction, and finally a correct solution.

Thank you very much to the author of the video for doing a great job of making something obscure so easy to understand. I highly recommend you take a look. Big talk neural network, 10 lines of code do not switch, do not understand you hit me!