preface


Linear regression is a fairly basic part of machine learning. So how do you do that in Python? First we di 䘔鯻 yi, get foothills jiawa, then hew son, finally multifunction chase. So that’s our linear regression, isn’t that easy? (Serious face)

If the above content did not understand, that also does not matter, after all, IS my random type. So what is the article mainly about? In other words, linear regression through matrices.


On the basis and implementation of linear regression



Although the author intends to explain the basic concept of linear regression and the realization of ordinary linear regression with one variable, but the level is limited. After several drafts, he found that he could not describe in detail. If you have some machine learning background and are familiar with this section, you can read on. If it is not familiar with linear regression, and want to see the basic realization of linear regression. The author hereby attaches the mysterious address:

Linear regression understanding (with pure Python implementation)

The great god wrote about the principle of linear regression in more detail, and also carried out the basic implementation. When the author is carrying on actual operation, also drew lessons from the train of thought of many great gods.

What this paper has done is to replace the less efficient cyclic operation with matrix operation, and to achieve multi-parameter linear regression.


Why do you use matrices



If you look at the implementation of linear regression in the link above, you will see that a for loop is used to iterate over each sample to update the parameters. On such a basis, it is not difficult to achieve multi-parameter linear regression, as long as multiple parameters are updated at the same time of each iteration. However, as I increase the number of samples, the cost of doing this increases, so if I have 500 samples, I have to do 500 for loops, which is very inefficient. In fact, there is no for loop in TensorFlow or any other machine learning-related learning system. With matrices, we can train multiple or even all samples at once.


Operation of matrix



So how does a matrix evaluate multiple samples?

First let’s look at the matrix operations that are involved.



Matrix addition and subtraction

A matrix

.

.



the


     



In short, two matrices add and subtract, that is, their elements in the same position add and subtract. The addition and subtraction of matrices are relatively simple and will not be overstated.

Matrix multiplication



Here’s a simple example:




This is a matrix with three rows and one column multiplied by a matrix with three rows and one column. You can view it as you end up with a row by column matrix. Notice that if A and B were reversed, it would be completely different, it would be A matrix with three rows and three columns. In other words, there is no commutative law for matrix multiplication. If the positions of A and B were switched, the result would look like this:


Why is that? Because first of all, matrix multiplication is conditional: the first matrix has the same number of columns as the second matrix has the same number of rows. So for example in this case where A is A 3 by 1 matrix, B is A 1 by 3 matrix, then AB is multiplied, A is listed as 1, B has the same behavior as 1, so you can multiply. The result is a matrix with the same number of rows as the first matrix and the same number of columns as the second matrix. This is where A has 3 rows and B has 3 columns, so it’s going to be A 3 by 3 matrix. And when you multiply BA, the number of rows in B is 1, the number of columns in A is 1, so it’s A 1 by 1 matrix

Here’s a slightly more complicated operation:




These are the basics of linear algebra, and if you’ve already taken linear algebra, they shouldn’t be too difficult. I know I’m being too general and hasty, so if you still haven’t figured out how to multiply matrices, here’s Mr. Ng’s mysterious address:

Matrix vector multiplication

Teacher Ng’s speech is easy to understand. It’s also a good idea to learn linear algebra on your own, since there are many linear algebra courses available online.



Application of matrix



Assuming that our multivariate linear equation has n terms, the equation should look like this: Y=w0x0+w1x1+… + WNXN +b, where w is the coefficient of each parameter x, which is what we want to figure out at the end, and b is the constant term, or we can call it the offset term. So what is a matrix for something like this?

Assuming that X = [x0, x1,… xn], W = [w0, w1… wn] so XWT + b = [w0x0 + w1x1 +… + WNXN + b]

If we have m group samples, then:

XWT+b=




So we can process multiple groups of samples and get the predicted value

Initialization of W and B

[python]

view plain

copy









  • Weight=np.ones(shape=(1,data_x.shape[1]))

  • baise=np.array([[1]])



Data_x is sample x of the input, data_y is the correct sample result. Data_x is just like X up here in matrix form, the number of columns of characteristics, behavior samples.

Like this:

We want to make sure that data_x is multiplied by W transpose, so W has to have the same number of columns as data_x. In other words, the number of columns in data_x represents the number of features, and W is the coefficient (weight) of x features, so the number should be the same as the number of features. And data_x.shape(1) is the number of columns in data_x. Shape (0) as the number of lines


Get the predicted value

The predicted value is w0x0+w1x1+… +wnxn+b

[python]

view plain

copy









  • WXPlusB = np.dot(data_x, Weight.T) + baise



Dot (A,B) is the matrix A multiplied by B. Thus, we get WXPlusB that looks something like this:




Calculate the cost function and update W and B

[python]

view plain

copy









  • Loss = Np.dot ((data_Y-WxplUSB).T, data_Y-WxplUSB)/data_y.shape[0] # Cost function

  • W_gradient = -(2/data_x.shape[0])* Np.dot ((data_Y-wxplUSB).T,data_x) #w update rate

  • Baise_gradient = -2* Np.dot ((data_y-wxplUSB).t, Np.ones (Shape =[data_x.shape[0],1]))/data_x.shape[0] #b

  • Weight = Weight – learningRate * w_gradient # update

  • baise=baise-learningRate*baise_gradient



The principle of this place is almost the same as that of linear regression without elaboration.

Although this step is very want to elaborate, and perhaps some people will not understand, but. I’m hungry now, so IF I want to eat, I’ll leave it at that. And I want to think about how this works. It’s not hard to understand.

One thing to note

[python]

view plain

copy









  • np.dot((data_y-WXPlusB).T,data_x)



This is the sum of all the samples.

The overall code



[python]

view plain

copy









  • import numpy as np

  • #learningRate learningRate, Loopnum number of iterations

  • def liner_Regression(data_x,data_y,learningRate,Loopnum):

  • Weight=np.ones(shape=(1,data_x.shape[1]))

  • baise=np.array([[1]])

  • for num in range(Loopnum):

  • WXPlusB = np.dot(data_x, Weight.T) + baise

  • loss=np.dot((data_y-WXPlusB).T,data_y-WXPlusB)/data_y.shape[0]

  • w_gradient = -(2/data_x.shape[0])*np.dot((data_y-WXPlusB).T,data_x)

  • baise_gradient = -2*np.dot((data_y-WXPlusB).T,np.ones(shape=[data_x.shape[0],1]))/data_x.shape[0]

  • Weight=Weight-learningRate*w_gradient

  • baise=baise-learningRate*baise_gradient

  • if num%50==0:

  • Print (Loss) # Print loss per iteration 50 times

  • return (Weight,baise)



At this point, a multivariate linear regression has been achieved. But I want to say something about validation.


About the authentication



When I started testing with one set of data, I had a few problems,

[python]

view plain

copy









  • Data_x = np. Random. Normal (0, 10, [5])

  • Weights = np. Array ([minus [3]])

  • data_y=np.dot(data_x,Weights.T)+5

  • Res = liner_Regression (data_x data_y, learningRate = 0.01, Loopnum = 5000)

  • print(res[0],res[1])



Five sets of x were randomly generated, with weights of 3,4, and 6 and constants of 5. So y is equal to 3 times x0 plus 4 times x1 plus 6 times x2 plus 5. So I want the output weight to be [3,4,6] and baise to be 5. The learning rate was 0.01, 5000 iterations, but the results were broken.




The value of Loss does not decrease and even increases. So I found that the learning rate was set too high. As for the learning rate, if it is too large, it will produce oscillation and cannot converge.

So I just set the learningRate at 0.001




The result is much better. The loss keeps decreasing, and the weight obtained is 3.02,4.11 and 5.98, which is much better, but the loss is still not enough. So I increased the number of iterations to 30,000




At this time, the error loss is very small, and even weight and Baise are exactly the same as the setting.

However, the training data we used is different from the real data, which has many errors and interference, so the number used for training cannot be so perfect, so I added noise

[python]

view plain

copy









  • Data_x = np. Random. Normal (0, 10, [5])

  • Weights = np. Array ([minus [3]])

  • Whose = np. Random. Normal (# 0,0.05, [5, 1]) noise

  • data_y=np.dot(data_x,Weights.T)+5+noise

  • Res = liner_Regression (data_x data_y, learningRate = 0.001, Loopnum = 30000)



Although there is an error, but the value is still very close to the setting.


I hope you can try different samples, or different learning rates, it will be fun.


For more free technical information: annalin1203