This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

Logistic regression

At the beginning of the training, a logistic regression model will be built to predict whether a student will be admitted to college. Imagine that you are the administrator of the relevant section of the university and want to determine whether students will be accepted or not by their scores on two tests. You now have a set of training samples from previous applicants that you can use to train logistic regression. For each training sample, you have their two test scores and the final admission results. To accomplish this prediction task, we are going to build a classification model that can evaluate the likelihood of admission based on two test scores.

Import the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Copy the code
Import ex2data1 into data
path="ex2data1.txt"
data=pd.read_table(path,header=None,names=["Exam 1"."Exam 2"."Admitted"],sep=', ')
data.head()
Copy the code
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Exam 1 Exam 2 Admitted
0 34.623660 78.024693 0
1 30.286711 43.894998 0
2 35.847409 72.902198 0
3 60.182599 86.308552 1
4 79.032736 75.344376 1

Create a scatter diagram of Exam1,Exam2, and use colors to visualize (is the sample positive (accepted) or negative (not accepted)?)

Extract admitted and unadmitted data
positive=data[data["Admitted"].isin([1]]The #isin function is used to extract the corresponding row
negative=data[data["Admitted"].isin([0]]# drawing
fig,ax=plt.subplots(figsize=(9.6))# Set graphics size
ax.scatter(positive["Exam 1"],positive["Exam 2"],s=50,c="blue",marker="o",label="Admitted")     # Set point coordinates, size, color, graphic, label
ax.scatter(negative["Exam 1"],negative["Exam 2"],s=50,c="red",marker="x",label="Not Admitted")
ax.legend(loc=1)# Legend in the upper right corner
ax.set_xlabel("Exam 1 Score")
ax.set_ylabel("Exam 2 Score")
plt.show()

Copy the code



It can be seen from the figure above that the two graphs have a clear decision boundary directly. Next, logistic regression is implemented and a model is trained to predict the outcome.


s i g m o i d function The sigmoid function

GGG stands for sigmoidsigmoidsigmoid function: g(z)=11+e−zg(z)=\frac{1}{1+e^{-z}}g(z)=1+e−z1

Logistic regression model assumes that the function: h theta. Theta (Tx) (x) = g = 11 + e – theta Txh_ {theta} (x) = g (theta ^ {T} (x) = \ frac {1} {1 + e ^ {- theta ^ {T} x}} h theta (x) = g (theta Tx) = 1 + e – theta Tx1

Sigmoid: g(z)
def sigmoid(z) :
    return 1/ (1+np.exp(-z))
Copy the code
Verify the function of the g(z) function
fig,ax=plt.subplots(figsize=(8.6))
test=np.arange(-10.10,step=0.5)
ax.plot(test,sigmoid(test),c='red')
plt.show()
Copy the code



Sigmoid function: g(z) verification is successful, then write the cost function to evaluate the result.


J ( Theta. ) = 1 m i = 1 m [ y ( i ) l o g ( h Theta. ( x ( i ) ) ) ( 1 y ( i ) ) l o g ( 1 h Theta. ( x ( i ) ) ) ] J (theta) = \ frac {1} {m} \ sum ^ {m} _ {I = 1} [- y ^ {(I)} the log (h_ {\ theta} (x ^ {(I)})) – (1 – y ^ {(I)}) log (1 – h_ {\ theta} (x ^ {(I)}))]

ps:

Multiply (): Multiply the array by the corresponding position of the matrix, and the output is the same size as the multiplied array/matrix

@: Performs matrix multiplication on matrices

*: Multiply arrays by positions; Perform matrix multiplication on matrices

def cost(Theta,X,Y) :# pass X,Y is a table, Theta is an array
    # Convert X,Y from table to matrix,Theta from array to matrix
    Theta=np.matrix(Theta)
    X=np.matrix(X.values)
    Y=np.matrix(Y.values)
    first=np.multiply(-Y,np.log(sigmoid([email protected])))
    second=np.multiply(1-Y,np.log(1-sigmoid([email protected])))
    return np.sum(first-second)/len(X)
Copy the code
# Add vector x0(x0 is equal to 1)
data.insert(0."Ones".1)
Copy the code
Extract data X,Y,Theta
cols=data.shape[1]
X=data.iloc[:,0:cols-1]
Y=data.iloc[:,cols-1:cols]
Theta=np.zeros(3)
Copy the code
Calculate the cost function of the initialization parameter (Theta = 0)
cost(Theta, X, Y)
Copy the code
0.6931471805599453
Copy the code

Design functions to calculate gradients of training data, labels, and some parameters thata:


g r a d i e n t   d e s c e n t gradient \ descent
(Gradient descent)

  • Batch gradient Descent (BATCHGradientDescent)
  • Transform to vectorization


partial J ( Theta. ) partial Theta. j = 1 m i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) x j ( i ) \frac{\partial J(\theta)}{\partial \theta_{j}}=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x_{j}^{(i)}

def gradient(Theta,X,Y) :# pass X,Y is a table, Theta is an array
    

    # Convert X,Y from table to matrix,Theta from array to matrix
    Theta=np.matrix(Theta)
    X=np.matrix(X.values)
    Y=np.matrix(Y.values)
    #grad records the gradient descent value of each element of θ vector
    Theta_cnt=Theta.shape[1]
    grad=np.zeros(Theta.shape[1])
    # Calculate the error vector
    error=sigmoid(X*Theta.T)-Y
    for i in range(Theta_cnt):
        tmp=np.multiply(error,X[:,i])
        grad[i]=np.sum(tmp)/len(X)
    return grad
Copy the code

The above function actually does not perform gradient descent, just calculates a gradient step size. Here is the result of gradient descent with initial data and parameter 0:

gradient(Theta, X, Y)
Copy the code
Array ([-0.1, -12.00921659, -11.26284221])Copy the code

Since you’re using Python, you can use SciPy’s “optimize” namespace to calculate the cost and gradient parameters

Most commonly used parameters:

  • Func: Objective function of optimization
  • X0: initial value
  • Fprime: Gradient function that provides the optimization function func (if cost function returns only cost, set fPrime =gradient), otherwise func must return the function value and gradient, or set approx_grad=True
  • Approx_grad: If set to True, an approximate gradient is given
  • Args: tuples that are parameters passed to the optimization function

Returns:

  • X: array, return the target value of the optimization problem
  • Nfeval: Integer, number of function evaluations (A function evaluation counts every time a target optimization function is called during optimization. There will be multiple function evaluation in an iteration. This parameter is not equal to the number of iterations, but is often greater than the number of iterations.
  • Rc: int,Return code, see below
SciPy's truncated Newton (TNC) implementation can find the optimal parameter
import scipy.optimize as opt 
result=opt.fmin_tnc(func=cost, x0=Theta,fprime=gradient,args=(X,Y))
result
Copy the code
(Array ([-25.16131863, 0.20623159, 0.20147149]), 36, 0)Copy the code

The calculation results of cost function in this conclusion are as follows:

cost(result[0],X,Y)
Copy the code
0.20349770158947458
Copy the code

Draw a decision curve

plot_x=np.linspace(30.100.100)
plot_y=( - result[0] [0] - result[0] [1] * plot_x) / result[0] [2]
# drawing
fig,ax=plt.subplots(figsize=(9.6))# Set graphics size
ax.plot(plot_x,plot_y,c='y',label='Prediction')
ax.scatter(positive["Exam 1"],positive["Exam 2"],s=50,c="blue",marker="o",label="Admitted")     # Set point coordinates, size, color, graphic, label
ax.scatter(negative["Exam 1"],negative["Exam 2"],s=50,c="red",marker="x",label="Not Admitted")
ax.legend(loc=1)# Legend in the upper right corner
ax.set_xlabel("Exam 1 Score")
ax.set_ylabel("Exam 2 Score")
plt.show()
Copy the code



The method of minimize can be used to fit, and the method of minimize can choose different algorithms to calculate

Most commonly used parameters:

  • Func: Objective function of optimization
  • X0: initial value, one-dimensional array, Shape (n,)
  • Args: tuple, optional, additional arguments passed to the optimization function
  • Method: algorithm for solving. If TNC is selected, it is similar to fMIN_tnc ()
  • Jac: a function that returns the gradient vector

Returns:

  • Optimize the result object.
  • X: Optimize the target array of the problem
  • Success: True indicates whether success is achieved. Failure information is displayed if success is not achieved.
result=opt.minimize(fun=cost, x0=Theta,args=(X,Y),method="TNC",jac=gradient)
result
Copy the code
Fun: 0.20349770158947458 JAC: Array ([8.95090947E-09, 8.17143290E-08, 4.7654271707]) message: 'Local minimum reached (|pg| ~= 0)' nfev: 36 nit: 17 status: 0 success: True x: Array ([25.16131863, 0.20623159, 0.20147149])Copy the code

The calculation results of cost function in this conclusion are as follows:

cost(result["x"],X,Y)
Copy the code
0.20349770158947458
Copy the code

After obtaining the parameter θ, the model is used to predict whether a student will be admitted.

Next, write a function that prints predictions for dataset X with the parameter θ. This function is then used to score the training accuracy of the classifier. Logistic regression model assumes that the function: h theta. Theta (Tx) (x) = g = 11 + e – theta Txh_ {theta} (x) = g (theta ^ {T} (x) = \ frac {1} {1 + e ^ {- theta ^ {T} x}} h theta (x) = g (theta Tx) = 1 + e – theta Tx1

When hθ>= 0.5H_ {θ}>=0.5hθ>=0.5, y=1y=1y=1;

When hθ< 0.5H_ {θ}< 0.5H θ<0.5, y=0y=0y=0 is predicted.

Let’s start building the prediction function predict

def predict(Theta,X) :
    p=sigmoid([email protected])
    return [1 if x>=0.5 else 0 for x in p]
Copy the code
result=opt.fmin_tnc(func=cost, x0=Theta,fprime=gradient,args=(X,Y))
Y=np.matrix(Y.values)
# Note that X and Y are always tables and need to be converted to matrices or lists
Copy the code
#result[0] is the learned θ
Theta_min = result[0]If the minimum is used, then the result[0] will change, and you need to execute fMIN_tnc again to get the required result[0].
predictions = predict(Theta_min, X)

correct = [1 if a==b else 0 for (a, b) in zip(predictions,Y)]
accuracy = float((sum(correct) / len(correct))*100)
print ("accuracy = {:.2f}%".format(accuracy))
Copy the code
Accuracy = 89.00%Copy the code

The logistic regression classifier predicted correctly if a student was admitted or not, achieving 89 percent accuracy, which is the accuracy of the training set. There is no real approximation held by Settings or cross-validation, so this number could be higher than its true value.

Regularized logistic regression

In the second part of the training, we will enhance the logistic regression algorithm by adding regular terms. Regularization is a term in cost function that makes the algorithm more in favor of the “simpler” model (in this case the model will have smaller coefficients). This theory helps to reduce overfitting and improve the generalization ability of the model.

Suppose you are a product manager at a factory and you have test results for some microchips in two different tests. From these two tests, you want to determine whether the microchip should be accepted or rejected. To help you decide, you have a data set of past microchip test results from which you can build a logistic regression model.

First, data extraction:

# Data extraction
path="ex2data2.txt"
data2=pd.read_table(path,sep=",",header=None,names=["Test 1"."Test 2"."Accepted"])
data2.head()
Copy the code
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Test 1 Test 2 Accepted
0 0.051267 0.69956 1
1 0.092742 0.68494 1
2 0.213710 0.69225 1
3 0.375000 0.50219 1
4 0.513250 0.46564 1

Graphs show data points

Accepted=1; Accepted=0
positive=data2[data2["Accepted"].isin([1])]
negative=data2[data2["Accepted"].isin([0]]Accepsted,Rejected
fix,ax=plt.subplots(figsize=(8.6))
ax.scatter(positive["Test 1"],positive["Test 2"],s=20,color="blue",marker="o",label="Accepsted")
ax.scatter(negative["Test 1"],negative["Test 2"],s=20,color="red",marker="x",label="Rejected")
ax.legend(loc=1)
ax.set_xlabel("Test 1 Score")
ax.set_ylabel("Test 2 Score")
plt.show()
Copy the code



This data seems too complex to be properly separated by a straight line.

Linear techniques such as logistic regression can be considered to construct features derived from polynomials of original features.

For details, see the features in the PDF: map these features to all x1 and x2 polynomial terms, up to the sixth power.


m a p F e a t u r e ( x ) = [ 1 x 1 x 2 x 1 2 x 2 2 x 1 x 2 x 1 3 . . . x 1 x 2 5 x 2 6 ] mapFeature(x)= \begin{bmatrix} 1 \\ x_{1}\\ x_{2}\\ x_{1}^{2}\\ x_2^{2}\\ x_{1}x_{2} \\ x_1^3\\.\\.\\.\\x_1x_2^5\\ x_2^6\\ \end{bmatrix}

# create a polynomial feature

Set the highest power to 6
degree=6

Extract the vectors x1,x2
x1=data2["Test 1"]
x2=data2["Test 2"]

Insert vector 1 into data2
data2.insert(3."Ones".1)

for i in range(1,degree+1) :for j in range(0,i+1):
        data2['F'+str(i)+str(j)]=np.power(x1,i-j)*np.power(x2,j)

Select * from Test1,Test2
data2.drop("Test 1",axis=1,inplace=True) Axis =1; The inplace argument is False by default, meaning that the original data is unchanged, and True means that the original data is changed.
data2.drop("Test 2",axis=1,inplace=True)

data2.head()
Copy the code
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Accepted Ones F10 F11 F20 F21 F22 F30 F31 F32 . F53 F54 F55 F60 F61 F62 F63 F64 F65 F66
0 1 1 0.051267 0.69956 0.002628 0.035864 0.489384 0.000135 0.001839 0.025089 . 0.000900 0.012278 0.167542 1.815630 e-08 2.477505 e-07 0.000003 0.000046 0.000629 0.008589 0.117206
1 1 1 0.092742 0.68494 0.008601 0.063523 0.469143 0.000798 0.005891 0.043509 . 0.002764 0.020412 0.150752 6.362953 e-07 4.699318 e-06 0.000035 0.000256 0.001893 0.013981 0.103256
2 1 1 0.213710 0.69225 0.045672 0.147941 0.479210 0.009761 0.031616 0.102412 . 0.015151 0.049077 0.158970 9.526844 e-05 3.085938 e-04 0.001000 0.003238 0.010488 0.033973 0.110047
3 1 1 0.375000 0.50219 0.140625 0.188321 0.252195 0.052734 0.070620 0.094573 . 0.017810 0.023851 0.031940 2.780914 e-03 3.724126 e-03 0.004987 0.006679 0.008944 0.011978 0.016040
4 1 1 0.513250 0.46564 0.263426 0.238990 0.216821 0.135203 0.122661 0.111283 . 0.026596 0.024128 0.021890 1.827990 e-02 1.658422 e-02 0.015046 0.013650 0.012384 0.011235 0.010193

5 rows × 29 columns

Then modify the original cost function cost and gradient function.


r e g u l a r i z e d c o s t regularized cost
(Regularized cost function)


J ( Theta. ) = 1 m i = 1 m [ y ( i ) l o g ( h Theta. ( x ( i ) ) ) ( 1 y ( i ) ) l o g ( 1 h Theta. ( x ( i ) ) ) ] + Lambda. 2 m j = 1 n Theta. j 2 J (theta) = \ frac {1} {m} \ sum ^ {m} _ {I = 1} [- y ^ {(I)} the log (h_ {\ theta} (x ^ {(I)})) – (1 – y ^ {(I)}) log (1 – h_ {\ theta} (x ^ {(I)}))] + \ frac {\ lambda} {2 m }\sum_{j=1}^{n}\theta_j^2

def costReg(Theta,X,Y,LearningRate) :# the passed X,Y,Theta is an array
    # Convert to matrix
    Theta=np.matrix(Theta)
    X=np.matrix(X)
    Y=np.matrix(Y)
    first=np.multiply(-Y,np.log(sigmoid([email protected])))
    second=np.multiply(1-Y,np.log(1-sigmoid([email protected])))
    # regex:
    reg=(LearningRate/(2*len(X)))*np.sum(np.power(Theta[:,1:Theta.shape[1]],2))
    return np.sum(first-second)/len(X)+reg
Copy the code

If gradient descent is used to minimize the cost function, since θ0\theta_0θ0 is not regularized, the gradient descent algorithm will be divided into two cases:


  • Theta. 0 : = Theta. 0 Alpha. 1 m i = 1 m [ h Theta. ( x ( i ) ) y ( i ) ] x 0 ( i ) \theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^{m}[h_{\theta}(x^{(i)})-y^{(i)}]x_0^{(i)}

  • Theta. j : = Theta. j Alpha. 1 m i = 1 m [ h Theta. ( x ( i ) ) y ( i ) ] x j ( i ) + Lambda. m Theta. j = Theta. j ( 1 Alpha. Lambda. m ) Alpha. 1 m i = 1 m [ h Theta. ( x ( i ) ) y ( i ) ] x j ( i )    Alpha. And then this is the partial derivative ( g r a d i e n t R e g So let’s solve this part ) \theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^{m}[h_{\theta}(x^{(i)})-y^{(i)}]x_j^{(i)}+\frac{\lambda}{m}\theta_j=\thet a_j(1-\alpha\frac{\lambda}{m})-\alpha\frac{1}{m}\sum_{i=1}^{m}[h_{\theta}(x^{(i)})-y^{(i)}]x_j^{(i)}\ \ The formula after alpha is the partial derivative (gradientReg)
def gradientReg(Theta,X,Y,LearningRate) :# the passed X,Y,Theta is an array
    # Convert X,Y,Theta from array to matrix
    Theta=np.matrix(Theta)
    X=np.matrix(X)
    Y=np.matrix(Y)

    #grad records the gradient descent value of each element of θ vector
    Theta_cnt=Theta.shape[1]
    grad=np.zeros(Theta.shape[1])

    error=sigmoid(X*Theta.T)-Y
    for i in range(Theta_cnt):
        tmp=np.multiply(error,X[:,i])
        grad[i]=np.sum(tmp)/len(X)
    reg=(LearningRate/len(X))*Theta
    reg[0] =0 No regularization, no punishment for the 0th term
    return grad+reg

    # I wonder why the accuracy of the following is only 83.05% and the accuracy of the above is 84.75%
    # Calculate the error vector
    # error=sigmoid(X*Theta.T)-Y
    # for i in range(Theta_cnt):
    # tmp=np.multiply(error,X[:,i])
    # if i==0 :
    # grad[i]=np.sum(tmp)/len(X)
    # else:
    # grad[i]=np.sum(tmp)/len(X)+(LearningRate/len(X))*Theta[:,i]
    return grad
Copy the code

Initialize a variable:

cols=data2.shape[1]
Data2: Accepted x0 x1...
Y2=data2.iloc[:,0:1]
X2=data2.iloc[:,1:cols]

Table to array
X2=np.array(X2.values)
Y2=np.array(Y2.values)
Theta2=np.zeros(X2.shape[1])
Copy the code

Initial learning rate to a reasonable value

LearningRate=1
Copy the code

Try calling the new theta regularization function, which defaults to 0, to make sure the calculation works.

costReg(Theta2, X2, Y2, LearningRate)
Copy the code
0.6931471805599454
Copy the code
gradientReg(Theta2, X2, Y2, LearningRate)
Copy the code
Matrix ([8.47457627E-03, 1.87880932E-02, 7.77711864E-05, 5.03446395E-02, 1.15013308E-02, 3.76648474E-02, 1.83559872E-02, 7.32393391E-03, 8.19244468E-03, 2.34764889E-02, 3.93486234E-02, 2.23923907E-03, 1.28600503E-02, 3.09593720E-03, 3.93028171E-02, 1.99707467E-02, 4.32983232E-03, 3.38643902E-03, 5.8382207803, 4.47629067e-03, 3.10079849E-02, 3.10312442E-02, 1.09740238E-03, 6.31570797E-03, 4.08503006E-04, 7.26504316E-03, 1.37646175E-03, 3.87936363 e-02]])Copy the code

Use the same optimization function as in Part 1 to calculate the optimized result:

result2=opt.fmin_tnc(func=costReg, x0=Theta2,fprime=gradientReg,args=(X2,Y2,LearningRate))
result2
Copy the code
(Array ([1.60695456, 1.1560186, 1.96230284, -3.0506508, -1.65702971, -1.91905201, 0.57020964, -0.68153388, -0.71446988, 0.04581342, -2.05403849, -0.19543701, -1.06002879, -0.50146813, -1.49394535, 0.08870346, -0.37553871, -0.1621286, -0.47670397, -0.49928213, -0.25753424, -1.25322562, 0.00804809, -0.51945916, -0.03978315, -0.54273819, 0.21843762, 0.93050987]), 86, 4)Copy the code

Finally, the prediction function in Part 1 was used to check the accuracy of the scheme on the training data:

#result2[0] is the learned θ
Theta_min = result2[0]
predictions = predict(Theta_min, X2)
correct = [1 if a==b else 0 for (a, b) in zip(predictions,Y2)]
accuracy = float((sum(correct) / len(correct))*100)
print ("accuracy = {:.2f}%".format(accuracy))
Copy the code
Accuracy = 84.75%Copy the code

You can also use the advanced Python library Scikit-learn to solve this problem:

from sklearn import linear_modelCall the linear regression package of SkLearn
model = linear_model.LogisticRegression(penalty='l2', C=1.0)#(C: regularization coefficient. Float, by default 1.0, is the inverse of the regularization strength and must be a positive float, the smaller it is, the stronger the regularization.)
model.fit(X2, Y2.ravel())
Copy the code
LogisticRegression()
Copy the code
model.score(X2, Y2)
Copy the code
0.8305084745762712
Copy the code

The accuracy was not ideal before, maybe the parameters need to be adjusted. The reason is that when creating polynomial features, the highest order will affect the results, and there is always X2 due to my code problems, so X1 cannot exist alone, so the accuracy is very low.

Draw a decision curve

def hfun2(theta,x1,x2,degree) :
    temp=theta[0] [0]
    place=0
    for i in range(1,degree+1) :for j in range(0,i+1):
            temp+=np.power(x1,i-j)*np.power(x2,j)*theta[0][place+1]
            place+=1
    return temp
Copy the code
def find_decision_boundary(theta,degree) :
    t1 = np.linspace(-1.1.5.1000)
    t2 = np.linspace(-1.1.5.1000)
    cord=[(x,y)for x in t1 for y in t2]
    # print(cord)
    x_cord,y_cord=zip(*cord)
    h_val=pd.DataFrame({'x1':x_cord,'x2':y_cord})
    h_val['hval']=hfun2(theta, h_val['x1'],  h_val['x2'], degree)
    decision=h_val[np.abs(h_val['hval']) <2*10* * -3]
    return decision.x1,decision.x2
Copy the code
fix,ax=plt.subplots(figsize=(8.6))
x,y=find_decision_boundary(result2,6)
ax.scatter(x, y,c='y',s=10,label='Prediction')
ax.scatter(positive["Test 1"],positive["Test 2"],s=20,color="blue",marker="o",label="Accepsted")
ax.scatter(negative["Test 1"],negative["Test 2"],s=20,color="red",marker="x",label="Rejected")
ax.set_xlabel("Test 1 Score")
ax.set_ylabel("Test 2 Score")
ax.legend(loc=1)

plt.show()
Copy the code



change
Lambda. \lambda
, observe the decision curve

λ=0\lambda=0 overfitting λ=0

LearningRate=0
result3=opt.fmin_tnc(func=costReg, x0=Theta2,fprime=gradientReg,args=(X2,Y2,LearningRate))
result3
Copy the code
(array([9.11192364e+00, 1.18840465e+01, 6.30828094e+00, -8.397064e +01, -4.48639810e+01, -3.81221435e+01, -9.42525756e+01, -8.14257602e+01, -4.22413355e+01, -3.52968361e+00, 2.95734207e+02, 2.51308760e+02, 3.64155830e+02, + +02 + 1.61036970e+02 + 1.70100234e +01 + 1.71716716e+02 + 2.72109672e+02 + 3.12447535e+02 + 1.41764016e+02 + 3.22495698e+01 + -1.75836912E-01, -3.58663811e+02, -4.82161916e+02, -7.49974915e+02, -5.03764307e+02, -4.80978435e+02, -1.85566236e+02, 3.83936243 e+01]), 280, 3)Copy the code
fix,ax=plt.subplots(figsize=(8.6))
x,y=find_decision_boundary(result3,6)
ax.scatter(x, y,c='y',s=10,label='Prediction')
ax.scatter(positive["Test 1"],positive["Test 2"],s=20,color="blue",marker="o",label="Accepsted")
ax.scatter(negative["Test 1"],negative["Test 2"],s=20,color="red",marker="x",label="Rejected")
ax.set_xlabel("Test 1 Score")
ax.set_ylabel("Test 2 Score")
ax.legend(loc=1)

plt.show()
Copy the code



λ=100\lambda=100 less fitting λ=100

LearningRate=100
result4=opt.fmin_tnc(func=costReg, x0=Theta2,fprime=gradientReg,args=(X2,Y2,LearningRate))
result4
Copy the code
Array ([0.05021733, 0.03612558, 0.06132196, -0.09533284, -0.05178218, -0.05997038, 0.01781905, -0.02129793, -0.02232718, 0.00143167, -0.0641887, -0.00610741, -0.0331259, -0.01567088, -0.04668579, 0.00277198, -0.01173558, -0.00506652, -0.014897, -0.01560257, -0.00804795, -0.0391633, 0.0002515, -0.0162331, -0.00124322, -0.01696057, 0.00682618, 0.02907843]), 93, 4)Copy the code
fix,ax=plt.subplots(figsize=(8.6))
x,y=find_decision_boundary(result4,6)
ax.scatter(x, y,c='y',s=10,label='Prediction')
ax.scatter(positive["Test 1"],positive["Test 2"],s=20,color="blue",marker="o",label="Accepsted")
ax.scatter(negative["Test 1"],negative["Test 2"],s=20,color="red",marker="x",label="Rejected")
ax.set_xlabel("Test 1 Score")
ax.set_ylabel("Test 2 Score")
ax.legend(loc=1)

plt.show()
Copy the code