Machine learning assignment 2 - Logistic Regression

1. Introduction

This task is mainly to classify data sets in ex2datA1. TXT and ex2data2. TXT by logistic regression. The ex2data1.txt contains the students’ historical scores on the two tests and whether they were admitted, while the ex2data2.txt contains the two test results of the chip products and whether they were good or bad. Use algorithms to classify students and chips, judge what kind of students can be admitted, and what kind of chips are good.
Among them, sigmoid function, cost function, gradient descend function, Regularized method need to be completed, featuring the mapping and drawing the decision boundary
Compute the package import

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-dark-palette') You can use plt.style.available to see how many styles are available
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report # This package is the evaluation report
Copy the code

2. Prepare data

data = pd.read_csv('ex2data1.txt', names=['exam1'.'exam2'.'admitted'])
data.head()# Look at the first five lines
Copy the code

	exam1	exam2	admitted
0	34.623660	78.024693	0
1	30.286711	43.894998	0
2	35.847409	72.902198	0
3	60.182599	86.308552	1
4	79.032736	75.344376	1

Let’s look at the data

data.describe()
Copy the code

	exam1	exam2	admitted
count	100.000000	100.000000	100.000000
mean	65.644274	66.221998	0.600000
std	19.458222	18.582783	0.492366
min	30.058822	30.603263	0.000000
25%	50.919511	48.179205	0.000000
50%	67.032988	67.682381	1.000000
75%	80.212529	79.360605	1.000000
max	99.827858	98.869436	1.000000

sns.set(context="notebook", style="darkgrid")

sns.lmplot('exam1'.'exam2', hue='admitted', data=data, The hue parameter is used for classification
           height=6, 
           fit_reg=False,
           scatter_kws={"s": 30} # Set the radius of the drawn point
          )
plt.show()# Take a look at the data
Copy the code

Read data function

def get_X(df):# Read feature
Use the concat function to add x_0 columns to the X matrix, the independent variable matrix
    ones = pd.DataFrame({'ones': np.ones(len(df))})#ones is the dataframe of row m, column 1
    data = pd.concat([ones, df], axis=1)  # merge data by column
    return data.iloc[:, :- 1].values  # This operation returns the matrix


def get_y(df):# fetch tag
The last column is the target value, which is the dependent variable
    return np.array(df.iloc[:, - 1]) # df.iloc[:, -1] is the last column of df


def normalize_feature(df):
# Keep data in one dimension
    return df.apply(lambda column: (column - column.mean()) / column.std())# Feature scaling
Copy the code

X = get_X(data)
print(X.shape)

y = get_y(data)
print(y.shape)
Copy the code

(100, 3) (100),Copy the code

3. function

Classification problems, especially binary classification problems, are black and white. So what we need to know is whether it’s 0 or 1, what’s the probability of this 0 or 1. And often the classification problem is nonlinear, linear regression things in most scenarios are invalid. Therefore, the sigmoid function is the only hypothetical function that can satisfy the above characteristics. Give it a large number and it will give you a number close to one; Give it a very small number and it will give you a number close to zero, collapsing any of your complex nonlinear combinations into a reasonable number.

It is commonly usedThe formula is:? g\left( z \right)=\frac{1}{1+{{e}^{-z}}}? Then we combine the hypothesis functions of linear regression to obtain the hypothesis functions of logistic regression model:? {{h}_{\theta }}\left( x \right)=\frac{1}{1+{{e}^{-{{\theta }^{T}}X}}}=\frac{1}{1+{{e}^{-(\theta_0x_0+\theta_1x_1+\theta_2x_2+… +\theta_nx_n)}}}?

def sigmoid(z):
    return 1 / (1 + np.exp(-z))
Copy the code

fig, ax = plt.subplots(figsize=(8.6))
ax.plot(np.arange(- 10.10, step=0.01),
        sigmoid(np.arange(- 10.10, step=0.01)))
ax.set_ylim((0.1.1.1))
ax.set_xlabel('z', fontsize=18)
ax.set_ylabel('g(z)', fontsize=18)
ax.set_title('Sigmoid Function', fontsize=18)
plt.show()
Copy the code

Cost Function = Cost Function

Logistic regression cost function that is, the objective function does not continue to use the target of minimizing the sum of squares like linear regression, because the hypothesis function change into numerical always quantico between 0 and 1, the sum of squares function will fluctuate, produce co., a local minimum, convergence cannot lead to the final or is unable to find a global optimal solution.

Therefore, there is an urgent need for improvement. At present, the function we need needs to meet the conditions based on the purpose of the task. Since it is a classification problem, when the actual result is 1, we need to followAs delta gets bigger in the direction of one,CostIt keeps decreasing; When the actual result is 0, we need to go withAs delta gets bigger in the direction of one,CostThe value keeps increasing. Like the following two pictures:

Such a function can be defined as

To simplify this:

You can separateAs well asThe result is consistent with the formula discussed in classification.

From this we can get the new cost function:

theta = np.zeros(3) Theta is a column vector of 3x1, corresponding to x0,x1, and x2

theta
Copy the code

array([0., 0., 0.])
Copy the code

def cost(theta, X, y):
That's the cosine of t that I mentioned before
    return np.mean(-y * np.log(sigmoid(X @ theta)) - (1 - y) * np.log(1 - sigmoid(X @ theta)))

# x@ theta is equivalent to x.dot (theta)
Copy the code

cost(theta, X, y)
Copy the code

0.6931471805599458
Copy the code

Their Descent is Gradient Descent.

The original Batch gradient descent algorithm is

Apply to what we just got:

The following is the derivation process:

Consideration:

Is:

So:

To vectorization:

def gradient(theta, X, y):
Only one batch gradient descent algorithm is required
    return (1 / len(X)) * X.T @ (sigmoid(X @ theta) - y)
Copy the code

gradient(theta, X, y)
Copy the code

Rs-0.100000 exam1-12.009217 exam2-11.262842 DType: FLOAT64Copy the code

6. Fitting parameters

Here I use scipy. Optimize. Minimize to find parameters

import scipy.optimize as opt
Copy the code

res = opt.minimize(fun=cost, x0=theta, args=(X, y), method='Newton-CG', jac=gradient)
Copy the code

print(res)
Copy the code

Fun: 0.2034977030343232 JAC: ONes-3.908000E-09 exam1-6.926737E-05 exam2-4.659433E-05 DType: Float64 message 'Optimization terminated successfully.' nfev: 75 nhev: 0 nit: 31 njev: 287 status: 0 success: True x: Rs-25.158217 exam1 0.206207 exam2 0.201446 DType: Float64Copy the code

7. Prediction and verification with training sets

def predict(x, theta):
    prob = sigmoid(x @ theta)
    return (prob >= 0.5).astype(int)
Copy the code

final_theta = res.x
y_pred = predict(X, final_theta)

print(classification_report(y, y_pred)) # The difference between output raw data and data using a predictive model
Copy the code

Precision Recall F1-Score Support 0 0.87 0.85 0.86 40 1 0.90 0.92 0.91 60 accuracy 0.89 100 macro AVG 0.89 0.88 0.88 100 Weighted AVG 0.89 0.89 0.89 100Copy the code

Precision =Of all predicted truths, the actual percentage is true(The percentage of all false predictions that are actually false)
Recall ratio = (Recall)The percentage of successful predictions that are true out of all actual predictions that are true(The percentage of successful predictions that were false when the experience was actually false)

TP: True Positive

TN: True Negative Forecast false, actual false

FP: False Positive Indicates that the prediction is true and the actual prediction is False

FN: False Negative predicted False, actual true

F1-score: harmonic average of accuracy and recall rate

When F1 is 1, it is the best value, perfect precision and recall, and the worst value is 0.

8. Look for decision boundaries

Stats.stackexchange.com/questions/9…From the previous introduction, we can know whenTime is the boundary, i.e

print(res.x) # This is the θ parameter for the final fit
Copy the code

Rs-25.158217 exam1 0.206207 exam2 0.201446 DType: Float64Copy the code

coef = -(res.x / res.x[2])  # Eliminate the third variable and construct a linear function
print(coef)

x = np.arange(130, step=0.1)
y = coef[0] + coef[1]*x
Copy the code

Ones 124.887907 exam1-1.023631 Exam2-1.000000 DType: FLOAT64Copy the code

sns.set(context="notebook", style="ticks", font_scale=1.5)

sns.lmplot('exam1'.'exam2', hue='admitted', data=data, 
           size=6, 
           fit_reg=False, 
           scatter_kws={"s": 25}
          )

plt.plot(x, y, 'grey')
plt.xlim(0.130)
plt.ylim(0.130)
plt.title('Decision Boundary')
plt.show()
Copy the code

This is the end of the first task set

9. Regularized logistic regression

The regularized logistic regression is applied to the second dataset

df = pd.read_csv('ex2data2.txt', names=['test1'.'test2'.'accepted'])
df.head()
Copy the code

	test1	test2	accepted
0	0.051267	0.69956	1
1	0.092742	0.68494	1
2	0.213710	0.69225	1
3	0.375000	0.50219	1
4	0.513250	0.46564	1

sns.set(context="notebook", style="ticks", font_scale=1.5)

sns.lmplot('test1'.'test2', hue='accepted', data=df, 
           size=6, 
           fit_reg=False, 
           scatter_kws={"s": 50}
          )

plt.title('Regularized Logistic Regression - Data Preview')
plt.show()
Copy the code

10. Feature Mapping

In order to better fit the data, it is necessary to create more characteristic parameters, but without inventing out of thin air, it is necessary to generate new features from existing data. And the way you do that is you do polynomials.

def feature_mapping(x, y, power, as_ndarray=False):
# return parameters as either ndarray or dataframe
    data = {"f{}{}".format(i - p, p): np.power(x, i - p) * np.power(y, p)
                for i in np.arange(power + 1)
                for p in np.arange(i + 1)}if as_ndarray:
        return pd.DataFrame(data).values
    else:
        return pd.DataFrame(data)

Copy the code

x1 = np.array(df.test1)
x2 = np.array(df.test2)
Copy the code

data = feature_mapping(x1, x2, power=6)
print(data.shape)
data.head()
Copy the code

(118, 28)
Copy the code

	f00	f10	f01	f20	f11	f02	f30	f21	f12	f03	.	f23	f14	f05	f60	f51	f42	f33	f24	f15	f06
0	1.0	0.051267	0.69956	0.002628	0.035864	0.489384	0.000135	0.001839	0.025089	0.342354	.	0.000900	0.012278	0.167542	1.815630 e-08	2.477505 e-07	0.000003	0.000046	0.000629	0.008589	0.117206
1	1.0	0.092742	0.68494	0.008601	0.063523	0.469143	0.000798	0.005891	0.043509	0.321335	.	0.002764	0.020412	0.150752	6.362953 e-07	4.699318 e-06	0.000035	0.000256	0.001893	0.013981	0.103256
2	1.0	0.213710	0.69225	0.045672	0.147941	0.479210	0.009761	0.031616	0.102412	0.331733	.	0.015151	0.049077	0.158970	9.526844 e-05	3.085938 e-04	0.001000	0.003238	0.010488	0.033973	0.110047
3	1.0	0.375000	0.50219	0.140625	0.188321	0.252195	0.052734	0.070620	0.094573	0.126650	.	0.017810	0.023851	0.031940	2.780914 e-03	3.724126 e-03	0.004987	0.006679	0.008944	0.011978	0.016040
4	1.0	0.513250	0.46564	0.263426	0.238990	0.216821	0.135203	0.122661	0.111283	0.100960	.	0.026596	0.024128	0.021890	1.827990 e-02	1.658422 e-02	0.015046	0.013650	0.012384	0.011235	0.010193

5 rows × 28 columns

data.describe()
Copy the code

	f00	f10	f01	f20	f11	f02	f30	f21	f12	f03	.	f23	f14	f05	f60	f51	f42	f33	f24	f15	f06
count	118.0	118.000000	118.000000	118.000000	118.000000	118.000000	1.180000 e+02	118.000000	118.000000	118.000000	.	118.000000	1.180000 e+02	118.000000	1.180000 e+02	118.000000	1.180000 e+02	118.000000	1.180000 e+02	118.000000	1.180000 e+02
mean	1.0	0.054779	0.183102	0.247575	0.025472	0.301370	5.983333 e-02	0.030682	0.015483	0.142350	.	0.018278	4.089084 e-03	0.115710	7.837118 e-02	0.000703	1.893340 e-02	0.001705	2.259170 e-02	0.006302	1.257256 e-01
std	0.0	0.496654	0.519743	0.248532	0.224075	0.284536	2.746459 e-01	0.134706	0.150143	0.326134	.	0.058513	9.993907 e-02	0.299092	1.938621 e-01	0.058271	3.430092 e-02	0.037443	4.346935 e-02	0.090621	2.964416 e-01
min	1.0	0.830070	0.769740	0.000040	0.484096	0.000026	5.719317 e-01	0.358121	0.483743	0.456071	.	0.142660	4.830370 e-01	0.270222	6.472253 e-14	0.203971	2.577297 e-10	0.113448	2.418097 e-10	0.482684	1.795116 e-14
25%	1.0	0.372120	0.254385	0.043243	0.178209	0.061086	5.155632 e-02	0.023672	0.042980	0.016492	.	0.001400	7.449462 e-03	0.001072	8.086369 e-05	0.006381	1.258285 e-04	0.005749	3.528590 e-04	0.016662	2.298277 e-04
50%	1.0	0.006336	0.213455	0.165397	0.016521	0.252195	2.544062 e-07	0.006603	0.000039	0.009734	.	0.001026	8.972096 e-09	0.000444	4.527344 e-03	0.000004	3.387050 e-03	0.000005	3.921378 e-03	0.000020	1.604015 e-02
75%	1.0	0.478970	0.646562	0.389925	0.100795	0.464189	1.099616 e-01	0.086392	0.079510	0.270310	.	0.021148	2.751341 e-02	0.113020	5.932959 e-02	0.002104	2.090875 e-02	0.001024	2.103622 e-02	0.001289	1.001215 e-01
max	1.0	1.070900	1.108900	1.146827	0.568307	1.229659	1.228137 e+00	0.449251	0.505577	1.363569	.	0.287323	4.012965 e-01	1.676725	1.508320 e+00	0.250577	2.018260 e-01	0.183548	2.556084 e-01	0.436209	1.859321 e+00

8 rows × 28 columns

11. Regularized Cost function

Sometimes when linear functions cannot be used to properly classify data, nonlinear functions will be considered. However, there are also cases where the curve is too curved, leading to overfitting, or the curve is not curved enough, which is also known as underfitting. In order to prevent the over-fitting, the key is to reduce the number of parameters of the high-power features, so that the terms that affect the results of the hypothesis function occupy a smaller weight, which is to give it more punishment.

For example, for hypothetical functions like this:

We can use regularization like this:

We can have a look atandIncrease penalties to reduce their impact. But for general problems we don’t know who to punish, so it’s up to the algorithm to decide, we just sum these terms and add one moreTo constrain, of courseThe selection of is also based on experience. This is the general method of regularization represented below:

theta = np.zeros(data.shape[1])
X = feature_mapping(x1, x2, power=6, as_ndarray=True)
print(X.shape)

y = get_y(df)
print(y.shape)
Copy the code

(118, 28) (118)Copy the code

def regularized_cost(theta, X, y, l=1):
There is no need to regularize x0
    theta_j1_to_n = theta[1:]
    regularized_term = (l / (2 * len(X))) * np.power(theta_j1_to_n, 2).sum()

    return cost(theta, X, y) + regularized_term
Regularize the cost function
Copy the code

regularized_cost(theta, X, y, l=1)
Copy the code

0.6931471805599454
Copy the code

Because we set θ to 0, the regularization cost function has the same value as the cost function

12. Regularized Gradient

If we’re going to use gradient descent this cost function is minimized, because we’re notTherefore, the gradient descent algorithm will be divided into two cases:

For the above algorithm j=1,2… , the updated formula at n can be adjusted to obtain:

It can be seen that the change added to the regular term is that, during each iteration, will beIt’s a little bit smaller

def regularized_gradient(theta, X, y, l=1):
    theta_j1_to_n = theta[1:]
    regularized_theta = (l / len(X)) * theta_j1_to_n

    # by doing this, no offset is on theta_0
    regularized_term = np.concatenate([np.array([0]), regularized_theta])

    return gradient(theta, X, y) + regularized_term
Copy the code

regularized_gradient(theta, X, y)
Copy the code

Array ([8.47457627E-03, 1.87880932E-02, 7.77711864E-05, 5.03446395E-02, 1.15013308E-02, 3.76648474E-02, 1.83559872E-02, 7.32393391E-03, 8.19244468E-03, 2.34764889E-02, 3.93486234E-02, 2.23923907E-03, 1.28600503E-02, 3.09593720E-03, 3.93028171E-02, 1.99707467E-02, 4.32983232E-03, 3.38643902E-03, 5.83822078e-03, 4.47629067E-03, 3.10079849E-02, 3.10312442E-02, 1.09740238E-03, 6.31570797E-03, 4.08503006E-04, 7.26504316E-03, 1.37646175E-03, 3.87936363E-02])Copy the code

13. Regularized fitting parameters

import scipy.optimize as opt
Copy the code

print('init cost = {}'.format(regularized_cost(theta, X, y)))

res = opt.minimize(fun=regularized_cost, x0=theta, args=(X, y), method='Newton-CG', jac=regularized_gradient)
res
Copy the code

Init cost = 0.6931471805599454 Fun: 0.5290027297127226 Array ([-5.59000348E-08, -1.28838122E-08, -6.36410934E-08, 1.48052446E-08, -2.17946454E-09, -3.76148722E-08, 9.78876709E-09, -1.82798254E-08, 1.13128886E-08, -1.07496536E-08, 9.72446799E-09, -3.07080137E-09, -8.9994446709, 3.95483551E-09, -2.61273742E-08, 4.27780929E-10, -1.11055205E-08, -6.79817860E-10, -5.00207423e-09, 2.66918207E-09, -1.42573657E-08, 2.66682830E-09, -3.70874575E-09, -1.41882519E-10, -1.24101649E-09, -1.53332708E-09, 3.89033012E-10, -2.18628962e-08]) message: 'Optimization terminated successfully.' nfev: 7 nhev: 0 nit: 6 njev: 68 status: 0 success: True x: Array ([1.27273933, 0.62527083, 1.18108774, -2.01995945, -0.91742379, -1.43166442, 0.12400731, -0.36553516, -0.35723847, -0.17512854, -1.45815594, -0.05098912, -0.61555563, -0.27470594, -1.19281681, -0.24218847, -0.20600683, -0.04473089, -0.27778458, -0.29537795, -0.45635707, -1.04320269, 0.02777149, -0.29243126, 0.01556672, -0.32737949, 0.14388703, 0.92465318])Copy the code

14. Regularized prediction

final_theta = res.x
y_pred = predict(X, final_theta)

print(classification_report(y, y_pred))
Copy the code

Precision recall F1-Score Support 0 0.90 0.75 0.82 60 1 0.78 0.91 0.84 58 Accuracy 0.83 118 Macro AVG 0.84 0.83 0.83 118 Weighted AVG 0.84 0.83 0.83 118Copy the code

15. Use different ones(This is a constant)

Draw decision boundaries

We find all satisfaction 的

def feature_mapped_logistic_regression(power, l):
# power: int
# Use x1, x2 to generate power degree polynomials
# l: int
Lambda constant regularized
# "" "
    df = pd.read_csv('ex2data2.txt', names=['test1'.'test2'.'accepted'])
    x1 = np.array(df.test1)
    x2 = np.array(df.test2)
    y = get_y(df)

    X = feature_mapping(x1, x2, power, as_ndarray=True)
    theta = np.zeros(X.shape[1])

    res = opt.minimize(fun=regularized_cost,
                       x0=theta,
                       args=(X, y, l),
                       method='TNC',
                       jac=regularized_gradient)
    final_theta = res.x

    return final_theta
Copy the code

def find_decision_boundary(density, power, theta, threshhold):
    t1 = np.linspace(- 1.1.5, density)
    t2 = np.linspace(- 1.1.5, density)

    cordinates = [(x, y) for x in t1 for y in t2]
    x_cord, y_cord = zip(*cordinates) To extract the # zip (*)
    # e.g. For [1, 2, 3], [4 and 6] after the zip into ([1, 4], [2, 5], [3]) after the zip (*) into ([1, 2, 3], [4 and 6])
    mapped_cord = feature_mapping(x_cord, y_cord, power)  This is a dataframe

    inner_product = mapped_cord.values @ theta

    decision = mapped_cord[np.abs(inner_product) < threshhold] # cannot be exactly equal to 0, it should be less than threshold

    return decision.f10, decision.f01
# Find the decision boundary function
Copy the code

def draw_boundary(power, l):
# "" "
# power: Mapped feature
# l: lambda constant
# "" "
    density = 1000
    threshhold = 2 * 10**- 3

    final_theta = feature_mapped_logistic_regression(power, l)
    x, y = find_decision_boundary(density, power, final_theta, threshhold)

    df = pd.read_csv('ex2data2.txt', names=['test1'.'test2'.'accepted'])
    sns.lmplot('test1'.'test2', hue='accepted', data=df, height=6, fit_reg=False, scatter_kws={"s": 100})

    plt.scatter(x, y, c='R', s=10)
    plt.title('Decision boundary')
    plt.show()
Copy the code

draw_boundary(power=6, l=1) #lambda=1
Copy the code

draw_boundary(power=6, l=0)  Lambda =0, no regularization, overfitting
Copy the code

draw_boundary(power=6, l=100)  # underfitting, lambda=100
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Machine learning assignment 2 — Logistic Regression

1. Introduction

2. Prepare data

3. function

Cost Function = Cost Function

Their Descent is Gradient Descent.

6. Fitting parameters

7. Prediction and verification with training sets

8. Look for decision boundaries

9. Regularized logistic regression

10. Feature Mapping

11. Regularized Cost function

12. Regularized Gradient

13. Regularized fitting parameters

14. Regularized prediction

15. Use different ones(This is a constant)

Machine learning assignment 2 — Logistic Regression

1. Introduction

2. Prepare data

3. function

Cost Function = Cost Function

Their Descent is Gradient Descent.

6. Fitting parameters

7. Prediction and verification with training sets

8. Look for decision boundaries

9. Regularized logistic regression

10. Feature Mapping

11. Regularized Cost function

12. Regularized Gradient

13. Regularized fitting parameters

14. Regularized prediction

15. Use different ones(This is a constant)

Related Posts

Coding contest turns into dating convention to see what’s different about this hackathon

Graphic python | install set with the environment

[Flaw detection] Trace detection based on MATLAB GUI gray co-occurrence matrix