1.逻辑回归

数据介绍:在本部分练习中,您将构建一个逻辑回归模型来预测学生是否被大学录取。 假设您是一个大学部门的管理员,您希望根据两次考试的结果来确定每个申请人的录取机会。您可以使用以前申请者的历史数据作为逻辑回归的训练集。

1.1 画出原始数据图

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# 展示原始数据
data = pd.read_csv('ex2data1.txt',names=['score1','score2','admitted'])
# 布尔值索引data['Admitted'].isin([1]):False,True,False...
positive = data[data['admitted'].isin([1])]
negative = data[data['admitted'].isin([0])]
# 子画布
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['score1'], positive['score2'], s=50, c='g', marker='o', label='Admitted')
ax.scatter(negative['score1'], negative['score2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()  # 标签
ax.set_xlabel('Score1')
ax.set_ylabel('Score2')
plt.show()
复制代码

其中data['admitted'].isin([1])能够将admitted列中满足值为1的位置变为True,不满足的位置变为False,即生成这样的Series:

1.2 数据处理

插入全1列的目的是匹配偏置

# 插入全1列x0
data.insert(0, 'Ones', 1)
# set X (training data) and y (target variable)
cols = data.shape[1]
X = data.iloc[:,0:cols-1]  # X是所有行,去掉最后一列
y = data.iloc[:,cols-1:cols]  # X是所有行,最后一列
theta = np.array([0,0,0])
复制代码

1.3 sigmoid函数

def sigmoid(x):
    return 1 / (1 + np.exp(-x))
复制代码
# 画出sigmoid
nums = np.arange(-10,10,step=1)
plt.figure(figsize=(20,8),dpi=100)
plt.plot(nums,sigmoid(nums))
plt.show()
复制代码

1.4 代价函数和梯度下降

代价函数: 代码实现:

def cost(theta, X, y):
    # 将参数转换为矩阵类型
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    # X(100,3),y(100,1),theta(1,3)
    first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
    return np.sum(first - second) / (len(X))
复制代码

梯度下降:

这里梯度下降函数只实现了部分且为向量化实现。

# 构造梯度下降函数(只做了J(θ)偏导数部分)
def gradientDescent(theta, X, y):
    X = np.matrix(X)
    y = np.matrix(y)
    theta = np.matrix(theta)
    return (1 / len(X)) * X.T @ (sigmoid(X @ theta.T) - y)
复制代码

1.5 进行梯度下降并计算准确率

1.5.1 梯度下降计算

这里使用了scipy库中的方法,关于该方法参数的意义可以使用help查看。

# 使用 scipy.optimize.minimize 去寻找参数
import scipy.optimize as opt
res = opt.minimize(fun=cost, x0=theta, args=(X, y), method='TNC', jac=gradientDescent)
res
复制代码

输出如下:

1.5.2 准确度计算

  • 获取训练集的预测结果
# 获取训练集预测结果
def predict(theta, X):
    theta = np.mat(theta)
    y_predict = sigmoid(X @ theta.T)
    return [1 if x >= 0.5 else 0 for x in y_predict]
复制代码
  • 判断准确度
# 判断准确度
y_pre = np.mat(predict(res.x, X))  # 预测值矩阵
y_true = np.mat(y).ravel()  # 真实值矩阵
# 矩阵进行比较返回各元素比对布尔值矩阵,列表进行比较返回整个列表的比对布尔值
accuracy = np.mean(y_pre == y_true)
print('accuracy = {}%'.format(accuracy * 100))
复制代码

先将y_predict和y_true转换为一维矩阵,一维矩阵的布尔运算返回的是矩阵中各位置元素的布尔值,形如:[True,False,False…]。

而np.mean()求得矩阵中True元素占总元素数量的比例,即预测准确度。

1.6 绘制决策边界

,即所有概率为类别1的样本的集合,亦即决策边界。

显然本次逻辑回归的决策边界是线性的,即直线

代码实现:

# 绘制决策边界
x1 = np.linspace(data.score1.min(), data.score1.max(), 100)  # 返回在指定范围内的100个等间隔数
x2 = (- res.x[0] - res.x[1] * x1) / res.x[2]
# 布尔值索引data['Admitted'].isin([1]):False,True,False...
positive = data[data['admitted'].isin([1])]
negative = data[data['admitted'].isin([0])]
# 子画布
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['score1'], positive['score2'], s=50, c='g', marker='o', label='Admitted')
ax.scatter(negative['score1'], negative['score2'], s=50, c='r', marker='x', label='Not Admitted')
ax.plot(x1, x2, 'r', label='Prediction')
ax.legend()  # 标签
ax.set_xlabel('Score1')
ax.set_ylabel('Score2')
plt.show()
Copy the code

2. Regularized logistic regression

Data Presentation: In this part of the exercise, you will implement a canonical logistic regression to predict whether a microchip from a manufacturing facility passes quality assurance (QA). During the QA process, each microchip goes through various tests to ensure it works properly. Suppose you are a product manager at a factory, and you get some microchip test results in two different tests. From these two tests, you want to determine whether microchips should be accepted or rejected. To help you decide, you have a data set of test results for past microchips from which you can get data.

2.1 Data Display

data2 = pd.read_csv('ex2data2.txt',names=['test1'.'test2'.'pass'])
Data ['Admitted']. Isin ([1]) : False,True,False...
positive = data2[data2['pass'].isin([1])]
negative = data2[data2['pass'].isin([0])]
Son # canvasAx. plots(figsize=(12,8)) ax. plots(positive['test1'], positive['test2'], s=50, c='g', marker='o', label='Pass')
ax.scatter(negative['test1'], negative['test2'], s=50, c='r', marker='x', label='Not Pass')
ax.legend()  # label
ax.set_xlabel('test1')
ax.set_ylabel('test2')
plt.show()
Copy the code

2.2 Feature mapping and data segmentation

  • Polynomial expansion:

Pseudo code:

for i in0.. Power + 1:for p in0.. i+1: output x^(i-p) * y^pCopy the code

def poly_expansion(x1,x2,power,dataframe):
    ""Polynomial expansion""
    for i in range(power + 1):
        for j in range(i + 1):
            dataframe['F' + str(i-j) + str(j)] = np.power(x1, i-j) * np.power(x2, j)
    dataframe.drop('test1', axis=1, inplace=True)
    dataframe.drop('test2', axis=1, inplace=True)
Copy the code

After mapping, data2.shape is (118,29), that is, it expands from two features to 28 features.

  • Data segmentation
# Split and extract data
# set X (training data) and y (target variable)
cols = data2.shape[1]
X = data2.iloc[:,1:cols]  # X is all the rows, get rid of the first column
y = data2.iloc[:,0:1]  # y is all of the rows, first column, and pay attention to how you do it
theta = np.zeros(X.shape[1])
Copy the code

2.3 Regularize the cost function


The first half is consistent with ordinary logistic regression, and the second half isSuper parameters andThe product of the summation of each of theIs as small as possible to avoid overfitting.

Code implementation:

def cost(theta, X, y, lambd):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    # first half of the function
    first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
    front = np.sum(first - second) / (len(X))
    The second half of the function
    last = lambd / (2 * len(X)) * np.sum(np.power(theta, 2))
    return front + last
Copy the code

2.4 Gradient descent

  • The principle of

Because we’re not rightTherefore, the gradient descent algorithm will be divided into two cases:

For the above algorithm j=1,2… , the updated formula at n can be adjusted to obtain:

In regularized gradient descent, we have the equation minusExternal parameters are optimized becauseHas no effect on the weight of eigenvalues ().

Still only forThis section does the code implementation.

Code implementation:

# Construct gradient descent function (only partial derivative of J(θ) is done)
def gradient(theta, X, y, lambd):
    X = np.matrix(X)
    y = np.matrix(y)
    theta = np.matrix(theta)
    first = (1 / len(X)) * (X.T @ (sigmoid(X @ theta.T) - y))
    # theta0 does not need optimization
    second = (lambd / len(X)) * theta[:,1:].T
    second = np.concatenate((np.array([[0]]),second),axis=0)
    return first + second
Copy the code

second = np.concatenate((np.array([[0]]),second),axis=0)This line of code is not optimizedAlign the two parts of the matrix (plus a 0 row).

  • Gradient descent optimization is performed

The same method as in the previous chapter will not be repeated.

Find parameters with scipy. Optimize. Minimize
import scipy.optimize as opt
res = opt.minimize(fun=cost, x0=theta, args=(X, y, 100), method='TNC', jac=gradient)
res
Copy the code
  • Accuracy calculation
# Judgment accuracy
y_pre = np.mat(predict(res.x, X))  # Predictive value matrix
y_true = np.mat(y).ravel()  # True value matrix
Matrix comparison returns Boolean values for each element comparison matrix, list comparison returns Boolean values for the entire list comparison
accuracy = np.mean(y_pre == y_true)
print('accuracy = {}%'.format(accuracy * 100))
Copy the code

2.5 Decision boundary

Unlike in the previous chapter, we will now proceed to the drawing of irregular decision boundaries. Obviously, the method of drawing the equation of a line used above is no longer applicable.

Quote: programmer says

Code implementation:

def plot_decision_boundary(axis,axe):
    The # meshgrid function draws a grid on a plane with points on two axes, returning a coordinate matrix
    X0, X1 = np.meshgrid(
        # Two random groups of numbers, starting value and density determined by the starting value of the coordinate axis
        np.linspace(axis[0], axis[1], int((axis[1] - axis[0]) * 100)).reshape(-1, 1),
        np.linspace(axis[2], axis[3], int((axis[3] - axis[2]) * 100)).reshape(-1, 1),
    )
    The # ravel() method reduces a higher-dimensional array to a one-dimensional array, and c_[] concatenates the two arrays as columns, forming a matrix
    X_grid_matrix = np.c_[X0.ravel(), X1.ravel()]
    Convert the eigenmatrix to a DataFrame for polynomial expansion
    X_grid_matrix = pd.DataFrame(X_grid_matrix,columns=["test1"."test2"X = X_grid_matrix. Test2 x = X_grid_matrix. Test2 poly_expansion(x1,x2,6,X_grid_matrix)# Predict the classification of these points on the plane by trained logistic regression model
    y_predict = np.array(predict(res.x, X_grid_matrix))
    # Transform the prediction results into a matrix of the same shape as X0 and X1
    y_predict_matrix = y_predict.reshape(X0.shape)
    
    Set the color table
    from matplotlib.colors import ListedColormap
    my_colormap = ListedColormap(['#0000CD'.'#40E0D0'.'#FFFF00'])
    
    # Draw a contour line and fill in the color of the contour area
    ax.contourf(X0, X1, y_predict_matrix, cmap=my_colormap)
Copy the code

Draw boundaries:

Data ['Admitted']. Isin ([1]) : False,True,False...
positive = data2[data2['pass'].isin([1])]
negative = data2[data2['pass'].isin([0])]
Son # canvasMax (), Max (), Max (), bplots(ax = plt. plots(figsize=(12,8))) plot_decision_boundary(axis=[data2.f10.min (), data2.f10.min (), data2.f01.min ()), data2.F01.max()], axe=ax) ax.scatter(positive['F10'], positive['F01'], s=50, c='g', marker='o', label='Pass')
ax.scatter(negative['F10'], negative['F01'], s=50, c='r', marker='x', label='Not Pass')
ax.legend()  # label
ax.set_xlabel('test1')
ax.set_ylabel('test2')
plt.show()
Copy the code

The drawing result is shown as follows:

2.6 inquiryInfluence of hyperparameter size on decision boundary

2.6.1 whenWhen the value is too large

Such as setting current, observe decision boundaries.

2.6.2 whenValue after hours

Such as setting current, observe decision boundaries.