Moment For Technology

Python Mathematical Modeling Series ix: Regression

Posted on May 28, 2023, 12:45 a.m. by Roger Patel
Category: The back-end Tag: python The back-end

Small knowledge, big challenge! This article is participating in the creation activity of "Essential Tips for Programmers".


Hello! Friend!!!

Thank you very much for reading haihong's article, if there are any mistakes in the article, please point out ~


Self-introduction ଘ(੭, ᵕ)੭

Nickname: Haihong

Tag: programmer monkey | C++ contestant | student

Introduction: because of C language to get acquainted with programming, then transferred to the computer major, had the honor to get some national awards, provincial awards... Has been confirmed. Currently learning C++/Linux/Python

Learning experience: solid foundation + more notes + more code + more thinking + learn English well!


Little White stage of learning Python

The article is only used as my own study notes for the establishment of knowledge system and review

The problem is not to learn one more question to understand one more question

Know what is, know why!

1. Multiple regression

Note: there is no find the data set Reference: The following code is not validated

1.1 Data Selection

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl   # display Chinese
def mul_lr() :
    pd_data=pd.read_excel('.. /profile/test.xlsx')
font = {
    "family": "Microsoft YaHei"
matplotlib.rc("font", **font)
mpl.rcParams['axes.unicode_minus'] =False 
sns.pairplot(pd_data, x_vars=['csi 500'.Lugu deep in '300'.'Shanghai'.'the Shanghai 180'], y_vars=Shanghai Composite Index,kind="reg", size=5, aspect=0.7)
Copy the code

1.2 Build training set and test set, and build model

from sklearn.model_selection import train_test_split Cross validation is referenced here
from sklearn.linear_model import LinearRegression  # Linear regression
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt
def mul_lr() :   # Continue with the previous code
    # remove date data, generally no this column does not perform, the following data:
    X=pd_data.loc[:,('csi 500'.Lugu deep in '300'.'Shanghai'.'the Shanghai 180')]
    y=pd_data.loc[:,Shanghai Composite Index]
    X_train,X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2,random_state=100)
    print ('X_train.shape={}\n y_train.shape ={}\n X_test.shape={}\n, y_test.shape={}'.format(X_train.shape,y_train.shape, X_test.shape,y_test.shape))
    linreg = LinearRegression(), y_train)
    print (model)
    # Model intercept after training
    print (linreg.intercept_)
    # Weight of model after training (no change in number of features)
    print (linreg.coef_)
Copy the code

1.3 Model prediction

# prediction
y_pred = linreg.predict(X_test)
print (y_pred) # Predicted results of 10 variables
Copy the code

1.4 Model Evaluation

    # evaluation
    #(1) Evaluation measure
    # For classification problems, the evaluation measure is accuracy, but this method is not applicable to regression problems. We use evaluation metrics for continuous values.
    Here are three commonly used measures for linear regression.
    # 1) Mean Absolute Error (MAE)
    # (2) Mean Squared Error (MSE)
    # (3) Root Mean Squared Error (RMSE)
    # here I use RMES.
    for i in range(len(y_pred)):
    sum_erro=np.sqrt(sum_mean/10)  # This 10 is the number of test levels you have
    # calculate RMSE by hand
    print ("RMSE by hand:",sum_erro)
    # Make ROC curve
    plt.legend(loc="upper right") # displays the labels in the diagram
    plt.xlabel("the number of sales")
    plt.ylabel('value of sales')
Copy the code

2 logistic regression

2.1 Iris data set

Iris has three subgenera, namely Iris-setosa, Iris-versicolor and Iris-Virginica.

This data set contains 4 characteristic variables and 1 category variable. There are 150 samples in total. Iris is an iris plant, where the length and width of its sepals and petals are stored. There are four attributes in total, and there are three types of iris plants.

2.2 Draw scatter diagram

The Demo code

import matplotlib.pyplot as plt 
import numpy as np 
from sklearn.datasets import load_iris 
iris = load_iris()
Get the flower two-column dataset
DD =  
X = [x[0] for x in DD]  
Y = [x[1] for x in DD] 
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa') 
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica') 
plt.legend(loc=2) # the upper left corner 
Copy the code

The results

2.3 Logistic regression analysis

The Demo code

from sklearn.linear_model import LogisticRegression 
iris = load_iris() 
X =[:, :2]   Get the flower two-column dataset
Y = 
lr = LogisticRegression(C=1e5),Y) 
The meshgrid function generates two grid matrices
h = . 02  
x_min, x_max = X[:, 0].min() -. 5, X[:, 0].max() +. 5 
y_min, y_max = X[:, 1].min() -. 5, X[:, 1].max() +. 5 
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) 
Z = lr.predict(np.c_[xx.ravel(), yy.ravel()]) 
Z = Z.reshape(xx.shape)  
plt.figure(1, figsize=(8.6))  
plt.pcolormesh(xx, yy, Z, 
plt.scatter(X[:50.0], X[:50.1], color='red',marker='o', label='setosa')  
plt.scatter(X[50:100.0], X[50:100.1], color='blue', marker='x', label='versicolor')
plt.scatter(X[100:,0], X[100:,1], color='green', marker='s', label='Virginica') 
plt.xlabel('Sepal length')  
plt.ylabel('Sepal width')  
plt.xlim(xx.min(), xx.max())  
plt.ylim(yy.min(), yy.max())  
Copy the code

The results


Source of learning: STATION B and its classroom PPT, the code is reproduced

The essay is just a study note, recording a process from 0 to 1

Hope to help you, if there is a mistake welcome small partners to correct ~

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.