 # Python Mathematical Modeling Series ix: Regression

Posted on May 28, 2023, 12:45 a.m. by Roger Patel
Category: The back-end

# preface

Hello! Friend!!!

Thank you very much for reading haihong's article, if there are any mistakes in the article, please point out ~

Self-introduction ଘ(੭, ᵕ)੭

Nickname: Haihong

Tag: programmer monkey | C++ contestant | student

Introduction: because of C language to get acquainted with programming, then transferred to the computer major, had the honor to get some national awards, provincial awards... Has been confirmed. Currently learning C++/Linux/Python

Learning experience: solid foundation + more notes + more code + more thinking + learn English well!

Little White stage of learning Python

The article is only used as my own study notes for the establishment of knowledge system and review

The problem is not to learn one more question to understand one more question

Know what is, know why!

# 1. Multiple regression

Note: there is no find the data set Reference: blog.csdn.net/HHTNAN/arti... The following code is not validated

## 1.1 Data Selection

``````import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl   # display Chinese
def mul_lr() :
font = {
"family": "Microsoft YaHei"
}
matplotlib.rc("font", **font)
mpl.rcParams['axes.unicode_minus'] =False
sns.pairplot(pd_data, x_vars=['csi 500'.Lugu deep in '300'.'Shanghai'.'the Shanghai 180'], y_vars=Shanghai Composite Index,kind="reg", size=5, aspect=0.7)
plt.show()
Copy the code`````` ## 1.2 Build training set and test set, and build model

``````from sklearn.model_selection import train_test_split Cross validation is referenced here
from sklearn.linear_model import LinearRegression  # Linear regression
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt
def mul_lr() :   # Continue with the previous code
# remove date data, generally no this column does not perform, the following data: http://blog.csdn.net/chixujohnny/article/details/51095817
X=pd_data.loc[:,('csi 500'.Lugu deep in '300'.'Shanghai'.'the Shanghai 180')]
y=pd_data.loc[:,Shanghai Composite Index]
X_train,X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2,random_state=100)
print ('X_train.shape={}\n y_train.shape ={}\n X_test.shape={}\n, y_test.shape={}'.format(X_train.shape,y_train.shape, X_test.shape,y_test.shape))
linreg = LinearRegression()
model=linreg.fit(X_train, y_train)
print (model)
# Model intercept after training
print (linreg.intercept_)
# Weight of model after training (no change in number of features)
print (linreg.coef_)
Copy the code``````

## 1.3 Model prediction

``````# prediction
y_pred = linreg.predict(X_test)
print (y_pred) # Predicted results of 10 variables
Copy the code``````

## 1.4 Model Evaluation

``````    # evaluation
#(1) Evaluation measure
# For classification problems, the evaluation measure is accuracy, but this method is not applicable to regression problems. We use evaluation metrics for continuous values.
Here are three commonly used measures for linear regression.
# 1) Mean Absolute Error (MAE)
# (2) Mean Squared Error (MSE)
# (3) Root Mean Squared Error (RMSE)
# here I use RMES.
sum_mean=0
for i in range(len(y_pred)):
sum_mean+=(y_pred[i]-y_test.values[i])**2
sum_erro=np.sqrt(sum_mean/10)  # This 10 is the number of test levels you have
# calculate RMSE by hand
print ("RMSE by hand:",sum_erro)
# Make ROC curve
plt.figure()
plt.plot(range(len(y_pred)),y_pred,'b',label="predict")
plt.plot(range(len(y_pred)),y_test,'r',label="test")
plt.legend(loc="upper right") # displays the labels in the diagram
plt.xlabel("the number of sales")
plt.ylabel('value of sales')
plt.show()
Copy the code`````` # 2 logistic regression

## 2.1 Iris data set

Iris has three subgenera, namely Iris-setosa, Iris-versicolor and Iris-Virginica.

This data set contains 4 characteristic variables and 1 category variable. There are 150 samples in total. Iris is an iris plant, where the length and width of its sepals and petals are stored. There are four attributes in total, and there are three types of iris plants. ## 2.2 Draw scatter diagram

The Demo code

``````import matplotlib.pyplot as plt
import numpy as np
Get the flower two-column dataset
DD = iris.data
X = [x for x in DD]
Y = [x for x in DD]
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa')
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor')
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica')
plt.legend(loc=2) # the upper left corner
plt.show()
Copy the code``````

The results ## 2.3 Logistic regression analysis

The Demo code

``````from sklearn.linear_model import LogisticRegression
X = iris.data[:, :2]   Get the flower two-column dataset
Y = iris.target
lr = LogisticRegression(C=1e5)
lr.fit(X,Y)
The meshgrid function generates two grid matrices
h = . 02
x_min, x_max = X[:, 0].min() -. 5, X[:, 0].max() +. 5
y_min, y_max = X[:, 1].min() -. 5, X[:, 1].max() +. 5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = lr.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(8.6))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
plt.scatter(X[:50.0], X[:50.1], color='red',marker='o', label='setosa')
plt.scatter(X[50:100.0], X[50:100.1], color='blue', marker='x', label='versicolor')
plt.scatter(X[100:,0], X[100:,1], color='green', marker='s', label='Virginica')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.legend(loc=2)
plt.show()
Copy the code``````

The results # conclusion

Source of learning: STATION B and its classroom PPT, the code is reproduced

The essay is just a study note, recording a process from 0 to 1

Hope to help you, if there is a mistake welcome small partners to correct ~ Search