This is the 9th day of my participation in Gwen Challenge

Polynomial regression

Polynomial regression is the same concept as the simple linear regression we discussed before, except that its graphical representation uses curves instead of straight lines. Compared with simple linear regression, it has more parameters, but it still has only one independent variable. Its formula is expressed as follows:


y = b 0 + b 1 x 1 + b 2 x 1 2 . . . + b n + x 1 n y = b_0 + b_1 * x_1 + b_2 * x_1^2 … + b_n + x_1^n

The application of polynomial regression is suitable for the fitting of parabola, which is basically used to define or describe nonlinear phenomena such as:

  • The rate of tissue growth
  • The development of infectious diseases
  • Population growth rate

Polynomial linear regression has only one independent variable, which can better fit parabola and is suitable for fitting parabolic images such as the spread rate of similar diseases.

Because the dependent variable y can take a linear parameter b0b_0b0… Bnb_nbn, so we still call polynomial regression a type of linear regression.

Python – polynomial regression implementation

If we have the following salary data, we would like to predict the salary of the corresponding level by position and rank.

Position Level Salary
Business Analyst 1 45000
Junior Consultant 2 50000
Senior Consultant 3 60000
Manager 4 80000
Country Manager 5 110000
Region Manager 6 150000
Partner 7 200000
Senior Partner 8 300000
C-level 9 500000
CEO 10 1000000

Let’s compare the effects of simple linear regression and polynomial regression.

We load the data first

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## import data
dataset = pd.read_csv('data.csv')
X = dataset.iloc[:,1:2].values
y = dataset.iloc[:,2].values
Copy the code
# Create simple linear regression model
from sklearn.linear_model import LinearRegression

line_regression = LinearRegression()
line_regression.fit(X,y)

Create a polynomial matrix
from sklearn.preprocessing import PolynomialFeatures

ploy_regression = PolynomialFeatures(degree = 4)
X_ploy = ploy_regression.fit_transform(X)

# Linear regression model using polynomial matrix sets
line_regression_2 = LinearRegression()
line_regression_2.fit(X_ploy,y)
Copy the code

As you can see, sklearn created LINe_regression with simple linear regression model fitting, and LINe_REGRESsion_2 with polynomial regression model fitting

So let’s see how they work

# Linear regression model image
plt.scatter(X, y, color='red')
plt.plot(X, line_regression.predict(X), color='blue')
plt.title('Linear Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
Copy the code

It can be seen that the simple linear model is not good for the set of actual cases, and it does not fit the curve model well.

# Linear regression model image
plt.scatter(X, y, color='red')
plt.plot(X, line_regression_2.predict(X_ploy), color='green')
plt.title('Polynomal Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
Copy the code

It can be seen that polynomial linear regression can better fit the parabolic image in the data than simple linear regression

But you can see that the connections between the points are straight lines, so the image is not very smooth, we can add interpolation to make the image more smooth.

X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape(len(X_grid), 1)
plt.scatter(X, y, color='red')
plt.plot(X_grid, line_regression_2.predict(
    ploy_regression.fit_transform(X_grid)), color='black')
plt.title('Polynomal Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
Copy the code

By adding interpolation, you can see that the image has become smoother.

Let’s use the fitted model to predict the value

line_pred = line_regression.predict(np.array(6.5).reshape(1, -1))
// output - 330379
ploy_pred = line_regression_2.predict(
    ploy_regression.fit_transform(np.array(6.5).reshape(1, -1)))
// output - 158862
Copy the code

It can be seen that the predicted value of the polynomial regression model is more consistent with the actual situation.