Introduction to linear regression machine learning algorithms

Before we explain what linear regression is, let’s take an example!

So let’s say we have m samples, and we have n features x in the sample, and the corresponding output y. The expression is as follows:

Now there’s a new data set

We need to predict what the corresponding output y is.

The first thing we need to know is that if the output y is continuous, it’s a regression problem, otherwise it’s a classification problem. If it is a classification problem, then we should fit the previous M samples and establish a linear equation similar to the following form: \

So now we can make predictions about the new data set, and figure out the corresponding y value. Of course, it is not 100% accurate to predict the fitted function. The reality is more like the picture below

Ok, so that’s linear regression. For the convenience of going out with friends pretend to be forced, we use professional belongs to summarize is: the use of mathematical statistics regression analysis, to determine two or more variables interdependent quantitative relationship between a statistical analysis method. Linear regression is arguably the most basic algorithm in machine learning. \

First, the goal of linear regression

1. Assess the significance of predictive variable Y in explaining variation or performance in response variable X. 2. Predict the value of the response variable y given the value of the predictive variable x.

Second, the application of linear regression

1. Prediction: establish a model (equation) to predict the new y value based on the relationship between reaction variable Y and other predictive variables X. 2. Explanatory analysis and exploratory analysis, which are used to understand and explain the relationship between response variables and predictive variables.

Third, categories of linear regression

1. Unary linear regression: it includes only one independent variable and one dependent variable, and the relationship between them can be approximated by a straight line. 2. Multiple linear regression: including two or more independent variables, and there is a linear relationship between dependent variables and independent variables

The hypothesis function and loss function of linear regression

What is the hypothesis function and the loss function

Hypothesis functionIn supervised learning, a hypothesis function used to fit an input sampleSaid.

Loss function: also called cost function, objective function. In order to evaluate the model fit, it is used to measure the degree of fit. The smaller the loss function is, the better the fitting degree is, and the corresponding parameter is the optimal parameter. For linear regression, the loss function is usually the mean square error (the predicted value minus the true value squared). Below we assume that there are m samples, each corresponding to an N-dimensional feature and an output y, then linear regression assumes a function \

Fifth, the algorithm process

We have the hypothesis function and the loss function. So our next goal is to figure out the model parameters that minimize the loss function. To minimize, we usually use gradient descent and least square. Today we’re going to use the least square method. For the convenience of explanation, we only use a sample of one feature, namely, the loss function becomes

To minimize the loss function, pair them separately.Take the partial derivatives, set them all equal to 0. To get.Equations of phi, these two equations are simultaneous, and then you get phi.The values of these two parameters. The specific process is as follows:

And then we have the parameters of our optimal model. So that’s the linear regression algorithm.

§ § \

Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the Ministry of Public Security, ministry of industry, tsinghua university, Beijing university, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, represented by Google, Microsoft and other government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.

The recent hot

Use Python to crawl financial market data \

Build CNN model to crack website captcha \

Image recognition with Python (OCR) \

Analysis of employment status of Python development in Beijing \

Memories of youth in QQ space with Python \

Email: [email protected]

**** Free membership of the Data Science Club ****

Introduction to linear regression machine learning algorithms

Related Posts

How to ensure uniqueness of ids in distributed systems

Random sampling is dead. Reservoir sampling is king

I couldn’t resist writing this code guide after seeing my colleague’s code