Python Machine Learning Notes (iii) : Logistic Regression

3.1 Logistic regression

3.1.1. Introduction to logistic regression and conditional probability

As mentioned above, the perceptron algorithm may fail to converge when the nonlinear is separable, so logistic regression can be used to improve the classification efficiency. Note that although it is called logistic regression, it is actually a classification model, not a regression model.

To introduce logistic regression, we need a concept: the odd ratio, mathematically defined as p1−p\frac {p}{1-p}1−pp, the ratio of the probability that an event will happen to the probability that it will not. And then we define a function logit, which is the logarithm of the probability ratio

logit(p) = log \frac {p}{(1-p)}

The input p of logit function is the number in the interval [0,1], and the output range is the set of real numbers. In this way, we can establish the relationship between the probability that the sample belongs to a certain class and the eigenvalue of the sample

logit(p(y=1|x)) = \sum_{i=0}^nw_mx_m=w^Tx

P (y | x) = 1 means under the condition of a given feature x the probability of the sample belong to category 1. Just like before, if we set z=wTxz =w ^Txz=wTx, we can solve for the probability p.

p = \phi(z) = \frac {1}{1+e^{-z}}

ϕ(z)\phi(z)ϕ(z) is then used to represent the probability that the sample is in a category. The graph of the function is as follows

The sigmoid function takes a real value as its input and maps it to the interval [0,1], with an inflection point at ϕ(z)\phi (z)ϕ(z) = 0.5.

In this way, given the characteristics of a sample, we can get the probability that the sample belongs to a certain category, and then convert it into binary output.

\hat y = \begin{cases} 1, &\ phi(z) \ge 0.5(or z \ge 0)\\ 0, &other \end{cases}

3.1.2 Loss function of logistic regression

We used the sum of error squares as the loss function.

J(w) = \frac 12( \Sigma (y^{(i)} – \phi (z)^{(i)}))^2

However, in logistic regression, the sum of error squares is no longer applicable because ϕ(z)\phi (z)ϕ(z) in logistic regression is nonsubconvex in ϕ(w)J(w)J(w) J(w) is not an optimal solution, so we need a new loss function.

We can use the maximum likelihood function:

L(w) = P(y|x,w) = \prod_{i=1}^{n}P(y^{(i)}|x^{(i)}:w)

Because y(I)y^{(I)}y(I) has only two values, 0 and 1, the above equation can be simplified as

L(w) = P(y|x,w) = \prod_{i=1}^{n}\phi(z)^ {y^{(i)}}(1-\phi(z))^{1-y^{(i)}}

Then take logarithm

l(w)=log(L(w)) = \sum_{i=1}^n y^{(i)}log(\phi(z^{(i)})) + (1-y^{(i)})log(1-\phi(z^{(i)}))

Our goal is to find the maximum value of this function, and the loss function is a minimum value, so the loss function is the loss function we need to take the negative number.

J(w)=\sum_{i=1}^n -y^{(i)}log(\phi(z^{(i)})) – (1-y^{(i)})log(1-\phi(z^{(i)}))

To better understand this function, let’s look at the loss function for a single sample:

J(w)=\begin{cases} -log(\phi(z)), & \text{y} = 1 \\ -log(1-\phi(z)), & \text{y} = 0 \\ \end{cases}

The image is as follows:

It can be seen that when the predicted value is 1 and the actual value is 1, the value of the loss function tends to 0 (solid line part), but when the predicted value is 0 and the loss Hanshu value is 1, the cost function tends to infinity.

Python Machine Learning Notes (iii) : Logistic Regression

3.1 Logistic regression

3.1.1. Introduction to logistic regression and conditional probability

3.1.2 Loss function of logistic regression

Related Posts

The new upgrade of intelligent creation platform helps open a new era of intelligent media

What are the differences between artificial intelligence, machine learning, deep learning and neural networks?

IIR+FIR digital filter design based on MATLAB GUI