This is the 24th day of my participation in the August More Text Challenge

Regularization of linear regression

We have discussed two algorithms for linear regression before:

Gradient descent
Normal equations

Regularized linear regression

\begin{aligned} &J(\theta)=\frac{1}{2 m}\left[\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}\right] \\ &\min _{\theta} J(\theta) \end{aligned}

For the cost function above we want to find the right theta to minimize it.

Gradient descent

Remember that traditional gradient descent looks like this:

Repeat {

$\theta_{j}:=\theta_{j}-\alpha \quad \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) X_ {j}^{(I)} \quad\quad (j=0,1,2,3, \ldots, n)$

}

So let’s just take the gradient descent theta 0\theta_0 theta 0 and add the penalty term to the rest of it. Because, as we said before, the penalty term starts at θ1\theta_{1}θ1.

Repeat {

$\theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}$

$\theta_{j}:=\theta_{j}-\alpha \lbrack\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) X_ ^ {j} {(I)} + \ frac {\ lambda} {m} \ theta_ {j} \ rbrack \ quad \ quad (j = 1, 2, 3, \ ldots, n)$

}

The simplification can be written like this:

Repeat {

$\theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}$

$\theta_{j}:=\theta_{j}\left(1-\alpha \frac{\lambda}{m}\right)-\alpha \frac{1}{m} \ sum_ {I = 1} ^ {m} \ left (h_ {\ theta} \ left (x ^ {(I)} \ right) – y ^ {(I)} \ right) x_ ^ {j} {(I)}, quad, quad (j = 0,1,2,3 \ ldots, n)$

}

Normal equaion

The original normal equation:

$\theta=\left(X^{T} X\right)^{-1} X^{T} y$

Where X = [… (x0i) (x2i) (x1i) T… T… T… T…] (xni) ∈ Rm * n + 1 x = \ begin {bmatrix}… (x^i_0)^T… \ \… (x^i_1)^T… \ \… (x^i_2)^T… \ \… \ \… (x^i_n)^T… {\ end bmatrix} \ \ in R ^ {m \ times n + 1}, quad, quad, quadX = ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡… (x0i)T…… (x1i)T…… (x2i)T……… (xni)T… ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ∈ Rm * n + 1 y = [y1y2y3… ym] ∈ Rmy = \ begin {bmatrix} \ \ ^ 1 ^ 2 \ \ y y y ^ 3 \ \… \\ y^m \end{bmatrix} \in \R^my=⎣⎢⎢⎢⎢⎢, repeatable… Ym ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ∈ Rm

After using regularization

\theta=\left(x^{\top} x+\lambda\left[\begin{array}{lllll} 0 & & & \\ & 1 & & \\ & & 1 & \\ & & & \cdots \\ & & & & 1 \end{array}\right]\right)^{-1} x^T y

In which the diagonal matrix 011. [1] \ left [\ begin {array} {LLLLL} 0 & && \ \ & && \ \ & 1 & \ \ & && \ cdots \ \ & & & & 1 {array} \ \ end right] ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 011. 1 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ dimension of Rn (n + 1 + 1 \ R_ (n + 1 \ times n + 1} Rn * n + 1 + 1

Regularization of logistic regression

Cost function of logistic regression:

$\begin{aligned} J(\theta) =-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right] \end{aligned}$

We will also add one to the end, after which is:

\begin{aligned} J(\theta) =-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2} \end{aligned}

Even if you fit a lot of parameters with a high order, if you add the regularization term and keep the parameters small, you still get a reasonable decision boundary.

Gradient descent

We already know that linear regression and logistic regression look the same in form, so we can directly transfer the gradient descent of linear regression:

Repeat {

$\theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}$

}

In order to regularize it to logistic regression, we need to add another expression to the second expression:

Repeat {

$\theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}$

}

It looks the same as linear regression, but it’s important to remember both $h(x)$ There is a difference.

Advanced optimization

When we talked about logistic regression, we talked about other advanced algorithms besides gradient descent, but we didn’t go into detail. So how do you use regularization in advanced algorithms

$function [jVal, gradient] = costFunction (theta)$

JValjValjVal =[code to compute J(θ)J(\theta)J(θ)]

Gradient (1)=[gradient (1)=\left[\right. Gradient (1)=[code to compute ∂θ0J(θ)]\left.\frac{\partial}{\partial \theta_{0}} J (\ theta) \ right] partial theta 0 partial J (theta)]

Gradient (2)=[gradient (2)=\left[\right.gradient(2)=[code to compute ∂∂θ1J(θ)]\left.\frac{\partial}{\partial \theta_{1}} J (\ theta) \ right] partial theta 1 partial J (theta)]

Gradient (n+1)=[gradient (n+1)=\left[\right. Gradient (n+1)=[code to compute ∂θnJ(θ)]\left.\frac{\partial}{\partial \ theta_ {n}} J (\ theta), quad, right] partial theta n partial J (theta)]

Again, you need to write the costFunction function yourself, and in this function:

function[jVal,gradient]=costFunction(theta)function [jVal, Gradient]=costFunction(theta) function[jVal,gradient]=costFunction(theta) needs to be passed in, Theta = 1 ⋮ theta n] [theta. Theta 0 = \ left [\ begin {array} {c} \ theta_ {0} \ \ \ theta_ {1} \ \ \ vdots \ \ \ theta_ {n} {array} \ \ end right] = ⎣ ⎢ ⎢ ⎢ ⎢ ⎡ theta 0 1 ⋮ theta. Theta n ⎦ ⎥ ⎥ ⎥ ⎥ ⎤
JValjValjVal =[code to compute J(θ)J(\theta)J(θ)
Gradient (1)=[gradient (1)=\left[\right. Gradient (1)=[code to compute ∂θ0J(θ)]\left.\frac{\partial}{\partial \theta_{0}} J (\ theta) \ right] partial theta 0 partial J (theta)] is calculated 1 m ∑ I = 1 m (h theta – y (x (I)) (I)) x0 (I) \ frac {1} {m} \ sum_ {I = 1} ^ {m} \ left (h_ {\ theta} \ left (x ^ {(I)} \ right) – y ^ {(I)} \ right) x_ {0} ^ {(I)} m1 ∑ I = 1 m (h theta – y (x (I)) (I)) x0 (I)
Gradient (n+1)=[gradient (n+1)=\left[\right. Gradient (n+1)=[code to compute ∂θnJ(θ)]\left.\frac{\partial}{\partial \ theta_ {n}} J (\ theta), quad, right] partial theta n partial J (theta)] is 1 m ∑ I = 1 m (h theta – y (x (I)) (I)) xn lambda mJ theta (n) (I) + \ frac {1} {m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) X_ ^ {n} {(I)} + \ frac {\ lambda} {m} J (\ theta_n) m1 ∑ I = 1 m (h theta – y (x (I)) (I)) xn (I) + m lambda J theta (n)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Deep learning regularization applications

Regularization of linear regression

Gradient descent

Normal equaion

Regularization of logistic regression

Gradient descent

Advanced optimization

Deep learning regularization applications

Regularization of linear regression

Gradient descent

Normal equaion

Regularization of logistic regression

Gradient descent

Advanced optimization

Related Posts

Pandas

The most obscure project on Gitee: creating an artificial life from code

Computational vision — local feature information and panoramic image Mosaic