Vernacular machine learning – Optimization methods – Newton method

@[toc]

Introduction to the

Newton method, BFGS, is one of the most effective methods to solve nonlinear optimization problems.

The characteristics of

Fast convergence rate;

way

Newton method is an iterative algorithm, each step needs to solve the objective functionHesse matrixQuasi – Newton method, which simplifies this process by approximating the inverse or Hessel matrix of a positive definite matrix.

Analysis of the

Consider unconstrained optimization problems

$\min_{x \in R} f(x)$

Where X ∗x^*x∗ is the minimum of the objective function. Assume that f (x) with second order partial derivative in a row, if the first k iteration value of x (k) x ^ {} (k) (k), x, f (x) can be in the x (k) x ^ {} (k) x (k) near the second-order Taylor expansion:

$f(x) = f(x^{k}) + g_{k}^{T}(x – x^{k}) + 1/2(x-x^{k})^TH(x^{k})(x – x^{k})$

Gk = g (xk) = ∇ (f) (xk) g_k = g (x ^ {k}) = \ nabla (f (x ^ {k})) gk = g (xk) = ∇ (f) (xk) is f (x) of the gradient vector at x (k) x ^ {} (k) values of x (k).
H (xk) H (x ^ {k}) H (xk) is f (x) of Hesse matrix [partial f2 partial xi partial yj] NXN [\ frac {\ partial f ^ 2} {\ partial x_i \ partial y_j}] _ {NXN} [partial xi partial yj partial f2] NXN in X (k) x ^ {} (k) values of x (k).

Here, the hessian matrix of Taylor’s expansion is explained in detail, and Taylor’s expansion of binary functions is explained temporarily

And then we move on, the necessary condition for f(x) to have an extreme is that the first derivative at the extreme point is 0, the gradient vector is 0. In particular, when H(xk)H(x^{k})H(xk) is a positive definite matrix, the extreme value of the function f(x) is a minimum, so:

$\nabla(f(x)) = 0$

The derivative of f of x, so

∇ (f (x) = f (xk) + gkT (x – xk) + 1/2 (x – xk) TH (xk) (x – xk)) \ nabla (f (x) = f (x ^ {k}) + g_ {k} ^ {T} (x – x ^ {k}) + 1/2 (x – x ^ {k}) ^ TH (x ^ {k} (x – X ^ ∇ {k}))) (f (x) = f (xk) + gkT (x – xk) + 1/2 (x – xk) TH (xk) (x – xk)) = gk + H (xk) = g_k (x – xk) + H (x ^ {k}) (x – x ^ {k}) = gk + H (xk) (x – xk) Gk + H (xk) (xk + 1 – xk) = 0 g_k + H (x ^ {k}) (x ^ ^ {k + 1} – x} {k) = 0 gk + H (xk) (xk + 1 – xk) = 0 xk + 1 – xk = (xk) – 1 – H GKX ^ {k + 1} – x ^ = {k} – H (x) ^ k ^ {1} g_kxk + 1 – xk = (xk) – 1 – H gk or pk xk + 1 = xk + x ^ + 1} {k = x ^ {k} + p_kxk + 1 = xk + pk H (xk) of pk = – gk H p_k = (x ^ k) -g_kh (xk)pk=−gk The formula is derived

algorithm

Input: objective function f (x), the gradient g (x) = ∇ f (x) (x) = g \ nabla f (x) (x) = g ∇ f (x), Hesse matrix H (x), epsilon accuracy; Output: the minimum of f(x) x^*;

X ^{(0)}x(0), k=0;
Gk =g(x(k))g_k =g(x ^{(k)})gk=g(x(k))
If ∣ ∣ gk ∣ ∣ < epsilon | | g_k | | < epsilon ∣ ∣ gk ∣ ∣ < epsilon, stop calculation, get the solution x ∗ = x (k) = x x ^ * ^ {} (k) ∗ = x x (k)
Calculation of Hk = H (k) (x) H_k = H (x ^ {(k)}) Hk = H (x) (k), and solving pkp_kpk

H(xk)pk=−gk H(x^k)p_k = -g_kh (xk)pk=−gk 5. Iterate, xk + 1 = xk + PKX ^ + 1} {k = x ^ {k} + p_kxk + 1 = xk + pk, request k++, go to step 2;

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Vernacular machine learning (vi) optimization method – Newton method

Vernacular machine learning – Optimization methods – Newton method

Introduction to the

The characteristics of

way

Analysis of the

algorithm

Vernacular machine learning (vi) optimization method – Newton method

Vernacular machine learning – Optimization methods – Newton method

Introduction to the

The characteristics of

way

Analysis of the

algorithm

Related Posts

How can programmers send the most special “I love you” declarations in Python before 520

Application of deep residual shrinkage network in MNIST image recognition

Intelligent robot speech recognition technology