“This is the first day of my participation in the First Challenge 2022. For details: First Challenge 2022”


1 quotes

Given a function as shown in the figure, how to solve it through computer algorithm programming
f ( x ) m i n f(x)_{min}
?

2. Numerical method

The traditional method is numerical solution, as shown in the figure

Follow these steps to iterate through the loop until optimal:

1. Set any initial value
x 0 x_0
;

2. Randomly generate incremental direction and step size
Δ x \varDelta x
;

3. Calculate and compare
f ( x 0 ) f\left( x_0 \right)
with
f ( x 0 + Δ x ) f\left( x_0+\varDelta x \right)
The size of if
f ( x 0 + Δ x ) < f ( x 0 ) f\left( x_0+\varDelta x \right) <f\left( x_0 \right)
Update the location, otherwise rebuild
Δ x \varDelta x
;

4. Repeat until convergence reaches the optimum
f ( x ) m i n f(x)_{min}
.

The biggest advantage of the numerical method is its simplicity of programming, but its drawbacks are also obvious:

1. The setting of initial value has great influence on the convergence speed of the result;

2. Incremental direction is generated randomly with low efficiency;

3. Easy to fall into local optimal solution;

4. Unable to handle “plateau” type functions.

The so-called trapped local optimal solution means that when the iteration enters a minimum value or its neighborhood, the learning effect is not as good as the current one, no matter in the positive direction or the negative direction, because the step size is not properly chosen, so that the global optimal solution cannot be iterated. As shown in the figure for this problem, when iteration falls into x= XJX = X_jx =xj, due to the limitation of learning stepstep, It is not possible to make f(xj±Step)

A “plateau” function like the one shown in the figure below can also prevent iterations from being updated.

Gradient descent algorithm

The gradient descent algorithm can be regarded as an improvement of the numerical method and is described as follows:

After KKK iteration, the independent variable is updated as x= XKX =x_kx= xK, so that the objective function F (x)f(x)f(x) is expanded Taylor at x= XKX =x_kx=xk:


f ( x ) = f ( x k ) + f ( x k ) ( x x k ) + o ( x ) f\left( x \right) =f\left( x_k \right) +f’\left( x_k \right) \left( x-x_k \right) +o(x)

Examine f (x) minf (x) _ {min} f (x) min, is expected to f (xk + 1) < f (xk) f \ left (x_ + 1} {k \ right) < f \ left (x_k \ right) f (xk + 1) < f (xk), thus:


f ( x k + 1 ) f ( x k ) = f ( x k ) ( x k + 1 x k ) < 0 f\left( x_{k+1} \right) -f\left( x_k \right) =f’\left( x_k \right) \left( x_{k+1}-x_k \right) <0

If f ‘(xk) > 0 f \’ left (x_k \ right) > 0 f ‘(xk) > 0, xk + 1 < xkx_ + 1} {k + 1 < < x_kxk xk, namely the iterative direction is negative; The reverse is positive. Might as well set xk + 1 – xk = – f ‘(xk) x_ + 1} {k – x_k = – f’ (x_k) xk + 1 – xk = – f ‘(xk), To ensure that f (xk + 1) – (xk) < 0 f f \ left (x_ + 1} {k \ right) – f \ left (x_k \ right) < 0 f (xk + 1) – (xk) f < 0. It must be pointed out that the conditions of the Taylor formula was set up, x – x0x \ rightarrow x_0x – x0, therefore ∣ f ‘(xk) ∣ |’ f \ left (x_k \ right) | ∣ f ‘(xk) ∣ cannot too big, Otherwise, the distance between xk+1x_{k+1}xk+1 and Xkx_ {k}xk is too far, resulting in residual errors. So vector to introduce gamma ∈ (0, 1) \ gamma \ \ in the left (0, 1, right) gamma ∈ (0, 1) to reduce the deviation degree, namely the xk + 1 – xk = – gamma f ‘(xk) x_ + 1} {k – x_k = – \ gamma f’ (x_k) xk + 1 – xk = – gamma f ‘(xk)

In engineering, the learning rate gamma \gammaγ should be rationally selected in combination with practical application. If == gamma \gammaγ is too large, the iteration will oscillate on both sides of the minimum value and the algorithm cannot converge. If gamma \gammaγ is too small, the learning efficiency will decrease and the algorithm will converge slowly ==.

For vectors, the above iterative formula is extended to


x k + 1 = x k gamma x k {\boldsymbol{x}_{\boldsymbol{k}+1}=\boldsymbol{x}_{\boldsymbol{k}}-\gamma \nabla _{\boldsymbol{x}_{\boldsymbol{k}}}}

Including ∇ x = (partial f (x) partial x1, partial f (x) partial x2,…, partial f (x) partial xn) T \ nabla _ {\ boldsymbol {x}} = \ left (\ frac {\ partial f (\ boldsymbol {x})} {\ partial x_1},\frac{\partial f(\boldsymbol{x})}{\partial x_2},\cdots \cdots ,\frac{\partial f(\boldsymbol{x})}{\partial x_n} \ right) ^ T ∇ x = (partial x1 partial f (x), partial x2 partial f (x),…, partial xn partial f (x)) T for multivariate function gradient, therefore iterative algorithm is also known as the gradient descent algorithm

Gradient descent algorithm determines the direction and step size of each iteration by function gradient, which improves the efficiency of the algorithm. However, it can be seen from the principle that this algorithm can not solve the problems of initial value setting, local optimal collapse and partial function locking in numerical method.

4 code combat: Logistic regression

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import matplotlib as mpl
from Logit import Logit

@param[in]: colName -> Column name to load * @param[in]: mode -> load mode, set: Dictionary of column names and column data, df: df * @retval: return value "" in mode
def loadCsvData(file, colName, mode='df') :
    assert mode in ('set'.'df')
    df = pd.read_csv(file, encoding='utf-8-sig', usecols=colName)
    if mode == 'df':
        return df
    if mode == 'set':
        res = {}
        for col in colName:
            res[col] = df[col].values
        return res

if __name__ == '__main__':
    # = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Read the CSV data
    # = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    csvPath = os.path.abspath(os.path.join(__file__, ".. /.. / data/dataset3.0 alpha. CSV"))
    dataX = loadCsvData(csvPath, ["Sugar content"."Density"].'df')
    dataY = loadCsvData(csvPath, ["Good melon"].'df')
    label = np.array([
        1 if i == "Yes" else 0
        for i in list(map(lambda s: s.strip(), list(dataY['good melon'))))# = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    # Draw sample points
    # = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    line_x = np.array([np.min(dataX['density']), np.max(dataX['density'])])
    mpl.rcParams['font.sans-serif'] = [u'SimHei']
    plt.title('Logistic Regression Simulation' \nLogistic Regression Simulation)
    plt.xlabel('density')
    plt.ylabel('sugarRate')
    plt.scatter(dataX['density'][label==0],
                dataX['Sugar content'][label==0],
                marker=A '^',
                color='k',
                s=100,
                label='bad melon')
    plt.scatter(dataX['density'][label==1],
                dataX['Sugar content'][label==1],
                marker=A '^',
                color='r',
                s=100,
                label='good melon')

    # = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    # Instantiate the log-probability regression model
    # = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    logit = Logit(dataX, label)

    # Adopt gradient descent method
    logit.logitRegression(logit.gradientDescent)
    line_y = -logit.w[0.0] / logit.w[1.0] * line_x - logit.w[2.0] / logit.w[1.0]
    plt.plot(line_x, line_y, 'b-', label="Gradient descent")

    # drawing
    plt.legend(loc='upper left')
    plt.show()
Copy the code