DNN optimization techniques for DL: introduction to neural network algorithms: introduction to GD/SGD algorithm, code implementation, code tuning in detail

 

 

directory

Introduction of GD algorithm

GD/SGD algorithm code implementation

1. Matlab programming

GD algorithm improved algorithm

Hyperparameters in GD algorithm


 

 

Introduction of GD algorithm

GD algorithm is a basic method to solve nonlinear unconstrained optimization problems and a common first-order optimization method to minimize loss function. Figure out the steepest way down the mountain as shown.

1. How to find the gradient?

Along the gradient, the function drops fastest.

2. Binary surface

Error surface of a linear neuron with two input weights, Error surface of a linear neuron with two input weights

3. GD algorithm is prone to fall into local minima

 

 

GD/SGD algorithm code implementation

1. Matlab programming

%% fastest descent method icon % set step size to0.1F_change is the change of y value before and after the change, and only one exit condition is set. syms x; f=x^2;
step=0.1; x=2; k=0; % Set step size, initial value, iteration record number f_change=x^2; % Initialize the difference f_current=x^2; % Calculate the current function value ezplot(@(x,f)f-x.^2) % Draw the function graph axis([-2.2, -0.2.3]) % fixed axis hold onwhile f_change>0.000000001% set the condition that the difference between two calculated values is less than a certain number, out of the loop x=x-step*2*x; % -2*x is the opposite direction of the gradient, step is the step size,! Fastest descent method! f_change = f_current - x^2; % Calculate the difference between the two function values f_current = x^2; % recalculate the current function value plot(x,f_current,'ro'.'markersize'.7Drawnow; pause(0.2);
    k=k+1;
end
hold off
fprintf('Find the minimum value of the function %e after iterating %d times, and the corresponding x value %e\n',k,x^2,x)
Copy the code

2. SGD algorithm based on Python

class SGD:
    def __init__(self, lr=0.01) :
        self.lr = lr  # Learning rate, instance variable
    
    The #update() method is called repeatedly in SGD
    def update(self, params, grads) :
        for key in params.keys():
            params[key] -= self.lr * grads[key]  Parameters params and grads are also typical variables, preserving the weight parameters and their gradients respectively in the form of Params ['W1'] and ['W1'].
    
'Pseudocode: Updating of parameters in neural networks'   
network = TwoLayerNet(...)
optimizer = SGD()
for i in range(10000):
    ...
    x_batch, t_batch = get_mini_batch(...) # mini-batch
    grads = network.gradient(x_batch, t_batch)
    params = network.params
    optimiz
Copy the code

 

 

GD algorithm improved algorithm

1. SGD algorithm (1). Mini-batch is the mini-batch gradient descent algorithm if the average gradient of several samples is used as the update direction instead of changing the gradient every time a sample is obtained.





(1) SGD and learning Rate, Rate and Loss

 

 

Hyperparameters in GD algorithm

1. Learning rate

(1) C code of fixed learning rate experiment

Backing Line Search



(3) Quadratic interpolation linear search: retrospective linear search thinking – interpolation method, quadratic interpolation method for extreme value