Produced by: The Cabin by: Peter Edited by: Peter

Machine Learning — Neural Networks Learning

This paper makes further elaboration on the basis of the upper segment neural network, including:

  • Neural network cost function
  • Back propagation method and interpretation
  • Gradient inspection
  • Summary of neural networks

Neural network cost function

Parameter interpretation

The marking method of several parameters is explained:

  • MMM: Number of training samples
  • X, yx, YX, Y: input and output signals
  • LLL: indicates the number of neural network layers
  • SI{S}_{I}SI: the number of neurons in each layer
  • Sl{S}_{L}Sl: indicates the number of output neurons

Classification of discussion

There are two main categories: dichotomies and polytaxies

SL=0,y= 0/1s_l =0,y=0/1SL=0,y=0/1; The output is a real number

KKK classification: SL=k,yi=1S_L=k,y_i=1SL=k,yi=1 indicates the situation of being divided into class III. The output is a multidimensional vector

Cost function

Cost function in Logistic regression (LR) :

J\left(\theta \right)=-\frac{1}{m}\left[\sum_\limits{i=1}^{m}{y}^{(i)}\log{h_\theta({x}^{(i)})}+\left(1-{y}^{(i)}\right)log\left(1-h_\ theta\left({x}^{(i)}\right)\right)\right]+\frac{\lambda}{2m}\sum_\limits{j=1}^{n}{\theta_j}^{2}

In Logistic regression, there is only one output variable called scalar Scalar.

But in a neural network there will be multiple output variables, hθ(x)h_\theta(x)hθ(x) is a vector of KKK dimensions.

Assume the third output function:

\ newcommand {\ subk} [1] theta _k} {# 1 h (x) ∈ RKh_ \ theta \ left (x \ right) \ \ mathbb in ^ {R} {K} h theta (x) ∈ fairly RK Theta (x) (h) I = ithoutput {\ left ({h_ \ theta} \ left (x, right), right)} _ {I} = {I} ^ {th} \ text {output} theta (x) (h) I = ithoutput

The cost function JJJ is expressed as:


J ( Θ ) = 1 m [ i = 1 m k = 1 k y k ( i ) log \subk ( h Θ ( x ( i ) ) ) + ( 1 y k ( i ) ) log ( 1 \subk ( h Θ ( x ( i ) ) ) ) ] + Lambda. 2 m l = 1 L 1 i = 1 s l j = 1 s l + 1 ( Θ j i ( l ) ) 2 J(\Theta) = -\frac{1}{m} \left[ \sum\limits_{i=1}^{m} \sum\limits_{k=1}^{k} {y_k}^{(i)} \log \subk{(h_\Theta(x^{(i)}))} + \left( 1 – y_k^{(i)} \right) \log \left( 1- \subk{\left( h_\Theta \left( x^{(i)} \right) \right)} \right) \right] + \frac{\lambda}{2m} \sum\limits_{l=1}^{L-1} \sum\limits_{i=1}^{s_l} \sum\limits_{j=1}^{s_{l+1}} \left( \Theta_{ji}^{(l)} \right)^2

Explanation:

  1. The cost function is expected to observe the error between the predicted result of the algorithm and the real situation
  2. There will be KKK predictions for each line feature, and each line will be predicted by cycle
  3. Select the one with the highest probability among KKK predictions and compare it with the actual data YYY
  4. The regularization term is the sum of the θ\thetaθ matrices for each layer after excluding each offset θ0\theta_0θ0
  5. The JJJ parameter (determined by the number of active cells in sl+ 1S_L + 1SL +1 layer) loops through all rows and the III parameter (determined by the number of active cells in slS_lSL layer) loops through all columns

Backpropagation Algorithm

Neural network to calculate the cost function of the partial derivative partial J (Θ) partial Θ ij (l) \ frac {\ partial J (\ Theta)} {\ partial \ Theta_ {ij ^ {(l)}}} partial Θ ij (l) partial J (Θ), you need to use back propagation method

  • First calculate the error of the last layer
  • Find the error of each layer in reverse, until you get to the penultimate layer

Propagating examples forward

Suppose we have a sample of data:


( x ( 1 ) . y ( 1 ) ) ({x^{(1)}},{y^{(1)}})

The neural network is 4-layer, where K=SL=L=4{K=S_L=L=4}K=SL=L=4

Forward propagation is calculated from the input layer to the output layer in the order of the neural network.

Back propagation example

  1. Calculate from the error of the last layer:

    Error = activation unit forecasts a (4) {a} ^ {} (4) a (4) and y (k) between the actual value y ^ {} (k) y (k) the difference between

  2. δ\deltaδ is used to represent the error, and the error = the predicted value of the model – the true value


    Delta t. ( 4 ) = a ( 4 ) y \delta^{(4)} = a^{(4)} -y

  3. The error of the previous layer


Delta t. ( 3 ) = ( Θ ( 3 ) ) T Delta t. ( 4 ) g ( z ( 3 ) ) \delta^{(3)}=\left({\Theta^{(3)}}\right)^{T}\delta^{(4)}\ast g’\left(z^{(3)}\right)

(3) the g ‘(z) g (z ^ {} (3)) g’ (3) (z) is the derivative of the shape function SSS, specific expression is:


g ( z ( 3 ) ) = a ( 3 ) ( 1 a ( 3 ) ) g'(z^{(3)})=a^{(3)}\ast(1-a^{(3)})

  1. The error in the previous layer


Delta t. ( 2 ) = ( Θ ( 2 ) ) T Delta t. ( 3 ) g ( z ( 2 ) ) \delta^{(2)}=(\Theta^{(2)})^{T}\delta^{(3)}\ast g'(z^{(2)})

The first layer is the input variable, and there is no error

  1. Suppose lambda=0 lambda=0 lambda=0 if no regularization is done


partial J ( Θ ) partial Θ i j l = a j ( l ) Theta. i ( l + 1 ) \frac{\partial J(\Theta)}{\partial \Theta_{ij}^{l}}=a_j^{(l)}\theta_i^{(l+1)}

Explain the meanings of the upper and lower signs in the above formula:

  • LLL stands for what level

  • JJJ represents the subscript of the activation unit in the computing layer

  • Iii is the subscript of the error element

algorithm

  • The forward propagation method is used to calculate the activation units of each layer

  • The error of the last layer is calculated by using the real result of the training set and the prediction result of the neural network

  • Finally, all the errors up to the second layer are calculated by using the back propagation method.

    Triangle {(l)}_{ij}△(l)ij = triangle {(l)}_{ij}△(l)ij = triangle {(l)}_{ij}△(l)ij


    D ( l ) i j D{(l)}_{ij}

Intuitive understanding of back propagation

Principle of forward propagation

  • 2 input units; 2 hidden layers (not including bias units); 1 output unit
  • Subscript III is level III, subscript is level III is trait or attribute

There is a small problem in the picture, look at the bottom right corner of the screenshot!!

According to the conclusion of the back propagation method above:


Z 1 ( 3 ) = Θ 10 ( 2 ) 1 + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) Z^{(3)}_{1}=\Theta_{10}^{(2)}*1+\Theta_{11}^{(2)}*a^{(2)}_1+\Theta_{12}^{(2)}*a^{(2)}_2

Back propagation principle

)

Parameters on

In the above formula, how to use the back propagation method to calculate the derivative of the cost function is realized. How to expand the parameter from the matrix form to the vector form is introduced here

Gradient inspection

How do I solve for the derivative at a point

How do I take the derivative of some parameter θθ in the cost function

Summary of neural networks

The first job

When constructing a neural network, the first consideration is how to choose the network structure: how many layers and how many neural units per layer

  • The number of units in the first layer is the number of features in our training set.
  • The number of units in the last layer is the number of classes that are the result of our training set.
  • If the number of hidden layers is greater than 1, ensure that each hidden layer has the same number of cells. Generally, the more hidden layer cells the better.

Train neural network steps

  1. Random initialization of parameters
  2. Calculation of all hθ(x)h_{\theta}(x)hθ(x) by forward propagation method
  3. Write the code to calculate the cost function JJJ
  4. All partial derivatives are calculated using the back propagation method
  5. These partial derivatives are verified by numerical tests
  6. Use optimization algorithms to minimize the cost function