background

It is a common method to train a neural network by Backpropagation. There is no shortage of online papers on how backpropagation works. But rarely include an example using actual numbers. This article is my attempt to explain how it works and a concrete example. You can compare your calculations to make sure they understand back propagation correctly.

Python implements the back-propagation algorithm

You can go to Github and try a Python script I wrote for the back propagation algorithm.

Visualization of back propagation algorithm

For an interactive visualization of the neural network learning process, see my neural network visualization website.

Additional resources

If you find this tutorial useful and want to continue Learning about neural networks and their applications, I highly recommend checking out Adrian Rosebrock’s excellent tutorial Getting Started with Deep Learning and Python

An overview of the

For this tutorial, we will use a neural network with 2 input neurons, 2 hidden neurons, and 2 output neurons. In addition, the hidden layer and the output layer will include a Bias neuron. The basic structure here:









The forward propagation

Let’s look at the biases, weights, and inputs given by the current neural network at 0.05 and 0.10. For this we want to feed these inputs in advance although the network.

We calculate the total input of each hidden neuron, use the total input as a variable of the activation function (here we use the Sigmoid function), and then repeat the process for the output layer neuron.

Here’s how we calculate the h1 total input:


The Sigmoid function is then used to compute the h1 output:

Similarly, h2 output:

We repeat this process with the output layer neurons, using the output of the hidden layer neurons as the input. Here is the output of o1:



Similarly, o2 output:

Calculate total error

We can now calculate the square error sum of each output neuron:

For example, the expected output of O1 is 0.01, but the actual output is 0.75136507, so his error is:

Repeating the process results in an error of O2 (the expected output is 0.99)

Therefore, the total error of the neural network is

Back propagation process

The goal of backpropagation is to update the weights of the connections so that the actual output of each neuron is closer to the expected output, thereby reducing the error of each neuron and the entire network.

Output layer

So let’s think about omega 5, and we want to know how much of a change in omega 5 is going to affect the error



Omega 5



The chain rule











o1



We use the total error pairs

When I take the partial derivative,

The value of theta becomes 0 because

Does not affect the
o2The error.

Next, how much does the change in o1’s total input affect o1’s output?


Finally, calculate how much does the change in ω5 affect the total input of O1?


Put all three together:


The Delta rule — the weight correction is equal to the error times the input we can also combine this calculation into the form of the Delta rule:

(1)


make

(2)


because


so

(3)

(1) (2) (3


To reduce the error, we subtract this value from the current weight (learning rate is customizable, here we set it to 0.5) :

Repeating this process, we can get the weights ω6, ω7, and ω8:



We update the ω6, ω7, and ω8 values after we get the input weights for the new hidden layer neurons (that is, we use the old weights for back propagation)

Hidden layer

Next, we’ll go back and compute the new values ω1, ω2, ω3, and ω4. In the big picture, we need to calculate







out_h1
out_o1
out_o2

out_h1

From the first



then

Omega 5



Let’s plug in both


In the same way:

As a result,

And now we have it figured out

And then we calculate



Next we compute the partial derivative of h1’s total input with respect to ω1:


To sum up,


You could also write it this way



Now we can update ω1:

Repeat the process to calculate ω1, ω2, and ω3:



Finally, we have updated all the weights! We initially proposed 0.05 and 0.1 inputs, and the error on the network was 0.298371109. After the first round of back propagation, the total error is now 0.291027924. It may not seem like it has adjusted too much. But after the process is repeated 10,000 times, the error drops to, say, 0.000035085. At this moment, when we input 0.05 and 0.1, the two output neurons output 0.015912196 (VS expected 0.01) and 0.984065734 (VS expected 0.99) respectively.

If you go this far and find any mistakes or can think of a more simple explanation, please add my public account Jinkey-love.

Link to Original English