• How to build your own Neural Network from Scratch in Python
  • By James Loy
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: JackEggie
  • Proofreader: LSvih, Xionglong58

A guide for beginners to understand the inner workings of deep neural networks

Motivation for writing: In order to better understand deep learning myself, I decided to build a neural network from scratch without a deep learning library like TensorFlow. I believe that understanding the inner workings of neural networks is important for any aspiring data scientist.

This article contains some of the things I’ve learned that I hope will be useful to you.

What is a neural network?

Most articles on neural networks describe them by analogy with the brain. Without delving into parallels with the brain, I find it easier to understand a neural network simply by describing it as a mathematical function that maps a given input to a desired output.

The neural network consists of the following parts:

  • An input layer, x
  • Any number of hidden layers
  • An output layer, ŷ
  • A set of weights and deviations between layers, W and B
  • An optional activation function, sigma, contained in each of the hidden layers. In this tutorial, we will use the Sigmoid activation function.

The diagram below shows the architecture of a 2-layer neural network (note: when calculating the number of layers in a neural network, the input layer is usually excluded)

Architecture of 2 layer neural network

Creating a neural network class in Python is simple.

class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1].4) 
        self.weights2   = np.random.rand(4.1)                 
        self.y          = y
        self.output     = np.zeros(y.shape)
Copy the code

Training neural network

The ŷ outputs of a simple 2-layer neural network are as follows:

You may have noticed that in the equation above, only two variables — the weight W and bias B — have an impact on the output ŷ.

Of course, reasonable weights and deviations will determine the accuracy of the forecast. The process of fine-tuning the weights and biases of input data is the process of training the neural network.

Each iteration of the training process includes the following steps:

  • ŷ calculates and saves the values of the forecast output, which is known as feedforward
  • Update weight and bias, namely back propagation

The sequence diagram below illustrates this process.

Feedforward process

As we can see in the sequence diagram above, feedforward is just a simple computational process, and for a basic 2-layer neural network, its output is:

Let’s do this by adding a feedforward function to our Python code. Note that for the sake of simplicity, we assume a bias of 0.

class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1].4) 
        self.weights2   = np.random.rand(4.1)                 
        self.y          = y
        self.output     = np.zeros(self.y.shape)

    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))
Copy the code

But do we still need a way to assess the “accuracy” of our predictions (that is, how good our predictions are)? And the loss function allows us to do that.

Loss function

There are many loss functions available, and our choice of loss function should be determined by the nature of the problem itself. In this tutorial, we will use a simple sum of squares error as our loss function.

That is, the sum of squares error is just the sum of the difference between each predicted value and the actual value. We square the difference so that we can evaluate the absolute value of the error.

The goal of training is to find an optimal set of weights and deviations that minimize the loss function.

Back propagation process

Now that we have the predicted error (loss), we need to find a way to propagate the error back and update our weights and biases.

To arrive at the appropriate amount to adjust the weight and deviation, we need to calculate the derivative of the loss function with respect to the weight and deviation.

If you recall from calculus, calculating the derivative of a function is calculating the slope of a function.

Gradient descent algorithm

If we have figured out the derivative, we can simply update the weight and bias by increasing/decreasing the derivative (see figure above). This is known as gradient descent.

However, we cannot directly calculate the derivative of the loss function with respect to the weight and deviation, because the weight and deviation are not included in the equation of the loss function. So, we need the chain rule to help us do this.

To update the weights use the chain rule to find the derivative of the function. Note that, for the sake of simplicity, we show only the partial derivatives of the neural network assumed to be layer 1.

Oh! This is ugly, but it gives us what we need — the derivative of the loss function with respect to the weight, so that we can adjust the weight accordingly.

Now that we know how to do it, let’s add a backpropagation function to the Pyhton code.

class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1].4) 
        self.weights2   = np.random.rand(4.1)                 
        self.y          = y
        self.output     = np.zeros(self.y.shape)

    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

    def backprop(self):
        The derivative of the loss function with respect to weights2 and weights1 is obtained by the chain rule
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        Update the weight with the derivative (slope) of the loss function
        self.weights1 += d_weights1
        self.weights2 += d_weights2
Copy the code

If you need a deeper understanding of calculus and the chain rule’s application to back propagation, I highly recommend 3Blue1Brown’s tutorial.

Watch video tutorials

Achieve mastery through a comprehensive

Now that we have the full Python code for feedforward and backpropagation, let’s apply the neural network to an example and see how it looks.

Our neural network should learn an ideal set of weights to represent this function. Note that the process of just solving the weights is not easy for us either.

Let’s run 1,500 training iterations of the neural network and see what happens. Looking at the loss changes for each iteration in the figure below, we can clearly see that the loss monotonically decreases to a minimum. This is consistent with the gradient descent algorithm we discussed earlier.

Let’s take a look at the final prediction (output) of the neural network after 1500 iterations.

Prediction results after 1500 training iterations

We did it! Our feedforward and back propagation algorithms train the neural network successfully and the predicted results converge to the real value.

Note that there may be slight deviations between predicted and actual values. We need this bias because it prevents overfitting and allows neural networks to be better generalized to invisible data.

Follow-up learning tasks

Fortunately, our learning journey is not over yet. We still have a lot to learn about neural networks and deep learning. Such as:

  • Besides the Sigmoid function, what activation functions can we use?
  • Use learning rates when training neural networks
  • Use convolution for image classification tasks

I’ll be writing more on these topics, so be sure to follow me on Medium for updates!

conclusion

Of course, I also learned a lot by writing my own neural network from scratch.

While deep learning libraries like TensorFlow and Keras make it easy to build deep neural networks, even if you don’t fully understand the inner workings of neural networks, I’ve found a deep understanding of neural networks to be beneficial for aspiring data scientists.

This exercise took a lot of my time, and I hope it was helpful for you too!

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.