Learning neural network forward propagation from scratch

Make writing a habit together! This is the fifth day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

Build forward propagation from scratch

To understand how forward propagation works, we will build a neural network with a simple example where the input of the neural network is (1, 1) and the corresponding output is 0.

The neural network we use has a hidden layer, an input layer and an output layer. There are more neurons in the hidden layer than in the input layer because the input layer can be represented in larger dimensions.

Compute the hidden layer node value

When forward propagation is carried out for the first time, weights need to be assigned to all connections, which are randomly selected based on Gaussian distribution. However, the final weight after the neural network training process does not need to follow a specific distribution. It is assumed that the initial network weight is as follows:

Next, we multiply the input by the weight to calculate the value of the hidden unit in the hidden layer. The node unit value of the hidden layer is calculated as follows:

H_1 =1\times 0.8+1\times 0.2 =1\ \ h_2=1\times 0.4+1\times 0.9 = 1.3\\ h_3=1\times 0.3+1\times 0.5 = 0.8

The following figure shows the network diagram after calculating the node values of the hidden layer:

In the above steps, we calculated the value of the hidden node. For simplicity, we do not add bias entries to the nodes of the hidden layer. Next, we will pass the value of the hidden layer through the activation function to add nonlinearity to the output.

NOTE: If we do not apply nonlinear activation functions in the hidden layer, the neural network becomes essentially a linear connection from input to output.

Apply the activation function

Activation functions can be applied across multiple network layers in a network, and with them they can be highly non-linear, which is critical for modeling complex relationships between inputs and outputs. In our example, the Sigmoid activation function is used as follows:

sigmoid(x)=\frac 1 {1+e^{-x}}

By applying the Sigmoid activation function to the hidden layer, we get the following results:

Final \ _h_1 = sigmoid (1.0) = 0.73 \ \ final \ _h_2 = sigmoid (1.3) = 0.78 \ \ final \ _h_3 = sigmoid (0.8) = 0.69

The following figure shows the node values of the hidden layer after applying the nonlinear activation function:

For more information about activation functions, see Common Activation functions for Deep Learning.

Calculate the output layer value

Now that we have evaluated the hidden layer, we will finally evaluate the output layer. In the figure below, we connect the hidden layer values to the output layer with randomly initialized weight values. Calculate the sum of the product of the hidden layer value and the weight value, and get the output value:

Output = 0.73\times 0.3+0.79\times 0.5 + 0.69\times 0.9= 1.235

Using the hidden layer value and weight value, we can get the output value of the network, as shown in the figure below:

Because the first forward propagation uses random weights, the value of the output neuron differs greatly from the target, with a difference of +1.235 (the target value is 0).

Calculated loss value

Loss values (also known as cost functions) are values optimized in neural networks. To understand how to calculate the loss value, we analyze the following two cases:

Continuous variable prediction
Classification (discrete) variable prediction

Losses are calculated during continuous variable prediction

In general, when the predicted value is a continuous variable, the loss function uses square error, that is, we try to minimize the mean square error by changing the weight value associated with the neural network:

$J(\theta)=\frac 1 m \sum _{i=1} ^m(h(x_i)-y_i)^2$

Where yiy_iyi is the actual value, h(x) H (x) H (x)h(x) is the network model where we transform input XXX to obtain the predicted value YYy, and MMM is the number of data in the input data set.

Losses are calculated in categorical (discrete) variable prediction

When the variables to be predicted are discrete variables (that is, there are only a few categories in the variables), we usually use the categorical cross entropy loss function. When the variables to be predicted have two different values, the loss function is the dichotomous cross entropy, while when the variables to be predicted have multiple different values, the loss function is the multi-classification cross entropy.

The dichotomous cross entropy formula is as follows:

(ylog (p) + (1 – y) log (1 – p))

Multi-classification cross entropy is defined as follows:

Yyy is the real value corresponding to the input, PPP is the predicted value of the output, and NNN is the total amount of data.

Calculate the network loss value

Since the predicted results in the above example are continuous, the loss function value is the mean square error, which is calculated as follows:

Error = 1.235^2 = 1.52

Network forward propagation using Python

From the above learning, we know that error values can be obtained in forward propagation by performing the following steps on the input data:

Randomly initialize weights
Compute the hidden layer node value by multiplying the input value by the weight
Perform activation on hidden layer values
Connect the hidden layer values to the output layer
Calculate the squared error loss

Calculate the squared error loss of all data points:

import numpy as np
def feed_forward(inputs, outputs, weights) :
     pre_hidden = np.dot(inputs,weights[0])+ weights[1]
     hidden = 1/ (1+np.exp(-pre_hidden))
     out = np.dot(hidden, weights[2]) + weights[3]
     squared_error = (np.square(pred_out - outputs))
     return squared_error
Copy the code

In the previous function, we took the input variable values, weights (randomly initialized if it was the first iteration), and the actual output from the dataset as input to the feed_forward function. We calculate the value of the hidden layer by matrix multiplication of the inputs and weights. Also, add bias values to the hidden layer:

pre_hidden = np.dot(inputs,weights[0])+ weights[1]
Copy the code

Weights [0] is the weight value and weights[1] is the offset value, which is used to connect the input layer to the hidden layer. After calculating the value of the hidden layer, you can use the activation function on the value of the hidden layer:

hidden = 1/ (1+np.exp(-pre_hidden))
Copy the code

The output of the hidden layer is calculated by multiplying the output of the hidden layer by the weight that connects the hidden layer to the output, and then adding a bias item to the output:

pred_out = np.dot(hidden, weights[2]) + weights[3]
Copy the code

Once the output is calculated, we can calculate the squared error loss for each input as follows:

squared_error = (np.square(pred_out - outputs))
Copy the code

In the previous code, pred_out is the predicted output, and outputs is the actual output of the input response. With these simple steps, we can calculate the loss value as the network propagates forward.