An overview of the


I’ve covered a lot of theory, but now it’s time to get real. Let’s try to build a neural network from scratch and train it to string the whole thing together.

To make it more intuitive and easy to understand, we follow the following principles:

  1. Do not use third-party libraries, so that the logic is simpler;
  2. No performance optimization: Avoid introducing additional concepts and techniques that increase complexity;

The data set


First, we need a data set. To facilitate visualization, we use a binary function as the target function, and then generate the dataset based on its sampling. Note: In real engineering projects, the objective function is unknown, but can be sampled based on it.

Fictitious objective function


o ( x . y ) = { 1 x 2 + y 2 < 1 0 other O (x, y) = \ begin {cases} 1 & x ^ 2 + y ^ 2 < 1 \ \ 0 & \ text {cases} {else} \ end

The code is as follows:

def o(x, y) :
    return 1.0 if x*x + y*y < 1 else 0.0
Copy the code

Generating data set

sample_density = 10
xs = [
    [-2.0 + 4 * x/sample_density, -2.0 + 4 * y/sample_density]
    for x in range(sample_density+1)
    for y in range(sample_density+1)
]
dataset = [
    (x, y, o(x, y))
    for x, y in xs
]
Copy the code

Data generated as: [[2.0, 2.0, 0.0], [2.0, 1.6, 0.0],…

The image is as follows:

Structural neural network

The activation function

import math

def sigmoid(x) :
    return 1 / (1 + math.exp(-x))
Copy the code

neurons

from random import seed, random

seed(0)

class Neuron:
    def __init__(self, num_inputs) :
        self.weights = [random()-0.5 for _ in range(num_inputs)]
        self.bias = 0.0

    def forward(self, inputs) :
        # z = wx + b
        z = sum([
            i * w
            for i, w in zip(inputs, self.weights)
        ]) + self.bias
        return sigmoid(z)
Copy the code

The neuron expression is: sigmoid(wx+b)\text{sigmoid}(\mathbf {w} \mathbf x+b) sigmoid(wx+b)

  • W \mathbf {w}w: vector, corresponding to the weights array in the code
  • BBB: corresponds to bias in the code

Note: parameters in neurons are randomly initialized. However, in order to ensure reproducible experiments, a random seed(0) is usually fixed.

The neural network

class MyNet:
    def __init__(self, num_inputs, hidden_shapes) :
        layer_shapes = hidden_shapes + [1]
        input_shapes = [num_inputs] + hidden_shapes
        self.layers = [
            [
                Neuron(pre_layer_size)
                for _ in range(layer_size)
            ]
            for layer_size, pre_layer_size in zip(layer_shapes, input_shapes)
        ]

    def forward(self, inputs) :
        for layer in self.layers:
            inputs = [
                neuron.forward(inputs)
                for neuron in layer
            ]
        # return the output of the last neuron
        return inputs[0]

Copy the code

Construct a neural network as follows:

net = MyNet(2[4])
Copy the code

At this point, we have a neural network (NET) that can call the neural network function it represents:

print(net.forward([0.0]))
Copy the code

The function value is 0.55… The neural network is an untrained network.

Training neural network

Loss function

First define a loss function:

def square_loss(predict, target) :
    return (predict-target)**2
Copy the code

Computing the gradient

Gradient calculation is quite complex, especially for deep neural networks. Back propagation algorithm is a special algorithm for computing neural network gradient.

Due to its complexity, the description is not expanded here, but you can refer to the detailed code below if you are interested. And now deep learning frameworks have automatic gradient calculation function.

Define the derivative function:

def sigmoid_derivative(x) :
    _output = sigmoid(x)
    return _output * (1 - _output)

def square_loss_derivative(predict, target) :
    return 2 * (predict-target)
Copy the code

Partial derivative (part of the data is cached in the forward function to facilitate derivation) :

class Neuron:.def forward(self, inputs) :
        self.inputs_cache = inputs

        # z = wx + b
        self.z_cache = sum([
            i * w
            for i, w in zip(inputs, self.weights)
        ]) + self.bias
        return sigmoid(self.z_cache)

    def zero_grad(self) :
        self.d_weights = [0.0 for w in self.weights]
        self.d_bias = 0.0

    def backward(self, d_a) :
        d_loss_z = d_a * sigmoid_derivative(self.z_cache)
        self.d_bias += d_loss_z
        for i in range(len(self.inputs_cache)):
            self.d_weights[i] += d_loss_z * self.inputs_cache[i]
        return [d_loss_z * w for w in self.weights]

class MyNet:.def zero_grad(self) :
        for layer in self.layers:
            for neuron in layer:
                neuron.zero_grad()

    def backward(self, d_loss) :
        d_as = [d_loss]
        for layer in reversed(self.layers):
            da_list = [
                neuron.backward(d_a)
                for neuron, d_a in zip(layer, d_as)
            ]
            d_as = [sum(da) for da in zip(*da_list)]
Copy the code
  • Partial derivatives are stored in d_weights and d_bias, respectively
  • The Zero_grad function is used to clear the gradient, including the partial derivatives
  • The backward function is used to compute partial derivatives and store their values accumulative

Update parameter

Update parameters with gradient descent method:

class Neuron:.def update_params(self, learning_rate) :
        self.bias -= learning_rate * self.d_bias
        for i in range(len(self.weights)):
            self.weights[i] -= learning_rate * self.d_weights[i]

class MyNet:.def update_params(self, learning_rate) :
        for layer in self.layers:
            for neuron in layer:
                neuron.update_params(learning_rate)
Copy the code

Implement training

def one_step(learning_rate) :
    net.zero_grad()

    loss = 0.0
    num_samples = len(dataset)
    for x, y, z in dataset:
        predict = net.forward([x, y])
        loss += square_loss(predict, z)

        net.backward(square_loss_derivative(predict, z) / num_samples)

    net.update_params(learning_rate)
    return loss / num_samples

def train(epoch, learning_rate) :
    for i in range(epoch):
        loss = one_step(learning_rate)
        if i == 0 or (i+1) % 100= =0:
            print(f"{i+1} {loss:4.f}")
Copy the code

Training for 2000 steps:

train(2000, learning_rate=10)
Copy the code

Note: a relatively large learning rate is used here, which is related to the project situation. The rate of learning in real projects is usually small

conclusion

The steps of this actual combat are as follows:

  1. A virtual objective function is constructed: o(x,y)o(x, y)o(x,y);
  2. Based on o (x, y) o (x, y) o (x, y) sampling, data set, the data set function: d (x, y) d (x, y) d (x, y)
  3. A fully connected neural network containing a hidden layer is constructed, that is, the neural network function f(x,y)f(x, y)f(x,y)
  4. Training the neural network using gradient descent method, let f (x, y) f (x, y) f (x, y) approximate d (x, y) d (x, y) d (x, y)

The most complicated part is finding the gradient, which uses a back-propagation algorithm. In real projects, the use of mainstream deep learning frameworks for development can save the gradient code, the threshold is lower.

In the lab’s “3D classification “experiment, the second data set is very similar to this actual combat, you can go into the actual operation.

Reference software

For more content and interactive version, please refer to App:

Neural networks and deep learning

Download from the App Store, Mac App Store, Google Play.