This article is participating in Python Theme Month. See the link for details

Machine learning is where we describe something in mathematical language based on our understanding of real tasks, starting with a simple problem and taking you step by step from a Numpy-based solution to a Pytorch approach to solving the problem. The takeaway is that you need to understand what a Tensor is, and the Tensor in Pytorch can be understood as running on the GPU. There are also general steps for building and training neural networks using PyTorch. The examples and some of the content are taken from pyTorch’s official documentation, which gives you an introduction and insight into the framework.

The data is uniformly sampled from -3.14 to 3.14, and then the data is passed through sin(x) to get Y. We design a neural network to approximate a model to approximate the sin function.

A little request for you

To understand this post, you need to understand deep learning. You need to understand the basic syntax of Python. You can write code in Python. Familiar with model definitions and training in deep learning projects and understand pyTorch framework.

Use NUMpy to realize neural network

Before introducing PyTorch, we first implement the network using Numpy. Numpy is a general framework for scientific computation, providing representation of vectors and matrices and manipulation of these data structures. But Numpy was not designed for deep learning, so there is no support for deep learning such as computing. Here a high order polynomial is used to fit the sine function, and forward and backward transfer of the network is manually implemented by using numpy operations.

import numpy as np
import math
import matplotlib.pyplot as plt

def dataset() :
	x = np.linspace(-math.pi, math.pi, 2000)
	y = np.sin(x)

	return x,y

class Model:
	def __init__(self) :

		Initialize model weights
		self.a = np.random.randn()
		self.b = np.random.randn()
		self.c = np.random.randn()
		self.d = np.random.randn()

		self.epoches = 2000
		self.learning_rate = 1e-6
	def train(self,x,y) :
		for epoch in range(self.epoches):
			y_pred = self.a + self.b *x + self.c*x **2 + self.d* x**3

			# Calculate the loss
			loss = np.square(y_pred - y).sum(a)if epoch % 100= =99:
				print(epoch,loss)
			
			grad_y_pred = 2.0 * (y_pred -y)
			
			grad_a = grad_y_pred.sum()		
			grad_b = (grad_y_pred*x).sum()		
			grad_c = (grad_y_pred*x**2).sum()		
			grad_d = (grad_y_pred*x**3).sum()		

			self.a -= self.learning_rate * grad_a
			self.b -= self.learning_rate * grad_b
			self.c -= self.learning_rate * grad_c
			self.d -= self.learning_rate * grad_d
		print(f"Result: y= {self.a} + {self.b}x + {self.c}x^2 + {self.d}x^3")	

if __name__ == "__main__":
	x,y = dataset()
#	plt.plot(x,y)
#	plt.show()

	model = Model()
	model.train(x,y)

Copy the code

  • Preparing the data set
  • To define the model, that is, we find a set of functions, which can’t be too complicated or too simple, and we fit it with polynomials of order 3
  • By defining the loss function, we measure the difference between what we found and what we got right, how close we found the function to the real function, and give us a hint as to how to train
  • Gradient descent, we step by step along the gradient direction to find the corresponding parameters for the function to give a value close to the true value
99 1889.940637866053 199 1279.4185079879578 299 867.9029409104106 399 590.2627387918794 499 402.7629925812673 599 276.0117451685029 699 190.23937687341535 799 132.13673749120156 899 92.73579912145074 999 65.98802589121729 1099 47.81001788773136 1199 35.4423216183395 1299 27.018280854221068 1399 21.27387063441539 1499 17.352238178693227 1599 14.671917339753634 1699 12.83788922530275 1799 11.581499958306198 1899 10.719831814469643 1999 10.128200865575229 Result: Y = 0.030062902508905868 + 0.8349093658197444x + -0.005186350931643299x^2 + -0.09022505151138818x^3Copy the code

The introduction of Tensor

Numpy, as a library of classical operation matrices, is a good choice for developing neural networks, but GPU cannot be used to accelerate their numerical calculations. For deep neural networks, it’s unacceptable to do calculations without a GPU, and gpus are usually 50 times faster or better, so we need to replace the matrix in the code above with a Tensor

Here we introduce PyTorch to the basic concept of Tensor. PyTorch’s Tensor is conceptually similar to Numpy’s array. The Tensor can also be thought of as an N-dimensional array. PyTorch provides a number of functions to operate on the Tensor. But apart from being an array representation, Tensor is designed with a lot of deep learning scenarios in mind, so it adds a lot of functionality, like tracking computational graphs and gradients.

Most importantly, also unlike Numpy, PyTorch Tensors can use a GPU to speed up calculations. To run PyTorch Tensor on a GPU, you just need to specify the right device.

Here we use PyTorch Tensor to fit sine functions of order 3 polynomials. Like the Numpy example above, we need a manual tensor to do forward and backward transmission of the network.

import torch
import math


def dataset(device) :
	dtype = torch.float
	
	x = torch.linspace(-math.pi,math.pi,2000,device=device,dtype=dtype)
	y = torch.sin(x)
	
	return x,y

class Model:
	def __init__(self) :
		self.dtype = torch.float
		self.device = torch.device("cpu")

		
		self.a = torch.randn((), device=self.device,dtype=self.dtype)
		self.b = torch.randn((), device=self.device,dtype=self.dtype)
		self.c = torch.randn((), device=self.device,dtype=self.dtype)
		self.d = torch.randn((), device=self.device,dtype=self.dtype)

		self.learning_rate = 1e-6
		self.epoches = 2000
	def train(self,x,y) :
		for epoch in range(self.epoches): 
			y_pred = self.a + self.b *x + self.c*x **2 + self.d* x**3

			# Calculate the loss
			loss = (y_pred - y).pow(2).sum().item()

			if epoch % 100= =99:
				print(epoch,loss)

			grad_y_pred = 2.0 *  (y_pred - y)	
			grad_a = grad_y_pred.sum()		
			grad_b = (grad_y_pred*x).sum()		
			grad_c = (grad_y_pred*x**2).sum()		
			grad_d = (grad_y_pred*x**3).sum()		

			self.a -= self.learning_rate * grad_a
			self.b -= self.learning_rate * grad_b
			self.c -= self.learning_rate * grad_c
			self.d -= self.learning_rate * grad_d
		print(f"Result: y= {self.a.item()} + {self.b.item()}x + {self.c.item()}x^2 + {self.d.item()}x^3")	

if __name__ == "__main__":

	device = torch.device("cpu")

	x,y = dataset(device)


	model = Model()
	model.train(x,y)


Copy the code

Automatic calculation of gradient (Autograd)

In the above two examples, we manually realized the forward propagation and direction propagation of the neural network. It seems that it is not difficult for a simple two-layer shallow neural network to write a back propagation, but it is not easy for a complex and large network to realize the back propagation by itself.

Fortunately, frameworks like PyTorch TnSorFlow already provide automatic differentiation in back propagation to automatically compute gradients of neural network parameters. Autograd in PyTorch provides this functionality. When using Autograd, the forward passing of the network defines a computation graph; The nodes in this diagram are going to be the tensor, and the edges are going to be the functions from the input tensor to the output tensor. By propagating back from this graph, you can easily calculate the gradient.

As complicated as this sounds, it’s actually quite simple to use, and the developers used the PyTorch framework to build the network, allowing us to focus more on architecture than calculating gradients in backpropagation. Each tensor represents a node in the computational graph. For example, if X is a tensor, with x.equires_grad =True, then x.grad is another tensor, holding the gradient of X relative to some scalar value (the loss function value).

Here we use PyTorch Tensor and Autograd for our example of a third-order polynomial fitting sin function, so we don’t have to manually implement backward propagation of the network.

import torch
import math


def dataset(device) :
	dtype = torch.float
	
	x = torch.linspace(-math.pi,math.pi,2000,device=device,dtype=dtype)
	y = torch.sin(x)
	
	return x,y

class Model:
	def __init__(self) :
		self.dtype = torch.float
		self.device = torch.device("cpu")

		
		self.a = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)
		self.b = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)
		self.c = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)
		self.d = torch.randn((), device=self.device,dtype=self.dtype,requires_grad=True)

		self.learning_rate = 1e-6
		self.epoches = 2000

	def train(self,x,y) :
		for epoch in range(self.epoches): 
			y_pred = self.a + self.b *x + self.c*x **2 + self.d* x**3

			# Calculate the loss
			loss = (y_pred - y).pow(2).sum(a)if epoch % 100= =99:
				print(epoch,loss)


			loss.backward()
			with torch.no_grad():
				self.a -= self.learning_rate * self.a.grad
				self.b -= self.learning_rate * self.b.grad
				self.c -= self.learning_rate * self.c.grad
				self.d -= self.learning_rate * self.d.grad

				self.a.grad = None
				self.b.grad = None
				self.c.grad = None
				self.d.grad = None

			

		print(f"Result: y= {self.a.item()} + {self.b.item()}x + {self.c.item()}x^2 + {self.d.item()}x^3")	

if __name__ == "__main__":

	device = torch.device("cpu")

	x,y = dataset(device)


	model = Model()
	model.train(x,y)

Copy the code
  • device = torch.device("cuda:0")Can be formulated to use the device as GPU
  • The data set, the input and the label variables are now denoby tensor, defining the tensor if you don’t specify itrequires_grad=False,It means that the tensor does not calculate its gradient during back propagation
  • Use autograd to calculate back propagation when calledbackward(), the loss gradient of all Tensors with require_grad=True in the figure will be calculated. After calculation, A. Rad, B. Rad, C. Rad and D. Rad will be tensor with loss gradients relative to A, B, C and D, respectively
  • Place the manual process of using gradient descent to update parameters intorch.no_grad(), this is because we do not need to track gradients when updating weight parameters, so prevent
At the same time, you need to tensor at home at home. At the same time, you need to tensor at home at home. At the same time, you need to tensor at home. Grad_fn =<SumBackward0>) 399 tensor(1137.0117, grad_fn=<SumBackward0>) Grad_fn =<SumBackward0>) tensor 699 at home Grad_fn =<SumBackward0>) 799 tensor(253.1588, grad_fn=<SumBackward0>) 899 tensor Grad_fn =<SumBackward0>) 999 tensor(123.3137, grad_fn=<SumBackward0>) Grad_fn =<SumBackward0>) 1199 tensor(62.7157, grad_fn=<SumBackward0>) 1299 tensor( Grad_fn =<SumBackward0>) at tensor(34.3047, grad_fn=<SumBackward0>) at tensor(26.3733, Grad_fn =<SumBackward0>) tensor 1699 at home Grad_fn =<SumBackward0>) tensor(14.5921, grad_fn=<SumBackward0>) Grad_fn =<SumBackward0>) 1999 tensor(11.5835, grad_fn=<SumBackward0>) Result: Y = 0.04615207761526108 + 0.8281480669975281x + -0.007962002418935299x^2 + -0.08926331996917725x^3Copy the code

Under the hood, each

import torch
import math

def dataset() :
    dtype = torch.float
    device = torch.device("cpu")

    x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
    y = torch.sin(x)

    return x,y
"" You can customize a backpropagation class that inherits' torch. Autograd. Function 'and implements' forward' and 'backward' ""
class LegendrePolynomial3(torch.autograd.Function) :

    @staticmethod
    def forward(ctx, input) :
        """ In forward propagation, we take a tensor with inputs and we return a tensor with outputs. CTX is a context object that can be used to store backcomputed information. You can use the ctx.save_for_BACKWARD method to cache arbitrary objects for use in backward-passing. "" "
        ctx.save_for_backward(input)
        return 0.5 * (5 * input支那3 - 3 * input)

    @staticmethod
    def backward(ctx, grad_output) :
        In the backward pass, we receive a tensor containing the loss gradient with respect to the output, and we need to calculate the loss gradient with respect to the input. "" "
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input支那2 - 1)

class Model:

    def __init__(self) :
        self.dtype = torch.float
        self.device = torch.device("cpu")

        self.a = torch.full((), 0.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.b = torch.full((), -1.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.c = torch.full((), 0.0, device=self.device, dtype=self.dtype, requires_grad=True)
        self.d = torch.full((), 0.3, device=self.device, dtype=self.dtype, requires_grad=True)

        self.learning_rate = 5e-6
        self.epoches = 2000

    def train(self,x,y) :
        for t in range(self.epoches):
            P3 = LegendrePolynomial3.apply

            y_pred = self.a + self.b * P3(self.c + self.d * x)
            loss = (y_pred - y).pow(2).sum(a)if t % 100= =99:
                print(t, loss.item())

            You need to calculate the gradient of gradient tensor during back propagation
            loss.backward()

            with torch.no_grad():
                # Use gradient descent to update weights
                self.a -= self.learning_rate * self.a.grad
                self.b -= self.learning_rate * self.b.grad
                self.c -= self.learning_rate * self.c.grad
                self.d -= self.learning_rate * self.d.grad

                Set the gradient to 0 manually after updating parameters with gradients
                self.a.grad = None
                self.b.grad = None
                self.c.grad = None
                self.d.grad = None 

        print(f'Result: y = {self.a.item()} + {self.b.item()} * P3({self.c.item()} + {self.d.item()} x)')

if __name__ == "__main__":
    x,y = dataset()

    model = Model()
    model.train(x,y)
Copy the code

In PyTorch, we can easily define our own Autograd operator by defining a subclass that inherits Torch. Autograd. Function and implements forward functions (forwad and backward functions). We can then use our new Autograd operator by constructing an instance and calling it like a function, and passing the Tensors that contain the input data.

Let’s define the function y=a+bP3(c+dx)y =a+ bP_3(c+dx)y=a+bP3(c+dx) instead of y=a+ Bx +cx2+ DX3y =a+ Bx +cx ^2 +dx ^3y=a+ Bx +cx2+dx3 here P3 (x) = 12 x3 (5-3 x) P_3 (x) = \ frac {1} {2} (5 x ^ 3-3 x) P3 (x) = 21 (5 x3-3 x). The Legendre polynomial P3(x) is the 3rd order polynomial P_3(x) is the 3rd order polynomial P3(x) is the 3rd order polynomial P_3(x) = \frac{1}{2}(5x^ 3-3x)$

```python import torch import math def dataset(): dtype = torch.float device = torch.device("cpu") x = torch.linspace(-math.pi, math.pi, 2000, device=device, Dtype =dtype) y = torch. Sin (x) return x,y "" The class inherits' torch.autograd.Function 'and needs to implement' forward 'and' backward '"" class legendreideial3 (torch.autograd.function): @staticMethod def forward(CTX, input): """ In forward propagation, we take a tensor with inputs and return a tensor with outputs. CTX is a context object that can be used to store backcomputed information. You can use the ctx.save_for_BACKWARD method to cache arbitrary objects for use in backward-passing. Save_for_backward (input) return 0 * (5 * input ** 3-3 * input) @staticmethod def backward(CTX, Grad_output): """ in the backpass, we receive a tensor containing the loss gradient with respect to the output, and we need to calculate the loss gradient with respect to the input. "" input, = ctx.saved_tensors return grad_output * 1.5 * (5 * input ** 2-1) class Model: def __init__(self): Self.dtype = self.float self.device =self.device (" CPU ") self.a = self.full ((), 0.0, device=self.device, Self.dtype =self.dtype, requires_grad=True) self.b = torch. Full ((), -1.0, device=self.device, dtype=self.dtype, Requires_grad =True) self.c = torch. Full ((), 0.0, device=self.device, dtype=self.dtype, Requires_grad =True) self.d = torch. Full ((), 0.3, device=self.device, dtype=self.dtype, requires_grad=True) self.learning_rate = 5e-6 self.epoches = 2000 def train(self,x,y): for t in range(self.epoches): P3 = LegendrePolynomial3.apply y_pred = self.a + self.b * P3(self.c + self.d * x) loss = (y_pred - y).pow(2).sum() if t % 100 == 99: print(t, loss.item()) # calculate gradient loss. Backward () with torch. No_grad (): Self.a -= self.learning_rate * self.a.rad self.b -= self.learning_rate * self.b.rad self.c -= Self.a.rad = None self.a.rad = None self.a.rad = None self.a.rad = None self.a.rad = None self.a.rad = None self.b.grad = None self.c.grad = None self.d.grad = None print(f'Result: y = {self.a.item()} + {self.b.item()} * P3({self.c.item()} + {self.d.item()} x)') if __name__ == "__main__": x,y = dataset() model = Model() model.train(x,y)Copy the code

The model of the Pytorch

Computational graphs and Autograd are a very powerful paradigm for defining complex operators and automating derivatives, but the original Autograd can be a little too low-level for large neural networks.

When constructing neural networks, we often think of computing in a hierarchical manner, in which the parameters that need to be learned are optimized by gradient descent during training.

In TensorFlow, packages like Keras, Tensorflow-Slim, and TFLearn provide a higher level of abstraction over the raw computational graph, making it convenient to build neural networks with these advanced methods.

In PyTorch, the NN module also provides this functionality. An NN packet defines a set of modules, roughly equivalent to the layers of a neural network. A module receives input tensors and computes output tensors, but can also hold internal states, such as tensors containing learnable parameters. The NN module also defines a set of commonly used loss functions, which are often used in training neural networks.

In this example, we use the NN module to implement the network of our polynomial model.

p = torch.tensor([1.2.3])
xx = x.unsqueeze(-1).pow(p)
Copy the code

In this example, y is a linear combination of x, x^2, and x^3, so you can define a linear neural network layer to simulate the calculation by changing each input x value to a vector x, x^2, x^3, that is, taking the input x to the powers of (1,2,3) respectively, You have a 3-dimensional tensor of x, x^2, x^3

And if you’re not familiar with the tensor shape you should watch this video, unsqueeze is the tensor that adds a dimension, that takes the tensor from 2000 to 2000,1. P is the tensor of the shape 3,1, x goes to 2000,3 by broadcasting it to the power of p

model = torch.nn.Sequential(
    torch.nn.Linear(3.1),
    torch.nn.Flatten(0.1))Copy the code

Nn. Sequential container is defined as the model. Nn. Sequential container is similar to a container, in which layers can be superimposed on top of each other. The first layer is a linear transformation of the inputs, three neurons corresponding to the input tensor three dimensions a, B and C weights and a paranoia, and then a flatten turns the 3-dimensional vector into a 1-dimensional output. So you output the neural network a 1-dimensional tensor corresponding to y

loss_fn = torch.nn.MSELoss(reduction='sum')
Copy the code

The loss function uses the mean square deviation, which is not implemented this time. Instead, PyTorch is used to provide the loss function. The pyTorch package has implemented all the current mainstream loss functions one by one

y_pred = model(xx)
Copy the code

Forward propagation: After passing x into the model, the predicted value y is computed. Since the Module object call method is implemented in Python, an instance of a class is called as if it were a function, so passing the input into the model can get the predicted value through a 2-layer network

loss = loss_fn(y_pred, y)
    if t % 100= =99:
        print(t, loss.item())
Copy the code

Take the predicted and true values of the model as input to the loss function, calculate the loss, and then output it. Because it’s also a tensor you need loss.item()

model.zero_grad()
Copy the code

In the backpropagation process, the gradient needs to be set to 0 before calculating the gradient

loss.backward()
Copy the code

Back propagation: You can learn the gradient of parameters relative to the loss function in the calculation model. Call this function for the Tensor of all requires_grad=True to calculate the gradient

with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
Copy the code

The next thing you do is update the weights with the calculated gradients. Each parameter is a Tensor, and you get those parameters by calling model.parameters() and then update them with the product of their gradient and their learning rate

The complete code

# -*- coding: utf-8 -*-
import torch
import math


Create a tensor for input and outputs
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

In this example, y is a linear combination of x, x^2, x^3, so you can define a linear neural network layer to simulate the calculation, and change each input x value into a vector x, x^2, x^3, that is, take the input x to the powers of (1,2,3) respectively. You have a 3-dimensional tensor of x, x^2, x^3
p = torch.tensor([1.2.3])
xx = x.unsqueeze(-1).pow(p)


And if you're not familiar with the tensor shape you should watch this video, Unsqueeze is tensor
# Add a dimension that deforms (2000,) into (2000,1) tensor. P is the tensor of the shape 3,1, when you take x to the power of p
# x changed to (2000,3) by broadcast


Nn.Sequential container is similar to a container, in which layers can be superimposed on top of each other
The first layer is linear transformation of the inputs. Three neurons correspond to the input tensor of three dimensions A, B and C weights
# and a bigotry, and then finally a flatten turns a 3-dimensional vector into a 1-dimensional output, so put a 1-dimensional tensor out of the neural network for Y
model = torch.nn.Sequential(
    torch.nn.Linear(3.1),
    torch.nn.Flatten(0.1))Pytorch has implemented all the main loss functions in the pyTorch package
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000) :# forward propagation: After passing x to the model, calculate the predicted value y, because in Python the Module object __call__ method is implemented, so
    # When calling an instance of a class, it is just like calling a function, so passing the input into the model can get the predicted value through a 2-layer networkY_pred = model(xx) then the next thing you need to do is update the weights with calculated gradients. Every parameter is a Tensor, and you can use 'model.parameters()' as input to the loss function to calculate the loss and output it, Because return is also a tensor you need 'loss.item()' loss = loss_fn(y_pred, y)if t % 100= =99:
        print(t, loss.item())


    In the process of back propagation, the gradient needs to be set to 0 before calculating the gradient
    model.zero_grad()

    The Tensor for all requires_grad=True calculates the gradient of parameters relative to the loss function
    loss.backward()


    The next thing you need to do is update the weights with the calculated gradients. Every parameter is a Tensor. You can get these parameters by calling 'model.parameters()'
    The parameter is then updated with the product of its gradient and learning rate
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# The weight of the first layer through 'model' is the parameter we want to solve
linear_layer = model[0]


print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')
Copy the code

The optimizer

So far, the Tensors manually update those learnable parameters (weights of the model) with torch.no_grad(). For simple optimization algorithms like stochastic gradient descent, it is not difficult to implement, but in practical projects, we will also use AdaGrad, RMSProp, Adam and other more complex optimizers to train the neural network to update parameters. The Optim module in PyTorch provides an implementation of the current mainstream optimization algorithm.

# -*- coding: utf-8 -*-
import torch
import math


Create a tensor for input and outputs
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)


p = torch.tensor([1.2.3])
xx = x.unsqueeze(-1).pow(p)


model = torch.nn.Sequential(
    torch.nn.Linear(3.1),
    torch.nn.Flatten(0.1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')



# PyTorch has a feature for paid optimizers that I've shared with you in my pyTorch video
How to update gradients to parameters during training, i.e. a strategy to update parameters one by one, so that training can quickly converge
# Function global minimum. Most of pyTorch's commonly used optimizers are pre-implemented
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    
    y_pred = model(xx)

    
    loss = loss_fn(y_pred, y)
    if t % 100= =99:
        print(t, loss.item())


    Use the optimizer to set all gradients of the variables to be updated (that is, the learnable weights of the model) to 0 before starting back propagation.
    # This is because by default the gradient accumulates in the buffer whenever.backward() is called (rather than being overwritten).
    optimizer.zero_grad()



    # Back propagation: Calculate the gradient for each parameter of the model
    loss.backward()

    Call the optimizer step function to update the learnable parameters in the model
    optimizer.step()


linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

Copy the code

Custom model

Most of the time, the Module to be defined is much more complex than the sequence model above. In these cases, you can define a custom Module by inheriting nn.Module.

# -*- coding: utf-8 -*-
import torch
import math


class Polynomial3(torch.nn.Module) :
    def __init__(self) :
        """ Four parameters are created in the constructor and assigned to the member variables of the class. "" "
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x) :
        """ In the forward propagation function, the tensor that takes an input, the tensor that returns an output. You can use modules defined in constructors (network structures) and any operators on tensors. "" "
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self) :
        "" just like any class in Python, you can define custom methods on the PyTorch module.
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = Polynomial3()

Define loss functions and optimizers
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
    y_pred = model(x)

    loss = criterion(y_pred, y)
    if t % 100= =99:
        print(t, loss.item())


    Set the gradient to 0 before updating the gradient, then run backpropagation BACKWARD to calculate the gradient, and finally update the parameters with STEP
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')
Copy the code

Process control and shared weights

To explain an example of dynamic graph and weight sharing, the model implemented here is a bit of a stretch: a polynomial of order 4-5. For the forward propagation of the model, we randomly select 4 and 5 and then add a 4th-order polynomial or add a 4th-and 5th-order polynomial based on the previous 3th-order term. This is because PyTorch builds a dynamic graph for forward propagation, and Python’s control-flow operators, such as loops or conditional statements, dynamically control the graph structure. Here we also see that it is perfectly safe to use the same parameter more than once when defining a computational graph. It is perfectly safe to use the same parameter more than once when defining a computational graph.

# -*- coding: utf-8 -*-
import random
import torch
import math


class DynamicNet(torch.nn.Module) :
    def __init__(self) :

        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x) :
        """ For forward propagation of the model, we randomly select 4, 5 and then add a polynomial of order 4 or 5 to the previous third-order term. This is because PyTorch builds a dynamic graph for forward propagation. Python's control-flow operator, As we have seen here, it is perfectly safe to use the same parameter more than once when defining a computational graph. It is perfectly safe to use the same parameter more than once when defining a computational graph. "" "
        y = self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
        for exp in range(4, random.randint(4.6)):
            y = y + self.e * x ** exp
        return y

    def string(self) :
 
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()}x^5 ? '


x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

model = DynamicNet()


criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)
for t in range(30000):
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 2000= =1999:
        print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')
Copy the code