Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

PyTorch: Tensors and Autograd (auto-gradient) can be used in PyTorch to realize automatic calculation of neural network backpropagation process. When we use Autograd, forward propagation defines a graph in which the nodes are tensors and the edges of the graph are functions that generate the output tensors from the input tensors. Gradient can be easily obtained through the back propagation of this graph.

Although it sounds complicated, it is simple to use. Each tensor represents a node in a computational graph. If x is a tensor and x.equires_grad =True is set to it, then x.grad stores the tensor of x relative to some scalar gradient.

1. .requires_grad=True

By default the Tensor’s requires_grad attribute is False, so it doesn’t preserve its gradient. When it is set to x.reequires_grad =True, operations on x are tracked. In addition, the gradient will be saved in the. Grad attribute during backpropagation.

import torch

# default
x = torch.tensor([1.0.2.0.3.0])
print(x)		Tensor ([1., 2., 3.])
x += 1
print(x)		Tensor ([1., 2., 3.])

# set requires_grad = True
y = torch.tensor([1.0.2.0.3.0], requires_grad=True)
print(y)		Tensor ([1., 2., 3.], requires_grad=True)
y = y + 1.0
print(y)		Tensor ([2., 3., 4.], grad_fn=
      
       )
      
print(y.grad)
Copy the code

Note:

  • Tensors for leaf nodes cannot be executed+ =An error will be reporteda leaf Variable that requires grad is being used in an in-place operation.
  • There’s a warning on the last line of code hereUserWarningError accessing a non-leaf node.gridI’m purposely printing grad here, don’t worry about it.

Tensor’s.grad_fn property

Requires_grad =True. Operations on the tensor are tracked. “Requires_grad =True” for the first print and “grad_fn=

” for the second print. After tracing is set, the grad_fn attribute holds the operations added to it. Here, after we do x = x+1 up here, we’re saying we’re doing a plus. But only non-leaf nodes have this property, and leaf nodes display None.

3. with torch.no_grad()

If you don’t want an operation to be traced, use with torch.no_grad():. Code wrapped in this statement will not be tracked in gradient.

import torch

# set requires_grad = True
y = torch.tensor([1.0.2.0.3.0], requires_grad=True)
print(y)			# 1 tensor([1., 2., 3.], requires_grad=True)
y = y + 1.0
print(y)			# 2 tensor([2., 3., 4.], grad_fn=<AddBackward0>)
y = y*1.2
print(y)			# 3 tensor([2.4000, 3.6000, 4.8000], grad_fn=<MulBackward0>)
print(y.grad_fn)	# 4 <MulBackward0 object at 0x000001FAA3B511C0>

with torch.no_grad():
    y = y - 2
print(y)			# 5 tensor([0.4000, 1.6000, 2.8000])
print(y.grad_fn)	# 6 None
Copy the code
  • In the third outputgrad_fnThe display is tracing a multiplication operation
  • The fourth output is printed separatelygrad_fnAgain, the multiplication operation is tracked
  • The fifth output is setwith torch.no_grad():After that, you find none in the outputgrad_fnProperties of the
  • Print the sixth output separatelygrad_fnIt does not exist, that iswith torch.no_grad():Under the package, the multiplication operation is not traced

4. .grad

The x and y nodes in the example above are non-leaf nodes and will not be stored.gradSo let’s try it on a neural network. We use a three-node polynomial
y = a + b x + c x 2 + d x 3 y = a + bx + cx^2 + dx^3
To fitting
sin ( x ) \sin(x)
Just a simple two-layer neural network.The code is as follows:

import torch
import math

dtype = torch.float
device = torch.device("cpu")
# Uncomment the line below to run on the GPU
# device = torch.device("cuda:0")

# Create Tensors to hold input and outputs.
Requires_grad =False by default, indicating that we do not need to calculate the gradient of this tensor in the backpropagation process.
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

Requires_grad =True indicates that we want to preserve the gradient in the backpropagation process
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Now loss is a Tensor of shape (1,)
    # loss.item() Retrieves scalar values in Loss
    loss = (y_pred - y).pow(2).sum(a)This call calculates the gradients of all the tensors for requires_grad=True.
    # Their values are then stored separately in the corresponding tensors
    loss.backward()

    if t == 500:
        print(a.grad)				# tensor(-905.9598)
        print(a.grad_fn)			# None
        print(a.is_leaf)			# True
        print(y_pred.grad)			# None
        print(y_pred.grad_fn)		# <AddBackward0 object at 0x0000022B7AFD2190>
        print(y_pred.is_leaf)		# False

    # Manually update weights
    # Wrap in torch. No_grad () because we set requires_grad=True, but we don't want to record the gradient of the A - operation in autograd
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually store the gradient tensor clearly after updating the weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None
Copy the code

I printed out one of the sections of the loop above. Is_leaf can verify whether it is a leaf node. A is the leaf node. You can also see the stored A.rad, but grad_fn is None. Y is not a leaf node. Printing y.rad will give you a warning, but grad_fn can be seen. The leaf node is usually None, where only the grad_fn of the result node is useful to indicate what type of gradient function is.


Well, it looks like it worked again today. Lambda to the minus lambda to the V