Participate in the 19th day of The November Gwen Challenge, see the details of the event: 2021 last Gwen Challenge

Why do I have to isolate the parameters?

Because in training, the goal is to find the parameter values that minimize the loss function. After training, we need to take these parameters out and make predictions, or use them elsewhere.

So for future convenience, we’re going to pick them out and talk about them separately:

  • Access parameters for debugging, diagnostics, and visualization.
  • Parameter initialization.
  • Share parameters between different model components.

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4.8), nn.ReLU(), nn.Linear(8.1))
X = torch.rand(size=(2.4))
net(X)
Copy the code

This is a simple implementation of a multilayer perceptron, and then an X as input.

When defining a model through a Sequential class, we can access any layer of the model through an index.

print(net)
Copy the code

You can see that the output is:

>>
Sequential(
  (0): Linear(in_features=4, out_features=8, bias=True)
  (1): ReLU()
  (2): Linear(in_features=8, out_features=1, bias=True)
)
Copy the code

We can get the desired layer by the number in front.

print(net[0])
print(net[1])

print(net[2].state_dict())
Copy the code
>> Linear(in_features=4, out_features=8, bias=True) ReLU() OrderedDict([('weight', tensor([-0.0264, -0.0906, 0.3497, 0.3284, 0.0173, 0.0124, 0.0136, 0.0782]])), (' bias', tensor ([0.2243])))Copy the code

Not surprisingly, we see what the first two layers are.

As for the third output, we can see that this layer contains two parameters.

[(' weight ', tensor ([[0.0264, 0.0906, 0.3497, 0.3284, 0.0173, 0.0124, 0.0136, 0.0782]])), (' bias', Tensor ([0.2243]))]Copy the code
print(type(net[2].bias))
print(type(net[0].weight))
Copy the code
>> 
<class 'torch.nn.parameter.Parameter'>
<class 'torch.nn.parameter.Parameter'>
Copy the code

You can see that each parameter is represented as an instance of the parameter class.

print(net[2].bias)
print(net[0].weight)
Copy the code
Parameter containing: Tensor ([-0.1431, 0.1381, -0.2775, 0.0038, -0.0269, 0.0631, -0.1791, 0.1291], Requires_grad =True) Parameter containing: Tensor ([[0.4736, 0.2223, 0.0059, 0.4146], [0.1052, 0.2813, 0.2315, 0.2931], [0.4990, 0.1991, 0.1453, 0.0369]. [0.4676, 0.0669, 0.0069, 0.4932], [0.4223, 0.0659, 0.3783, 0.1145], [0.0460, 0.2386, 0.1586, 0.2148], [0.0085, [0.2703, -0.2903, 0.1822, -0.3782] for requires_grad=TrueCopy the code

The corresponding layer number + method call extracts the network bias or parameters.

print(*[(name, param.shape) for name, param in net[0].named_parameters()])

print(*[(name, param.shape) for name, param in net.named_parameters()])

print(*net.named_parameters(),end="\n",sep='\n')

# here * is a unpacker that prints each element of the list
Copy the code
>> ('weight', torch.Size([8, 4])) ('bias', torch.Size([8])) ('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1])) ('0.weight', Parameter containing: Tensor ([[0.3700, 0.3270, 0.3741, 0.1365], [0.2200, 0.0786, 0.1241, 0.2834], [0.3143, 0.3718, 0.3278, 0.0949]. [0.1565, 0.4639, 0.1515, 0.4962], [0.3102, 0.0025, 0.0099, 0.4132], [0.1754, 0.1320, 0.3762, 0.1371]. [0.3860, 0.0369, 0.3743, 0.0892], [0.0280, 0.2877, 0.1884, 0.2915]], requires_grad = True)) (' 0. Bias, Parameter containing: Tensor ([0.4722, -0.4143, 0.0858, -0.2280, 0.4349, 0.3954, 0.0971, -0.1192], requires_grad=True)) ('2.weight', Parameter containing: Tensor ([[0.0984, 0.0207, 0.1292, 0.0530, 0.0693, 0.0413, 0.2231, 0.3125]], requires_grad = True)) (' 2. Bias, The Parameter containing: tensor ([0.1844], requires_grad = True))Copy the code

For more information about the unpack, see here: Python ** * Package unpack – Juejin.

I’ve separated the three outputs.

  • The first is the parameter name and parameter shape of layer 0 of NET
  • The second is to unpack the parameter names and parameter shapes for all layers of NET
  • The third is to unpack the parameter list of NET

You can also get the argument list as follows:

print(net.state_dict()['2.bias'].data)
print(net.state_dict()['0.weight'])
Copy the code
> > tensor ([0.1844]) tensor ([[0.3700, 0.3270, 0.3741, 0.1365], [0.2200, 0.0786, 0.1241, 0.2834], [0.3143, 0.3718, 0.3278, 0.0949], [0.1565, 0.4639, 0.1515, 0.4962], [0.3102, 0.0025, 0.0099, 0.4132], [0.1754, 0.1320, 0.3762, 0.1371], [0.3860, 0.0369, 0.3743, 0.0892], [0.0280, 0.2877, 0.1884, 0.2915]])Copy the code

The value of the parameter can be output directly with or without. Data.

def block1() :
    return nn.Sequential(nn.Linear(4.8), nn.ReLU(),
                         nn.Linear(8.4), nn.ReLU())

def block2() :
    net = nn.Sequential()
    for i in range(4) :# nested here
        net.add_module(f'block {i}', block1())
    net[2] = nn.Linear(4.4)
    return net

X = torch.rand(size=(2.4))
rgnet = nn.Sequential(block2(), nn.Linear(4.1))
rgnet(X)
Copy the code

Define a nested network. I drew a picture. It looks like this.

print(rgnet)
Copy the code

Output the network and you can see its structure as follows:

>>
Sequential(
  (0): Sequential(
    (block 0): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 1): Linear(in_features=4, out_features=4, bias=True)
    (block 2): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
  )
  (1): Linear(in_features=4, out_features=1, bias=True)
)
Copy the code

Such as:

print(rgnet[0] [2] [0].bias.data)
print(rgnet.state_dict()['0. Block 2.0. Bias,'])
Copy the code
> > tensor ([0.1555, 0.4410, 0.4920, 0.1434, 0.1243, 0.4114, 0.0883, 0.1387]) tensor ([0.1555, 0.4410, 0.4920, 0.1434, 0.1243, 0.4114, -0.0883, 0.1387])Copy the code

In this case, to get the parameters, the first amplification is simply to add an address. The second way is to first specify which block. And specify which layer on which block.


  1. More on hands-on Deep Learning series can be seen here: Juejin. Cn

  2. Github address: DeepLearningNotes/d2l(github.com)

Still in update …………