On the 20th day of the November Gwen Challenge, check out the details: the last Gwen Challenge 2021


The previous section explained how to customize the initialization parameters. PyTorch parameter Initialization – Digging gold (juejin. Cn)

This section looks at how to customize layers.

So you can think about what floor you touched before. Such as Nn. Linear,nn.ReLU and so on. Their role is to act as a layer of processing. The difference between the two is that the former has parameters, while the latter has no parameter list. Now let’s also implement some layer operations with and without parameter lists.

import torch
import torch.nn.functional as F
from torch import nn
Copy the code

Layer with no parameters

class CenteredLayer(nn.Module) :
    def __init__(self) :
        super().__init__()

    def forward(self, X) :
        return X - X.mean()
Copy the code

We only need to define forward propagation. The purpose of this self-built layer is to subtract the average value of each eigenvalue.

layer = CenteredLayer()
X = torch.arange(5) *0.1
print(layer(X))
Copy the code
>>
tensor([-0.2000, -0.1000,  0.0000,  0.1000,  0.2000])
Copy the code

After testing, we can see that this layer is completely effective.

What if you put it into a complex model.

net = nn.Sequential(nn.Linear(8.128), CenteredLayer())

Y = torch.rand(10.8)
print(net(Y).mean().data)
Copy the code
> > tensor (7.8231 e-09)Copy the code

Well, this model isn’t really complicated, it only has two layers. The first one is a linear layer. The second is our customization layer.

Generate a random set of test data Y. Then we use the network we build to do external calculations, and then output the average of the results.

It should be 0 if nothing happens. I’m not showing 0 here. This is because of the memory accuracy of floating-point numbers, and you can certainly approximate this extremely small number as zero.

As for why the results fail, this is a mathematical problem, will go to a few numbers to calculate their own understand.

Layer with parameters

class MyLinear(nn.Module) :
    def __init__(self, in_units, units) :
        super().__init__()
        self.weight = nn.Parameter(torch.ones(in_units, units))
        self.bias = nn.Parameter(torch.zeros(units,))
    def forward(self, X) :
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)
Copy the code

This lease layer is a custom implementation of a full link layer. The parameters in this layer are weighted and biased and then returned to the ReLU activation function after calculation.

linear = MyLinear(5.3)
print(linear.weight.data)
Copy the code
> > tensor ([[1.0599, 0.3885, 1.2025], [1.8313, 0.2097, 1.6529], [1.4119, 0.2675, 0.4148], [0.2596, 0.0319, 1.9548], [1.2874, 1.0776, 0.5804]])Copy the code

Print its weight and let’s see, it does produce a 5 by 3 weight matrix.

X = torch.rand(2.5)
linear(X)
Copy the code
> > tensor ([[2.3819, 2.3819, 2.3819], [1.8295, 1.8295, 1.8295]])Copy the code

The single-layer test results were also fine.

net = nn.Sequential(MyLinear(64.8), MyLinear(8.1))
net(torch.rand(2.64))
Copy the code
> > tensor ([[0.4589], [0.0000]])Copy the code

Putting it on the network turned out to be no problem.

Now LET me show you a comparison of how the layer we wrote achieves the same functionality as the layer Pytorch wrote.

net1 = nn.Sequential(MyLinear(64.8), MyLinear(8.1))
net2 = nn.Sequential(nn.Linear(64.8),
                     nn.ReLU(),
                     nn.Linear(8.1),
                     nn.ReLU())

def init(m) :
    if type(m)==nn.Linear:
        nn.init.ones_(m.weight)
        nn.init.zeros_(m.bias)
net2.apply(init)

Y = torch.rand(4.64)


print(net1(Y).data)
print(net2(Y).data)

Copy the code
> > tensor ([[270.5055], [253.7892], [238.7834], [258.4998]]) tensor ([[270.5055], [253.7892], [238.7834], [258.4998]])Copy the code

So at first glance it looks like the two results are exactly the same.

In contrast to pyTorch’s built-in implementation, this does not require you to write a weighting procedure, nor does it require you to add another ReLU layer.

This may seem easy, but in practice it is not recommended that you implement what is already in PyTorch yourself. Because it’s more efficient to do it the other way.


  1. More on hands-on Deep Learning series can be seen here: Juejin. Cn

  2. Github address: DeepLearningNotes/d2l(github.com)

Still in update …………