Hands-on deep learning 4.5 regularized weight decay code concise implementation

On the 16th day of the November Gwen Challenge, check out the details: the last Gwen Challenge 2021

import torch
from torch import nn
from d2l import torch as d2l
Copy the code

n_train, n_test, num_inputs, batch_size = 20.100.200.5
true_w, true_b = torch.ones((num_inputs, 1)) * 0.01.0.05
train_data = d2l.synthetic_data(true_w, true_b, n_train)
train_iter = d2l.load_array(train_data, batch_size)
test_data = d2l.synthetic_data(true_w, true_b, n_test)
test_iter = d2l.load_array(test_data, batch_size, is_train=False)
Copy the code

The first is to generate a manual data set:

Y = 0.05 + \sum_{I = 1}^d 0.01 x_i + \epsilon \text{where} \epsilon \sim \mathcal{N}(0, 0.01^2)

Deep learning 4.5 regularization weight decay code manual implementation – Digging gold (juejin. Cn)

def init_params() :
    w = torch.normal(0.1, size=(num_inputs, 1), requires_grad=True)
    b = torch.zeros(1, requires_grad=True)
    return [w, b]
Copy the code

Random initialization.

def train_concise(wd) :
    net = nn.Sequential(nn.Linear(num_inputs, 1))
    for param in net.parameters():
        param.data.normal_()
    loss = nn.MSELoss()
    
    num_epochs, lr = 100.0.003
    The bias parameter is not decayed, only weight decay is set.
    trainer = torch.optim.SGD([
        {"params":net[0].weight,'weight_decay': wd},
        {"params":net[0].bias}], lr=lr)
        
    # Ignore this code for visualization
    animator = d2l.Animator(xlabel='epochs', ylabel='loss', yscale='log',
                            xlim=[5, num_epochs], legend=['train'.'test'])
    
    for epoch in range(num_epochs):
        for X, y in train_iter:
            with torch.enable_grad():
                trainer.zero_grad()
                l = loss(net(X), y)
            l.backward()
            trainer.step()
            
         # Ignore this code for visualization
        if (epoch + 1) % 5= =0:
            animator.add(epoch + 1, (d2l.evaluate_loss(net, train_iter, loss),
                                     d2l.evaluate_loss(net, test_iter, loss)))
    print('L2 norm of W:', net[0].weight.norm().item())
Copy the code

The weight decay superparameter is specified directly with weight_decay.

By default, PyTorch attenuates both weights and offsets.

Here we set weight_decay only for weights, so the bias parameter B does not decay.

It may not seem that the code is much shorter than the weight decaying code implemented manually, but they run faster, are easier to implement, and are even more advantageous for more complex problems.

train_concise(0)
train_concise(3)
Copy the code

Training.

Take a look at the results:

Train_concise (0) :

Train_concise (3) :

The over-fitting phenomenon was alleviated after regularization.

You can read more about hands-on Deep Learning here: Hands-on Deep Learning – LolitaAnn’s Column – Nuggets (juejin. Cn)

Notes are still being updated …………

Hands-on deep learning 4.5 regularized weight decay code concise implementation

Related Posts

OpenCV gray histogram details

The popularity Bias in dynamic recommendation

Zhouyi Compass Deployment and Simulation (16) | More challenging in August