directory

Abstract

Code implementation

The activation function

SE module

Define MBConv module and Fused MBConv module

The main module

The complete code


Abstract

Studied the EfficientNetV2 paper last week, and the translation, if interested in paper can refer to my article: blog.csdn.net/hhhhhhhhhhw… .

There are three drawbacks to the first version of EfficientNets:

(1) Training with very large image sizes is slow; Moreover, large images must be input to train these models at smaller batch sizes, which greatly slows down the training speed, but the accuracy is actually reduced.

(2) Using Depthwise convolutions in shallow networks is slow.

(3) Scaling up each stage is suboptimal.

EfficientNetV2 improves on these three shortcomings:

1. For the problem of image size, the author puts forward an adaptive regularization progressive learning method, as detailed in Section 4.2 of the paper

2. To solve the problem that Depthwise convolutions are slow in the early layers, the author proposes a Fused MBConv module to replace part of the MBConv.

3. To solve the problem that scaling in each stage is suboptimal, the author uses a non-uniform scaling strategy to gradually add in the later stage.

The author summed up the EfficientNetV2 in three aspects

• We’ve introduced EfficientNetV2, a new series of smaller, faster models. With our training perception NAS and extension, EfficientNetV2 was found to be superior to the previous model in terms of training speed and parameter efficiency.

• We propose an improved progressive learning method that adaptively adjusts regularization according to image size. We showed that it can speed up training while improving accuracy.

• We demonstrated training speeds up to 11 times faster and parametric efficiencies up to 6.8 times faster than prior art on ImageNet, CIFAR, Cars and Flowers datasets.

All in all, in a word, our new model is fast, accurate and small, so use it! Here’s how to EfficientNetV2 with Pytorch.

Code implementation

EfficientNetV2 is a efficientnetv2_s, efficientnetV2_m, efficientNetV2_L, and efficientnetv2_XL. So we’re going to implement four models.

The activation function

The activation function uses the SiLU activation function. I have summarized the activation function, which can be viewed if you are interested: CNN Basics — Activation function _aihao -CSDN blog

# SiLU (Swish) activation function
if hasattr(nn, 'SiLU'):
    SiLU = nn.SiLU
else:
    # For compatibility with old PyTorch versions
    class SiLU(nn.Module) :
        def forward(self, x) :
            return x * torch.sigmoid(x)
Copy the code

SE module

The SE module, which I have introduced in previous articles. Now take the SE module directly and use it.

class SELayer(nn.Module) :
    def __init__(self, inp, oup, reduction=4) :
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
                nn.Linear(oup, _make_divisible(inp // reduction, 8)),
                SiLU(),
                nn.Linear(_make_divisible(inp // reduction, 8), oup),
                nn.Sigmoid()
        )

    def forward(self, x) :
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1.1)
        return x * y
Copy the code

Define MBConv module and Fused MBConv module

These two modules are the core of the whole model implementation, and the detailed structure of the modules is shown as follows:

We can see that MBConv module, through 1×1 convolution, then channel magnification quadruple, then depthwise conv3×3 convolution, then through SE module, then through 1×1 convolution, channel back to the input size, and finally with the upper input fusion.

The Fused MBConv module is simpler than the MBConv module. It first goes through a 3×3 convolution to magnify the channel four times, then goes through the SE module, then goes through a 1×1 convolution, and finally merges with the upper input. The details of the MBConv and Fused MBConv modules are as follows:

class MBConv(nn.Module) :
    "" Define the MBConv module and the Fused MBConv module, set the Fused to 1 or True to Fused MBConv, MBConv :param inp: input channel :param oup: output channel :param stride: step, set to 1, the size of the image remains the same, set to 2, Param expand_ratio: return: """
    def __init__(self, inp, oup, stride, expand_ratio, fused) :
        super(MBConv, self).__init__()
        assert stride in [1.2]
        hidden_dim = round(inp * expand_ratio)
        self.identity = stride == 1 and inp == oup
        if fused:
            self.conv = nn.Sequential(
                # fused
                nn.Conv2d(inp, hidden_dim, 3, stride, 1, bias=False),
                nn.BatchNorm2d(hidden_dim),
                SiLU(),
                SELayer(inp, hidden_dim),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1.1.0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:

            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1.1.0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                SiLU(),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                SiLU(),
                SELayer(inp, hidden_dim),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1.1.0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x) :
        if self.identity:
            return x + self.conv(x)
        else:
            return self.conv(x)
Copy the code

The main module

class EfficientNetv2(nn.Module) :
    def __init__(self, cfgs, num_classes=1000, width_mult=1.) :
        super(EfficientNetv2, self).__init__()
        self.cfgs = cfgs

        # building first layer
        input_channel = _make_divisible(24 * width_mult, 8)
        layers = [conv_3x3_bn(3, input_channel, 2)]
        # building inverted residual blocks
        block = MBConv
        for t, c, n, s, fused in self.cfgs:
            output_channel = _make_divisible(c * width_mult, 8)
            for i in range(n):
                layers.append(block(input_channel, output_channel, s if i == 0 else 1, t, fused))
                input_channel = output_channel
        self.features = nn.Sequential(*layers)
        # building last several layers
        output_channel = _make_divisible(1792 * width_mult, 8) if width_mult > 1.0 else 1792
        self.conv = conv_1x1_bn(input_channel, output_channel)
        self.avgpool = nn.AdaptiveAvgPool2d((1.1))
        self.classifier = nn.Linear(output_channel, num_classes)

        self._initialize_weights()

    def forward(self, x) :
        x = self.features(x)
        x = self.conv(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self) :
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0.0.001)
                m.bias.data.zero_()
Copy the code

To understand the code, we also need to understand the input parameter CFGS, with a efficientnetV2_s example:

def efficientnetv2_s(**kwargs) :
    """ Constructs a EfficientNetV2-S model """
    cfgs = [
        # t, c, n, s, fused
        [1.24.2.1.1],
        [4.48.4.2.1],
        [4.64.4.2.1],
        [4.128.6.2.0],
        [6.160.9.1.0],
        [6.272.15.2.0]]return EfficientNetv2(cfgs, **kwargs)
Copy the code

The first column “T” refers to the magnification of the MBConv module and the Fused MBConv module after the first input.

The second column, “C”,channel, refers to the output channel.

The third column “n” specifies the number of MBConv modules and Fused MBConv modules to be stacked.

The fourth column “S” refers to the step size of convolution, the step size is 1, the size of the picture remains unchanged, the step size is the area of the picture is reduced to one quarter of the original, to achieve dimensionality reduction.

The fifth column “fused”, select MBConv module or fused -MBConv module, is 1 this is fused -MBConv module, 0 is fused -MBConv module, corresponding to the above description, in the surface, use fused -MBConv instead of MBConv.

The complete code

import torch
import torch.nn as nn
import math

__all__ = ['efficientnetv2_s'.'efficientnetv2_m'.'efficientnetv2_l'.'efficientnetv2_xl']


from torchsummary import summary

The purpose of this function is to ensure that a Channel is divisible by 8.
def _make_divisible(v, divisor, min_value=None) :
    The purpose of this function is to ensure that a Channel is divisible by 8. :param v: :param divisor: :param min_value: :return: """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


# SiLU (Swish) activation function
if hasattr(nn, 'SiLU'):
    SiLU = nn.SiLU
else:
    # For compatibility with old PyTorch versions
    class SiLU(nn.Module) :
        def forward(self, x) :
            return x * torch.sigmoid(x)

 
class SELayer(nn.Module) :
    def __init__(self, inp, oup, reduction=4) :
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
                nn.Linear(oup, _make_divisible(inp // reduction, 8)),
                SiLU(),
                nn.Linear(_make_divisible(inp // reduction, 8), oup),
                nn.Sigmoid()
        )

    def forward(self, x) :
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1.1)
        return x * y


def conv_3x3_bn(inp, oup, stride) :
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        SiLU()
    )


def conv_1x1_bn(inp, oup) :
    return nn.Sequential(
        nn.Conv2d(inp, oup, 1.1.0, bias=False),
        nn.BatchNorm2d(oup),
        SiLU()
    )


class MBConv(nn.Module) :
    "" Define the MBConv module and the Fused MBConv module, set the Fused to 1 or True to Fused MBConv, MBConv :param inp: input channel :param oup: output channel :param stride: step, set to 1, the size of the image remains the same, set to 2, Param expand_ratio: return: """
    def __init__(self, inp, oup, stride, expand_ratio, fused) :
        super(MBConv, self).__init__()
        assert stride in [1.2]
        hidden_dim = round(inp * expand_ratio)
        self.identity = stride == 1 and inp == oup
        if fused:
            self.conv = nn.Sequential(
                # fused
                nn.Conv2d(inp, hidden_dim, 3, stride, 1, bias=False),
                nn.BatchNorm2d(hidden_dim),
                SiLU(),
                SELayer(inp, hidden_dim),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1.1.0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:

            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1.1.0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                SiLU(),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                SiLU(),
                SELayer(inp, hidden_dim),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1.1.0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x) :
        if self.identity:
            return x + self.conv(x)
        else:
            return self.conv(x)


class EfficientNetv2(nn.Module) :
    def __init__(self, cfgs, num_classes=1000, width_mult=1.) :
        super(EfficientNetv2, self).__init__()
        self.cfgs = cfgs

        # building first layer
        input_channel = _make_divisible(24 * width_mult, 8)
        
        layers = [conv_3x3_bn(3, input_channel, 2)]
        # building inverted residual blocks
        block = MBConv
        for t, c, n, s, fused in self.cfgs:
            output_channel = _make_divisible(c * width_mult, 8)
            for i in range(n):
                layers.append(block(input_channel, output_channel, s if i == 0 else 1, t, fused))
                input_channel = output_channel
        self.features = nn.Sequential(*layers)
        # building last several layers
        output_channel = _make_divisible(1792 * width_mult, 8) if width_mult > 1.0 else 1792
        self.conv = conv_1x1_bn(input_channel, output_channel)
        self.avgpool = nn.AdaptiveAvgPool2d((1.1))
        self.classifier = nn.Linear(output_channel, num_classes)

        self._initialize_weights()

    def forward(self, x) :
        x = self.features(x)
        x = self.conv(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self) :
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0.0.001)
                m.bias.data.zero_()


def efficientnetv2_s(**kwargs) :
    """ Constructs a EfficientNetV2-S model """
    cfgs = [
        # t, c, n, s, fused
        [1.24.2.1.1],
        [4.48.4.2.1],
        [4.64.4.2.1],
        [4.128.6.2.0],
        [6.160.9.1.0],
        [6.272.15.2.0]]return EfficientNetv2(cfgs, **kwargs)


def efficientnetv2_m(**kwargs) :
    """ Constructs a EfficientNetV2-M model """
    cfgs = [
        # t, c, n, s, fused
        [1.24.3.1.1],
        [4.48.5.2.1],
        [4.80.5.2.1],
        [4.160.7.2.0],
        [6.176.14.1.0],
        [6.304.18.2.0],
        [6.512.5.1.0]]return EfficientNetv2(cfgs, **kwargs)


def efficientnetv2_l(**kwargs) :
    """ Constructs a EfficientNetV2-L model """
    cfgs = [
        # t, c, n, s, fused
        [1.32.4.1.1],
        [4.64.7.2.1],
        [4.96.7.2.1],
        [4.192.10.2.0],
        [6.224.19.1.0],
        [6.384.25.2.0],
        [6.640.7.1.0]]return EfficientNetv2(cfgs, **kwargs)


def efficientnetv2_xl(**kwargs) :
    """ Constructs a EfficientNetV2-XL model """
    cfgs = [
        # t, c, n, s, fused
        [1.32.4.1.1],
        [4.64.8.2.1],
        [4.96.8.2.1],
        [4.192.16.2.0],
        [6.256.24.1.0],
        [6.512.32.2.0],
        [6.640.8.1.0]]return EfficientNetv2(cfgs, **kwargs)

if __name__ == '__main__':
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model = efficientnetv2_s()
    model.to(device)
    summary(model, (3.224.224))
Copy the code