Make writing a habit together! This is the 14th day of my participation in the “Gold Digging Day New Plan · April More Text Challenge”. Click here for more details.

MobileNetV2 is an improvement of mobileNetV1, which is a lightweight neural network. MobileNetV2 retains the depth detacheable convolution of V1 and adds Linear bottlenecks and Inverted Residual.

The MobileNetV2 model is shown in the figure below, where T is the multiple of internal dimension raising of the bottleneck layer, C is the dimension of the feature, N is the number of repetitions of the bottleneck layer, and S is the step of the first CONV of the bottleneck layer.

A constant expansion rate is used throughout the network except for the first layer. In experiments, it was found that expansion rates between 5 and 10 resulted in almost identical performance curves, with smaller networks performing better at slightly lower expansion rates and larger networks performing slightly better at large expansion rates.

MobileNetV2 mainly uses the extension factor of 6 applied to the size of the input tensor. For example, for a bottleneck layer that takes a 64-channel input tensor and produces a tensor with 128 channels, the intermediate extension layer is 64×6 = 384 channels.

Linear bottleneck

For the depth deprivable convolution of mobileNetV1, the m-dimensional space compressed by the width multiplier will pass through a nonlinear transform ReLU. According to the nature of ReLU, if the input feature is negative, the feature of this channel will be cleared. The original feature has been compressed, which will further lose the feature information. If the input characteristic is positive and the output characteristic is the original input value after the activation layer, it is equivalent to a linear transformation.

The specific structure of the bottleneck layer is shown in the following table. The input increases the dimension from K dimension to TK dimension through conv+ReLU layer of 1, and then downsamples the image by 3× 3CONv +ReLU separable convolution (when stride>1), at which the feature dimension has become TK dimension, and finally reduces the dimension from TK to K dimension by 1* 1CONv (without ReLU).

Pour residual

Residual blocks have been proven in ResNet to help build deeper networks with improved accuracy, so mobileNetV2 has introduced similar blocks. The classic process of residual block is: 1×1(dimensionality reduction)–>3×3(convolution)–>1×1(dimensionality increase), but feature extraction by Depthwise convolution layer is limited to input feature dimension. If residual block is adopted, After 1×1 Pointwise convolution, the input feature graph is compressed, and then after deep convolution, fewer features are extracted. Therefore, mobileNetV2 expands the channels of the feature map through 1×1 point-by-point convolution operation to enrich the number of features and improve the accuracy. This process happens to reverse the order of residuals, which is the origin of reverse residuals: 1×1(raised dimension)–>3×3(Dw conv+relu)–>1×1(dimensionality reduction + linear transformation).

Combining the understanding of linear bottlenecks and backward residuals above, I draw the structure diagram of the Block. The diagram below:

Block code implementation:

Pytorch version

def Conv3x3BNReLU(in_channels,out_channels,stride,groups) :
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=stride, padding=1, groups=groups),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6(inplace=True))def Conv1x1BNReLU(in_channels,out_channels) :
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6(inplace=True))def Conv1x1BN(in_channels,out_channels) :
    return nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels)
        )

class InvertedResidual(nn.Module) :
    def __init__(self, in_channels, out_channels, stride, expansion_factor=6) :
        super(InvertedResidual, self).__init__()
        self.stride = stride
        mid_channels = (in_channels * expansion_factor)

        self.bottleneck = nn.Sequential(
            Conv1x1BNReLU(in_channels, mid_channels),
            Conv3x3BNReLU(mid_channels, mid_channels, stride,groups=mid_channels),
            Conv1x1BN(mid_channels, out_channels)
        )

        if self.stride == 1:
            self.shortcut = Conv1x1BN(in_channels, out_channels)

    def forward(self, x) :
        out = self.bottleneck(x)
        out = (out+self.shortcut(x)) if self.stride==1 else out
        return out

Copy the code

Keras version

def relu6(x) :
    return K.relu(x, max_value=6)

# Ensure that the number of feature layers is a multiple of 8
def make_divisible(v, divisor, min_value=None) :
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v+divisor/2)//divisor*divisor) #// Round down, divide
    if new_v<0.9*v:
        new_v +=divisor
    return new_v

def pad_size(inputs, kernel_size) :

    input_size = inputs.shape[1:3]

    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size)

    if input_size[0] is None:
        adjust = (1.1)

    else:
        adjust = (1- input_size[0] %2.1-input_size[1] %2)
    
    correct = (kernel_size[0] / /2, kernel_size[1] / /2)

    return ((correct[0] - adjust[0], correct[0]),
            (correct[1] - adjust[1], correct[1]))

def conv_block (x, nb_filter, kernel=(1.1), stride=(1.1), name=None) :

    x = Conv2D(nb_filter, kernel, strides=stride, padding='same', use_bias=False, name=name+'_expand')(x)
    x = BatchNormalization(axis=3, name=name+'_expand_BN')(x)
    x = Activation(relu6, name=name+'_expand_relu')(x)

    return x


def depthwise_res_block(x, nb_filter, kernel, stride, t, alpha, resdiual=False, name=None) :

    input_tensor=x
    exp_channels= x.shape[-1]*t  # Extend dimensions
    alpha_channels = int(nb_filter*alpha)     # Compress dimensions

    x = conv_block(x, exp_channels, (1.1), (1.1), name=name)

    if stride[0] = =2:
        x = ZeroPadding2D(padding=pad_size(x, 3), name=name+'_pad')(x)

    x = DepthwiseConv2D(kernel, padding='same' if stride[0] = =1 else 'valid', strides=stride, depth_multiplier=1, use_bias=False, name=name+'_depthwise')(x)

    x = BatchNormalization(axis=3, name=name+'_depthwise_BN')(x)
    x = Activation(relu6, name=name+'_depthwise_relu')(x)

    x = Conv2D(alpha_channels, (1.1), padding='same', use_bias=False, strides=(1.1), name=name+'_project')(x)
    x = BatchNormalization(axis=3, name=name+'_project_BN')(x)

    if resdiual:
        x = layers.add([x, input_tensor], name=name+'_add')

    return x
Copy the code