MobileNet series MobileNet_v1

Inception_v1 of Inception series

Batch Normalization of the Inception series

Inception series inception_v2-V3

Inception_v4 of Inception series

Click follow and update two computer vision articles a day

Introduction:

MobileNet_v2 puts forward some problems existing in MobileNet_v1, and proposes an improvement scheme based on this. His major contributions are Linear bottleneck and Inverted Residuals.

01Linear Bottlenecks

As shown in the figure above, MobileNet_v2 suggests that ReLU corrupts data in low-dimensional space, while high-dimensional space has less impact. Therefore, Linear Activation was used instead of ReLU in low-dimensional space. As shown in the figure below, experiments show that using Linear Layer in low-dimensional space is quite useful because it can avoid nonlinear destruction of too much information.

In addition, if the output is the non-null space of the manifold, then using ReLU is equivalent to a linear transformation and will not achieve spatial mapping, so MobileNet_v2 uses ReLU6 to achieve nonlinear activation of the non-null space.

It is proposed above that using ReLU will destroy information, and it is proposed here that ReLU6 can achieve nonlinear activation of non-null space. It seems a little hard to understand. Here I put forward my own understanding.

According to the view of manifold learning, the data we observe is actually mapped from a low-dimensional manifold to a higher-dimensional space. Due to the limitation of internal characteristics of data, some data in higher dimensions will produce redundancy in dimension. In fact, these data can be uniquely represented as long as the dimension of lower dimension.

The image distribution is in high dimensional space, and the neural network uses nonlinear activation function to map the high dimensional space back to the low dimensional manifold space. In this paper, ReLU6 is proposed to increase the mapping of neural network to non-null space. Otherwise, using ReLU in non-null space is equivalent to linear transformation and cannot map backflow low-dimensional space. However, the linear activation function proposed above to replace ReLU is in the low-dimensional space of the manifold after mapping.

ReLU6 is used when mapping a higher-dimensional space to a lower-dimensional space of a manifold, while Linear Layer is used when mapping a lower-dimensional space of a manifold.

Its usage is shown in the following table

02 Inverted Residuals

The structure in MobileNet_v1 is shown below on the left and MobileNet_v2 on the right. ,

MobileNet_v2 was published in 2018 when ResNet was already out and after a few years of extensive use it is clear that Shortcut Connection and Bottlenck residual block are quite useful. Both structures are added to MobileNet_v2.

However, the difference is that the bottleneck residual in ResNet is hourglass shape, that is, dimensionality reduction through 1×1 convolution layer, whereas in MobileNet_v2 it is spindle shape and dimensionality increase at 1×1 convolution layer. This is because MobileNet uses Depth Wise, the number of parameters is already very small, if using dimensionality reduction, the generalization ability will be insufficient.

In addition, in MobileNet_v2, instead of pooling is used for dimensionality reduction, convolution of step 2 is used for dimensionality reduction, and as shown in the figure above, blocks of step 2 do not use shortcut Connection.

So t is the expansion factor. Let’s take 6.

Inverted Residuals block and residuals block in ResNet are compared as shown in the following figure:

Figure from network

The residual block in ResNet is large at both ends and small in the middle. On the contrary, MobileNet_v2 is Inverted residual block because it is large in the middle and small at both ends.

The overall structure is shown in the figure below:

The paper mentions 19 layers of Bottleneck, but only 17 in its schematic.

MobileNet_v2, compared with MobileNet_v1, the number of parameters has been increased, mainly because the Depth wise before using 1×1 liters. In addition, the inference speed on THE CPU is also slower than the latter, but with higher precision.

This article comes from the public CV technical guide model interpretation series.

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

A summary PDF of the following articles can be obtained by replying to the keyword “Technical Summary” in the public account.


Other articles

Shi Baixin, Peking University: From the perspective of reviewers, talk about how to write a CVPR paper

Siamese network summary

Summary of computer vision terms (a) to build the knowledge system of computer vision

Summary of under-fitting and over-fitting techniques

Summary of normalization methods

Summary of common ideas of paper innovation

Summary of efficient Reading methods of English literature in CV direction

A review of small sample learning in computer vision

A brief overview of intellectual distillation

Optimize the read speed of OpenCV video

NMS summary

Loss function technology summary

Summary of attention mechanism technology

Summary of feature pyramid technology

Summary of pooling techniques

Summary of data enhancement methods

Summary of CNN structure Evolution (I) Classical model

Summary of CNN structural evolution (II) Lightweight model

Summary of CNN structure evolution (iii) Design principles

How to view the future trend of computer vision

Summary of CNN visualization technology (I) – feature map visualization

Summary of CNN visualization Technology (ii) – Convolutional kernel visualization

CNN Visualization Technology Summary (iii) – Class visualization

CNN Visualization Technology Summary (IV) – Visualization tools and projects