MobileNet series MobileNet_v1
Inception_v1 of Inception series
Batch Normalization of the Inception series
Inception series inception_v2-V3
Inception_v4 of Inception series
Click follow and update two computer vision articles a day
Introduction:
MobileNet_v2 puts forward some problems existing in MobileNet_v1, and proposes an improvement scheme based on this. His major contributions are Linear bottleneck and Inverted Residuals.
01Linear Bottlenecks
As shown in the figure above, MobileNet_v2 suggests that ReLU corrupts data in low-dimensional space, while high-dimensional space has less impact. Therefore, Linear Activation was used instead of ReLU in low-dimensional space. As shown in the figure below, experiments show that using Linear Layer in low-dimensional space is quite useful because it can avoid nonlinear destruction of too much information.
In addition, if the output is the non-null space of the manifold, then using ReLU is equivalent to a linear transformation and will not achieve spatial mapping, so MobileNet_v2 uses ReLU6 to achieve nonlinear activation of the non-null space.
It is proposed above that using ReLU will destroy information, and it is proposed here that ReLU6 can achieve nonlinear activation of non-null space. It seems a little hard to understand. Here I put forward my own understanding.
According to the view of manifold learning, the data we observe is actually mapped from a low-dimensional manifold to a higher-dimensional space. Due to the limitation of internal characteristics of data, some data in higher dimensions will produce redundancy in dimension. In fact, these data can be uniquely represented as long as the dimension of lower dimension.
The image distribution is in high dimensional space, and the neural network uses nonlinear activation function to map the high dimensional space back to the low dimensional manifold space. In this paper, ReLU6 is proposed to increase the mapping of neural network to non-null space. Otherwise, using ReLU in non-null space is equivalent to linear transformation and cannot map backflow low-dimensional space. However, the linear activation function proposed above to replace ReLU is in the low-dimensional space of the manifold after mapping.
ReLU6 is used when mapping a higher-dimensional space to a lower-dimensional space of a manifold, while Linear Layer is used when mapping a lower-dimensional space of a manifold.
Its usage is shown in the following table
02 Inverted Residuals
The structure in MobileNet_v1 is shown below on the left and MobileNet_v2 on the right. ,
MobileNet_v2 was published in 2018 when ResNet was already out and after a few years of extensive use it is clear that Shortcut Connection and Bottlenck residual block are quite useful. Both structures are added to MobileNet_v2.
However, the difference is that the bottleneck residual in ResNet is hourglass shape, that is, dimensionality reduction through 1×1 convolution layer, whereas in MobileNet_v2 it is spindle shape and dimensionality increase at 1×1 convolution layer. This is because MobileNet uses Depth Wise, the number of parameters is already very small, if using dimensionality reduction, the generalization ability will be insufficient.
In addition, in MobileNet_v2, instead of pooling is used for dimensionality reduction, convolution of step 2 is used for dimensionality reduction, and as shown in the figure above, blocks of step 2 do not use shortcut Connection.
So t is the expansion factor. Let’s take 6.
Inverted Residuals block and residuals block in ResNet are compared as shown in the following figure:
Figure from network
The residual block in ResNet is large at both ends and small in the middle. On the contrary, MobileNet_v2 is Inverted residual block because it is large in the middle and small at both ends.
The overall structure is shown in the figure below:
The paper mentions 19 layers of Bottleneck, but only 17 in its schematic.
MobileNet_v2, compared with MobileNet_v1, the number of parameters has been increased, mainly because the Depth wise before using 1×1 liters. In addition, the inference speed on THE CPU is also slower than the latter, but with higher precision.
This article comes from the public CV technical guide model interpretation series.
Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.
A summary PDF of the following articles can be obtained by replying to the keyword “Technical Summary” in the public account.
Other articles
Shi Baixin, Peking University: From the perspective of reviewers, talk about how to write a CVPR paper
Siamese network summary
Summary of computer vision terms (a) to build the knowledge system of computer vision
Summary of under-fitting and over-fitting techniques
Summary of normalization methods
Summary of common ideas of paper innovation
Summary of efficient Reading methods of English literature in CV direction
A review of small sample learning in computer vision
A brief overview of intellectual distillation
Optimize the read speed of OpenCV video
NMS summary
Loss function technology summary
Summary of attention mechanism technology
Summary of feature pyramid technology
Summary of pooling techniques
Summary of data enhancement methods
Summary of CNN structure Evolution (I) Classical model
Summary of CNN structural evolution (II) Lightweight model
Summary of CNN structure evolution (iii) Design principles
How to view the future trend of computer vision
Summary of CNN visualization technology (I) – feature map visualization
Summary of CNN visualization Technology (ii) – Convolutional kernel visualization
CNN Visualization Technology Summary (iii) – Class visualization
CNN Visualization Technology Summary (IV) – Visualization tools and projects