preface

Due to limited memory and computing resources, it is difficult to deploy convolutional neural networks (CNN) on embedded devices. Redundancy in feature graphs is an important feature of those successful CNNS, but has rarely been studied in neural architecture design.

This paper presents a novel Ghost module which can generate more feature maps from cheap operations. The proposed Ghost module can be used as a plug and play component to upgrade the existing convolutional neural network. Stacking Ghost Modules creates a lightweight GhostNet.

GhostNet can achieve higher recognition performance than MobileNetV3 (e.g. 75.7% top-1 accuracy) and has similar computational costs on ImageNet ILSVRC-2012.

GhostNet: More Features from Cheap Operations

Code: github.com/huawei-noah… .

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

The starting point

Over the years, a series of methods have been proposed to study compact deep neural networks, such as network pruning, low level quantization, knowledge distillation, etc. In the network pruning and pruning neural network, regularization is used to prune the filter to obtain efficient CNN. Low quantization quantifies the weight and activation to 1-bit data to achieve a large compression and acceleration ratio; Distillation of knowledge transfers the edges of knowledge from larger models to smaller models.

However, the performance of these methods is usually capped by pre-trained neural networks as their baseline.

The rich and even redundant information in the feature graph of the trained deep neural network can usually ensure a comprehensive understanding of the input data. For example, the figure above shows some of the features of the input image generated by ResNET-50, and there are many similar feature pairs, like ghosts of each other. Redundancy in feature graphs may be an important feature of successful deep neural networks. We prefer to adopt them rather than avoid redundant feature maps, but in a cost-effective way.

In a well-trained normal-size network, there are a large number of redundant feature graphs. Model pruning (or model compression) and regularization are ways to reduce redundant feature graphs, and this paper believes that these redundant information will play an important role in correct identification or detection.

I recommend reading “Do We Really Need Model Compression” to better understand the above paragraph.

The main contributions

A new Ghost module was introduced that generates more features by using fewer parameters. Specifically, an ordinary convolution layer in a deep neural network is divided into two parts. The first part involves ordinary convolution, but their total number will be strictly controlled. Given the first part of the internal feature map, then apply a series of simple linear operations to generate more feature maps. Compared with ordinary convolutional neural networks, the total number of parameters and computational complexity required by this Ghost module are reduced without changing the size of the output feature graph.

Based on Ghost module, an efficient neural architecture, namely GhostNet, is established. First, the original convolutional layer in the benchmark neural architecture is replaced to prove the validity of Ghost module, and then the superiority of GhostNets on several benchmark visual data sets is verified.

Experimental results show that the proposed Ghost module can reduce the computational cost of general convolution layer while maintaining similar recognition performance, and GhostNets can surpass SOTA efficient depth model on various tasks, such as fast reasoning on MobileNetV3 mobile devices.

Methods

Ghost module

As shown in the figure above, Ghost Module first reduces the number of input channels through normal convolution, and then through a Depthwise convolution and identity(identity transformation).

1. The previous convolution can use either 1×1 convolution or normal 3×3 or 5×5 convolution.

2. φ here is cheap operation, which can be depthwise convolution or convolution in other ways, such as grouping convolution. The function of this part is to generate a similar feature map. That is, a cheaper way to preserve that redundant information.

3. The identity mapping is parallel with the linear transformation in Ghost module to preserve the inherent feature mapping.

Complexity analysis

Suppose the size of the input feature graph is H * W * C, the size of the output feature graph is H ‘* W’ *n, and the size of the convolution kernel is K *k.

In cheap operation transformation, we assume that the channel of the feature graph is M, the number of transformation is S, and the number of the new feature graph finally obtained is N. Then we can get the equation:

N = m ∗ s

Due to the existence of an Identity transformation (Identity) in the final Ghost transformation process, so the actual effective number of transformation is S-1, so the above formula can be obtained as follows:

M ∗ (S − 1) = N/S ∗ (S − 1)

Therefore, the theoretical velocity ratio is:

The theoretical compression ratio is:

Where s is much less than c.

GhostNet

Conclusion

This article comes from the public CV technical guide of the paper sharing series.

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

Reply keyword “technical summary” in the public account to obtain the summary PDF of the original technical summary article of the public account.

Other articles

CV technical Guide – Summary and classification of essential articles

Summary of tuning methods for hyperparameters of neural networks

Lightweight Model family –GhostNet: Cheap operations generate more features

ICCV2021 | MicroNet: at a low FLOPs improve image recognition

CVPR2021 | to rethink BatchNorm in Batch

ICCV2021 | to rethink the visual space dimension of transformers

CVPR2021 | Transformer used for End – to – End examples of video segmentation

ICCV2021 | (tencent optimal figure) count and rethink the crowd positioning: a purely based on the framework

Complexity analysis of convolutional neural networks

A review of the latest research on small target detection in 2021

Self attention in computer vision

Review column | attitude estimation were reviewed

CUDA optimizations

Why is GEMM at the heart of deep learning

Why are 8 bits enough to use deep neural networks?

Capsule Networks: The New Deep Learning Network

Classic paper series | target detection – CornerNet & also named anchor boxes of defects

What about the artificial intelligence bubble

Use Dice Loss to achieve clear boundary detection

PVT– Multifunctional backbone without convolution intensive prediction

CVPR2021 | open the target detection of the world

Siamese network summary

Past, present and possibility of visual object detection and recognition

What concepts or techniques have you learned as an algorithm engineer that have made you feel like you’ve improved by leaps and bounds?

Summary of computer vision terms (a) to build the knowledge system of computer vision

Summary of under-fitting and over-fitting techniques

Summary of normalization methods

Summary of common ideas of paper innovation

Summary of efficient Reading methods of English literature in CV direction

A review of small sample learning in computer vision

A brief overview of intellectual distillation