137% YOLOv3 acceleration, 10x search performance improvement! Such a surprise, the latest PaddleSlim has 10

In recent years, deep learning has been applied more and more widely in the industrial field. It not only improves the automatic production efficiency of enterprises, but also provides data support for important decisions of enterprises. AI is gradually changing people’s way of life and production. Due to the high computational complexity and large number of parameters, deep neural network greatly restricts the deployment scenarios of the model, especially for mobile embedded devices. So model miniaturization technology has become a hot topic in academia and industry in recent years.

PaddleSlim is based on the fly blade frame, fully functional, fully open source deep learning model of compression tools, set deep learning model is commonly used in compression of quantitative, clipping, distillation, model structure search, search methods such as hardware, meet the demand of more industrial deployment scenarios, further reducing the depth of the threshold of the study in the field of industrial applications.

The PaddleSlim was first released in Q1 2019 and has since gone through four iterations. PaddleSlim1.0 released this time has been significantly improved in ease of use, model adaptation, end-to-end deployment, performance improvement and other aspects, and has been successfully applied in Baidu face SDK to achieve face detection, tracking, in vivo detection, recognition and other whole process operations within 0.3 seconds on embedded devices. Helped Baidu to release the industry’s leading gecko and epidemic temperature test series of hardware.

PaddleSlim1.0 has released 10 key features, and here’s a taste of how it differs in model compression.

PaddleSlimV1.0 project address:

Github.com/PaddlePaddl…

Figure 1 Baidu Gecko face recognition terminal

01 Customize YOLO distillation scheme and refresh the accuracy of COCO detection task

Model distillation is to extract useful information from complex network and transfer it to a smaller network, so as to save computing resources. PaddleSlim1.0 not only supports traditional distillation methods and FSP-based distillation methods, but also supports customized Loss distillation strategies for different tasks.

In ImageNet classification task, MobileNetV2 accuracy was further improved by 2.1%. In addition, PaddleSlim1.0 combined with PaddleDetection developed a distillation scheme for YOLO series models, which improved the accuracy of COCO target detection dataset by more than 2%.

Table 1 Partial experimental results of distillation strategy

02 Based on the sensitivity nondestructive clipping target detection model, the precision increases instead of decreasing after clipping

In order to maximize the effect of model pruning, PaddleSlim implemented pruning based on network structure sensitivity in previous versions. PaddleSlim1.0 supports multi-machine, multi-thread parallel accelerated computation of model sensitivity. The user can draw a sensitivity line graph of the model to be clipped according to the calculation results, and then select a set of appropriate clipping rates from it, or directly call the interface provided by PaddleSlim to automatically generate a set of appropriate clipping rates.

FIG. 2 Sensitivity line graph of convolution layer

The biggest problem of model clipping is to accurately find all nodes related to the clipping convolution in the network. It is usually traversed from a global perspective, but the scalability is not strong. PaddleSlim1.0 adopts the traversal from the perspective of network nodes to find all nodes related to the pruned convolution, which is equivalent to allocating the traversal task of complex network to all types of network nodes, thus improving the scalability and theoretically supporting any complex network.

In the target detection task, a large proportion of pruning can be carried out without reducing the accuracy of the model, and the accuracy of some tasks can be improved after pruning.

Table 2 Partial experimental results of sensitivity-based clipping method

Support network configurable quantization, added offline quantization, model prediction speed doubled

The purpose of fixed point quantization is to quantize floating-point operation (FLOAT32) into integer operation (INT8) in the forward process of neural network, so as to achieve the purpose of calculation acceleration. In addition to fixed-point quantization, PaddleSlim1.0 also supports network configurable quantization, which can be quantized for user-specified local networks. That is, sensitive layers continue to be calculated with floating point numbers to reduce precision loss.

In order to reduce the overhead of quantization training, PaddleSlim1.0 also added offline quantization, which allows for high quantization accuracy for most tasks without re-train.

PaddleSlim1.0 supports quantization of the convolutional layer, the fully connected layer, the activation layer, the BIAS layer, and other unweighted layers. Experiments show that fixed-point quantization can reduce the model to about 1/4 of the original model, and 1.7 ~2.2 times of acceleration can be achieved for different models based on the Framework of Paddle Lite.

Table 3 Results of int8 fixed-point quantization training experiment

04 New NAS architecture, faster search and more flexible structure

PaddleSlim1.0 opens up a more flexible NAS API with richer predefined search policies and search Spaces. The system decouples the search space from the search policy, so that users can expand the search policy and search space.

At the search strategy level, the previous version already supports Simulated Annealing (SA) algorithm, which has faster convergence and fewer iterative steps than the traditional RL algorithm. Supports the distributed SA search policy, ensuring linear acceleration of the search speed within 40GPU cards.

In this upgrade, the popular hypernet-based one-shot NAS automatic search method is added. The one-shot NAS completely decouples the hypernetwork training from the search, and can be flexibly applied to different constraints. In the hypernetwork training process, the video memory consumption is low, and all structures share the weight of the hypernetwork. Meanwhile, a self-supervised sorting consistency algorithm is developed to ensure the consistency between the super network performance and the final performance of the model.

In terms of search space, multiple search Spaces of MobileNet, ResNet, Inception and other types are added, and multiple search Spaces of different types can be stacked for search. Users can also customize the search space.

Figure 3 One-shot network structure search principle

Table 4 One-shot search acceleration gains on ImageNet tasks

The experimental results show that the single-card one-shot strategy achieves more than 10 times faster search speed than the single-card SA search strategy.

Innovative hardware search technology to automatically match optimal models for different hardware

Hardware search is to solve the problem of customizing the optimal model structure for specific hardware in the process of hardware adaptation.

In the process of searching the optimal model structure, how to quickly obtain the actual performance of the model on the hardware is the first problem to be solved. Traditional FLOPs cannot accurately represent the performance of the model on the real hardware environment. PaddleSlim1.0 supports the method of network delay estimation based on Operator lookup table.

Users only need to set up an Operator delay table on the hardware, and after the network is generated, they can quickly obtain the network delay on the hardware based on the Operator delay table, network delay estimator and network structure.

Figure 4 Principle and process of SANAS hardware search

Table 5 Hardware platform after hardware search acceleration (ImageNet task compared to MobileNetV2)

Added Pantheon, a massively scalable knowledge distillation framework

Support distributed distillation, achieve teacher and student distillation on different Gpus or different machines. Avoid when the teacher and student is too large, can not run the situation. In the single image classification distillation task, the distillation time can be reduced by about 50%.

FIG. 5 Schematic diagram of large-scale distillation

07 Supports classification, detection, and segmentation of multiple scenarios and free combination of multiple policies

PaddleSlim1.0 supports a combination of compression strategies to achieve the highest compression ratio. In the classification task, the model size is reduced by 70% and the accuracy is improved by 1%.

Table 6 Partial model compression effects of ImageNet classification task

In the target detection task, the COCO task is improved by 0.6% and the FLOP is reduced by 43%.

Table 7 Target detection model partial model compression effect

08 Realize the whole process application from “model training -> model compression -> predictive deployment”

PaddleSlim is based on the perfect technology ecosystem of flying paddles, realizing the whole process of application from “model training -> model compression -> predictive deployment”, and the compressed model can be seamlessly landed in various hardware environments.

Figure 6. Compression and deployment process

Table 8 Time consuming data of target detection model deployed on server and mobile

The test data showed that mobilenet V1-Yolov3 increased the speed by 127%~137% on different mobile devices.

Table 9 Partial performance of ImageNet classification model deployed on server and mobile

09 Lightweight interface design, realizing policy decoupling, greatly reducing coding time

PaddleSlim1.0 implements a new interface design that uses algorithms to decouple the code between different compression methods independently, so that each method can be used independently or in a mix, greatly reducing coding time. In addition, the interface design is more simple, users only need to add the following lines of code on the original project, can quickly achieve model compression.

Below, we construct a MobileNetV1 image classification model and cut out two convolution layers to observe the tailored FLOPs. The code is shown as follows:

# Build a network
import paddle
import paddle.fluid as fluid
import paddleslim as slim
exe, train_program, val_program, inputs, outputs =  
      slim.models.image_classification("MobileNet", [1, 28, 28], 10, use_gpu=False)
print(" FLOPs before pruning: {} ". The format (slim) analysis) FLOPs (train_program)))# Declare clipper
pruner = slim.prune.Pruner()

# Clipping the network
pruned_program, _, _ = pruner.prune(train_program
fluid.global_scope(),
params=["conv2_1_sep_weights"."conv2_2_sep_weights"], ratios=[0.33] * 2, place=fluid.CPUPlace()# see FLOPs
print(" FLOPs before pruning: {} ". The format (paddleslim. Analysis. FLOPs (train_program))Copy the code

If you want to know and use the complete code, please open the link below to view the complete code example of image classification model channel clipping:

Aistudio.baidu.com/aistudio/pr…

If you’d like to see more examples of clipping code, check out the advanced Clipping tutorial in the link below:

Aistudio.baidu.com/aistudio/pr…

The complete code above can be run online on AI Studio, baidu development training platform. After entering the address of the link, choose “Login AI Stuido -> click Fork -> Start the project”.

Perfect Chinese and English documents, provide more friendly support for global developers and partners

Synchronous developer advice, improved Chinese documentation, and added English documentation to provide more friendly support for PaddleSlim developers and partners worldwide.

PaddleSlim has successfully commercialized its industry-leading gecko face recognition kit, AI temperature measurement and other products. In the 15th Baidu Star Developer Competition, PaddleSlim, as an important tool in the model miniaturization competition, was used by more than 1,800 teams from 90% of China’s top universities and related research.

In the future, PaddleSlim is willing to work with AI developers, enthusiasts and partners to jointly explore the leading technology of model miniaturization, and continue to contribute to the extensive application of AI in the industrial field.

(For more content of Baidu Star Developer Contest, click the link below:

Astar2019.baidu.com/index.html)

If you want to learn more about flying OARS, please refer to the following documentation:

PaddlePaddle- an open source deep learning platform based on industry practices

PaddleSlim Project Address:

PaddlePaddle/PaddleSlim