Make writing a habit together! This is the 11th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

Please follow my official account [Jizhi Vision] for more notes to share

Hi, I’m Jizhi Vision. This paper introduces a method to implement Mish operators using Tensorrt.

Yolo is familiar to those of you who have done object detection. Yolov4 was proposed in early 2020, followed by YoloV5 and other variants. There are many tricks in YOLOV4, including the Mish activation function. Mish is described in detail in this paper mish: A Self Regularized Non-monotonic Activation Function. Here I describe the functions themselves and how Tensorrt implements mish operators.

1. Mathematical expression of Mish functions

The mathematical expression for Mish is as follows:

The image of the function is expressed as follows, where:

  • The blue curve is Mish
  • The orange curve is ln of 1 plus e to the x.

Let’s take a look at Mish in YoloV4:

You can also think of Mish as a combination of Tanh and SoftPlus. Let’s see, the mathematical expression for tanh is as follows:

Softplus is mathematically expressed as follows, softPlus can be seen as the smoothing of RELU.

By comparing the mathematical expressions for Mish, Tanh, and Softplus, you can easily see that Mish can also be written like this:

Mish vs Relu

Relu can be said to be the most commonly used activation function because of its ability to overcome gradient disappearance and accelerate training convergence. Relu is a piecewise function, and its mathematical expression is as follows:

The function image is expressed as follows:

Mish versus Relu, it’s kind of tough. Consider some experiments in the paper “Mish: A Self Regularized Non-monotonic Activation Function”.

This is relu versus Mish gradient, and you can see that mish gradient is much smoother.

In terms of accuracy, the Improved network accuracy of Mish, Swish, Relu, and Leaky Relu activation functions are compared on the Imagenet-1K dataset as follows:

The following is the comparison data in the MS-COCO target detection dataset:

Based on the measured accuracy improvements, Mish had a significant advantage.

In terms of performance, the pyTorch framework compares relu, SoftPlus, Mish, mish-CUDa (RTX-2070) with FP32 and FP16 accuracy. The data below shows that RELU is faster than Mish in reasoning efficiency. Mish-cuda performance can be greatly improved after cudA optimization.

Tensorrt implements mish operator

Let’s take a look at the activation operators directly supported by the Tensorrt API:

/ /!
/ /! \enum ActivationType
/ /!
/ /! \brief Enumerates the types of activation to perform in an activation layer.
/ /!
enum class ActivationType : int32_t
{
    kRELU = 0./ /! < Rectified linear activation.
    kSIGMOID = 1./ /! < Sigmoid activation.
    kTANH = 2./ /! < TanH activation.
    kLEAKY_RELU = 3./ /! < LeakyRelu activation: x>=0 ? x : alpha * x.
    kELU = 4./ /! < Elu activation: x>=0 ? x : alpha * (exp(x) - 1).
    kSELU = 5./ /! < Selu activation: x>0 ? beta * x : beta * (alpha*exp(x) - alpha)
    kSOFTSIGN = 6./ /! < Softsign activation: x / (1+|x|)
    kSOFTPLUS = 7./ /! < Parametric softplus activation: alpha*log(exp(beta*x)+1)
    kCLIP = 8./ /! < Clip activation: max(alpha, min(beta, x))
    kHARD_SIGMOID = 9./ /! < Hard sigmoid activation: max(0, min(1, alpha*x+beta))
    kSCALED_TANH = 10./ /! < Scaled tanh activation: alpha*tanh(beta*x)
    kTHRESHOLDED_RELU = 11 / /! < Thresholded ReLU activation: x>alpha ? x : 0
};
Copy the code

You can see things like Relu, Sigmoid, TANh… You don’t have to write these yourself, just call the TRT API. Mish is not directly supported here, so there are basically two ways to think about TRT:

(1) Using existing operators, Mish can be combined with Tanh and SoftPlus;

(2) Implemented with CUDa kernel and registered in TRT with plugin;

Here is an introduction.

3.1 Existing operator combination implementation

And this is actually very easy to write. Look at the math for Mish:

So the basic idea is to call softPlus, call TANH, and pass the results of softPlus to TANH, and the output of tanH is equivalent to a Mish output. The key codes are as follows:

# # # # # # # # # ## softplus ############# Note that softPlus in TRT looks like this:alpha*log(exp(beta*x)+1)
activationSP = network->addActivation(*Layers[inputName], nvinfer1::ActivationType::kSOFTPLUS); # set alpha and beta to1
activationSP->setAlpha(1);
activationSP->setBeta(1); # # # # # # # # # # # ## tanh ##############
nvinfer1::ITensor *activationSP_Out = activationSP->getOutput(0);
mish = network->addActivation(*activationSP_Out, nvinfer1::ActivationType::kTANH);
Copy the code

So that’s all we have to do to implement mish operations using tensorrt’s existing combination of operators.

3.2 CUDA + Plugin implementation

The mathematical equivalent of mish is converted to the following mathematical expression:

The basic idea is to use CUDA to directly implement, the original combination of TANH and SoftPlus requires two operators into one operator. How to use the CUDa kernel to register.cu with the tensorrt plugin.

First you need a head, something like this:

/// mish.h
#include<NvInfer.h>
#include<NvInferPlugin.h>

calss MishLayerPlugin : public IPluginExt
{
    void mish_infer(...).;
}
Copy the code

Then there’s.cu, which is an implementation of the gpu_infer operator, something like this:

/// mish.cu
#include "mish.h"

__global__ void mish(...).
{
    ...;
}

void MishLayerPlugin::mish_infer(...).
{ mish<<<xx, xx>>>(...) ; }Copy the code

Finally, the.cpp plugin is registered to Tensorrt, which looks something like this:

/// tensort-mish.cpp

#include "mish.h"

void addmish_layer(...).
{
	nvinfer1::DataType Dtype;
	Dtype = nvinfer1::DataType::kFLOAT;
	nvinfer1::IPluginExt *mish = new MishLayerPlugin(xxx, Dtype);
	nvinfer1::IPluginLayer *mish_layer = m_network->addPluginExt(&Layers[inputName], 1, *mish); . }Copy the code


Ok, that’s it. I shared with you how tensorrt implements mish operators, and I hope you found it useful.


[Public Account transmission]

Model Reasoning: How to Implement Mish operators with Tensorrt