Tensor is a c++ library for high-performance reasoning on NVIDIA graphics processing units (gpus). Designed to work in a complementary way with training frameworks such as TesnsorFlow, Caffe, Pytorch, and MXNet, it is dedicated to fast and efficient network reasoning on gpus.

Some existing training frameworks, such as TensorFlow, already integrate TensorRT so that it can be used to speed up reasoning within the framework. In addition, TensorRT can be used as a library in user applications, including parsers for importing existing models from Caffe, ONNX, or TensorFlow, and for building models programmatically (C++ or Python apis).

TensorRT benefits After training the neural network, TensorRT can compress, optimize, and deploy the network at run time without the overhead of the framework. TensorRT combines layers, kernel optimization selection, and a Matrix Math method for normalization and transformation with specified precision to improve latency, throughput, and efficiency of the network.

For deep learning reasoning, there are five key factors used to measure software:

  • throughput
  • The efficiency of
  • delayed
  • accuracy
  • Memory usage

TensorRT addresses these issues by combining high-level apis that abstract out specific hardware details with an implementation of optimized reasoning to achieve high throughput, low latency, and low device memory footprint.

In general, the workflow for developing and deploying a deep learning model is divided into three phases:

The first stage is to train the model. The second stage is to develop a deployed solution. The third stage is to deploy using the developed solution.

Phase 1: In the Training stage, Training usually determines the problems to be solved, such as the input and output of the network and the loss function of the network, and then designs the network structure. Then, Training data is sorted and expanded according to their own requirements. Validation data and test data. In the process of training the model, we usually monitor the whole training process of the model to determine whether we need to modify the loss function of the network, training hyperparameters and data set enhancement. Finally, validation data is used to evaluate the performance of the trained Model. Note that TensorRT is generally not used to train any model at this stage.

Phase 2: Developing A Deployment Solution In this phase, we will use the trained Model to create and verify the Deployment Solution. This phase is divided into the following steps:

1. First of all, we need to consider how the neural network works in the system and design adaptive solutions according to the priorities in the requirements. On the other hand, there are many factors to consider when designing and implementing a deployment architecture due to the diversity between different systems. [For example, whether it is a single network or multiple networks, what post-processing steps are required, etc.]

2. After designing the solution, we can use TensorRT to build an inference engine from the saved network model. Since different frameworks can be selected during training Model, we need to convert the saved model to TensorRT format using the appropriate parser, depending on the format of the framework. The specific working process is shown in the figure below

3. Once the model is parsed successfully, we need to consider optimization options — Batch size, workspace size, mixing accuracy, and dynamic shape boundaries — that are selected and specified as part of the TensorRT build step, in which you build an optimized inference engine based on the network.

4. After creating inference Engine with TensorRT, we need to verify whether it can reproduce the performance evaluation results of the original model. If we had chosen FP32 or FP16, it would have been very close to the original result. If INT8 is selected, there will be some difference from the original result.

5, a serialization format save inference engine—–called plan file

The TensorRT library will be linked to the deployment application, and will be called when the application needs an inference result. To initialize the Inference Engine, the application first deserializes an inference engine from the plan file. TensorRT, on the other hand, is usually used asynchronously, so when the input data arrives, the program calls the enqueue function with an input buffer and a buffer of TensorRT’s placed results.

How does TensorRT work? In order to optimize the inference of the model, TensorRT performs the optimization (including platform-specific optimization) according to the definition of the network and generates the inference engine. This process is known as the build phase and can take a lot of time, especially on embedded platforms, so a typical application will be built with an Engine only once and then serialized as a plane file for subsequent use. Note: the generated plane file is not portable across platforms or TensorRT versions. In addition, since plane file is a model that explicitly specifies the GPU, we must respecify the GPU to run the plane file using a different GPU.

The construction phase performs the following optimizations on the Layer Graph:

  • Eliminate unused outputs layer
  • Elimination is equivalent to operation without operation
  • Convolution, Bias and ReLU

Aggregation of operations with sufficiently similar parameters and the same source tensor [e.g., convolution of 1×1] merges the concatenate layer by directing the output layer to the correct end result

What functions does TensorRT provide? Data link: docs.nvidia.com/deeplearnin…

  • End –

With the rapid development of technology, AMu Lab will keep up with the pace of technology and constantly recommend the latest technology and hardware in the robot industry to everyone. The greatest value of our training is to see our trainees make rapid progress in technology. If you are in the robotics industry, please follow our official account, we will continue to publish the most valuable information and technology in the robotics industry. Amu Lab is committed to cutting-edge IT technology education and intelligent equipment, so that robot research and development more efficient!