This is the fourth day of my participation in the August More text Challenge. For details, see:August is more challenging

The previous post, 4_TensorRT overview, mainly covered the Nvida TensorRT programming API. This article mainly explains how to accelerate a Caffe model (GoogleNet model) inference through TensorRT through a simple and complete example.

System environment

The example runs in the following system environment:

  • Hardware environment: Jetson TX2

  • Software environment:

    • JetPack: V4.2
    • CUDA:CUDA ToolKit for L4T V10.0
    • cuDNN:
      • 7.3 cuDNN on Target
      • 5.0 TensorRT On Target
    • Computer Vison:
      • OpenCV on Target 3.3.1
      • VisionWorks on target 1.6
    • MultiMedia API: 32.1

TensorRT basic framework

SampleGoogleNet class implements TensorRT network building, engine building, reasoning and other interfaces based on GoogleNet model.

class SampleGoogleNet { public: SampleGoogleNet(const samplesCommon::CaffeSampleParams& params) : mParams(params) { } //! / /! Create TensorRT network //! bool build(); / /! / /! Run the TensorRT inference engine //! bool infer(); / /! / /! Clean up states, resources created at run time //! bool teardown(); samplesCommon::CaffeSampleParams mParams; private: std::shared_ptr<nvinfer1::ICudaEngine> mEngine = nullptr; // To run the network TensorRT engine //! / /! This function parses a Caffe model for GoogleNet and creates a TensorRT network //! void constructNetwork(SampleUniquePtr<nvinfer1::IBuilder>& builder, SampleUniquePtr<nvinfer1::INetworkDefinition>& network, SampleUniquePtr<nvcaffeparser1::ICaffeParser>& parser); };Copy the code

Configuration parameters

When building TensorRT, you need several important parameters, which are usually passed in from the command line when the TensorRT application is started or the default configuration parameters are used. Most of the parameters are configuration parameters that are required to build a TensorRT network and are listed as follows:

  • BatchSize: indicates the number of entries in a batch
  • DalCore: Indicates whether Deep Learning Accelerate (DLA) is used
  • DataDirs: Location of network model data
  • InputTensorNames: The number of Tensor used as input
  • OutputTensorNames: The number of Tensor used for output
  • The following two parameters are used for a CAFFe-based neural network configuration:
    • PrototxtFileName: network prototype configuration file
    • WeightsFileName: indicates the network weight file

Build (network, inference engine)

SampleGoogleNet::build(), which creates the GoogleNet network by parsing caffe’s model and building the engine used to run GoogleNet (mEngine).

/ / create a Builder for reasoning auto Builder = SampleUniquePtr < nvinfer1: : IBuilder > (nvinfer1: : createInferBuilder (gLogger)); if (! builder) return false; / / by the builder to create network definition auto network = SampleUniquePtr < nvinfer1: : INetworkDefinition > (builder - > createNetwork ()); if (! network) return false; // Create a parser auto Parser = to parse caffe's network model SampleUniquePtr<nvcaffeparser1::ICaffeParser>(nvcaffeparser1::createCaffeParser()); if (! parser) return false; // Define a network constructNetwork(Builder, network, parser) with the builder, network, parser configuration parameters; The constructNetwork function is defined as follows:  { const nvcaffeparser1::IBlobNameToTensor* blobNameToTensor = parser->parse( locateFile(mParams.prototxtFileName, Mparams.datadirs).c_str(),// load the network prototype configuration file locateFile(mparams.datadirs). MParams. DataDirs). C_str (), / / load the network, the network weights training file * / / network defined nvinfer1: : DataType: : kFLOAT); // The weights and tensors are of precision type FP32 format // Go through the outputTensorNames, convert them to the corresponding Tensor through blobNameToTensor->find, and mark the Tensor as the output of the network through markOutput. for (auto& s : mParams.outputTensorNames) network->markOutput(*blobNameToTensor->find(s.c_str())); // Set the maximum batchSize based on batchSize. builder->setMaxBatchSize(mParams.batchSize); // Set the maximum workspace size. builder->setMaxWorkspaceSize(16_MB); // Determine whether to enable DLA based on dlaCore. samplesCommon::enableDLA(builder.get(), mParams.dlaCore); } // Create Cuda inference engine based on the built network definition. mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(builder->buildCudaEngine(*network), samplesCommon::InferDeleter()); if (! mEngine) return false;Copy the code

reasoning

SampleGoogleNet::infer(), which is the main execution function of the example. It allocates buffers, sets input, and executes the engine.

// Create RAII buffer management structure (the BufferManager class handles host and device (GPU) buffer allocation and release). //BufferManager This RAII class handles allocation and release of host and device buffers, memCPy between host and device buffers to aid inference, debug dump to verify push // logic. BufferManager classes are used to simplify the buffer management and buffer any interaction between engine and samplesCommon: : BufferManager buffers (mEngine, mParams. BatchSize); / / create the inference engine running context auto context = SampleUniquePtr < nvinfer1: : IExecutionContext > (mEngine - > createExecutionContext ()); if (! context) return false; // Get the host buffer and set the host input buffer to all zeros for (auto& input: mParams.inputTensorNames) memset(buffers.getHostBuffer(input), 0, buffers.size(input)); / / copy the data from the host by the memory input buffer to the device input buffer buffers. CopyInputToDevice (); / / perform reasoning bool status = context - > execute (mParams batchSize, buffers, getDeviceBindings (). The data ()); if (! status) return false; // After the reasoning is complete, copy the data from the device output buffer to the host output buffer by memcopy buffers.copyoutputToHost ();Copy the code

Resources to clean up

nvcaffeparser1::shutdownProtobufLibrary(); Resource cleanup mainly involves the cleanup of the Protobuf used by Parser.

conclusion

This article shows the coding process for deploying a network model to TensorRT with a very simple example. It should be noted that the network model used in this paper is Caffe, and the parser used is also ICaffeParser. TensorRT also supports ONX and UFF formats of Parsers. How to import other different network models through these two kinds of Parsers will be summarized later. For example, Tensorflow and so on. When executing the reasoning, it needs to involve the copying process of data between GPU cache and CPU cache, which is quite tedious. BufferMannager is used in this paper to encapsulate these processes well. This idea may be used for reference when developing the TensorRT network in the future.