From 0 to 1, an AI reasoning application is implemented using OpenPPL

Deep learning reasoning framework OpenPPL has been open source. This paper explains how to deploy a deep learning model to complete an AI reasoning application with an example of image classification from 0 to 1.

End result: Identify the animal 🐱 by uploading a picture of a cat (or a dog)

Background knowledge: OpenPPL is an inference engine based on self-developed high-performance operator library, which provides multi-back-end deployment capability of AI models in cloud native environment and supports efficient deployment of deep learning models such as OpenMMLab.

⭐️ Welcome star: github.com/openppl-pub…

The following takes the deployment of image classification model on Linux x86 platform as an example to detail the installation and use process of OpenPPL and help students to realize an artificial intelligence application inference service from 0 to 1.

@Tian Zichen

The installation

1. Download PPLNN source code

git clone https://github.com/openppl-public/ppl.nn.git
Copy the code

2. Install dependencies

PPLNN compile dependencies are as follows:

GCC >= 4.9 or LLVM/Clang >= 6.0
CMake > = 3.13
Git > = 2.7.0

The image classification routine described in this article also requires an additional installation of OpenCV:

For APT package management systems (e.g. Ubuntu/Debian) :

sudo apt install libopencv-dev
For YUM package management systems (e.g. CentOS) :

sudo yum install opencv opencv-devel
Or install OpenCV from the source code

** Note: ** Will automatically detect if OpenCV is installed at compile time and will not generate this article’s routines if it is not

3. The compilation

CD ppl.nn build.sh -dhpcc_use_openmp =ON #Copy the code

After compilation, the image classification routine will be generated under pplnn-build/samples/ CPP /run_model/ directory, which can read images and model files and output classification results.

See building-from-source.md for more compilation descriptions

Image classification routine explanation

The source code of image classification routine is in samples/ CPP/RUN_model /classification. CPP, and the main part of it will be explained in this section.

1. Image preprocessing

OpenCV reads data in BGR HWC Uint8 format, while ONNX model requires input format RGB NCHW FP32, which requires image data conversion:

int32_t ImagePreprocess(const Mat& src_img, float* in_data) { const int32_t height = src_img.rows; const int32_t width = src_img.cols; const int32_t channels = src_img.channels(); // Convert the color space from BGR/GRAY to RGB Mat rgb_img; if (channels == 3) { cvtColor(src_img, rgb_img, COLOR_BGR2RGB); } else if (channels == 1) { cvtColor(src_img, rgb_img, COLOR_GRAY2RGB); } else { fprintf(stderr, "unsupported channel num: %d\n", channels); return -1; Vector <Mat> rgb_channels(3); split(rgb_img, rgb_channels); // in_data is used to provide data space for CV ::Mat. Mat r_channel_fp32(height, width, CV_32FC1, in_data + 0 * height * width) if CV ::Mat changes, data will be directly written to in_data. Mat g_channel_fp32(height, width, CV_32FC1, in_data + 1 * height * width); Mat b_channel_fp32(height, width, CV_32FC1, in_data + 2 * height * width); vector<Mat> rgb_channels_fp32{r_channel_fp32, g_channel_fp32, b_channel_fp32}; // Convert uint8 data to FP32 and subtract mean divided by standard deviation, y = (x-mean)/STD const float mean[3] = {0, 0, 0}; // Adjust mean and variance according to data set and training parameters const float STD [3] = {255.0f, 255.0f, 255.0f}; for (uint32_t i = 0; i < rgb_channels.size(); ConvertTo (rGB_channels_fp32 [I], CV_32FC1, 1.0f/STD [I], -mean[I] / STD [I]); } return 0; }Copy the code

Generate runtime Builder from ONNX model

You first need to create and register the engines you want to use, each corresponding to an inference back end, currently x86 and CUDA supported. This article uses only x86 engine:

auto x86_engine = X86EngineFactory::Create(); Engine vector<unique_ptr< engine >> engines; vector<Engine*> engine_ptrs; engines.emplace_back(unique_ptr<Engine>(x86_engine)); engine_ptrs.emplace_back(engines[0].get());Copy the code

Then use ONNXRuntimeBuilderFactory: : Create () function, read ONNX model, according to Create the runtime engine builder registration:

auto builder = unique_ptr<ONNXRuntimeBuilder>(
        ONNXRuntimeBuilderFactory::Create(ONNX_model_path, engine_ptrs.data(), engine_ptrs.size()));
Copy the code

Note: PPLNN framework supports mixed reasoning for multiple heterogeneous devices. Multiple different engines can be registered, and the framework automatically splits the computation graph into multiple subgraphs and schedules the computation for different engines.

3. Create a runtime

Use runtime_options to configure runtime options, such as the MM_policy field to MM_LESS_MEMORY:

RuntimeOptions runtime_options; runtime_options.mm_policy = MM_LESS_MEMORY; // Use save modeCopy the code

Create a Runtime instance using the Runtime Builder generated in the previous step:

unique_ptr<Runtime> runtime;
    runtime.reset(builder->CreateRuntime(runtime_options));
Copy the code

A Runtime Builder can create multiple Runtime instances. These Runtime instances share constant data (weights, etc.) and network topology to save memory overhead.

4. Set network input data

GetInputTensor() takes the input of runtime

auto input_tensor = runtime->GetInputTensor(0); // The classification network has only one inputCopy the code

0 0 Input tensor and redistribute tensor’s memory:

const std::vector<int64_t> input_shape{1, channels, height, width}; input_tensor->GetShape().Reshape(input_shape); Auto status = input_tensor->ReallocBuffer(); // Even if ONNX has fixed the input, PPLNN will adjust the input dynamically. // This interface must be called to reallocate memory when 0 is calledCopy the code

Unlike ONNX Runtime, PPLNN can dynamically adjust the network input size even if the input size is fixed in the ONNX model (as long as the input size is reasonable).

The data in_data preprocessed above is of fp32 type and in NDARRAY format (4-dimensional data NDARRAY is equivalent to NCHW), thus defining the format description of user input data:

TensorShape src_desc = input_tensor->GetShape(); src_desc.SetDataType(DATATYPE_FLOAT32); src_desc.SetDataFormat(DATAFORMAT_NDARRAY); // For 4-dimensional data, NDARRAY is equivalent to NCHWCopy the code

Finally, the ConvertFromHost() interface is called to transform the data in_data into the format that input_tensor needs to do the filling:

status = input_tensor->ConvertFromHost(in_data, src_desc);
Copy the code

5. Model reasoning

status = runtime->Run(); // Perform network reasoningCopy the code

6. Obtain network output data

Get the output tensor of runtime through the GetOutputTensor() interface:

auto output_tensor = runtime->GetOutputTensor(0); // The classification network has only one outputCopy the code

Allocate data space to store network output:

uint64_t output_size = output_tensor->GetShape().GetElementsExcludingPadding();
    std::vector<float> output_data_(output_size);
    float* output_data = output_data_.data();
Copy the code

As with input data, you need to define the desired output format description:

TensorShape dst_desc = output_tensor->GetShape(); dst_desc.SetDataType(DATATYPE_FLOAT32); dst_desc.SetDataFormat(DATAFORMAT_NDARRAY); // For 1-dimensional data, NDARRAY is the same as vectorCopy the code

Call the ConvertToHost() interface to translate output_tensor’s data into the format described by dST_DESC and get the output:

status = output_tensor->ConvertToHost(output_data, dst_desc);
Copy the code

7. Parse the output

Parse the score output from the network to obtain the classification result:

int32_t GetClassificationResult(const float* scores, const int32_t size) {
    vector<pair<float, int>> pairs(size);
    for (int32_t i = 0; i < size; i++) {
        pairs[i] = make_pair(scores[i], i);
    }

    auto cmp_func = [](const pair<float, int>& p0, const pair<float, int>& p1) -> bool {
        return p0.first > p1.first;
    };

    const int32_t top_k = 5;
    nth_element(pairs.begin(), pairs.begin() + top_k, pairs.end(), cmp_func); // get top K results & sort
    sort(pairs.begin(), pairs.begin() + top_k, cmp_func);

    printf("top %d results:\n", top_k);
    for (int32_t i = 0; i < top_k; ++i) {
        printf("%dth: %-10f %-10d %s\n", i + 1, pairs[i].first, pairs[i].second, imagenet_labels_tab[pairs[i].second]);
    }

    return 0;
}
Copy the code

run

1. Prepare the ONNX model

We prepared a classification model mnasnet0_5.onnx under tests/ testData for testing.

More ONNX models can be obtained by:

You can export the ONNX model from OpenMMLab/PyTorch: model-convert-guide.md
Get the Model from ONNX Model Zoo: github.com/onnx/models

The opset version of ONNX Model Zoo is lower than that of ONNX Model Zoo. You can use convert_onnx_opset_version.py under Tools to convert opset to 11:

python convert_onnx_opset_version.py --input_model input_model.onnx --output_model output_model.onnx --output_opset 11
Copy the code

For details about converting opset, see onnx-model-opset-convert-guide.md

2. Prepare test pictures

Test images can be used in any format. PNG (head shot of our cat owner) and cat1.jpg (validation set image of ImageNet) were prepared under tests/ testData:

Images of any size will work fine. If you want to resize to 224 x 224, you can change the following variables in the program:

const bool resize_input = false; // If you want resize, change it to trueCopy the code

3. Test reasoning services

run

pplnn-build/samples/cpp/run_model/classification <image_file> <onnx_model_file>
Copy the code

When the reasoning is complete, you get the following output:

image preprocess succeed! [INFO][2021-07-23 17:29:31.341][simple_graph_partitioner. Cc :107] Total partition(s) of graph[torch-jit-export]: 1. successfully create runtime builder! successfully build runtime! successfully set input data to tensor [input]! successfully run network! successfully get outputs! Top 5 Results: 1th: 3.416199 284 N02123597 Siamese cat, Siamese 2th: 3.049764 285 N02124075 Egyptian cat 3th: 2.989676 606 N03584829 IRON, Smoothing iron 4th: 2.812310 283 N02123394 Persian CAT 5th: 2.796991 749 N04033901 Quill, quill penCopy the code

As you can see, this program correctly determines that my cat’s owner is a real cat (>^ω^<).

So far, the installation of OpenPPL and the reasoning of image classification model have been completed.

In addition, PPLNN is an executable file in the pplnn-build/tools directory, which can be used for any model inference, dump output data, benchmark, etc. Use the –help option to check the specific usage. You can make changes based on this example to become more familiar with OpenPPL usage.

Communicate QQ group: 627853444, enter group secret order OpenPPL