1. Reasoning, implementation and servitization of the model

Model reasoning is the forward computing reasoning of deep neural network. Algorithm engineers have finally trained a very good model after many hardships, but there is still a lot of work to do for the real application of this model. This is what we often call engineering deployment. There are a lot of things to consider in a landing deployment, such as model file extraction and migration, model compression, model acceleration, and so on.

Models can also be deployed in various ways:

  • Encapsulate into SDK and integrate into application or service;
  • The end – side model is made and efficient inference is carried out under the end – side calculation force.
  • Encapsulate as a Web model service that exposes API interfaces for external invocation.

Are discussed here as a service, if the model directly into the application or service, in a small number, business model can calmly deal with small size, but along with the quantity of model and the expansion of business, will cause the coupling model reasoning and business logic processing, increase the cost of model online, can also cause compete for resources. Model servitization can encapsulate deep learning model reasoning process into services and decouple business processing logic and model reasoning, that is, model servitization. Therefore, model Serving is an increasingly hot topic in recent years with the concept of microservices. For example, In 2016, Google released the corresponding framework TensorFlow Serving, which encapsulates the models trained by TensorFlow into Web services. Receives the network request from the client, performs the forward inference calculation and returns the inference result.

2. Model servitization framework

The following is a summary of outstanding open source model servitization frameworks in the current industry, as well as their advantages and disadvantages, as shown in the following table:

Service framework Support for deep learning frameworks Development co. Open source address Open time Types of supported Web call API interfaces advantage disadvantage
TensorFlow Serving 1. Native support

2. PyTorch–>onnx–>saved_model
Google Github.com/tensorflow/… 2016 GRPC and RESTful 1. Supports Docker deployment, both CPU and GPU.

2. Support multiple API communication modes and two interface types.

3. Easy to mount, support hot update, mature framework, industry experience precipitation.
1. Supporting PyTorch is inconvenient and cumbersome.

2. OpenVino and TensorRT accelerated optimization models are not supported.
OpenVino Model Server Support for most frameworks, as long as the trained model can be converted to OPENVino IR format. Intel Github.com/openvinotoo… 2019 GRPC and RESTful 1. Perfectly adapted to OpenVino’s CPU acceleration scheme, it can further improve CPU performance.

2. Flexible configuration, supporting version management and model hot update.

3. K8s is friendly and low code.
NVIDIA GPU hardware deployment acceleration is not supported.
Torch Serve Native support for the PyTorch framework. AWS and FaceBook Github.com/pytorch/ser… 2020 GRPC and RESTful 1. Simple and convenient deployment.

2. High performance and light weight.
1. The whole tool is still in the process of frequent update and improvement.

2. Only PyTorch framework is supported.
Triton Support for a variety of frameworks, including TensorFlow, TensorRT, PyTorch, ONNX and even custom frameworks. nvidia Github.com/triton-infe… 2018 GRPC and RESTful 1. High inference performance, supporting TensorRT acceleration model.

2. Rich support framework and strong adaptability.

3. Supports dynamic Batch reasoning and single-model multi-instance deployment.
The CPU acceleration model for deploying OpenVino is not supported.
BentoML 1. TensorFlow

2. PyTorch

3. ONNX
BentoML.ai Github.com/bentoml/Ben… 2019 RESTful 1. Rich support framework, flexible and convenient.

2. Container deployment is friendly.

3. Supports adaptive batch reasoning.
Framework niche, in the development stage, few references.
Kubeflow 1. TensorFlow

2. PyTorch
Google Github.com/kubeflow/ku… 2018 GRPC and RESTful 1. Internal support for TF-SERVING and Triton frameworks.

2. A fast-growing framework built specifically for container deployment.
There is little industrial experience.

Reference

  1. zhuanlan.zhihu.com/p/354058294
  2. Github.com/wuba/dl_inf…
  3. zhuanlan.zhihu.com/p/146688957