The last article shared “OpenPPL is officially open source for high performance deep learning reasoning platform”.

Today I will continue to talk about the origin of OpenPPL, which starts with SensePPL:

What is Senseppl?

SensePPL is a multi-back-end deep learning reasoning deployment engine carefully built by SenseTime HPC team since 2015. The model trained by the training platform can be converted to a standard format such as ONNX for rapid reasoning deployment using Senseppl.

Senseppl will load and transform the model, generate directed graphs and execution plans, perform graph-level optimization, and call the deeply trained operator library for inference calculation at runtime. All the code of the core framework and operator library is completely developed by the team and almost does not rely on third parties.

That the origins of OpenPPL

Senseppl has been used and polished in SenseTime for many years, and has accumulated many technologies and business practices in deep learning reasoning in CV field. Of course, the result is a lot of proprietary customization that is strongly relevant to the company’s business, which is similar to many closed-source technologies in the industry.

When we decided to give back to the tech community and go open source, in order to make our inference engine more accessible to developers, we decided to choose an industry-wide standard, the OnNX model format, for the open source version of Senseppl, and to do so, we redesigned the core framework of Senseppl.

The new PPL is the result of open source, and we call it “OpenPPL”, indicating that we will embrace open source and standardization in the industry.

Website 丨 https://openppl.ai

OpenPPL features

OpenPPL is a new start, with the first release starting with V0.1. Includes basic support for the x86 architecture FP32 data type and the NVIDIA GPU’s Turing architecture FP16 data type.

We will also develop reasoning solutions for both architectures for OpenMMLab’s core focus network. These two architectures cover a significant portion of deployment requirements in the cloud and server space, but they are still far from sufficient.

Over the next six months to a year, we will be iterating and improving OpenPPL to a commercially available version of V1.0. Version 1.0 will include, but not be limited to, the following features:

1. X86 CPU: The x86 processor remains the cornerstone of the cloud and server landscape and is the most widely deployed cloud computing architecture. OpenPPL will be further improved and, depending on market feedback, may support more new x86 instruction sets; It will also support AMD’s Zen architecture as well as several domestic x86 processors

2. Nvidia GPU: Continue to substantially optimize operator performance and framework support on the GPU; At the same time, it supports lower precision data format reasoning such as INT8 / INT4 of Turing framework, and opens the code of relevant quantization tool chain. We will also support NVIDIA’s latest amp architecture.

This content will greatly improve the availability of OpenPPL on the CUDA architecture

3. ARM Server: Senseppl has the longest history of supporting and optimizing the ARM architecture, but it has always been a hit in mobile and IoT scenarios.

The ARM architecture has an excellent power consumption ratio and a powerful ecosystem. With the rapid improvement of ARM processor performance, ARM Server has finally passed the threshold of large-scale application in the field of cloud computing, which represents the direction of cloud data center in the future.

OpenPPL will translate mobile ARM processor support into cloud and server support, with initial support for ARMv8/ V9 architectures in the v1.0 release.

Longer term planning

OpenPPL will absorb the needs of the industry, maintain and improve the types of operators, the types of models supported, and reason the whole chain of long-term optimization models. In addition to the reasoning of the model itself, techniques such as model pre – and post-processing, and ad-serving will also be introduced.

At present, OpenPPL is still the traditional way of directed graph representation + operator library, which can do model fusion, but the graph optimization ability is limited. The HPC team has done a lot of practice in areas such as automatic code generation, and will introduce related technologies to OpenPPL in the future to make the model optimization more complete.

We will also keep an eye on developments in the industry and introduce more technology and support, such as the recent blockbuster Transformer model technology. The development of AI back-end architecture is also more diversified. Many AI processors have already taken a considerable market share. We will expand cooperation with more AI chips and processors in the industry according to the demand, and transfer our technology accumulation on NVIDIA GPU and CPU to support more scenarios and chips.

At the same time, Senseppl’s end-to-end technology accumulation will also be released after 1.0. We hope to establish in-depth cooperation with more upstream and downstream organizations and manufacturers in the industry.

  • GitHub address: ppl.nn
  • Wen: Gao Yang