Run AI models based on TVM Rust Runtime and WASM sandbox

Author: Wang Hui/Post editor: Zhang Handong

This article from the 3.27 Meetup in shenzhen convention activity on March 27, PPT and video links: disk. Solarfs. IO/sd / 6 e7b909b…

instructions

This paper introduces a combination scheme of WASM and TVM in THE FIELD of AI: the model trained by AI framework is compiled into WASM bytecode relying on TVM’s ability of end-to-end deep learning to compile the full stack, and then the model is seamlessly migrated and deployed in the runtime environment through Wasmtime for graph loading.

Illustrates TVM and WASM technologies

TVM and Rust runtimes

As the Apache Foundation’s top open source project, TVM is a full-stack compiler for deep learning designed to efficiently compile, optimize and deploy models on any hardware platform. With a unified intermediate representation layer (Relay and Tensor IR), TVM compiles the models trained by the AI framework into computation-graph representations independent of the back-end hardware architecture, and then implements computation-graph loading and execution in different environments based on the unified runtime.

In order to implement the diagram loading operation, TVM has developed a set of abstract runtime interfaces, and provides interface implementations of various programming languages (including C++, Python, Rust, Go, Javascript, etc.) according to different runtime environments. This article mainly introduces the interface definition of TVM Rust runtime. The Rust runtime interfaces of TVM mainly include TVM_RT and TVM_graph_RT. The former fully implements the Rust interface of TVM Runtime API, while the latter implements the Rust version of TVM Graph runtime. This paper focuses on the implementation of tVM_graph_RT interface.

  • Structure definition

    Structure name Function is introduced
    DLTensor Plain C Tensor object, does not manage memory.
    DsoModule A module backed by a Dynamic Shared Object (dylib).
    Graph A TVM computation graph.
    GraphExecutor An executor for a TVM computation graph.
    SystemLibModule A module backed by a static system library.
    Tensor A n-dimensional array type which can be converted to/from tvm::DLTensor and ndarray::Array. Tensor is primarily a holder of data which can be operated on via TVM (via DLTensor) or converted to ndarray::Array for non-TVM processing.
  • The enumeration definition

    The enumeration name Function is introduced
    ArgValue A borrowed TVMPODValue. Can be constructed using into() but the preferred way to obtain a ArgValue is automatically via call_packed!.
    RetValue An owned TVMPODValue. Can be converted from a variety of primitive and object types. Can be downcasted using try_from if it contains the desired type.
    Storage A Storage is a container which holds Tensor data.
  • Constants defined

    • DTYPE_FLOAT32
    • DTYPE_FLOAT64
    • DTYPE_INT32
    • DTYPE_UINT32
  • Traits that define

    • Module
    • PackedFunc

WASM with WASI

WebAssembly technology (WASM) is a virtual machine based on a stack of binary operation instructions that can be compiled into machine code for faster and more efficient execution of local methods and hardware resources. Of course, with its powerful security and portability features, WASM can not only be embedded in browsers to enhance Web applications, but also can be used in server, IoT, and other scenarios.

Because the browser space is inherently shielded from back-end hardware platforms, the WASM technology itself does not need to consider the browser backend runtime environment; However, the non-Web domain must be adapted and compatible with different operating systems (file reading and writing, clock synchronization, interrupt triggering, etc.). For this situation, WASM community proposed a new WASI standard (WASM System Interface). Just as WASM is a logical machine-oriented assembly language, WASI is a standard interface for logical operating systems. The purpose of WASI is to achieve the seamless migration of WASM platform between different operating systems. For a detailed interpretation of the WASI standard, check out this blog post.

Plan to introduce

The early stage of the research

Currently, the industry has been exploring WASM technology in the field of AI. Tf.js community compiles traditional handwriting operators based on WASM to improve execution speed; The TVM community is based on the WASM compilation model for model reasoning in the browser domain. There is also the use of WASM portability to solve the problem of operator library and hardware device incompatibility (see XNNPACK), etc.

The project design

Earlier, our team shared some preliminary ideas for WASM’s integration with the AI domain (see here), and as tF.js and the TVM community have explored, we found that WASM’s portability naturally solves the problem of AI model implementation in full scenarios: For the models defined by traditional deep learning frameworks, users must carry out additional customized development work when conducting model training/reasoning in different hardware environments, and even need to develop a set of reasoning engine system separately.

So how do you take advantage of WASM’s portability to unify the hardware environment? Taking MindSpore deep learning framework as an example, if we analyze MindSpore model from macro and micro perspectives, it is a calculation diagram based on the definition of MindSpore IR from macro perspective, while it is a collection of a series of MindSpore operators from micro perspective. Then we can try to combine WASM with deep learning framework from the dimension of calculation graph and operator respectively, that is, to put forward the concepts of WASM calculation graph and WASM operator library.

  • WASM calculation chart

    WASM computing diagram, as the name implies, is to compile the trained model (including model parameters) into WASM bytecode, and then load it in the Runtime environment through WASM Runtime to directly perform model reasoning. With the portability of WASM, model reasoning can be implemented in any environment:

    • Web domain throughEmscriptenThe tool loads the WASM bytecode into the JS Runtime and executes it in the browser.
    • Non-web domains passWasmtimeThe tool loads the WASM bytecode into the system environment for execution.

    In the case of WASM calculation diagram, the trained model (and parameters) are stored in the system environment in advance, so IT is necessary to introduce WASI interface to interact with system resources, and then complete the operation of offline loading model. So when selecting WASM Runtime, you need to choose a tool that supports WASI (WASM System Interface) standard (such as Wasmtime), or you can simply extend WASI directly to Emscripten like the TVM community.

  • WASM operator library

    The WASM operator library is relatively easy to understand, which compiles individual operators into WASM bytecode and then provides a wrapped operator invocation interface to the upper-layer framework. However, the framework needs to load WASM operators in a way similar to dynamic linking, but considering that WASM itself does not support dynamic linking, it needs to integrate all compiled WASM operators in advance, and then provide the invocation interface of operator library to the framework layer.

After analyzing and comparing the above two ideas, and drawing on the existing work of the TVM community, we decided to start from the PATH of WASM calculation graph to explore further and make the best use of TVM full stack compilation capability to quickly implement the prototype of the solution.

Plan implementation

  • WASM map compilation

    As shown above, we can compile the model directly into the graph.o executable using the Python interface of TVM Relay, but note that the generated graph.o file is not recognized directly by the WASM Runtime module. The WASM Graph Builder module shown in the figure must first be loaded through the TVM Rust Runtime and then compiled directly into WASM bytecode (the WASm_graph.wASM file in the figure) through the Rust compiler. Why do we have to go through this tedious transition? Because graph.o contains primitives for Relay and TVM IR, we cannot directly convert these primitives to WASM primitives.

  • WASM figure load

    The diagram load phase (from the diagram above) seems very simple, but the reality is much more complicated. First, the WASM runtime defines a whole set of assembler level user interfaces for WASM IR, which is extremely unfriendly to upper-layer application developers. Second, WASM currently only supports integer types (i32, U64, etc.) as function parameters, which means that tensor types in deep learning domains cannot be passed in natively. Not to mention support for thread, SIMD128 and other advanced features.

    , of course, each new areas of exploration are inseparable from all kinds of problems, and solve the problem itself is the work of technical/scientific research personnel, so we have no hope that the WASM community but to try to solve these problems: since senior WASM’t upper user oriented API, we can according to their own needs to develop a set of; Although WASM does not support passing structs or Pointers, we can use the Memory mechanism to pre-write data to WASM Memory and convert the Memory address to type I32 as a function parameter. Some of the changes are a little “un-human,” but they clearly show our thoughts and ideas, and that’s enough.

Due to the limited space, here attached project implementation of the complete code, welcome interested leaders to exchange and discuss.

The codebase of the project as a whole is shown below:

Wasm -standalone/ ├─ Md ├─ Build.rs // Build Script │ ├─ Cargo.Toml // Project dependents │ ├─ lib / / by TVM Relay API compiler generated calculation chart of the storage directory │ │ ├ ─ ─ graph. The json │ │ ├ ─ ─ graph. O │ │ ├ ─ ─ graph. The params │ │ └ ─ ─ libgraph_wasm32. A │ ├ ─ ─ SRC / / WASM chart generated module source code │ │ ├ ─ ─ lib. Rs │ │ ├ ─ ─ types. The rs │ │ └ ─ ─ utils. Rs │ └ ─ ─ the tools / / Relay Python API build script the storage directory of │ ├ ─ ─ Build_graph_lib. Py └ ─ ─ wasm runtime / / wasm chart generated module ├ ─ ─ Cargo. Toml ├ ─ ─ SRC / / wasm chart generated module source code │ ├ ─ ─ graph. The rs │ ├ ─ ─ lib. Rs │ ├ ─ 0├ ─ 055 (PDF, PDF, PDF, PDF, PDF, PDF, PDF, PDFCopy the code

To give you a more concrete understanding of the scheme, we have prepared a simple prototype: Compile the ONNX-generated ResNet50 model into wasM_graph_RESnet50.wASM file through the TVM Relay API, and load WASM in the runtime environment through Wasmtime to complete the model reasoning function (see here for details).

The future planning

TVM community linkage

As mentioned earlier, the scheme is still in the experimental stage, so we will be exploring more possibilities together with the TVM community. The preliminary features currently planned include:

  • Support data parallel processing based on SIMD128;
  • Further improve the Rust Runtime API module of TVM community to enable native support for WASM Memory interconnection;
  • AutoTVM optimization based on WASM backend;
  • More network support.

WASM operator library

Currently we have only explored the direction of WASM computation graph, but the direction of WASM operator library may unlock more potential if WASM technology is combined with a deep learning framework such as MindSpore. Here are a few scenarios that are more suitable for the WASM operator library:

  • Many deep learning frameworks have defined their IR and compiler pipelines, and only the WASM operator library can seamlessly integrate with the graph compiler layer of these frameworks.
  • WASM calculation graph can only be used for model reasoning, while WASM operator library can be applied to model training/verification/reasoning scenarios.
  • At the level of portability, WASM computing graph cannot guarantee the consistency of its internal operators, while WASM operator library really realizes the portability of operators in the whole scenario of eeding-edge cloud.

As shown in the figure above, we plan to sort out a set of end-to-end integration solutions from the level of WASM operator library (covering the above scenarios first) to truly realize the integration of WASM technology in all scenarios in the AI field.

Join us

In order to promote Rust programming language ecology in the FIELD of AI, we have launched a non-commercial organization called Rusted AI. Any developer interested in Rust and AI technology can apply to join. The community currently provides the following communication channels:

  • Rusted AI wechat group: Welcome to add the little assistant’s wechat (wechat:mindspore0328, note:Rusted AI), the little assistant will pull you into the Rusted AI discussion group after passing the certification
  • GitHub Teams: The community currently provides open discussion via GitHub Teams, which is only available to members of the organizationIndividuals making ID[email protected], can participate after passing the certificationCommunity discussion
  • Eco-crowdfunding Project: The community recently launched an awesome-Rusted-AI crowdfunding project to document all open source projects related to the linkage between Rust and AI

Contents: Rust Chinese Collection (Rust_Magazine) march issue