The authors introduce

Yuan I, a member of the proprietary Dingding front-end team, responsible for the development of proprietary DINGding PC client, on-end application and on-end module plug-in.

Confusion before learning

This article is based on an introduction to AssemblyScript. The goal is to get your team members to understand it quickly. If you have any mistakes, please point them out in the comments section.

Before learning AssemblyScript, I had no experience with it at all. All I knew about AssemblyScript was that I used JS to write WASM. Therefore, I got to know AS with my work questions. Here are my questions before learning:

  • What is WebAssembly’s role in front-end technology?
  • Why is WebAssembly the closest technology to the front, and why not AssemblyScript, the closest language to the front?
  • Why does the official list so many languages? Such as Rust/ C++, where are they going?

Web profile to an Assembly

AssemblyScript, we need to know what a WebAssembly is. Mozilla, Intel, Red Hat and Fastly today announced the launch of the Bytecode Alliance, a new open source organization. The goal is to work on standards such as WebAssembly and WebAssembly System Interface (WASI) to create a secure, efficient and modular new Runtime engine and multilingual compilation tool chain, while promoting their use on as many platforms and devices as possible. WebAssembly is now supported by Internet Explorer, Firefox, Google Chrome, and Safari.

WebAssembly has many features that can bring a variety of value in different environments, but here are a few:

  1. Security sandbox: The WASM module runs in its own private sandbox. Programs in the sandbox cannot access address Spaces outside the sandbox.
  2. Cross-end applications: WebAssembly delivery is for virtual machines and requires only the corresponding WASM Runtime to compile. It is not tied to a specific architecture platform or environment.
  3. Lightweight: The WASM specification is designed with the browser in mind to download and run from the server side over the network. The opcodes are very streamlined and can be loaded and run very quickly with very few resources to create a running instance.
  4. High performance: WebAssembly’s bytecode design is designed to be just-in-time compile-friendly, allowing for fast compilation and high execution speed. WAMR’s AOT mode performance is almost Native.

AssemblyScript introduction

AssemblyScript is similar to a TypeScript to WebAssembly compiler. The biggest difference from TypeScript is that it adds the concept of type to JavaScript. AssemblyScript has since become a very popular language and is easy to get started with even if you are not familiar with TypeScript because it only uses a limited subset of TS.

Because of its similarity to JavaScript, AssemblyScript allows Web developers to easily integrate WebAssembly into a site without the involvement of a completely new language.

AssemblyScript is simple and useful

AssemblyScript is an example of how to develop a TypeScript application. The first step is to write a simple addition function called AssemblyScript, which is similar to TypeScript. The only difference is the AssemblyScript type of I32. It represents the 32-bit integer type here.

export function add(a: i32, b: i32): i32 {
  return a + b;
}
Copy the code

Then run the TNPM run build command, and AssemblyScript generates the WebAssembly code in the build directory. Now that we are using WebAssembly’s addition function in normal JS code, the following function is a test case to verify the addition function, expressing 1 + 2 = 3.

const assert = require("assert"); const myModule = require("./build"); // reference wASM assert. StrictEqual (mymodule. add(1, 2), 3); // Call the addition functionCopy the code

Why AssemblyScript is so efficient

I think AssemblyScript really started with the standard library. Just as the STL standard library of C++ is so classic, AssemblyScript will also try its best to learn syntax in this language.

Because one of the major scenes of WebAssembly is algorithm optimization, if the language itself does not provide a strong base standard library capability to support, this aspect is bound to lag behind. In this aspect, the compiler source code of AssemblyScript uses JS to write, which is very easy for students to understand the front-end.

A powerful standard library

Take a quick look at AssemblyScript’s standard library catalogue and you can find many of the same types in JS, such as Array, Date, Error, Map, Math and String. Almost every type includes a statement in its documentation: Is very similar to JavaScript’s. But the JavaScript equivalents are not the same thing.

Although very similar, a closer look reveals that these basic types are much more capable of being analyzed by numeric types than JS.

First of all, we can find that AS requires more detailed memory space for basic types. Take basic integer types AS examples, including 1 bit, 8 bit, 16 bit, 32 bit, 64 bit, 128 bit and so on.

The amount of memory allocated behind each type is also different, and this refinement of memory size is reflected in each base type. Such details can help us design higher-performance code than JS, and we can have very fine-grained control over the programs we write.

Take this very simple Hello function as an example. Let’s write a simple version of Hello in AssemblyScript. Note that the string in this case is not equivalent to the string in JS; the string in this case represents the string in AssemblyScript.

export function hello(name: string): string { return "Hello, " + name + "!" ; }Copy the code

We then use the exportRuntime compile parameters to expose some WebAssembly runtime methods.

npm run asbuild:optimized -- --exportRuntime
Copy the code

Let’s use the WebAssembly code in a normal JS file. When we use Hello, we can’t call the function directly using Hello (“Tomas”). Instead, we use __newString to build an AssemblyScript character type before passing it in as an argument.

Similarly, when retrieving the return value, we cannot use the return value of Hello directly. We must use __getString to convert AssemblyScript from memory into a JS string type.

var { hello,
  __newString,
  __getString } = wasm.exports;

// allocate string in memory
var pti = __newString("Tomas");
 
var pto = hello(pti);
 
// retain string from memory
var str = __getString(pto);
Copy the code

So AssemblyScript is really just syntactically close to TS, and its other language capabilities are closer to those of its rival C++/Rust.

AssemblyScript, for example, redefines operator overloading. It can reimplement different types of >=, ==, <, etc. This has similar capabilities in C++, but it’s almost impossible to reimplement operator overloading in JS.

Using a compiler

Just as TypeScript uses TSC to compile TS to JS. AssemblyScripts compiles as to WASM using ASC. Again, take the simplest addition function:

export function add(a: i32, b: i32): i32 {
  return a + b;
}
Copy the code

Let’s run NPM run asbuild:untouched to hold the non-optimized state and run NPM run asbuild:optimized to hold the optimized state.

  "asbuild": "npm run asbuild:untouched && npm run asbuild:optimized -- --exportRuntime"
Copy the code

The.map file is the sourcemap file that helps us debug,.wasm is the wASM binary that we can read directly,.wat is the text representation of the WASM binary. After all, binaries can’t be read directly, so with.wat we can get more insight into the efficiency behind the code we write:

Wat file to analyze performance

Let’s briefly read the readable-compile intermediate state of wat generated by add.

Module represents a tree with a root node of “module”. Type is used to declare the parameter and return types of add. Then, in a non-compile-optimized state, datA_END, stack_pointer address, and heap_base start address are declared globally.

We then declare a block of memory using a table to store references to storage functions. Elem to initialize the table region. Then export the add function. Next comes the function declaration:

Although browsers compile WASM into something more efficient, wASM execution is defined on a stack machine. That is, the basic idea is that each type of instruction performs a value on and off the stack.

Get reads the parameter values of the function and pushes them onto the stack. I32.add takes the two i32 parameter values from the stack and computes their sum, finally pushing the result onto the stack.

(module
 (type $i32_i32_=>_i32 (func (param i32 i32) (result i32)))
 (global $~lib/memory/__data_end i32 (i32.const 8))
 (global $~lib/memory/__stack_pointer (mut i32) (i32.const 16392))
 (global $~lib/memory/__heap_base i32 (i32.const 16392))
 (memory $0 0)
 (table $0 1 funcref)
 (elem $0 (i32.const 1))
 (export "add" (func $assembly/index/add))
 (export "memory" (memory $0))
 (func $assembly/index/add (param $0 i32) (param $1 i32) (result i32)
  local.get $0
  local.get $1
  i32.add
 )
)
Copy the code

There is a more detailed document about how to read and learn wat document in this part. Please refer to developer.mozilla.org/zh-CN/docs/…

Compiler optimization

We then try to optimize the following code using asC’s compiler optimization options by adding the c, D, and E redundant implementations to the add implementation. Then look at what the compiler does:

export function add(a: i32, b: i32): i32 {
  const c = a + b;
  const d = c - b;
  const e = d + b;
  return e;
}
Copy the code

As you can see, the non-optimized version is very similar to the previous version, except that during the process, we pushed the temporary variable several times because of C, D, E, and executed several times more local. Get, local. Set operations, although redundant, but the original code is restored.

(module (type $i32_i32_=>_i32 (func (param i32 i32) (result i32))) (global $~lib/memory/__data_end i32 (i32.const 8)) (global $~lib/memory/__stack_pointer (mut i32) (i32.const 16392)) (global $~lib/memory/__heap_base i32 (i32.const 16392)) (memory $0 0) (table $0 1 funcref) (elem $0 (i32.const 1)) (export "add" (func $assembly/index/add)) (export "memory" (memory $0)) (func $assembly/index/add (param $0 i32) (param $1 i32) (result i32) (local $2 i32) (local $3 i32)  (local $4 i32) local.get $0 local.get $1 i32.add local.set $2 local.get $2 local.get $1 i32.sub local.set $3 local.get $3 local.get $1 i32.add local.set $4 local.get $4 ) )Copy the code

Then we look at the optimized version of the code, we have significantly reduced the number of lines, especially the reduction of c, D, E local. Set operation, we no longer store the result of the + expression each time, and then put in the three temporary variables, but directly use the value of the operation to participate in the next operation.

Here is just a brief introduction of an optimization case, behind the compiler has a lot of optimization, you can understand from the source.

(module (type $i32_i32_=>_i32 (func (param i32 i32) (result i32))) (memory $0 0) (export "add" (func $assembly/index/add)) (export "memory" (memory $0)) (func $assembly/index/add (param $0 i32) (param $1 i32) (result i32)  local.get $1 local.get $0 local.get $1 i32.add local.get $1 i32.sub i32.add ) )Copy the code

More granular manipulation of memory

Similar to other languages that use linear memory, all data in AssemblyScript is currently stored in linear memory at a specific offset so that it can be read and modified by other parts of the program until the new GC scheme becomes available.

Static memory

Strings and arrays (constant values) encountered by the compiler while compiling the program are allocated in static memory. AssemblyScript, unlike other languages, has no concept of a stack of its own, relying entirely on WebAssembly’s execution stack.

Static memory starts at the reserved memory address and ends at the _heap_base heap start address. The dynamic memory address starts at the start of the heap, where static memory ends.

Dynamic memory

Dynamic memory (often referred to as the heap) is managed at run time by the garbage collector. When a program requests space for a new object, the runtime memory manager reserves an appropriate region and returns a pointer to that region to the program. Once an object is no longer needed and cannot be accessed, the garbage collector returns the object’s memory to the memory manager for reuse.

For example, in the following standard library heap section, we can use the heap API to dynamically allocate memory, just like C, and we can also freely free the memory blocks we previously allocated.

Wasm uses linear memory stored in a specific memory offset that is isolated from other programs.

In AssemblyScript, static data known at compile time is stored in static memory, and then dynamic memory managed on the heap at run time. A program accesses a block of memory through a pointer. Dynamic memory is tracked by the runtime garbage collector and reused when the program no longer needs it.

The heap API is used in AssemblyScript, but in AssemblyScript we can also use methods such AS __newString to allocate memory for AS strings. This can also be done using __new, which is provided in the AssemblyScript Loader section, as described below:

var { hello, memory, __new, __pin, __unpin } = wasm.exports; var input = "Tomas"; var length = input.length; // Apply memory (usize, String (id=1)) var pt = __new(length << 1, 1); Var ibytes = new Uint16Array(memory.buffer); for (let i = 0, p = pt >>> 1; i < length; ++i) ibytes[p + i] = input.charCodeAt(i); Var pti = __pin(pt); Wasm var pto = __pin(hello(pti)); Var SIZE_OFFSET = -4; var olength = new Uint32Array(memory.buffer)[pto + SIZE_OFFSET >>> 2]; Var obytes = new Uint8Array(memory.buffer, pto, olength); var str = new TextDecoder('utf8').decode(obytes); // Remove fixed objects for GC __unpin(pTI); __unpin(pto); console.log(str);Copy the code

Use fixed-size data structures for storage, which minimizes build and value overhead as well as memory footprint.

GC (Garbage Collection)

This part of the mechanism changes with the release, including future changes, here is just a brief introduction, more details can be found on the official website.

AssemblyScript 0.17:

Reference-counting based garbage collector The garbage collector is discarded

AssemblyScript 0.18:

The following three memory management methods can be specified at compile time with different compilation parameters:

Incremental Runtime: combines ITCMS garbage collection mechanism with TLSF memory management mechanism.

ITCMS is a Java memory reclamation mechanism. It can determine whether to reclaim memory according to the status of black, white and gray. It can be asynchronous and the pause time is short.

TLSF allocates memory by managing a set of linked lists.

Fixed memory blocks (referred to as PINS) are especially important when running with increments, because it may also reclaim memory blocks whenever a WebAssembly allocates memory. Accessing a memory block without fixing it is likely to result in random errors like “use freed memory blocks” and is difficult to debug.

var aPtr = exports.__pin(exports.__newString("hello")); // next line may collect
var bPtr = exports.__newString("world"); // allocates
var cPtr = exports.__pin(exports.stringConcat(aPtr, bPtr)); // puts args on stack
exports.__unpin(aPtr);
// ... do something with cPtr ...
exports.__unpin(cPtr);
Copy the code

Minimal runtime:

It is a simpler two-color tag and scan (TCMS) garbage collector built on top of TLSF, which is usually a compromise garbage collection solution.

The downside is long pause times.

Unlike Incremental Runtime, the Minimal Runtime does not run interlaced in code (referring to repeated pins), but can be called externally to perform a full garbage collection loop (referring to the collect method). But don’t call __collect() too much to avoid overmarking. Minimal Runtime usually has better throughput than Incremental Runtime if used properly.

One important usage difference between Incremental Runtime and Minimal Runtime is when fixed memory blocks are required. Minimal Runtime allows you to manually control the GC runtime, while Incremental Runtime may free memory at the same time as it is allocated in WebAssembly code. For example, the following code snippet runs fine in Minimal Runtime, but may randomly fail at Incremental Runtime:

var cPtr = exports.stringConcat(
  exports.__newString("hello"),
  exports.__newString("world")
);
// ... do something with cPtr ...
exports.__collect();
// don't use cPtr anymore
Copy the code

Stub runtime

Memory is allocated but not freed.

function compute() {
  exports.doSomeHeavyWorkProducingGarbage()
}
compute()
Copy the code

In the future

The WebAssembly GC proposal is still in progress and will be fully switched to in the future, so the above solution is still tentative for now.

Github.com/WebAssembly…

The limitation of AssemblyScript

  • Unable to access the DOM API: Due to its application scenario,WebAssemblyThe design is more biased towards high efficiency and high performance scenarios, rather than completely replacing JS, so access capabilities like DOM API are shielded at the language layer. It is also because the DOM API has been neutered that it does not have the ability to break the rendering interface, soWebAssemblyIt’s also considered a sandbox scenario.
  • Cannot use dynamic features: the artifacts are compiledAssemblyBinary code, which removes dynamic scene features for high performance.
  • Redefining types and libraries: While syntactic is similar to JS, the actual usage, library, and implementation of the JS kernel are vastly different.

Assembly Loader

Finally, the AssemblyLoader is available to help you load WASM files.

It is essentially a small module loader that makes using the AssemblyScript module as easy as possible without sacrificing efficiency. It is implemented using code from the WebAssembly API and provides operations such as allocating and reading memory for strings, arrays, and class object types. __newString and __getString are all implemented in @AssemblyScript/Loader.

const fs = require("fs");
const loader = require("@assemblyscript/loader/umd");
const imports = { 
  "assembly/index": {
    declaredImportedFunction: function() { }
  }
};
const wasmModule = loader.instantiateSync(fs.readFileSync(__dirname + "/build/optimized.wasm"), imports);
module.exports = wasmModule.exports;
Copy the code

Take a look at the core implementation of loader.instantiatesync loading the WASM module, which is encapsulated using WebAssembly.instance:

function instantiateSync(source, imports = {}) {
    const module = isModule(source) ? source : new WebAssembly.Module(source);
    const extended = preInstantiate(imports);
    const instance = new WebAssembly.Instance(module, imports);
    const exports = postInstantiate(extended, instance);
    return {
      module,
      instance,
      exports
    };
  }
Copy the code

WebAssembly language selection

With this basic understanding behind us, let’s revisit the selection of WebAssembly language technologies:

Rust:

  • Advantage:

    • Rust -> WebAssembly is much more mature than AssemblyScript
    • There are no GC interrupts and no overhead abstraction, which can lead to a qualitative improvement in memory footprint and performance.
  • Disadvantage:

    • Rust is too low-level, costly to learn, and very restrictive to developers in programming paradigms
    • Maintainability is relatively low. Some companies don’t pay much attention to Rust

Application: More suitable for performance improvement scenarios. Algorithm optimization, image processing, file processing and other pure optimization scene.

AssemblyScript:

  • Advantage:

    • AS runtime has GC and supports most OOP writing methods, which has a qualitative improvement in development efficiency compared with low-level Native languages.
    • The language closest to the front end is the most ecologically acceptable.
  • Disadvantage:

    • AS is not mature enough in terms of language, with no Virtual Overload support, limited closure support, and primitive exception handling.
    • Less performance and flexibility than Rust/ C++
    • The tool library in the AssemblyScript ecosystem is incomplete

Application: Suitable for more front-end business development scenarios. Such as sandbox scenario, plug-in scenario, applets execution thread scenario.

WebAssembly practice cases

Figma

Figma is a simple and easy to use online collaborative design software, support online multiplayer collaboration. Its interface, which uses WebAssembly to support layout algorithms and file parsing, is three times faster. In practice, the results are similar to those of Sketch, with some areas being smoother. The WebAssembly presence in Figma as a core competency is a good landing scenario.

Figma also investigated whether WebAssembly could be used as a plugin sandbox scenario. Very good article: www.figma.com/blog/how-we…

Bilibili Web submission system

Bilibili’s submission system also uses WebAssembly + AI to automatically select covers by reading local videos, normally by uploading the covers to the server.

WebAssembly has significantly improved performance and user experience. WebAssembly is responsible for reading local video, generating images, and serving the server for safe optimization.

WebAseembly on the blockchain

WebAssembly has emerged as a strong alternative to the current EVM virtual machine. Features such as host independence, security sandbox, and overall simplicity make it an ideal runtime for smart contracts. In addition, it allows contracts to be developed using multiple modern programming languages (Rust, C++, JavaScript, and so on). The Ethereum team has been trialling a WebAssembly-based contract engine, eWASM, and plans to officially release it sometime in 2021.

Summary of application

  • File processing: upload and slice large files.
  • Image processing: such as cover generation, two-dimensional code generation.
  • AI direction
  • Blockchain direction
  • The sandbox direction
  • Cross-end orientation: WebAssembly, for example, can be used as a JS execution link and as a logical thread of a small program.