Huawei | WebAssembly security research

Author: Huawei Trusted Software Engineering and Open Source 2012 Lab

In April 2015, the W3C established the WebAssembly Working Group to oversee and standardize WebAssembly proposals, and to advocate that browser vendors use consistent specifications. WebAssembly technology was born out of the browser, but as of today, 2021, it has moved out of the browser and is poised to make computing ubiquitous.

WebAssembly is inherently safe, portable, efficient, and lightweight, making it ideal for secure sandbox scenarios. In addition to the browser space, WebAssembly is getting a lot of attention from the container, function computing, and IoT/Edge computing communities. The WebAssembly sandbox can even safely run in the same process as other code, much like Software Fault Isolation (SFI).

This article will discuss the application value of WebAssembly from four aspects: features, security, performance, and application domain.

WebAssembly features

WebAssembly (or WASM) is a portable compilation format for the Web that provides smaller file sizes and faster load times. WebAssembly is intended to be a compilation target for high-level languages. Currently, you can create wASM modules using the C, C++, Rust, Go, Java, C# compilers (and more).

Current state of WebAssembly

As of today, 2021, WebAssembly’s phase goals have not been fully achieved. The official WebAssembly Roadmap shows the current state of affairs.

As you can see, the three major browsers, as well as the major WASM runtimes outside the browser, all support about half of WebAssembly’s features. For exception handling, Module Linking, tail calls, threads, Interface Types and so on are not fully supported.

In 2021, module links and interface types are expected to be supported.

The module linking proposal aims to declare linking of two or more modules in the standard at runtime, rather than requiring developers to manually write code to export one module to another, which is time-consuming and error-prone. If one of the modules is modified, the Webassembly engine can handle wiring them together and possibly optimize them.

The interface type proposal will facilitate communication between modules (guest) or hosts (host). This is because the WebAssembly module natively supports four data types (32-bit integer, 64-bit integer, 32-bit floating point, and 64-bit floating point). In order for WebAssembly modules written in different languages to communicate with each other, interface types provide some kind of interface-like mapping that correctly maps the type to communicate with, so that the same communication can be done using more expressive, high-level types.

Wasmtime and Wasmer, which provide Runtime for WebAssembly on Server Side, have basically entered GA status.

WebAssembly memory model

WebAssembly provides only a sandboxed linear memory and does not provide managed memory (the heap) or a garbage collector.

Linear memory is a contiguous, byte addressable range of memory stretching from offset 0 to different memory sizes. This size is always a multiple of the WebAssembly page size, fixed at 64KiB. Each WebAssembly instance has a specially specified default linear memory.

Linear memory has the following features:

All data in linear memory is always writable; there is no read-only memory.
Linear memory is always zero-initialized.
Every pointer to linear memory is valid.
Linear memory is deterministically arranged, that is, the stack position can be predicted from the compiler and program.
Linear memory can ask the VM to increase linear memory using the memory.growin directive.

Data management

Unmanaged data: Data that resides in linear memory. It is not protected by the VM and is completely controlled by instructions written to program memory.
Managed data: that is, local variables, global variables, execution values on the stack, and return addresses, reside in dedicated storage that is processed directly by the VM. WebAssembly code can only implicitly interact with managed data through directives and cannot directly modify its underlying storage. And the managed data has no address.

Because WebAssembly has only four basic types, managed data can hold only instances of these four basic types. Other non-scalar (compound) types such as strings, arrays, and lists must be stored in linear memory.

Because there are no addresses in managed data, any variables that have been addressed in the source must also be stored in linear memory.

Memory layout

Because there is a lot of data in the source code as function scope, global, or dynamic life cycle, the compiler creates areas in linear memory for call stack, heap (dynamically allocated memory), and static data, respectively.

Therefore, it can be divided into:

Unmanaged stack: call stack created by the compiler on linear memory.
Managed Evaluation stack: Used to hold intermediate values of instructions, managed by the VM.
Managed call stack: Stores local variables, return addresses, etc., managed by the VM.

There is no clear distinction between.data,.rodata, and.bss in WebAssembly compared to elf binaries because linear memory has no read-only region and is always zero-initialized.

Different compilers result in different memory layouts. The figure above shows Emscripten’s FastComp Backend and Upstream Backend, as well as the memory layout of clang and Rustc compiled Wasm binaries. The memory layout of Clang and Rustc is similar: static data is placed between the stack and the heap.

Import and export

The Wasm module provides symbolic information that the Host can read to link to the Wasm module, including: Export/Import/Entry Point.

Export symbols refer to the wASM module components that Host can access, including:

function
The global variable
memory
Table (table)

The Wasm module export function is type safe. The function derived information consists of the function signature of the target function. Therefore, the host environment can validate function parameters before the function is called. It eliminates invalid parameter passing, thus improving the overall program security.

The Wasm module can also export global variables and memory. It can exchange information between the host and the Wasm module without calling the Wasm function. The global variable export is also type safe, but it only provides the Wasm base type.

For larger data, a memory export is a better choice because it allows efficient transfer of raw data between the Wasm module and the host. However, memory exports are untyped and unstructured. Host code needs to be careful when exporting operational data through memory.

Again, imports refer to functions/memory/tables from the Host environment or other modules. However, the required elements must be mentioned before Host instantiates the WASM module. The Host environment ensures that the Host code provides all the necessary elements, including proper memory allocation, table allocation, and functions. The module instantiation process is similar to the linking process in compilation, where the code is linked to all required symbols before execution.

Control flow safety

WebAssembly does not allow jumping to any instruction point.

Instructions in the WASM module are grouped into blocks, which are divided into three types:

Regular block
Conditional block
Circular piece of

A block starts with the block keyword and ends with the end directive. The end directive is a pseudo directive that marks the end of the block. Operands outside a block cannot be accessed from inside a block. The program needs to populate the value with a local variable to transfer the value within the block. Wasm verifies the stack state based on the resulting block type.

Wasm has three types of branches:

An unconditional branch (BR) is always executed regardless of the external state.
Conditional branching (br_if), which consumes an operand from the stack to determine whether to branch.
Table branching (BR_TABLE), which requires an integer operand as an index to select the target block, behaves differently for each block type.

Wasm table

Wasm tables are used to exchange reference information.

Wasm tables store references to a function that a Wasm program can use to perform dynamic or indirect calls. Typically, the compiler populates the table as needed. The compiler uses function Pointers to detect whether tables are needed for dynamic function calls.

Another case, such as using a function pointer as the return value, also requires a table. The compiler statically populates the table and uses the table index as a reference to the instructions.

Currently, Wasm does not allow manipulation of tables in Wasm programs.

WebAssembly security

WebAssembly ecological security can be viewed in two dimensions:

Host security, the runtime environment can effectively protect the Host system from malicious WebAssembly code.
Wasm binary security with built-in fault isolation prevents other benign WebAssembly code from being exploited for evil.

Host Security Analysis

The SANDbox and memory isolation mechanism provided by the WASM VM can effectively reduce the security attack surface. As WebAssembly moves out of the browser, it’s geared toward a more general purpose scenario. WASM also faces more complex security challenges.

WASI provides a competency-based security model. WASI applications follow the minimum permission rule, where an application can access only the exact resources it needs to perform. Traditionally, if an application needs to open a file, it calls the system operation open with a pathname string. The system call then checks whether the application has the relevant permissions to access the file, such as the user/group based permissions model implemented by Linux. Such an implicit security model relies on proper security management configuration, such that once a privileged user has executed a malicious application, it can access any resources in the system. For A WASI application, if it needs to access a system resource such as a specified file, it needs to explicitly pass in a licensed file descriptor reference from the outside and cannot access any other unauthorized resource. This dependency injection approach avoids the potential risks of traditional security models.

WASI’s security model is very different from traditional operating system security models and continues to evolve. For example, the Bytecode Consortium proposes nanoprocess to solve the security coordination and trust transfer between application modules.

The WebAssembly/WASI security model still has shortcomings, such as:

Resource isolation:
For memory resources, WebAssembly implements a linear memory model. A WebAssembly application can only access an incoming chunk of logical linear memory using an index. The WASM VM is responsible for determining the physical address of the memory. The WASM application cannot obtain the real address of the memory and cannot launch attacks by means of cross-border access. Therefore, it is theoretically possible to impose resource capacity limits on WASM applications. However, some WASM VMS currently do not have accurate memory isolation limits.
For CPU resources, some WASM VMS can measure CPU resources used by applications. However, most WASM VMS cannot implement precise quota, priority, and preemptive scheduling.
Currently, WASM has no isolation capability for I/O resources such as IOPS.
Network security:
WASI’s Capability model is relatively easy to secure for file system access. However, this static security model cannot be applied to dynamic network application scenarios. In a microservice architecture, applications often perform Service discovery through Service Registry, implementing dynamic invocation bindings for Service callers and providers. This semantics cannot be described and injected with the static Capability model. As a result, the NETWORK part of WASI’s API is still under discussion. Existing WASI network security models, and related discussions,

Wasm binary security analysis

The paper “Everything Old Is New Again:Binary Security of WebAssembly” examines this issue specifically.

This paper builds a set of attack primitives for the current features of WebAssembly.

As shown in the figure above, an attack can be made by constructing the above three-step attack primitive.

Get the write primitive.
Overwrite security-related data on the stack or heap.
Malicious behavior is triggered by control flow distribution or manipulation of the Host environment.

The first step is to get the write primitive. Because WebAssembly lacks the Fortify_source (for compile-time buffer checking) and stack canary (stack overflow protection) security mechanisms common to native programs, it is more vulnerable to exploitation on unmanaged stacks.

There are three ways to obtain a write primitive:

Stack buffer overflow. Because linear memory is not VM managed, buffer overflows can overwrite local variables in other function calls.
Stack overflow. A stack overflow can be raised if a particular data is passed to a function that causes infinite recursion. Stack overflows in WebAssembly do not cause segment errors, but overwrite sensitive data outside of the stack, such as data on the heap.
Destroy heap metadata. Destroys the on-heap metadata of the memory allocator that comes with the WebAssembly binary through a defect in the memory allocator.

These are not the only three ways to get write primitives. There are other traditional attacks that can be exploited, such as format strings, use After Free (UAF), and so on.

In the second step, once you have the write primitive, you can override the security data.

The data that can be overwritten include:

Linear memory is data that is within the scope of functions that are contained in the unmanaged stack, such as function Pointers to function table indexes or parameters to safety-critical functions. While buffer overflow does not control the execution path of a function, it does have the ability to overwrite all active call stack frames that the current overflow can reach.
Heap data. The heap typically contains data with a longer life cycle and will store complex data structures across different functions. And the heap area allocated on linear memory by the compiler has no protection mechanism, and buffer overruns or stack overruns may destroy the heap data.
Constant data. The arbitrary write primitive can change the value of any non-scalar constant in the program, including for example all string literals. Thus breaking the guarantees that programming languages expect.

Even some implementations of the WASM runtime fail to properly separate the stack and heap, adding to the security risks.

Step three, trigger malicious behavior.

Redirect indirect calls. An attacker can redirect an indirection call by overriding an integer in linear memory. This integer value could be a local variable on the unmanaged stack, part of a heap object, in a Vtable, or even a so-called constant value. Limited by WebAssembly’s indirect invocation mechanism, an attacker can only redirect the invocation within the equivalent class of a function of the same type.
Code is injected into the host environment. For example, suppose WebAssembly typically calls eval with a “constant” code string stored in linear memory, which an attacker can then override with malicious code.
Application-specific data coverage depends on the application, and there may be other sensitive data coverage targets. For example, a WebAssembly module that makes a Web request through an imported function can start cookie stealing by overwriting the target string to contact different hosts.

In general, these attacks are constructed based on the inherent nature of WebAssembly’s linear memory. And it has been proved by experiments to be feasible.

Therefore, it is safer to write WebAssembly modules in a secure programming language such as Rust. Separate WASM module validation facilities are also required to ensure that each WASM module is compliant.

WASM sandbox isolation can be bypassed using CPU vulnerabilities

The paper Swivel: Hardening WebAssembly Against Spectre describes how WASM sandbox isolation can be bypassed using the CPU bug Spectre Attacks.

Ghost attacks utilize CPU branch predictors to induce false predictions, which result in incorrect execution of instructions. There are three categories of attacks:

Exploit schema history table (PHT) attacks. The attacker pollutes the entries in the PHT to mispredict the branch to the wrong path. An attacker can then use this faulty path execution to bypass memory isolation to protect or control flow integrity.
Branch target buffer (BTB) attack. The attacker pollutes the entries in the BTB, redirecting the inferred control flow to arbitrary targets.
Use the return stack buffer (RSB) attack. The attacker overflows or underflows the RSB using a chain of call or reservation instructions, redirecting the inferring control flow in turn.

Any malicious WASM module can be uploaded to the FaaS platform, using the Ghost vulnerability to bypass the WASM sandbox isolation.

There are also three types of attacks:

Sandbox breakout attack: An attacker customizes the control flow within a module to access data outside the sandbox area. For example, you can use SPECTRe-PHt to bypass conditional boundary checking when accessing indirection tables. Alternatively, they can use SPECTRe-BTB to transfer the control flow to the instruction center to execute unsafe code.
Sandbox poisoning attack: An attacker forces the victim to divulge private data by affecting the flow of control in the victim’s sandbox.
Host poisoning attack: An attacker affects the control flow of a host through a ghost attack to access data from the host or any other sandbox.

In order to prevent ghost attacks, this paper proposes a solution: building linear blocks.

The wASM code is compiled into linear blocks using Swivel-SFI. Linear blocks are linear blocks of code that contain no control flow instructions other than their terminators. This is in contrast to traditional Basic blocks, which typically do not treat function calls as terminators. This simple distinction is important: it allows us to ensure that all control flow transfers (sequential and speculative) fall on linear block boundaries. Then, by ensuring that individual linear blocks are secure, you can ensure that the entire Wasm program is limited at compile time and does not violate Wasm’s isolation guarantees.

Therefore, if you want to take advantage of WebAssembly’s SFI mechanism, you must also protect against malicious attacks such as ghost attacks.

WebAssembly sandbox performance

Wasm execution performance

There are also some statistics on the performance of Wasm execution in the paper Swivel: Hardening WebAssembly Against Spectre. Swivel-sfi performs function calls in the WASM module in about 5 microseconds.

Another major WASM runtime was tested in the first quarter of 2021: Benchmark of WebAssembly Runtimes-2021 Q1

It’s still an order of magnitude worse than native code execution.

Sandbox and host context switch performance

Some of the overhead associated with using WebAssembly’s Sandbox mechanism is primarily in the context switch between Host and Sandbox.

The performance cost is even greater if the sandbox and host switch are reinforced with security instructions.

In the case of Lucet, the context switch is higher because it has to ensure security. Context switching costs include:

Callee-save register integrity.
Initial register confidentiality.
Stack frame encapsulation.

Paper Isolation Without Taxation: Near Zero Cost Transitions for SFI provides a solution for WebAssembly SFI Transitions and implements it based on Lucet, Improved font and image rendering in Firefox by 29.7% and 10% respectively. This is because Firefox’s rendering component uses WebAssembly SFI, which is implemented based on Lucet, as described below.

More details on zero cost switching can be found in the paper.

WebAssembly and SFI

Apps in Firefox

The use of RLBox sandbox in Firefox is described in detail in the paper Retrofitting Fine Grain Isolation in the Firefox Renderer.

RLBox uses static information flow coercion and lightweight dynamic checking expressed directly in the C++ type system. RLBox supports effective sandboxes through software-based fault isolation or multi-core process isolation. The performance overhead is moderate and transient, and there is only a small impact on page latency. RLBox uses the WebAssembly sandbox and is integrated into production Firefox.

RLBox makes explicit data and control flows at the renderer sandbox interface through its type system and API in order to mediate these flows and enforce security checks across trust boundaries to ensure that sandbox data is validated before any potentially unsafe use. On the other hand, the API is designed to limit the transfer of control between the renderer and the sandbox. For example, the renderer must use andbox_invoke() to call a function in the sandbox; Any sandbox callbacks to the renderer must first be registered by the renderer using the Sandbox_callback (callback_fn)API.

RLBox ensures control flow and data flow security by providing a tainted

data type that wraps any data from the sandbox:

Automated security checks: Swizzling operations, performing checks to ensure that the sandbox provided Pointers point to the sandbox memory, and identifying the location of tainted data are all automated.
tainted data validation is performed only when necessary.
Sharing data structures effectively: Static checks ensure that shared data is allocated in sandbox memory and accessed through the tainted type.

In addition, RLBox forces application developers to explicitly register callbacks with the sandbox_callback() function through a compile-time error, since it is not safe to allow callbacks to arbitrary functions in the sandbox.

See the paper for more details.

summary

Because of WebAssembly’s features, current security risks, and practices and validation in the community, the current WebAssemby-based SFI mechanism is feasible, but still risky.

We cannot rely solely on the Sandbox mechanism of WebAssembly to provide THE SFI mechanism, but also need to strengthen the validity verification of WASM module, wASM runtime security mechanism, custom SFI implementation, reduce performance costs and other mechanisms to further provide assurance.

Appendix: WebAssembly Application Statistics

The paper “An Empirical Study of Real-World WebAssembly Binaries” extracts more than 50,000 WASM Binaries from the current WebAssembly ecology, A statistical analysis was made from four perspectives: source language usage, vulnerabilities propagated using insecure languages, whether wASM is mining in browsers, and WebAssembly application domain.

The language percentage for writing WebAssembly

While Cpp was the first language to support WebAssembly, Rust is the second most used Source language for WebAssembly.

A vulnerability propagated in an unsafe language

As the figure above shows, almost two-thirds of the WASM binaries use unmanaged stacks (on linear memory). The figure on the right shows how many functions in the binary file access the stack pointer at least once. 35% of all binaries have no functions using the stack pointer. Already 33% of functions use a stack pointer, and in some binaries, almost every function uses an unmanaged stack.

In combination with the previous analysis of WASM security, the bottom line is that WebAssembly does not make insecure languages secure.

Does WASM still mine in the browser

According to the analysis, the current usage of WASM mining in browsers is down 99%. Probably studying how to exploit the WASM vulnerability.

WebAssembly application domain

From 50,000 WASM binary files, 100 samples were randomly selected to make a statistical analysis of wASM application fields.

But I think that statistic is a little biased. The biggest potential for WebAssembly right now is in the cloud/edge/distributed/VR space, but this statistic doesn’t highlight that.

There are some great projects emerging around WebAssembly cloud/Edge/distributed/VR.

Wasmcloud, a distributed/microservice platform built with WASM, and created the waPC protocol for secure and easy communication between guest and host.
Wagi uses WebAssembly to build WebAssembly Gateway Interface.
Makepad, a project created by cloud9 IDE authors, uses WASM and WebGL technologies to provide a VR development platform for developers.