• Making JavaScript run fast on WebAssembly
  • Originally written by Lin Clark
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Badd
  • Proofread: PassionPenguin, Usualminds

Get JavaScript running fast on WebAssembly

JavaScript runs many times faster in browsers today than it did 20 years ago. This is thanks to persistent performance optimizations from browser manufacturers during this period.

Now we have to start optimizing JavaScript performance in completely different environments — where the rules of the game are completely different. It is WebAssembly that enables JavaScript to adapt to different environments.

Let’s be clear here — if you’re running JavaScript in a browser, just deploy JavaScript. The JavaScript engine in the browser has been carefully tuned to run loaded JavaScript programs very quickly.

But what if you’re running JavaScript in Serverless functionality? Or, how do you control performance if you want to run JavaScript in an environment like iOS or consoles that doesn’t support the usual just-in-time compilation?

In these usage scenarios, you’ll need to pay attention to this new round of JavaScript optimizations. JavaScript optimizations are also useful if you want to speed up other runtime languages such as Python, Ruby, or Lua in these scenarios.

But before we start exploring how to optimize in different environments, we need to understand the basics.

How does it work?

Whenever you run a Javascript program, the Javascript code will be executed as machine code. The JavaScript engine implements this transformation through a number of technologies, such as various interpreters and JIT compilers. (See Crash Course just-in-time (JIT) Compilers for details.)

But what if the platform you want to run your application on doesn’t have a JavaScript engine? Then you need to deploy the JavaScript engine along with the application code.

To enable JavaScript to run anywhere, we deployed the JavaScript engine as a WebAssembly module, which spans differences between machine architectures. And, with WASI, it is also possible to cross operating systems.

This means that the entire JavaScript runtime environment is integrated into a WebAssembly instance. Once you deploy WebAssembly, all you need to do is feed the JavaScript code into it, and the WebAssembly instance will digest the code itself.

Instead of running directly in machine memory, the JavaScript engine puts everything from binaries to garbage collection objects of binaries into linear memory of the Wasm module.

For the JavaScript engine, we chose SpiderMonkey, the one used in the Firefox browser. SpiderMonkey is one of the industry-level JavaScript virtual machines (VMS), a seasoned veteran in the browser world. This is especially important when you’re running untrusted code, or when your code handles untrusted input.

SpiderMonkey also uses a technique called an exact stack scan, which is extremely important to some of the optimization points I’ll cover next. SpiderMonkey also has a very inclusive code base, which is also important because the co-developers come from three different organizations — Fastly, Mozilla, and Igalia.

The approach I’ve just described does not appear subversive. People have been running JavaScript this way with WebAssembly for years.

The problem is, it’s slow. WebAssembly does not support dynamically generating new machine code and running it in pure Wasm code. This means you can’t use just-in-time compilation. You can only use the interpreter.

Knowing this limitation, you might ask:

So why talk about performance tuning?

Given that just-in-time compilation allows browsers to run JavaScript quickly (and since just-in-time compilation is not possible in WebAssembly modules), it seems counterintuitive to want to speed things up.

But what if, even without just-in-time compilation, we could somehow make JavaScript run faster?

Let’s take a look at a few examples of the benefits that can be gained if WebAssembly can run JavaScript quickly.

Run JavaScript in iOS (and other JIT-limited environments)

There are some environments where you can’t use just-in-time compilation for security reasons, such as iOS apps without special permissions and some smart TVS and game consoles.

On these platforms, you must use an interpreter. But the apps you want to run on these platforms are the ones with long cycles and lots of code… These are the conditions that make you not want to use the interpreter, because the interpreter can seriously slow down execution.

If we can speed uP JavaScript in this environment, then developers can use JavaScript on platforms that don’t support just-in-time compilation without worrying about performance.

Let no server immediately cold boot

There are other scenarios where just-in-time compilation is not a problem, but startup time is a drag, such as when using serverless functionality. This is the problem with cold start delays, as you’ve probably heard.

Even if you use the most stripped-down JavaScript environment — an isolated environment that just starts a pure JavaScript engine — the minimum latency is around 5 milliseconds. And that doesn’t even take into account the time it took to initialize the application.

There are ways to hide the startup delay for incoming requests. But as proposals like QUIC optimize connection times at the network layer, it becomes increasingly difficult to hide latency. It’s even harder to hide delays when you’re chaining multiple serverless functions and so on.

Using these techniques to hide delayed platform pages often results in instances being reused across multiple requests. In some cases, this means that global state can be observed in different requests, which is a matter of playing with security.

Because of this cold start problem, developers often fail to follow best practices. They cram a lot of functionality into a serverless deployment. That leads to another safety problem — one big thunderstorm, and the whole thing goes south. If one part of the deployment is breached, the attacker has access to the entire deployment.

But if we can get JavaScript startup times low enough in the scenario above, we won’t have to bother hiding startup times anymore. We can launch an instance in microseconds.

If we can do this, we can provide a new instance for each request, and no more global state will cross multiple requests.

Moreover, because these instances are lightweight enough, developers can arbitrarily break up the code into finer pieces, minimizing the scope of failure for each piece of code.

This implementation has another security advantage. In addition to instances being lightweight and code isolation granularity being better, the Wasm engine can also provide stronger security barriers.

JavaScript engines used to create an isolated code base that was so large and contained so much low-level code for extremely complex optimizations that it was easy to Bug and allow an attacker to jump out of a virtual machine and gain access to the system on which the virtual machine was located. That’s why browsers like Chrome and Firefox go to great lengths to ensure that websites run in completely isolated processes.

In contrast, the Wasm engine requires very little code to check, and many of them are written in a memory-friendly language like Rust. The memory isolation security of native binaries generated by WebAssembly modules is verifiable.

By running JavaScript code in the Wasm engine, we built this more secure external sandbox fortress as another line of defense.

Therefore, making JavaScript run faster on the Wasm engine is beneficial in these scenarios. So how do we do that?

To answer this question, we need to figure out where the JavaScript engine is spending its time.

Two time-consuming aspects of JavaScript

We can roughly break down what the JavaScript engine does into two parts: initialization and runtime.

I think of JavaScript as a contractor. The contractor was hired to do the job — run the JavaScript code and produce the results.

Initialization phase

Before the contractor can actually start running the project, he needs to do a little preparatory work. This initialization phase includes all operations that only need to be run once at the beginning of execution.

Application initialization

No matter what the project, our contractors need to understand the customer’s needs and allocate the resources needed to accomplish the task.

For example, Mr. Contractor will scan the project summary and other supporting documents and turn them into something he can handle, such as setting up a project management system and storing and organizing all the documents.

From a JavaScript engine’s perspective, this task is more like reading through the top-level source code and parsing functionality into bytecode, allocating memory for declared variables, and assigning values to already defined variables.

Engine initialization

In certain scenarios such as no server, there is another section that needs to be initialized before the application is initialized.

That’s engine initialization. The engine itself needs to be started first, and built-in functions need to be added to the environment.

I think of this process as setting up the office before starting work — assembling ikea tables and chairs, that sort of thing.

This process can also take a certain amount of time, which is one of the reasons why cold starts are a big problem in serverless usage scenarios.

Runtime phase

Once the initialization phase is over, the JavaScript engine can start running the code.

We call the completion speed of this part of the work Throughput. There are many factors that can affect the Throughput. Such as:

  • Which language is used for functionality development
  • Whether a JavaScript engine can predict code behavior
  • Which data structures to use
  • Is the code running cycle long enough to benefit from optimized compilation by the JavaScript engine

So those are the two phases of JavaScript that consume time.

So how do we make these two phases work faster?

Greatly reduces the initialization time

Let’s start with the Wizer tool to speed up the initialization process. I’ll explain how to do this later, but for eager readers, here’s how it works when running a very simple JavaScript application.

When running the widget with Wizer, it took 0.36 milliseconds (360 microseconds). This is more than 13 times faster than the pure JavaScript approach.

This is possible because of the Snapshot. Nick Fitzgerald explained it in more detail in his talk about Wizer at the WebAssembly Summit.

So how does that work? Before deploying the code, as part of the build step, we run the JavaScript code with a JavaScript engine until initialization is complete.

At this point, the JavaScript engine parses all the JavaScript code into bytecode and stores it in linear memory. At this stage, the engine also does a lot of memory allocation and initialization.

Since linear memory is very independent and complete, when all the data values are stored in the memory, we can directly bind the memory to the data block of the Wasm module.

When the JavaScript engine module is instantiated, it can access all the data in the data block. When the engine needs to use this memory, it can copy the required blocks (or pages) into its own linear memory. This eliminates the need for the JavaScript engine to do configuration work at startup. All the pre-initialized data is ready to go.

For now, we’ve tied this data block to the JavaScript engine. However, in the future, once Module linking is made available, we will be able to load the data block as a single Module, enabling the JavaScript engine to be reused by multiple different JavaScript applications.

This leads to a really clean decoupling.

The JavaScript engine module contains only the code for the engine itself. This means that once compiled, this part of code can be efficiently cached and reused by multiple different instances.

On the other hand, specific application modules do not contain Wasm code. It contains only linear memory, and linear memory contains only JavaScript code bytecode, and initialized JavaScript engine state data. This makes memory collation and allocation very convenient.

It’s as if our contractor JavaScript engine doesn’t need to decorate the office anymore. It’s ready to move in. It’s got an entire office in its bag, all the gear, all tuned up and ready for the JavaScript engine to break ground.

And what’s cool is that this isn’t done specifically for JavaScript — you just use WebAssembly’s existing properties. So you can do this in Python, Ruby, Lua, or any other runtime environment.

Next step: Improve throughput

This way, we can keep the boot time super short. So how do you optimize throughput?

For some cases, the throughput is actually not bad. If your JavaScript application runs for a very short period of time, it doesn’t have to be compiled instantaneously — it’s all done in the interpreter. In this case, throughput is the same as in the browser, and the program runs before the traditional JavaScript engine initialization is complete.

But for JavaScript code with longer runtimes, just-in-time compilation doesn’t take long to kick in. Once this happens, the throughput differences begin to become significant.

As I said above, it is impossible to use just-in-time compilation in a pure WebAssembly environment. But in fact, we can apply some of the just-in-time compilation ideas to the just-in-time compilation model.

Fast AOT-compiled JavaScript code (no analysis)

One optimization technique used for just-in-time compilation is Inline caching. With inline caching, just-in-time compilation creates a stub linked list that contains a machine coded shortcut path to all the ways in which JavaScript bytecode has ever been run. (See article: Crash Course on Instant Compilers for more details.)

The reason you need linked lists is because JavaScript is a dynamically typed language. Every time a line of code changes to a different type, you need to generate a new stub and add it to the list. But if you’ve dealt with this type before, you can just use the generated stubs.

Because inline caches (ics) are commonly used in just-in-time compilation, one would expect them to be very dynamic and specific to a particular program. But in fact, they can also be used in AOT scenarios.

Even if we haven’t seen the JavaScript code yet, we are familiar with the IC stubs to be generated. This is because there are patterns in JavaScript that are often used.

Accessing object properties is a good example. Accessing object properties is very common in JavaScript code, and using IC stubs can speed this up. For objects that have defined “shapes” or “hidden classes” (that is, properties stored at a relatively fixed location), when you read an attribute of that class, the attribute is always at the same Offset.

Traditionally, such IC stubs in just-in-time compilation hard-code two values: a pointer to a shape and an offset to a property. And that requires information that we can’t predict in advance. But what we can do is parameterize the IC stubs. We can think of shape and attribute offsets as variables passed into the stub.

This way, we can create a single stub that loads values from memory, and then we can use this stub everywhere. We can combine all the stubs that belong to a common pattern into an AOT-compiled module, regardless of the functional details of the JavaScript code. Even in browser Settings, this IC sharing is beneficial because it allows JavaScript engines to generate less machine code, improve startup speed, and optimize local instruction caching.

IC sharing is especially important for our usage scenarios. It means that we can combine all the stubs that belong to a common pattern into one AOT-compiled module, regardless of the implementation details of the JavaScript code.

We found that with just a few kilobytes of IC stubs, we could cover the vast majority of all JavaScript code. For example, just a 2 KB IC stub is enough to cover 95% of the JavaScript code in the Google Octane benchmark. Based on preliminary test results, the usual web browsing scenario seems to maintain this ratio.

Therefore, using this optimization approach, we should be able to achieve the throughput levels of early just-in-time compilation. Once we get to that point, we’ll add more fine-grained optimizations to further polish performance, just as JavaScript engine development teams from various browser vendors did in the early just-in-time compilations.

Next step: Maybe a little more analysis?

That’s all we can do ahead of time without knowing what a program does or what kind of data it uses.

But what if we had access to the analytics data just like real-time compilation? Then we can optimize the code across the board.

But this leads to a problem — it is often very difficult for developers to analyze their own code. It is not easy to extract representative code samples. So we don’t know if we’re going to get good analytical data.

If we can find the right tools for analysis, it’s still possible to get JavaScript code running as fast as just-in-time compilation is today (with no warm-up time!). .

How to get started now?

This new approach excited us and made us look forward to the next level. We are also excited to see how other dynamically typed languages can embrace WebAssembly in this way.

So, here are a few ways to get started, and if you have any questions, you can ask them in Zulip.

For other platforms that want to support JavaScript

To run JavaScript on your own platform, you need to embed a WebAssembly engine that supports WASI. We are using Wasmtime.

Then you need a JavaScript engine. In this step, we added full support for compiling SpiderMonkey into WASI to Mozilla’s build system. Mozilla will add SpiderMonkey’s WASI build to the CI setup used to build and test Firefox. This makes WASI an online quality target for SpiderMonkey, ensuring that WASI builds are always up and running. This means you can use SpiderMonkey as described in this article.

Finally, you need to ask the user to provide pre-initialized JavaScript code. To help you out, we’ve also opened source Wizer, which you can integrate into the build tool to produce application-specific WebAssembly modules that fit into the pre-initialized memory used by JavaScript engine modules.

For other languages that want to use this approach

If you are a speaker of Python, Ruby, Lua, etc., you can also build your own version for that language.

First, you need to compile the runtime into WebAssembly and use WASI as a system call, as we did with SpiderMonkey. You can then, as described above, integrate Wizer into the build tool and generate an in-memory snapshot that can be used to speed up startup.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.