How JavaScript works 14 - Parsing, Syntax Abstraction trees, and 5 tips for minimizing parsing time

This article is shared under a Creative Commons Attribution 4.0 International license, BY Troland.

See Github addresses in updates to this serieshere.

This is Chapter 14 of how JavaScript works.

An overview of the

We all know that running a large chunk of JavaScript code can be bad for performance. The code not only has to be transferred over the network but also parsed, compiled into bytecode, and finally run. Previous articles covered topics such as the JS engine, runtime and call stack, and the V8 engine widely used for Google Chrome and NodeJS. They all play an important role in the overall running of JavaScript.

Today’s topic is also important: learn how most JavaScript engines parse text into machine-readable code, what happens after the transformation, and how developers can take advantage of this knowledge.

Principles of Programming languages

So, first, let’s review the principles of programming languages. No matter what programming language you use, you often need some software to process the source code so that the computer can understand it. The software can be an interpreter or a compiler. Whether you use interpreted languages (JavaScript, Python, Ruby) or compiled languages (C#, Java, Rust), they all have one thing in common: parsing source code as plain text into data structures in a syntax abstraction tree (AST). The AST not only presents the source code in a structured way, but also plays an important role in semantic analysis, with the compiler checking that validators and language elements are syntactically used correctly. After that, the AST is used to generate the actual bytecode or machine code.

AST program

AST is not only used for language interpreters and compilers, but also for other uses in the computer world. One of the most common uses is static code analysis. Static code analysis does not run the input code. However, they still need to understand the structure of the code. For example, implement a tool to identify common code structures that can be used for code refactoring to reduce code duplication. You could probably do this using string comparisons, but the tools are fairly simple and limited. Of course, if you’re interested in implementing such a tool, you don’t have to write a parser yourself; there are plenty of open source projects that are perfectly compatible with the Ecmascript specification. Esprima and Acorn were a perfect match. There are other tools that can be used to help the parser output code, called excse. ASTs are widely used for code transformation. For example, you might want to implement a converter to convert Python code to JavaScript. The general idea is to use a Python code converter to generate an AST, and then use that AST to generate JavaScript code. You may find this hard to believe. The fact is that ASTs are just a different notation for some languages. Before parsing, it appears as text, which obeys some of the syntactic rules that make up the language. When parsed, it appears as a tree structure that contains almost the same information as the input text. Therefore, you can also reverse parse and go back to the text.

JavaScript parsing

Let’s take a look at the construction of AST. Take this simple JavaScript function as an example:

function foo(x) {
    if (x > 10) {
        var a = 2;
        return a * x;
    }

    return x + 10;
}
Copy the code

The parser produces the following AST.

Note that only a simplified version of the parser output is used for demonstration purposes. The actual AST is more complex. However, this is meant to be the first step before running the source code. You can visit the AST Explorer to see the actual AST tree. This is an online tool where you can write JavaScript code and the site outputs an AST of the object code.

You might ask why I have to learn how a JavaScript parser works. Anyway, the browser is responsible for running the JavaScript code. You’re a little bit right. The chart below shows the elapsed time at different stages of JavaScript execution. Just keep your eyes open and you might find something interesting.

Found no? Typically, browsers spend about 15 to 20 percent of their total runtime parsing JavaScript. I don’t have any statistics on these numbers. These statistics come from a variety of JavaScript usage gestures for real-world applications and websites. Now 15% May not seem like a lot, but trust me, it is. A typical single-page program loads about 0.4 megabytes of JavaScript code and consumes about 370ms of browser time to parse. And you might say, well, that’s not a lot. It doesn’t take much time in itself. But remember, this is just the time it takes to get the JavaScript code into the ASTs. This does not include time spent running itself or other processes such as CSS and HTML rendering during page loading. This is just a desktop browser problem. Mobile browsers are even more complicated. On average, mobile browsers take 2-5 times as long to parse code as desktop browsers.

The chart above shows how long it takes different mobile and desktop browsers to parse 1MB of JavaScript code.

In addition, web applications become more and more complex as more and more business logic is piled on the front end for a more native-like user experience. Web apps are getting too fat to move. You can easily imagine the performance impact on web applications. Simply open the browser developer tool and use it to detect parsing, compiling, and other time that takes place in the browser until the page is fully loaded.

Unfortunately, mobile browsers don’t have developer tools for performance checking. Don’t worry. Because of the DeviceTiming tool. It can be used to help detect the parsing and running time of scripts in a controlled environment. It encapsulates native code by inserting code so that parsing and running time can be measured locally whenever accessed from a different device.

The good news is that JavaScript engines do a lot of work to avoid redundancy and be more efficient. The following are the technologies used by the major browsers.

For example, V8 implements script streaming and code caching technologies. Script stream async and Deferred scripts are parsed in separate threads when the Script starts to download. This means that parsing is completed as soon as the script download is complete. This will increase page loading speed by 10%.

JavaScript code is usually compiled into bytecode whenever a page is accessed. However, this bytecode is invalidated when the user visits another page. This is because the compiled code depends heavily on the state and context of the compile-time machine. Bytecode caching comes with Chrome 42. This technique caches the compiled code locally, so that when the user returns to the same page, all steps such as downloading, parsing, and compiling are skipped. This saves Chrome about 40% of its code parsing and compilation time. Plus, it will save your phone’s battery.

In Opera, the Carakan engine can reuse the recently compiled output of another program. No code is required to be on the same page or under the same domain name. This caching technique is very efficient and can skip compilation steps entirely. It relies on typical user behavior and browsing scenarios: the same JavaScript code is loaded every time a user follows a specific user browsing habit on an application/website. However, Carakan has long been superseded by Google’s V8 engine.

The SpiderMonkey engine used by Firefox does not use any caching technology. It can transition to the monitoring phase, where script runs are recorded. Based on this calculation, it deduces frequently used parts of the code that can be optimized.

Obviously, some people choose not to do anything about it. Safari lead developer Maciej Stachowiak points out that Safari does not cache compiled bytecode. They probably thought about caching but didn’t implement it because it took less than 2% of the total elapsed time to generate the code.

These optimizations do not directly affect parsing time for JavaScript source code, but are avoided entirely if possible. Better than nothing, after all.

There are many ways to reduce the initial load time of a program. Minimize the amount of JavaScript loaded: Less code means less parsing time and less runtime. To do this, you can transfer the necessary code in a special way rather than loading a bunch of code in one lump. For example, the PRPL pattern represents this type of code transfer. Alternatively, you can examine the dependencies and see if there are useless, redundant dependencies that cause the code base to swell. However, these things require a lot of space to discuss.

The goal of this article is how developers can help speed up JavaScript parsers. Modern JavaScript parsers use heuristics to decide whether to run a given snippet of code immediately or postpone it at some point in the future. Based on these heuristics, parsers perform immediate or lazy parsing. Immediate parsing runs functions that need to be compiled immediately. It does three main things: build the AST, build the scope hierarchy, and then check for any syntax errors. While lazy parsing only runs uncompiled functions, it does not build the AST and check for any syntax errors. Building only the scope level takes about half the time compared to immediate parsing.

Obviously, this is not a new concept. Even older browsers like IE9 support this optimization technique, albeit in a crude way compared to how modern parsers work.

Give me an example. Assume the following code snippet:

function foo() {
    function bar(x) {
        return x + 10;
    }

    function baz(x, y) {
        return x + y;
    }

    console.log(baz(100, 200));
}
Copy the code

Similar to the previous code, input the code to the parser for parsing and output the AST. It is expressed as follows:

Declare the bar function to accept an x argument. There is a return statement. The function returns the sum of x and 10.

Declare the baz function to take two arguments (x and y). There is a return statement. Function x plus y.

Call the baz function to pass in 100 and 2.

The call to console.log argument is the return value of the previous function call.

So what happened in between? The parser finds the bar function declaration, the baz function declaration, the bar function call, and the console.log function call. However, the parser does a completely unrelated extra effort to parse the bar function. Why not? Because the function bar is never called (or at least not at the corresponding point in time). This is a simple example and may be unusual, but in many programs in the real world, many function declarations are never called.

The bar function, which is declared without specifying its purpose, is not parsed here. Do real parsing before the function runs only when needed. Lazy parsing still just needs to find the whole function body and declare it. It does not require a syntax tree because it will not be processed. In addition, it does not allocate memory from the memory heap, which can consume a significant portion of system resources. In short, skipping these steps can have a huge performance boost.

So in the previous example, the parser actually parses like this:

Note that this is just confirming the function bar declaration. The bar function body is not entered. In the current case, the function body has a simple return statement. However, as with most programs in the modern world, the function body can be much larger, containing multiple return statements, conditional statements, loops, variable declarations, and even nested function declarations. Since the function is never called, this is a complete waste of time and system resources.

It’s actually a fairly simple concept, but it’s very difficult to implement. Not limited to the above examples. The whole method can also be applied to functions, loops, conditional statements, objects, and so on. In general, all code should be parsed.

For example, the following is a fairly common pattern for implementing JavaScript modules.

var myModule = (function() {// whole module logic // return module object})();Copy the code

This pattern is recognized by most modern JavaScript parsers and indicates that the code inside needs to be parsed immediately.

So why don’t all parsers use lazy parsing? Lazy parsing of code that must run immediately slows down your code. You need to run a lazy parse followed by another immediate parse. This is 50% slower than immediate parsing.

Now that you have a general understanding of the underlying principles of the parser, it’s time to consider how you can help the parser parse faster. You can write your code in such a way that you parse the function at the right time. Here’s a pattern that most parsers recognize: wrap functions in parentheses. This tells the parser that the immediate function is needed. If the parser sees an open parenthesis followed by a function declaration, it immediately parses the function. You can help the parser speed up parsing by explicitly declaring the function to run immediately.

Let’s say I have a function foo

function foo(x) {
    return x * 10;
}
Copy the code

Browsers do lazy parsing because there is no obvious indication that the function needs to be run immediately. However, if we are sure that this is not true, we can run two steps.

First, store the function as a variable.

var foo = function foo(x) {
    return x * 10;
};
Copy the code

Notice the function name between the function keyword and the left parenthesis of the function argument. This is not necessary, but is recommended because when an exception error is thrown, the stack trace will contain the actual function name instead of.

The parser will still do lazy parsing. You can make one minor change to solve this problem: wrap the function in parentheses.

var foo = (function foo(x) {
    return x * 10;
});
Copy the code

The parser now sees the left parenthesis before the function keyword and immediately parses it.

Operability can be poor because you need to know when the parser is lazy or parses the code immediately. Likewise, developers need to take the time to consider whether a given function needs to be parsed immediately. Surely no one wants to take the trouble to do that. Finally, this certainly makes the code hard to read and understand. You can use optimize.js to handle such cases. This tool is only used to optimize the initial load time of JavaScript source code. They run a static analysis of the code, and then wrap the functions that need to run immediately by using parentheses so that the browser parses them immediately and is ready to run them.

So, you can do it as usual and then write a little code like this:

(function() {
    console.log('Hello, World! '); }) ();Copy the code

Everything looks good because the open parentheses are added before the function declaration. Of course, code compression is required before going into production. The following is the output of the compression tool:

!function(){console.log('Hello, World! ')} ();Copy the code

Everything looks fine. The code works as expected. Yet something seemed to be missing. The compression tool removes the parentheses enclosing the function and replaces them with an exclamation mark. This means that the parser skips the code and will run lazy parsing. In short, the parser parses immediately after lazy parsing in order to run the function. This causes the code to run slowly. Fortunately, optimize.js can be used to solve such problems. The compressed code passed to optimize.js will output the following:

! (function(){console.log('Hello, World! ')}) ();Copy the code

Now, you’re taking full advantage of each other: the code is compressed and the parser correctly identifies lazy and immediately parsed functions.

precompiled

But why not do this on the server side? In short, it’s better to just run it once and print the results on the client side than to force each client to do it repeatedly. Well, there is an ongoing discussion about whether the engine needs to provide a capability to run pre-compiled code to save the browser running time. In essence, the idea is to use a server-side tool to generate bytecode so that you just need to transfer the bytecode and run it on the client side. After that, you’ll see some major differences in startup time. It sounds tempting but it can be hard to pull off. It may backfire because it will be large and will likely need to be signed and processed for security reasons. For example, the V8 team has solved the problem of duplicate parsing internally, so precompilation may not actually be useful.

Some suggestions for improving the speed of web applications

Check dependencies. Reduce unnecessary dependencies.
Split code into smaller blocks instead of a whole block. For example, code-spliting in Webpack.
Loading JavaScript code as lazily as possible. You can load only the code snippets required by the current route. Such as introducing a code module only when an element is clicked.
Use developer tools and DeviceTiming to detect performance bottlenecks.
Use tools like optimize.js to help the parser choose between immediate or lazy parsing to speed up parsing.

expand

Sometimes, especially on mobile browsers, they cache things like when you hit the forward/back button. But in some cases, you may not need the browser’s functionality. There are the following solutions:

window.addEventListener('pageshow', (event) => {// Check the forward/backward cache to see if the page is loaded from the cacheif(event. Persisted | | window. The performance & window. The performance, navigation, type = = = 2) {/ / to the corresponding logical processing}};Copy the code

, recruiting

Toutiao is hiring! Send resume to [email protected], you can take the fast internal push channel, long-term effective! The JD of the international PGC department is as follows: c.xiumi.us/board/v5/2H… , can also be pushed inside other departments!

See Github addresses in updates to this serieshere.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

How JavaScript works 14 – Parsing, Syntax Abstraction trees, and 5 tips for minimizing parsing time

An overview of the

Principles of Programming languages

AST program

JavaScript parsing

precompiled

Some suggestions for improving the speed of web applications

expand

, recruiting

How JavaScript works 14 – Parsing, Syntax Abstraction trees, and 5 tips for minimizing parsing time

An overview of the

Principles of Programming languages

AST program

JavaScript parsing

precompiled

Some suggestions for improving the speed of web applications

expand

, recruiting

Related Posts

Typescript advanced types – Mapping types

Front – end interview omission and filling -(xiv) algorithm and sorting

Basic use of Redux