Author: Shiwei Lee

As a front-end programmer, do you use webpack, rollup, Babel, esLint a lot? They are packaging tools, code compilation tools, syntax checking tools. How did they do it? This article introduces the abstract syntax tree, which is the technique they use. Should you know more about it?

There are no esoteric theories, and no large chunks of code. We start from scratch, and there are no obstacles for young people to read. Through this article, you will understand the basics of AST and how to use it.

preface

What is an abstract syntax tree?

  • AST (Abstract Syntax Tree) is a Tree representation of the Abstract Syntax structure of source code. The following diagram illustrates the representation of an abstract syntax tree for a piece of JavaScript code.

What is the use of abstract syntax trees?

  • IDE error prompts, code formatting, code highlighting, code completion, etc
  • JSLint, JSHint, ESLint checks for code errors or styles, etc
  • Webpack, rollup for code packaging, etc
  • Babel converts ES6 to ES5 syntax
  • Injected code counts unit test coverage

directory

  • 1. The AST parser
  • 2.AST in Babel
  • 3.Demo with esprima
  • 4. There is

1. The AST parser

1.1 JS Parser

How is AST generated?

  • A tool that converts JavaScript source code into abstract syntax trees (AST) is called the JS Parser.

The parsing process of JS Parser consists of two parts

  • Lexical Analysis: Partitioned the entire code string into a minimal array of syntax units
  • Syntax Analysis: Establishes and analyzes the relationships between grammatical units on the basis of word segmentation

Common AST Parser

  • In the early days there were Uglifyjs and Esprima
  • Espree, based on esprima, for ESLint
  • Acorn, which boasts better performance and is smaller than Esprima
  • Babylon, from Acorn, for Babel
  • Babel-eslint, maintained by the Babel team for use with ESLint

1.2 Lexical Analysis

A grammatical unit is the smallest unit of parsed grammar that has practical meaning and is simply understood as a word in a natural language.

Syntax units in Javascript code include the following:

  • Keywords: such as var, let, const, etc
  • Identifier: A contiguous character that is not enclosed in quotes. It may be a variable, or keywords such as if or else, or built-in constants such as true or false
  • Operators: +, -, *, /, etc
  • Numbers: hexadecimal, decimal, octal and scientific expressions
  • String: Because the contents of a string are evaluated or displayed by the computer
  • Space: successive Spaces, line feeds, indentation, etc
  • Comment: A line comment or a block comment is the smallest syntax unit that cannot be split
  • Others: braces, parentheses, semicolons, colons, etc

1.3 Syntax Analysis

Combining the results of word segmentation, determine the relationship between words, determine the final expression meaning of words, generate abstract grammar tree.

1.4 the sample

  • Use esprima to parse the assignment statement as an example:
var a = 1;
Copy the code
  • As you can see, the result of word segmentation is an array, where each element is a minimal syntactic unit:
[
    {
        "type": "Keyword",
        "value": "var"
    },
    {
        "type": "Identifier",
        "value": "a"
    },
    {
        "type": "Punctuator",
        "value": "="
    },
    {
        "type": "Numeric",
        "value": "1"
    },
    {
        "type": "Punctuator",
        "value": ";"
    }
]
Copy the code
  • The results of grammar analysis are as follows, and the results of word segmentation are formed into a tree structure according to their mutual relations:
{
    "type": "Program",
    "body": [
        {
            "type": "VariableDeclaration",
            "declarations": [
                {
                    "type": "VariableDeclarator",
                    "id": {
                        "type": "Identifier",
                        "name": "a"
                    },
                    "init": {
                        "type": "Literal",
                        "value": 1,
                        "raw": "1"
                    }
                }
            ],
            "kind": "var"
        }
    ],
    "sourceType": "script"
}
Copy the code

1.5 Tool Website

esprima/parser

  • Classic JavaScript abstract syntax tree parser, the site provides very rich functionality
  • You can view segmentation and abstract syntax trees online
  • Syntax displays abstract Syntax trees, Tokens displays participles
  • Performance comparisons of various Parses are also provided, and it appears that Acorn performs a little better.

AST Explorer

  • A visual tool site for AST that enables AST transformations of code using various parses

AST Parsing Specification (The Estree Spec)

  • The same JavaScript code will yield the same AST from each parser because they all refer to the same AST parsing specification
  • The Estree Spec is a JavaScript AST specification for The SpiderMonkey engine output by Mozilla engineers. You can also refer to SpiderMonkey in MDN

2.AST in Babel

Having covered the AST, let’s take a look at how Babel uses the AST.

How Babel works

Babel works in three stages: Parse, Transform and generate

  • The Parse phase transforms the source code into an AST
  • Transform phase, using a variety of plug-ins for code conversion
  • In the generator stage, the AST is converted into code using code generation tools

The Parse – Parse

  • Babel parses the code using @babel/ Parser, and the input JS code string generates the AST according to the ESTree specification
  • The parser Babel uses is Babylon

The Transform – conversion

  • Receive the AST and iterate over it, adding, updating, and removing nodes during this process. This is part of the Babel plug-in.
  • Babel provides @babel/traverse methods to maintain the overall state of the AST tree. The methods take the original AST and custom conversion rules as parameters, and return the converted AST.

Generated by the Generator –

  • The code generation step converts the final AST (after a series of transformations) into string code and creates source maps.
  • Iterate through the AST and build a string that represents the transformed code.
  • Babel uses @babel/ Generator to convert the modified AST into code, which can be configured to compress and remove comments, and supports sourceMap.

3.Demo with esprima

Now that we know how Babel works, we will write a demo according to Babel’s three steps to deepen our understanding of AST.

We are going to use Esprima to simulate two code-conversion capabilities:
  • Change == to congruent ===
  • Change parseInt(a) to parseInt(a,10)

Before conversion code, before.js:

function fun1(opt) { if (opt.status == 1) { console.log('1'); } } function fun2(age) { if (parseInt(age) >= 18) { console.log('2'); }}Copy the code

Expected converted code, after.js:

The function fun1 (opt) {if (opt) status = = = 1) {/ / = = = = = become the console. The log (' 1 '); }} function fun2(age) {if (parseInt(age, 10) >= 18) {//parseInt(a,10) console.log('2'); }}Copy the code
  1. Start with the tool kit
Const esprima = require('esprima'); //JS syntax tree module const estraverse = require('estraverse'); Const escodeGen = require('escodegen'); //JS syntax tree decompress module const fs = require('fs'); // Read and write filesCopy the code
  1. Use esprima Parse to convert source code to AST. How about that? It’s easy. It’s a code.
const before = fs.readFileSync('./before.js', 'utf8');
const ast = esprima.parseScript(before);
Copy the code
  1. Walk through the AST to find the code that matches the transformation rules to transform
estraverse.traverse(ast, { enter: (node) => { toEqual(node); // change == to congruent === setParseInt(node); // change parseInt(a) to parseInt(a,10)}});Copy the code
  1. Let’s look at the implementation of toEqual and setParseInt
function toEqual(node) { if (node.operator === '==') { node.operator = '==='; }} function setParseInt(node) { If (node.type === 'CallExpression' && node.callee.name === 'parseInt' && Node.arguments. Length === 1) {if (node.type === 'CallExpression' && node.callee.name === 'parseInt' && Node.arguments. Node.arguments. Push ({// add arguments, literally "type": "Literal", "value": 10, "raw": "10"}); }}Copy the code
  1. Finally, the transformed AST generated string code is written to the file.
// Generate target code const code = escodegen. Generate (ast); / / write file fs. ExistsSync (')/after) js') && fs. UnlinkSync (')/after) js'); fs.writeFileSync('./after.js', code, 'utf8');Copy the code

Ok, open the after.js file and see if it has been converted successfully. Is it what we expect? Is there a Babel feeling? Yes, Babel does the same thing, but its conversion rule functions are quite complex and require a lot of work to consider various JavaScript syntactic situations, which is at the heart of Babel.

If we go back to the demo we wrote, we followed Babel’s three steps exactly. The first and third steps, parse and generate, are very simple, in a sentence, not much to say. The focus is on the implementation of the Transform rule function, and one might ask, how do you know that the toEqual and setParseInt conversion functions are written this way?

Ok, to answer this question, let’s look at the AST before and after the code conversion of these two rules.

  • Change == to congruent ===

AST of a== B is as follows:

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "BinaryExpression",
                "operator": "==",
                "left": {
                    "type": "Identifier",
                    "name": "a"
                },
                "right": {
                    "type": "Identifier",
                    "name": "b"
                }
            }
        }
    ],
    "sourceType": "script"
}
Copy the code

The AST of a=== B is as follows:

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "BinaryExpression",
                "operator": "===",
                "left": {
                    "type": "Identifier",
                    "name": "a"
                },
                "right": {
                    "type": "Identifier",
                    "name": "b"
                }
            }
        }
    ],
    "sourceType": "script"
}
Copy the code

Compare the two AST columns to see if only one “operator” field is different: one is == and the other is ===.

Let’s look at the toEqual function. Is that clear? Simply modify the value of node.operator to complete the conversion.

function toEqual(node) { if (node.operator === '==') { node.operator = '==='; }}Copy the code
  • Change parseInt(a) to parseInt(a,10)

The AST of parseInt(a) is as follows:

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "CallExpression",
                "callee": {
                    "type": "Identifier",
                    "name": "parseInt"
                },
                "arguments": [
                    {
                        "type": "Identifier",
                        "name": "a"
                    }
                ]
            }
        }
    ],
    "sourceType": "script"
}
Copy the code

ParseInt (a, 10);

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "CallExpression",
                "callee": {
                    "type": "Identifier",
                    "name": "parseInt"
                },
                "arguments": [
                    {
                        "type": "Identifier",
                        "name": "a"
                    },
                    {
                        "type": "Literal",
                        "value": 10,
                        "raw": "10"
                    }
                ]
            }
        }
    ],
    "sourceType": "script"
}
Copy the code

Compare the two AST’s, see? The arguments array is overloaded with the following element.

{
    "type": "Literal",
    "value": 10,
    "raw": "10"
}
Copy the code

So in the transformation rule function, we add this element to the transformation. Is it very simple?

Function setParseInt(node) {function setParseInt(node) { If (node.type === 'CallExpression' && node.callee.name === 'parseInt' && Node.arguments. Length === 1) {if (node.type === 'CallExpression' && node.callee.name === 'parseInt' && Node.arguments. Node.arguments. Push ({// add arguments, literally "type": "Literal", "value": 10, "raw": "10"}); }}Copy the code

Okay, so that’s it. I think the Demo makes sense.

4. There is

Now you know how the AST works and how to use it. So let’s look at a problem and see what we’ve learned.

If a is an object, var a = {b: 1}, then which is more efficient, a.b or a[‘b’]?

A.b [‘b’] a.b [‘b’] a.b [‘b’] In fact, tests have been done and the performance difference between the two is close, with A.B performing slightly better than a[‘ B ‘]. So why is A.B slightly better than a[‘b’]?

In my opinion, a.b can directly resolve b as an attribute of a, while a[‘b’] may have an additional judgment process, because the contents of [] may be a variable or a constant.

This may seem plausible, but isn’t it? Is there any evidence to support this claim?

Well, the only way to explain this is to start with the V8 engine.

Javascript code can run on CPU, mainly due to the JS engine, V8 engine is developed by Google, used in Chrome and NodeJS, is a classic JS engine. Parser (AST) ->Ignition (Bytecode) ->TurboFan (Machine Code)

  • Parser: Converts JavaScript source code to Abstract Syntax Tree (AST)
  • Ignition: Interpreter converts the AST to Bytecode, interprets and executes the Bytecode; Also gather information needed for TurboFan optimization compilation, such as the types of function arguments
  • TurboFan: Compiler, the compiler, uses the type information gathered by Ignitio to convert Bytecode into optimized assembly code

Parser – AST Parser

  • You should be familiar with the AST parser we introduced today

Ignition – interpreter

  • The AST is parsed into a kind of assembly language that looks similar to assembly language, called Bytecode. This language is CPU-independent, and the Bytecodes generated on machines with different cpus are the same.

TurboFan – the compiler

  • As you know, each CPU has a different architecture and instruction set, and the corresponding assembly language will be different. In this step, V8 parses Bytecode into assembly language suitable for different cpus. V8 supports assembly languages for more than a dozen cpus.

Now let’s compare a[‘b’] with a[‘b’] in V8

  • The test code of a.b is as follows:
function test001() {
    var a = { b: 1 };
    console.log(a.b)
}
test001();
Copy the code
  • The test code for a[‘b’] is as follows:
function test002() {
    var a = { b: 1 };
    console.log(a['b'])
}
test002();
Copy the code

Take a look at the Bytecode they generated

  • The Bytecode code of a.b is as follows:
[generated bytecode for function: test001]
Parameter count 1
Frame size 32
   16 E> 000001F6C03D7192 @    0 : a0                StackCheck 
   33 S> 000001F6C03D7193 @    1 : 79 00 00 29 fa    CreateObjectLiteral [0], [0], #41, r1
         000001F6C03D7198 @    6 : 27 fa fb          Mov r1, r0
   46 S> 000001F6C03D719B @    9 : 13 01 01          LdaGlobal [1], [1]
         000001F6C03D719E @   12 : 26 f9             Star r2
   54 E> 000001F6C03D71A0 @   14 : 28 f9 02 03       LdaNamedProperty r2, [2], [3]
         000001F6C03D71A4 @   18 : 26 fa             Star r1
   60 E> 000001F6C03D71A6 @   20 : 28 fb 03 05       LdaNamedProperty r0, [3], [5]
         000001F6C03D71AA @   24 : 26 f8             Star r3
   54 E> 000001F6C03D71AC @   26 : 57 fa f9 f8 07    CallProperty1 r1, r2, r3, [7]
         000001F6C03D71B1 @   31 : 0d                LdaUndefined 
   63 S> 000001F6C03D71B2 @   32 : a4                Return 
Constant pool (size = 4)
Handler Table (size = 0)
Copy the code
  • The Bytecode code for a[‘b’] is as follows:
[generated bytecode for function: test002]
Parameter count 1
Frame size 32
   16 E> 0000022E1C7D6DC2 @    0 : a0                StackCheck 
   33 S> 0000022E1C7D6DC3 @    1 : 79 00 00 29 fa    CreateObjectLiteral [0], [0], #41, r1
         0000022E1C7D6DC8 @    6 : 27 fa fb          Mov r1, r0
   46 S> 0000022E1C7D6DCB @    9 : 13 01 01          LdaGlobal [1], [1]
         0000022E1C7D6DCE @   12 : 26 f9             Star r2
   54 E> 0000022E1C7D6DD0 @   14 : 28 f9 02 03       LdaNamedProperty r2, [2], [3]
         0000022E1C7D6DD4 @   18 : 26 fa             Star r1
   59 E> 0000022E1C7D6DD6 @   20 : 28 fb 03 05       LdaNamedProperty r0, [3], [5]
         0000022E1C7D6DDA @   24 : 26 f8             Star r3
   54 E> 0000022E1C7D6DDC @   26 : 57 fa f9 f8 07    CallProperty1 r1, r2, r3, [7]
         0000022E1C7D6DE1 @   31 : 0d                LdaUndefined 
   66 S> 0000022E1C7D6DE2 @   32 : a4                Return 
Constant pool (size = 4)
Handler Table (size = 0)
Copy the code

Compare the bytecodes of the two and you will see that they are identical, which means that there is no performance difference between the two implementations at the Bytecode layer and below. In fact, they’re different, so you have to look up, and there’s only the Parser stage up there. So let’s see what the difference is in their AST.

  • The AST code of a.b is as follows:
{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "MemberExpression",
                "computed": false,
                "object": {
                    "type": "Identifier",
                    "name": "a"
                },
                "property": {
                    "type": "Identifier",
                    "name": "b"
                }
            }
        }
    ],
    "sourceType": "script"
}
Copy the code
  • The AST code for a[‘b’] is as follows:
{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "MemberExpression",
                "computed": true,
                "object": {
                    "type": "Identifier",
                    "name": "a"
                },
                "property": {
                    "type": "Literal",
                    "value": "b",
                    "raw": "'b'"
                }
            }
        }
    ],
    "sourceType": "script"
}
Copy the code

The only difference we found is that a.b is false and a[‘b’] is true for the “computed” attribute, which means that a[‘b’] has one more computation than A.B when it is parsed into AST. Here, we conclude, is the slight difference. All right, we got the proof, so there should be no doubt now.

finishing

Not only have you learned about AST, but you have also learned how the V8 engine parses JS code. If you find this article useful to you, please give it a thumbs up, thank you very much (90 degree bow).

For more shared articles from skFeTeam, click herehere, thank you ~