Abstract Syntax trees are the building blocks for JS translation, CSS preprocessing, code compression, ESLint, Prettier, etc

What is the AST

Is a tree representation of the abstract structure of source code. Each node in the tree represents a construct that appears in the source code

The generation of AST

Lexical analysis

In the lexical analysis stage, the input source code string is scanned to generate a series of lexical units, including numbers, punctuation marks, operators and so on, and the lexical units are independent of each other. (Split code statements, regardless of their relationships)

Code: Hello('krystal') 
--->
Tokenization 
Token:[ 
        { type:'Identifer',value:'Hello' }, 
        { type:'Punctuator',value:'{' }, 
        { type:'String',value:'krystal' }, 
        { type:'Punctuator',value:'}' } 
     ]
Copy the code

Syntax analysis

The syntax analysis is converted from the generated token list to an AST, which represents each fragment in the code statement and the connections between them. The AST is a deeply nested object that represents the code itself and can give us more information about the code.

Token: [ { type:'Identifer',value:'Hello' }, { type:'Punctuator',value:'{' }, { type:'String',value:'krystal' }, { type:'Punctuator',value:'}' } ] ---> AST:{ "program"{ "type": "Program", "body": {"type":" expression": {"type":" CallExpression", // CallExpression" callee": { "type":"Identifier", "name": "Hello" }, "arguments": { "type": "StringLiteral" "value": "krystal" } } } } }Copy the code

Code generation

The first step is to go through a transformation to generate a new AST, which simply modifies the AST generated in the previous phase. During AST transformation, you can add, remove, and replace nodes, and just create a new AST based on the existing AST

old AST ---> new AST:{ "program":{ "type":"Program", "body":{ "type": "CallExpression", "name": "Hello", "params": {"type": "StringLiteral", "value": "krystal"}}}}Copy the code

What follows is a walk through the AST tree, traversing the DFS way to each node.

  • Program – Starts at the top level of the AST
  • CallExpression – Moves to the first element of the program list
  • CallExpression(Hello) – Moves to the first element of the list of params in CallExpression

For the tree traversal here, the concept of access arises, that is, step by step operations on elements of the object structure. You need to create an accessor object that provides methods that accept different node types.

According to the examples I gave, only StringLiteral, CallExpression is involved. Each time a matching node is encountered, the accessor’s corresponding method is called.

var visitor = { StringLiteral(node,parent) {} CallExpression(node, parent) {} }

Access needs to be accompanied by the possibility of exit, so based on the example given, the final access flow looks like this

-> Program(enter) 
    ->CallExpression(enter) 
        ->StringLiteral(enter) 
        <-StringLiteral(exit) 
    <-CallExpression(exit) 
<- Program(exit)
Copy the code

The final accessor should be designed as follows

Var visitor = {StringLiteral: {enter(node,parent){} exit(node,parent){}} CallExpression: { enter(node,parent){} exit(node,parent){} } }Copy the code

The generated code

The code generator recursively calls all nested nodes on the AST object until everything is printed into a regular code string.

Basic structure of AST

General specification for JS compilers – the basic definition of AST structures in ESTree

AST usage scenarios

  • Code highlighting, formatting, error prompts, autocomplete
  • Code compression: uglifyJS
  • Code translation: Webpack, Babel, TS