Front-end advanced JS abstract syntax tree

Post synced to Github/Blog

Babel and Babylon,

Babel is a JavaScript compiler, or rather a source-to-source compiler, commonly known as a transpiler. This means that you provide Some JavaScript code to Babel, which Babel changes and returns to you the newly generated code.

Babel is a versatile JavaScript compiler. It also has many modules for different forms of static analysis.

Static analysis is the process of analyzing code without executing it (analyzing code while executing code is dynamic analysis). Static analysis has a variety of purposes. It can be used for syntax checking, compilation, code highlighting, code conversion, optimization, compression, and so on.

Babylon is Babel’s parser. Originally came out of Acorn project Fork. Acorn was fast, easy to use, and designed a plug-in based architecture for non-standard features (and those that will be standard features in the future).

Babylon has moved into Babel Mono-repo and renamed babel-Parser

First, let’s install it.

$ npm install --save babylon
Copy the code

Start by parsing a code string:

import * as babylon from "babylon";

const code = `function square(n) { return n * n; } `;

babylon.parse(code);
// Node {
// type: "File",
// start: 0,
// end: 38,
// loc: SourceLocation {... },
// program: Node {... },
// comments: [],
// tokens: [...]
// }
Copy the code

We can also pass options to the parse() method as follows:

babylon.parse(code, {
  sourceType: "module".// default: "script"
  plugins: ["jsx"] // default: []
});
Copy the code

SourceType can be “module” or “script”, which indicates which mode Babylon should be parsed in. “Module” will parse in strict mode and allow module definition, “script” will not.

Note: sourceType defaults to “script” and generates an error when import or export is found. ScourceType: “module” is used to avoid these errors.

Because Babylon uses a plug-in based architecture, there is a plugins option to switch on and off the built-in plug-ins. Note that Babylon has not yet made this API available to external plug-ins, and it is not ruled out that it will be available in the future.

Parse

The Parse step takes the code and outputs the abstract syntax tree (AST). This step is broken down into two stages: Lexical Analysis ** and Syntactic Analysis.

Lexical analysis

The lexical analysis phase transforms the string code into a stream of tokens.

You can think of tokens as a flat array of syntax fragments:

n * n;
Copy the code

[{type: {... },value: "n".start: 0.end: 1.loc: {... }}, {type: {... },value: "*".start: 2.end: 3.loc: {... }}, {type: {... },value: "n".start: 4.end: 5.loc: {... }},... ]Copy the code

Each type has a set of attributes that describe the token:

{
  type: {
    label: 'name',
    keyword: undefined,
    beforeExpr: false,
    startsExpr: true,
    rightAssociative: false,
    isLoop: false,
    isAssign: false,
    prefix: false,
    postfix: false,
    binop: null,
    updateContext: null
  },
  ...
}
Copy the code

Like AST nodes, they have start, end, and LOC attributes.

Syntax analysis

The parsing phase transforms a token stream into an abstract syntax tree (AST). This phase uses the information in the tokens to transform them into an AST representation structure that makes subsequent operations easier.

Each step in this process involves creating or manipulating an abstract syntax tree, also known as an AST.

Babel using a ESTree based and modified the AST, its kernel documentation can be in [here] (https://github. com/Babel/Babel/blob/master/doc/AST/spec. Md) found.

function square(n) {
  return n * n;
}
Copy the code

The AST Explorer gives you a better sense of the AST nodes. Here is a link to an example of the code above.

This program can be represented as a JavaScript Object like this:

{
  type: "FunctionDeclaration".id: {
    type: "Identifier".name: "square"
  },
  params: [{
    type: "Identifier".name: "n"}].body: {
    type: "BlockStatement".body: [{
      type: "ReturnStatement".argument: {
        type: "BinaryExpression".operator: "*".left: {
          type: "Identifier".name: "n"
        },
        right: {
          type: "Identifier".name: "n"}}}]}}Copy the code

You’ll notice that each layer of the AST has the same structure:

{
  type: "FunctionDeclaration".id: {... },params: [...]. .body: {...}
}
Copy the code

{
  type: "Identifier".name:... }Copy the code

{
  type: "BinaryExpression".operator:... .left: {... },right: {...}
}
Copy the code

Note: Some attributes have been removed for simplification purposes

Each of these layers is also called a Node. An AST can consist of a single node or hundreds or thousands of nodes. Together, they describe program syntax for static analysis.

Each node has the following Interface:

interface Node {
  type: string;
}
Copy the code

The type field is a string representing the type of the node (for example, “FunctionDeclaration”, “Identifier”, or “BinaryExpression”). Each type of node defines additional attributes that further describe the node type.

Babel also generates additional attributes for each node that describe its position in the original code.

{
  type:... .start: 0.end: 38.loc: {
    start: {
      line: 1.column: 0
    },
    end: {
      line: 3.column: 1}},... }Copy the code

Each node has attributes like start, end, and LOC.

Variable declarations

code

let a  = 'hello'
Copy the code

AST

VariableDeclaration

Variable declarations. The kind attribute indicates what type of declaration it is, since ES6 introduced const/let. Declarations represent multiple descriptions of declarations, since we can do this: let a = 1, b = 2; .

interface VariableDeclaration <: Declaration {
    type: "VariableDeclaration";
    declarations: [ VariableDeclarator ];
    kind: "var";
}
Copy the code

VariableDeclarator

Description of a variable declaration, where id represents the variable name node and init represents an expression for the initial value, which can be null.

interface VariableDeclarator <: Node {
    type: "VariableDeclarator";
    id: Pattern;
    init: Expression | null;
} 
Copy the code

Identifier

Identifiers, I think that’s what it’s called, are the names that we define when we write JS, the names of variables, the names of functions, the names of properties, are all called identifiers. The corresponding interface looks like this:

interface Identifier <: Expression, Pattern {
    type: "Identifier";
    name: string;
}
Copy the code

An identifier may be an expression or a deconstruction pattern (deconstruction syntax in ES6). We will see Expression and Pattern later.

Literal

Literals, not [] or {}, but literals that semantically represent a value, such as 1, “hello”, true, and regular expressions (with an extended Node to represent regular expressions) such as /\d? /. Let’s look at the definition of the document:

interface Literal <: Expression {
    type: "Literal";
    value: string | boolean | null | number | RegExp;
}
Copy the code

Value corresponds to the literal value. We can see the literal value type, string, Boolean, numeric, NULL, and re.

Binary operation expression

code

let a = 3+4
Copy the code

AST

BinaryExpression

Binary operation expression node, left and right represent two expressions left and right of the operator, and operator represents a binary operator.

interface BinaryExpression <: Expression {
    type: "BinaryExpression";
    operator: BinaryOperator;
    left: Expression;
    right: Expression;
}
Copy the code

BinaryOperator

Binary operator, all values are as follows:

enum BinaryOperator {
    "= =" | ! "" =" | "= = =" | ! "" = ="
         | "<" | "< =" | ">" | "> ="
         | "< <" | "> >" | "> > >"
         | "+" | "-" | "*" | "/" | "%"
         | "|" | "^" | "&" | "in"
         | "instanceof"
}
Copy the code

If statement

code

if(a === 0){
}
Copy the code

AST

IfStatement

If statement nodes, typically, have three attributes, the test attribute representing if (…). Expressions in parentheses.

Possession property is an execution statement that represents a condition true, which is usually a block statement.

The alternate property is used to represent an else statement node, usually a block statement, but also an if statement node, such as if (a) {//… } else if (b) { // … }. Alternate can of course be null.

interface IfStatement <: Statement {
    type: "IfStatement";
    test: Expression;
    consequent: Statement;
    alternate: Statement | null;
}
Copy the code

Common AST node types

Common AST node types are defined in Babylon as follows:

Node objects

Parsed AST nodes that conform to the specification are identified by Node objects that conform to interfaces like this:

interface Node {
    type: string;
    loc: SourceLocation | null;
}
Copy the code

The type field represents different node types, and we’ll talk more about each type and what syntax they correspond to in JavaScript below. The LOC field represents the location information of the source code, null if there is no relevant information, otherwise an object containing the start and end positions. The interfaces are as follows:

interface SourceLocation {
    source: string | null;
    start: Position;
    end: Position;
}
Copy the code

The Position object contains row and column information, starting with row 1 and column 0:

interface Position {
    line: number; / / > = 1
    column: number; / / > = 0
}
Copy the code

Identifier

Identifiers are the names we define when we write JS, such as variable names, function names, and attribute names, all belong to identifiers. The corresponding interface looks like this:

interface Identifier <: Expression, Pattern {
    type: "Identifier";
    name: string;
}
Copy the code

An identifier may be an expression or a deconstruction pattern (deconstruction syntax in ES6). We will see Expression and Pattern later.

PrivateName

interface PrivateName <: Expression, Pattern {
  type: "PrivateName";
  id: Identifier;
}
Copy the code

A Private Name Identifier.

Literal

interface Literal <: Expression {
    type: "Literal";
    value: string | boolean | null | number | RegExp;
}
Copy the code

RegExpLiteral

Value corresponds to the literal value. We can see the literal value type, string, Boolean, numeric, NULL, and re.

This for regular literal, in order to better to parse the regular expression content, add one more regex fields, which will include regular itself, as well as the regular flags.

interface RegExpLiteral <: Literal {
  regex: {
    pattern: string;
    flags: string;
  };
}
Copy the code

Programs

This is usually the root node, representing the entire program code tree.

interface Program <: Node {
    type: "Program";
    body: [ Statement ];
}
Copy the code

The body property is an array containing multiple Statement nodes.

Functions

Function declaration or function expression node.

interface Function <: Node {
    id: Identifier | null;
    params: [ Pattern ];
    body: BlockStatement;
}
Copy the code

The id is the function name, and the params property is an array representing the parameters of the function. Body is a block statement.

It’s worth noting that you won’t find the type: “Function” node during testing, but you will find the type: “FunctionDeclaration” and the type: “FunctionExpression”, because functions appear either as declarations or function expressions, which are combination types of node types. FunctionDeclaration and FunctionExpression will be mentioned later.

Statement

A statement node is nothing special; it is just a node, a distinction, but there are many kinds of statements, which are described below.

interface Statement <: Node { }
Copy the code

ExpressionStatement

Expression statement nodes, where a = a+ 1 or a++ have an expression property that refers to an expression node object (we’ll talk about expressions later).

interface ExpressionStatement <: Statement {
    type: "ExpressionStatement";
    expression: Expression;
}
Copy the code

BlockStatement

Block statement nodes, for example: if (…) {// Here is the contents of a block}, a block can contain multiple other statements, so there is a body attribute, which is an array representing multiple statements in the block.

interface BlockStatement <: Statement {
    type: "BlockStatement";
    body: [ Statement ];
} 
Copy the code

ReturnStatement

Returns the statement node. The argument property is an expression that represents what is returned.

interface ReturnStatement <: Statement {
    type: "ReturnStatement";
    argument: Expression | null;
}
Copy the code

IfStatement

If statement nodes, typically, have three attributes, the test attribute representing if (…). Expressions in parentheses.

Possession property is an execution statement that represents a condition true, which is usually a block statement.

interface IfStatement <: Statement {
    type: "IfStatement";
    test: Expression;
    consequent: Statement;
    alternate: Statement | null;
}
Copy the code

SwitchStatement

A Switch statement node has two attributes. The discriminant attribute indicates the discriminant expression immediately following a switch statement, which is usually a variable. The Cases attribute is an array of case nodes, which represents each case statement.

interface SwitchStatement <: Statement {
    type: "SwitchStatement";
    discriminant: Expression;
    cases: [ SwitchCase ];
}
Copy the code

ForStatement

The for loop node, init/test/update, represents the three expressions in the parentheses of the for statement, the initialization value, the loop judgment condition, and the variable update statement (init can be a variable declaration or expression) executed each time the loop executes. All three attributes can be null, for(;;) {}. The body attribute is used to indicate the statement to loop through.

interface ForStatement <: Statement {
    type: "ForStatement";
    init: VariableDeclaration | Expression | null;
    test: Expression | null;
    update: Expression | null;
    body: Statement;
}
Copy the code

Declarations

Declaration statement nodes, which are also statements, are just refinements of a type. The various declaration statement types are described below.

interface Declaration <: Statement { }
Copy the code

FunctionDeclaration

Function declarations, unlike Function declarations above, cannot have id null.

interface FunctionDeclaration <: Function, Declaration {
    type: "FunctionDeclaration";
    id: Identifier;
}
Copy the code

VariableDeclaration

interface VariableDeclaration <: Declaration {
    type: "VariableDeclaration";
    declarations: [ VariableDeclarator ];
    kind: "var";
}
Copy the code

VariableDeclarator

Description of a variable declaration, where id represents the variable name node and init represents an expression for the initial value, which can be null.

interface VariableDeclarator <: Node {
    type: "VariableDeclarator";
    id: Pattern;
    init: Expression | null;
} 
Copy the code

Expressions

Expression node.

interface Expression <: Node { }
Copy the code

Import

interface Import <: Node {
    type: "Import";
}
Copy the code

ArrayExpression

The elements property is an array representing multiple elements of the array, each of which is an expression node.

interface ArrayExpression <: Expression {
    type: "ArrayExpression";
    elements: [ Expression | null ];
}
Copy the code

ObjectExpression

Object expression node. The property property is an array representing each key-value pair of the object. Each element is an attribute node.

interface ObjectExpression <: Expression {
    type: "ObjectExpression";
    properties: [ Property ];
}
Copy the code

Property

Property node in an object expression. Key represents a key, value represents a value, and since ES5 syntax has get/set, there is a kind attribute that indicates a normal initialization, or get/set.

interface Property <: Node {
    type: "Property";
    key: Literal | Identifier;
    value: Expression;
    kind: "init" | "get" | "set";
}
Copy the code

FunctionExpression

Function expression node.

interface FunctionExpression <: Function, Expression {
    type: "FunctionExpression";
}
Copy the code

BinaryExpression

Binary operation expression node, left and right represent two expressions left and right of the operator, and operator represents a binary operator.

interface BinaryExpression <: Expression {
    type: "BinaryExpression";
    operator: BinaryOperator;
    left: Expression;
    right: Expression;
}
Copy the code

BinaryOperator

Binary operator, all values are as follows:

enum BinaryOperator {
    "= =" | ! "" =" | "= = =" | ! "" = ="
         | "<" | "< =" | ">" | "> ="
         | "< <" | "> >" | "> > >"
         | "+" | "-" | "*" | "/" | "%"
         | "|" | "^" | "&" | "in"
         | "instanceof"
}
Copy the code

AssignmentExpression

Assignment expression node, the operator property represents an assignment operator, left and right are expressions around the assignment operator.

interface AssignmentExpression <: Expression {
    type: "AssignmentExpression";
    operator: AssignmentOperator;
    left: Pattern | Expression;
    right: Expression;
}
Copy the code

AssignmentOperator

Assignment operator, all values as follows :(not many commonly used)

enum AssignmentOperator {
    "=" | "+ =" | "- =" | "* =" | "/ =" | "% ="
        | "< < =" | "> > =" | "> > > ="
        | "| =" | "^ =" | "& ="
}
Copy the code

ConditionalExpression

Conditional expressions, often called ternary operands, Boolean? True, false. Attribute reference condition statement.

interface ConditionalExpression <: Expression {
    type: "ConditionalExpression";
    test: Expression;
    alternate: Expression;
    consequent: Expression;
}
Copy the code

Misc

Decorator

interface Decorator <: Node {
  type: "Decorator";
  expression: Expression;
}
Copy the code

Patterns

Patterns, which are primarily meaningful in ES6 deconstruction assignments, can be understood in ES5 as something similar to identifiers.

interface Pattern <: Node { }
Copy the code

Classes

interface Class <: Node {
  id: Identifier | null;
  superClass: Expression | null;
  body: ClassBody;
  decorators: [ Decorator ];
}
Copy the code

ClassBody

interface ClassBody <: Node {
  type: "ClassBody";
  body: [ ClassMethod | ClassPrivateMethod | ClassProperty | ClassPrivateProperty ];
}
Copy the code

ClassMethod

interface ClassMethod <: Function {
  type: "ClassMethod";
  key: Expression;
  kind: "constructor" | "method" | "get" | "set";
  computed: boolean;
  static: boolean;
  decorators: [ Decorator ];
}
Copy the code

Modules

ImportDeclaration

interface ImportDeclaration <: ModuleDeclaration {
  type: "ImportDeclaration";
  specifiers: [ ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier ];
  source: Literal;
}
Copy the code

Import statements, such as import foo from “mod”;

Babylon AST node types

For a complete list of core Babylon AST node types, see Babylon spec.md.

conclusion

At first, I was going to explain Babel and its commonly used modules. Later, I found that the content was too large for one article, so I only focused on the code of Babel and analyzed the Babylon part. As a result, it was so long that I could only sigh that Babel was so extensive and profound.

reference

babel-handbook
babylon spec.md
estree

Front-end advanced JS abstract syntax tree

Babel and Babylon,

Parse

Lexical analysis

Syntax analysis

Variable declarations

code

AST

VariableDeclaration

VariableDeclarator

Identifier

Literal

Binary operation expression

code

AST

BinaryExpression

BinaryOperator

If statement

code

AST

IfStatement

Common AST node types

Node objects

Identifier

PrivateName

Literal

RegExpLiteral

Programs

Functions

Statement

ExpressionStatement

BlockStatement

ReturnStatement

IfStatement

SwitchStatement

ForStatement

Declarations

FunctionDeclaration

VariableDeclaration

VariableDeclarator

Expressions

Import

ArrayExpression

ObjectExpression

Property

FunctionExpression

BinaryExpression

BinaryOperator

AssignmentExpression

AssignmentOperator

ConditionalExpression

Misc

Decorator

Patterns

Classes

ClassBody

ClassMethod

Modules

ImportDeclaration

Babylon AST node types

conclusion

reference

Related Posts

JS Example: Implement the Promise

2021 9.19JS Chapter 1

Css3 – Beating notes