What is the AST

Abstract Syntax Tree (AST) is a Tree representation of the Abstract Syntax structure of source code. Webpack, ESLint, and many other libraries are based on the concept of abstract syntax books to implement code inspection, analysis, and other operations. Today I want to share with you the concept of abstract syntax trees for interpreted languages such as JavaScript

Our common browser is through the JS code into an abstract syntax tree for the next step of analysis and other operations. Therefore, the transformation of JS into an abstract syntax tree is more conducive to the analysis of the program.

The variable declaration statement in the figure above, converted to an AST, gives the style shown in the figure on the right

Corresponding in the left figure:

  • varIs a key word
  • ASTIt’s a definer
  • =EqualThe equals sign can be called in many different ways, as we’ll see later
  • is treeIt’s a string
  • ;isSemicoion

The first abstract syntax tree that a piece of code converts to is an object that has a top-level type attribute Program. The second property is that the body is an array.

Each entry in the body array is an object that contains all the description information for the statement

typeKind: keyword for variable declaration --> var declaration: an array of declared contents, each of which is also an objecttypeId: The object that describes the variable nametype: defines name: the name of the variable init: the object that initializes the value of the variabletype: Type value: Value"is tree"Row without quotation marks:"\"is tree"\"QuotedCopy the code

Lexical analysis and grammatical analysis

JavaScript is an interpreted language, and you can begin to interpret execution by lexical analysis -> syntax analysis -> syntax tree

Lexical analysis: Also known as scanning, converts a stream of characters into a stream of tokens, which reads our code and then synthesizes the tokens according to certain rules

For example: var a = 2, this code is usually decomposed into var, a, =, 2;

[{type: 'Keyword'.value: 'var' },
  { type: 'Identifier'.value: 'a' },
  { type: 'Punctuator'.value: '=' },
  { type: 'Numeric'.value: '2'},]Copy the code

When lexically analyzing the source code, it reads the code character by character, so it is figuratively called scanning-scans. When it encounters Spaces, operators, or special symbols, it considers a sentence to be complete.

Parsing: Also known as a parser, converts the lexically-parsed array into the form of a tree and validates the syntax. Syntax If there is an error, throw a syntax error.

{..."type": "VariableDeclarator"."id": {
    "type": "Identifier"."name": "a"},... }Copy the code

Syntax analysis into AST, we can see the effect online here at esprima.org

What can an AST do

Syntax checking, code style checking, formatting code, syntax highlighting, error, autocomplete, etc. Code obfuscation compress optimize change code, change code structure, etc. For example, I have a function function a() {} and I want to make it function b() {}

For example, after the completion of the code to compile the require in webpack (‘ a ‘) — – > __webapck__require__ (” * / * * / a. s “)

Here’s a set of tools to turn code into a syntax tree and then change nodes and regenerate the code

AST parsing process

Tools:

  • Esprima: code => Transfer the AST code to the AST
  • Estraverse: Traverse Ast switch trees
  • Escodegen: AST => Code in recommend a commonly used AST online conversion site: astexplorer.net/

For example, if we have a code function getUser() {}, let’s change the name of the function to hello and see how the code flows

Look at the following code to briefly illustrate the AST traversal process

const esprima = require('esprima')
const estraverse = require('estraverse')
const code = `function getUser() {}`
/ / generated AST
const ast = esprima.parseScript(code)
// Convert the AST to iterate only over the type attribute
// The traverse method has two hook functions, enter and leave
estraverse.traverse(ast, {
  enter(node) {
    console.log('enter -> node.type', node.type)
  },
  leave(node) {
    console.log('leave -> node.type', node.type)
  },
})
Copy the code

The output is as follows:

It can be concluded that the AST traversal process is depth-first, and the traversal process is as follows:

Change function name

At this point we find that the name of the function is the name of the function when the type is Identifier, and we can change it directly to implement an AST tool that changes the name of the function

/ / conversion tree
estraverse.traverse(ast, {
  // It is ok to enter or leave modifications
  enter(node) {
    console.log('enter -> node.type', node.type)
    if (node.type === 'Identifier') {
      node.name = 'hello'
    }
  },
  leave(node) {
    console.log('leave -> node.type', node.type)
  },
})
// Generate new code
const result = escodegen.generate(ast)
console.log(result)
// function hello() {}
Copy the code

How Babel works

Babel has been around since Es6 began to be used on a large scale. It solves the problem that some browsers are not compatible with the new features of Es6. It actually converts Es6 code into Es5 code, which is compatible with all browsers. The Babel transformation code actually uses the AST, and Babel has a special relationship with the AST.

So let’s use the AST in Babel and see how Babel compiles the code (not the source code).

We need two toolkits @babel/core and @babel/preset-env

When you configure Babel, either in.babelrc or babel.config.js, there are presets and plugins (as well as other configuration items that are not covered here).

The difference between plugins and presets

// .babelrc
{
  "presets": ["@babel/preset-env"],
  "plugins": []
}
Copy the code

When we have configured @babel/preset-env in the presets, @babel/core will look for preset-env’s preset plug-in package, which is a set

The Babel core package does not convert code. The core package only provides some core apis. The actual code conversion is done by plugins or presets. When the requirements for transformations increase and it is not possible to configure each plugin one by one, we can use the presets. Presets are collections of plugins. A preset contains many plugins.

Use of the Babel plugin

Now we have an arrow function, and to turn it into a normal function, we can just write:

const babel = require('@babel/core')
const code = `const fn = (a, b) => a + b`
// Babel has a transform method that automatically traverses the code using the corresponding default or plugin
const r = babel.transform(code, {
  presets: ['@babel/preset-env'],})console.log(r.code)
// Print the result as follows
// "use strict";
// var fn = function fn() { return a + b; };
Copy the code

Now we can see that the final code is going to be converted to a normal function, but we just need the arrow function to go through the function, we don’t need this big package, we just need a package of the arrow function to go through the normal function, The plugin-transform-arrow-functions plugin can be found under node_modules. The plugin-transform-arrow-functions plugin can be used to handle arrow functions.

const r = babel.transform(code, {
  plugins: ['@babel/plugin-transform-arrow-functions'],})console.log(r.code)
// Print the result as follows
// const fn = function () { return a + b; };
Copy the code

We can see from the print that we are not converting our variable to const declaration, but converting the arrow function

Write your own plug-ins

At this point, we can write our own plug-ins to implement the transformation of the code, the process of intermediate processing code is to use the AST processing logic mentioned earlier

Const fn = function(a, b) {return a + b}

Analysis of AST structure

Const fn = function(a, b) {return a + b} const fn = function(a, b) {return a + b} const fn = function(a, b) {return a + b} const fn = function(a, b)

According to our analysis, we can get:

When it becomes a normal function it’s not called an arrow function ArrowFunctionExpression, It’s a FunctionExpression. So the first thing we’re going to do is convert the ArrowFunctionExpression into a FunctionExpression. We’re going to take the binary expression BinaryExpression is put into a block of code (BlockStatement). In fact, what we need to do is to change a tree into another tree, in fact, to spell out the structure of another tree, and then generate new code, which can complete the code transformation

Visitor pattern

In Babel, we use the visitor pattern when we develop plugins, that is, when we access a path, we match it, and then modify the node, such as when we access the ArrowFunctionExpression above, Modify ArrowFunctionExpression to become a normal function

So we can write it like this:

const babel = require('@babel/core')
const code = `const fn = (a, b) => a + b` Const fn = function(a, b) {return a + b}
const arrowFnPlugin = {
  // Visitor mode
  visitor: {
    // Matches when a path is accessed
    ArrowFunctionExpression(path) {
      // Get the node
      const node = path.node
      console.log('ArrowFunctionExpression -> node', node)
    },
  },
}

const r = babel.transform(code, {
  plugins: [arrowFnPlugin],
})

console.log(r)
Copy the code

Modifying the AST structure

So what we’re going to do is we’re going to get a node that looks like this, which is actually the AST of the ArrowFunctionExpression, and what we’re going to do is replace the structure of the ArrowFunctionExpression with the structure of the FunctionExpression, But we need to assemble a similar structure, which is cumbersome to write directly, but Babel provides us with a tool called @babel/types

@babel/types does two things:

To determine if this node is an ArrowFunctionExpression (path.node below ArrowFunctionExpression is an ArrowFunctionExpression), generate the corresponding expression and then use it, always check the documentation, Because there are so many types of nodes in it, you can’t remember how to have multiple nodes if you’re not doing compilation work, right

So what we’re going to do is we’re going to start generating a FunctionExpression, and we’re going to replace the ArrowFunctionExpression, and we’re going to look at the types document and we’re going to look at the FunctionExpression, This method takes parameters that we pass to generate a FunctionExpression

t.functionExpression(id, params, body, generator, async)
Copy the code
  • Id: Identifier (default: NULL) THE ID can be passed null
  • Params: Array (required) function argument, you can bring the previous argument
  • Body: BlockStatement (required) body of the function that accepts a BlockStatement we need to generate one
  • Boolean (default: false) Specifies whether it is a generator function. Of course it is not
  • Async: Boolean (default: false) specifies whether the function is asyncBlockStatementWe then look at the document to findBlockStatementAccepted parameter
t.blockStatement(body, directives)
Copy the code

If you look at the documentation, the blockStatement accepts a body, so we can use the same body as before, but here the body accepts an array

As we look closer at the AST structure, the body in the BlockStatement in the function expression is a ReturnStatement, so we also need to generate a ReturnStatement

And now we can rewrite the AST

const babel = require('@babel/core')
const t = require('@babel/types')
const code = `const fn = (a, b) => a + b` // const fn = function(a, b) { return a + b }
const arrowFnPlugin = {
  // Visitor mode
  visitor: {
    // Matches when a path is accessed
    ArrowFunctionExpression(path) {
      // Get the node and replace the node
      const node = path.node
      console.log('ArrowFunctionExpression -> node', node)
      // Get the parameters of the function
      const params = node.params
      const body = node.body
      const functionExpression = t.functionExpression(
        null,
        params,
        t.blockStatement([body])
      )
      // Replace the original function
      path.replaceWith(functionExpression)
    },
  },
}
const r = babel.transform(code, {
  plugins: [arrowFnPlugin],
})
console.log(r.code) // const fn = function (a, b) { return a + b; };
Copy the code

A special case

We know that we can omit the return keyword in the clipper function, but if the user writes the return keyword, we have a problem with this plug-in, so we can optimize it again

const fn = (a, b) => { retrun a + b } -> const fn = function(a, b) { return a + b }
Copy the code

Looking at the code, we can see that we don’t need to convert the body to a blockStatement, just leave it out, so we can write this

ArrowFunctionExpression(path) {
  // Get the node and replace the node
  const node = path.node
  console.log("ArrowFunctionExpression -> node", node)
  // Get the parameters of the function
  const params = node.params
  let body = node.body
  // Determine if it is a blockStatement, and if it is not, make it a blockStatement
  if(! t.isBlockStatement(body)) { body = t.blockStatement([body]) }const functionExpression = t.functionExpression(null, params, body)
  // Replace the original function
  path.replaceWith(functionExpression)
}
Copy the code

According to the need to introduce

During development, we introduced UI frameworks such as Element-UI in Vue, ANTD in Vant or React that support global and on-demand import. The default is global import. If you need to import on demand, you need to install a babel-plugin-import plug-in to change the global approach to the on-demand approach.

For example, import {Button} from ‘vant’ will become import Button from ‘vant/lib/Button’. Referring to the entire vant turns me into using only one of the files under vant, and the packaged file will be much smaller than the entire imported file

Analysis syntax tree

Import {Button, Icon} from ‘vant’ import {Button, Icon} from ‘vant/lib/Button’ import Icon from ‘vant/lib/Icon’

Take a look at the differences between the two syntax trees

According to the analysis of the two graphs, we can get some information:

We find that the only modules that destruct introduces are import declarations, and the second figure shows two import declarations. The specifiers in destruct are two importspecifiers, which are separate in the second figure. And they’re all importDefaultspecifiers and they introduce different sources so what we’re going to do is we’re going to change a single ImportDeclaration into multiple importdeclarations, Then convert the imported specifiers into multiple ImportDefaultSpecifiers and modify the corresponding source

Analysis of the type

To make it easier to pass arguments, this time we write it into a function that makes it easier to pass the converted concatenation directory

Here we need to use several types, also need to find the corresponding explanation on the official website of Types

First we need to generate multiple importDeclaration types

/** * @param {Array
      
       } specifiers (required) * @param  {StringLiteral} source (required) */
      
t.importDeclaration(specifiers, source)
Copy the code

The ImportDefaultSpecifier needs to be generated in the importDeclaration

/** * @param {Identifier} local (required) */
t.importDefaultSpecifier(local)
Copy the code

You also need to generate a StringLiteral in the importDeclaration

/** * @param {string} value (required) */
t.stringLiteral(value)
Copy the code

In the code

With that analysis in mind, let’s start with the code

const babel = require('@babel/core')
const t = require('@babel/types')
const code = `import { Button, Icon } from 'vant'`
// import Button from 'vant/lib/Button'
// import Icon from 'vant/lib/Icon'
function importPlugin(opt) {
  const { libraryDir } = opt
  return {
    visitor: {
      ImportDeclaration(path) {
        const node = path.node
        // console.log("ImportDeclaration -> node", node)
        // Get a detailed description of the node and convert it into multiple import declarations
        const specifiers = node.specifiers
        Import vant, {Button, Icon} from 'vant'
        // Also consider the length of the specifiers, if the length is not 1 and it is not the default export we need to convert
        if (
          !(
            specifiers.length === 1 && t.isImportDefaultSpecifier(specifiers[0))) {const result = specifiers.map((specifier) = > {
            const local = specifier.local
            const source = t.stringLiteral(
              `${node.source.value}/${libraryDir}/${specifier.local.name}`
            )
            // console.log("ImportDeclaration -> specifier", specifier)
            return t.importDeclaration(
              [t.importDefaultSpecifier(local)],
              source
            )
          })
          console.log('ImportDeclaration -> result', result)
          // Because the AST to be replaced is not one, but multiple, so 'path.replaceWithMultiple(result)' is needed to replace, but once the execution is found to be infinite loop
          path.replaceWithMultiple(result)
        }
      },
    },
  }
}
const r = babel.transform(code, {
  plugins: [importPlugin({ libraryDir: 'lib'})],})console.log(r.code)
Copy the code

Looking at the print and conversion results seems to be ok, the plug-in is almost there

A special case

But let’s consider a case where the user doesn’t load everything on demand and loading on demand is just an option. If the user writes import vant, {Button, Icon} from ‘vant’, then we have a problem with the plug-in

If we do this, then the source of the default import should be unchanged, so we’ll take out the original source

Therefore, it is also necessary to determine whether each specifier is an ImportDefaultSpecifier and then process different sources. The complete processing logic should be as follows

function importPlugin(opt) {
  const { libraryDir } = opt
  return {
    visitor: {
      ImportDeclaration(path) {
        const node = path.node
        // console.log("ImportDeclaration -> node", node)
        // Get a detailed description of the node and convert it into multiple import declarations
        const specifiers = node.specifiers
        Import vant, {Button, Icon} from 'vant'
        // Also consider the length of the specifiers, if the length is not 1 and it is not the default export we need to convert
        if (
          !(
            specifiers.length === 1 && t.isImportDefaultSpecifier(specifiers[0))) {const result = specifiers.map((specifier) = > {
            let local = specifier.local,
              source
            // Check whether the default export exists
            if (t.isImportDefaultSpecifier(specifier)) {
              source = t.stringLiteral(node.source.value)
            } else {
              source = t.stringLiteral(
                `${node.source.value}/${libraryDir}/${specifier.local.name}`)}return t.importDeclaration(
              [t.importDefaultSpecifier(local)],
              source
            )
          })
          path.replaceWithMultiple(result)
        }
      },
    },
  }
}
Copy the code

babylon

Babylon is a JavaScript parser used in Babel.

Babylon’s relationship with Babel

The engine Babel uses is Babylon, which is not developed by the Babel team itself, but by the Fork of acorn, which was used in the build of Interest Clans 1.0 a long time ago. It’s a great engine for code conversion. However, the Acorn engine only provides basic ast parsing capabilities, traversal also requires acorn-travesal, node replacement requires acorn-, and this development, under Babel’s plug-in architecture, Become integrated (from AlloyTeam anatomy Babel)

The use of Babylon,

Use Babylon to write a plugin for array REST to Es5 syntax

Var arr = [].concat(arr1, arr2)

If we use Babylon, we don’t need to use @babel/core, just traverse and Generator in him, Babylon, @babel/traverse, @babel/generator, @babel/types are used

Analysis syntax tree

Let’s first look at the differences between the two syntax trees

According to the above figure, we can conclude that:

Both trees are ways of declaring variables, but they declare different keywords and they initialize variables differently, one is ArrayExpression and the other is CallExpression so it’s really easy to do, Just convert an array expression to a call expression

Analysis of the type

The core of this code generates a callExpression callExpression, so we analyze the API we need to use for the type on the website

So let’s look at what’s in init, first of all callExpression

/** * @param {Expression} callee (required) * @param {Array
      
       } source (required) */
      
t.callExpression(callee, arguments)
Copy the code

The corresponding syntax tree callee is a MemberExpression, so a MemberExpression is generated

/** * @param {Expression} object (required) * @param {if computed then Expression else Identifier} property (required) *  @param {boolean} computed (default: false) * @param {boolean} optional (default: null) */
t.memberExpression(object, property, computed, optional)
Copy the code

The object in Callee is an ArrayExpression, which is an empty array

/** * @param {Array
      
       } elements (default: []) */
      
t.arrayExpression(elements)
Copy the code

And once we’ve analyzed everything in there, we’re going to generate VariableDeclarator and VariableDeclaration and eventually generate a new syntax tree

/** * @param {LVal} id (required) * @param {Expression} init (default: null) */
t.variableDeclarator(id, init)

/** * @param {"var" | "let" | "const"} kind (required) * @param {Array
      
       } declarations (required) */
      
t.variableDeclaration(kind, declarations)
Copy the code

In fact, analyzing the syntax tree backwards will make it clear how to write, so shall we start the code

In the code

const babylon = require('babylon')
// With the packages provided by Babel, traverse and Generator are exposed to the default object
const traverse = require('@babel/traverse').default
const generator = require('@babel/generator').default
const t = require('@babel/types')

const code = `const arr = [ ...arr1, ...arr2 ]` // var arr = [].concat(arr1, arr2)

const ast = babylon.parse(code, {
  sourceType: 'module',})/ / conversion tree
traverse(ast, {
  VariableDeclaration(path) {
    const node = path.node
    const declarations = node.declarations
    console.log('VariableDeclarator -> declarations', declarations)
    const kind = 'var'
    // Boundary determination
    if(node.kind ! == kind && declarations.length ===1 && t.isArrayExpression(declarations[0].init)) {
      // Get the previous elements
      const args = declarations[0].init.elements.map((item) = > item.argument)
      const callee = t.memberExpression(t.arrayExpression(), t.identifier('concat'), false)
      const init = t.callExpression(callee, args)
      const declaration = t.variableDeclarator(declarations[0].id, init)
      const variableDeclaration = t.variableDeclaration(kind, [declaration])
      path.replaceWith(variableDeclaration)
    }
  },
})
Copy the code

Specific grammar

The opposite of an abstract Syntax Tree is a Concrete Syntax Tree, or CST (often called an analysis Tree). Typically, during source code translation and compilation, the parser creates the parse tree. Once the AST is created, information is added during subsequent processing, such as the semantic analysis phase. What’s the difference between an abstract syntax tree and a concrete syntax tree

supplement

The complete set of node types is as follows:

(parameter) node: Identifier | SimpleLiteral | RegExpLiteral | Program | FunctionDeclaration |
FunctionExpression | ArrowFunctionExpression | SwitchCase | CatchClause | VariableDeclarator |
ExpressionStatement | BlockStatement | EmptyStatement | DebuggerStatement | WithStatement |
ReturnStatement | LabeledStatement | BreakStatement | ContinueStatement | IfStatement | 
SwitchStatement | ThrowStatement | TryStatement | WhileStatement | DoWhileStatement | 
ForStatement | ForInStatement | ForOfStatement | VariableDeclaration | ClassDeclaration | 
ThisExpression | ArrayExpression | ObjectExpression | YieldExpression | UnaryExpression |
UpdateExpression | BinaryExpression | AssignmentExpression | LogicalExpression | MemberExpression |
ConditionalExpression | SimpleCallExpression | NewExpression | SequenceExpression | TemplateLiteral |
TaggedTemplateExpression | ClassExpression | MetaProperty | AwaitExpression | Property |
AssignmentProperty | Super | TemplateElement | SpreadElement | ObjectPattern | ArrayPattern |
RestElement | AssignmentPattern | ClassBody | MethodDefinition | ImportDeclaration | 
ExportNamedDeclaration | ExportDefaultDeclaration | ExportAllDeclaration | ImportSpecifier | 
ImportDefaultSpecifier | ImportNamespaceSpecifier | ExportSpecifier
Copy the code

Babel has documentation for a detailed definition of the AST tree, which can be found here

Refer to the link

  • Analyze the Babel – Babel overview | AlloyTeam
  • JavaScript syntax parsing, AST, V8, JIT
  • Detail the AST abstract syntax tree
  • AST Abstract syntax tree
  • webpack/AST
  • @babel/types