What is the AST

Abstract Syntax Tree (AST) is a tree-like representation of the Abstract Syntax structure of the source code. At the core of many tool libraries, such as Webpack, ESLint, and others, is the concept of abstract grammars to examine, analyze, and more code. Today I want to share with you the concept of an abstract syntax tree for an interpreted language like JavaScript

The most common browser is to convert JS code into an abstract syntax tree for further analysis and other operations. Therefore, JS into the abstract syntax tree more conducive to the analysis of the program.

For example, the variable declaration statement in the image above is converted to the AST style shown in the image on the right

Corresponding images on the left:

  • varIs a key word
  • ASTIt’s a definer
  • =It’s Equal and there’s a lot of different ways of calling it, and we’ll see that later on
  • is treeIt’s a string
  • ;Is the Semicolon

First, the abstract syntax tree that a piece of code converts to is an object that has a top-level type attribute, Program. The second property is the body which is an array.

Each item in the body array is an object that contains all the information describing the statement

Type: an array of declarations, each of which is also an object Type: type value: value "is tree" without quotes row: "\"is tree"\" with quotes

Morphological analysis and grammatical analysis

JavaScript is an interpreted language, usually through the lexical analysis -> syntax analysis -> syntax tree, you can start to explain the execution

Lexical parsing: Also known as scanning, this is the process of converting streams of characters into streams of tokens, which read our code and compose tokens according to a set of rules

For example: var a = 2, this code is usually decomposed into var, a, =, 2

[
  { type: 'Keyword', value: 'var' },
  { type: 'Identifier', value: 'a' },
  { type: 'Punctuator', value: '=' },
  { type: 'Numeric', value: '2' },
];

When lexical analysis of source code is performed, it reads the code character by character, which is why it is graphically called scan-scan. When it encounters a space, an operator, or a special symbol, it thinks a sentence is complete.

Syntax parsing: A parser that converts lexical arrays into tree forms and validates the syntax. Throw a syntax error if there is one.

{... "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "a" }, ... }

The syntax is parsed into an AST, and we can see the results online here at http://esprima.org

What can the AST do

  • Syntax check, code style check, formatting code, syntax highlighting, error message, auto-complete, etc
  • Code obfuscation compression
  • Optimize change code, change code structure, etc

For example, if I have a function a() {} and I want to change it to function b() {}

For example, after the completion of the code to compile the require in webpack (‘ a ‘) — – > __webapck__require__ (” * / * * / a. s “)

Here’s a set of tools for converting code into a syntax tree and then changing nodes and regenerating code

AST parsing process

Prepare tools:

  • Esprima: code => ast
  • Estraverse: Traverse AST transformation tree
  • escodegen: ast => code

A common AST online conversion site is recommended: https://astexplorer.net/

For example, if we have a piece of code called function getUser() {}, we change the name of the function to hello and see how the code flows

Take a look at the code below to briefly illustrate the AST traversal process

const esprima = require('esprima'); const estraverse = require('estraverse'); const code = `function getUser() {}`; // generate AST const AST = esprima.parseScript(code); // Convert the AST, Traverse (ast, {enter(node) {console.log(' enter-> node.type'), traverse(ast, {enter(node) {console.log(' enter-> node.type'), node.type); }, leave(node) { console.log('leave -> node.type', node.type); }});

The output is as follows:

It can be concluded that the AST traversal process is depth-first, and the traversal process is as follows:

Modify function name

At this time, we find that the name of the function is the name of the function when the type is Identifier, so we can directly modify it to implement an AST tool that changes the name of the function

// Convert the tree estraverse. Traverse (ast, {// Enter (node) {console.log(' enter-> node.type', node.type); if (node.type === 'Identifier') { node.name = 'hello'; } }, leave(node) { console.log('leave -> node.type', node.type); }}); Const result = escodegen.generate(ast); console.log(result); // function hello() {}

How Babel works

Babel has been around since ES6 began to be widely used. It solves the problem that some browsers are not compatible with new ES6 features. It converts ES6 code into ES5 code and works on all browsers. Babel transforms code using an AST, and Babel has a special relationship to the AST.

So let’s use the AST in Babel and see how Babel compiles the code (not the source code).

You need to use two toolkits @babel/core and @babel/preset-env

When we configure Babel, both the.babelrc and babel.config.js files are configured with presets and plugins (there are other configuration items that will not be covered here).

The difference between plugins and presets

// .babelrc
{
  "presets": ["@babel/preset-env"],
  "plugins": []
}

When we configure presets with @babel/preset-env in it, then @babel/core will look for the preset-env preset plug-in package, which is a set

The Babel core package does not convert the code. The core package only provides some core APIs. The real conversion work is done by plugins or presets. When conversion requirements increase, it is not possible to configure the Plugin for each of them. This is when the presets can be used. Presets are a collection of plugins. A preset contains a number of plugins.

Use of the Babel plugin

Now we have an arrow function. To convert it to a normal function, we can simply write:

const babel = require('@babel/core'); const code = `const fn = (a, b) => a + b`; // Babel has a transform method that will automatically traversal the corresponding code const r = babel.transform(code, {presets: ['@babel/preset-env'], }); console.log(r.code); // Print the result as follows: // "use strict"; // var fn = function fn() { return a + b; };

And now we can see that eventually the code is going to be converted to a normal function, but we just need the arrow function to go through the function, not this big package, just one arrow function to go through the normal function package, We can actually find a plugin called plugin-transform-arrow-functions under node_modules. This plugin is designed to handle arrow functions, so we can write:

const r = babel.transform(code, { plugins: ['@babel/plugin-transform-arrow-functions'], }); console.log(r.code); // const fn = function () {return a + b; };

We can see from the printout that we are not converting our variable declaration to const, we are just converting the arrow function

Write your own plug-in

At this point, we can write some plug-ins ourselves to implement the code transformation, the process of processing the code is using the AST processing logic mentioned earlier

Const fn = function(a, b) {return a + b} const fn = function(a, b) {return a + b}

Analyzing AST structure

Const fn = (a, b) => const fn = function(a, b) {return a + b} const fn = function(a, b) {return a + b

According to our analysis, we can get:

  1. He stopped calling it the arrow function when it became a normal functionArrowFunctionExpressionIt’s a function expressionFunctionExpression
  2. So the first thing we need to do is takeArrowFunctionExpressionconvertFunctionExpression
  3. To putBinaryExpression (binaryExpression)Wrapped inReturns the ReturnStatement.And then a push toIn the code block (BlockStatement).
  4. So what we’re going to do is we’re going to turn one tree into another tree, which is basically a structure of another tree, and then we’re going to generate new code, and we’re going to convert that code

Visitor Pattern

In Babel, we develop plugins using the visitor pattern, which means that we match a path when accessing it, and then modify the node, such as when accessing ArrowFunctionExpression above, ArrowFunctionExpression is modified to become a normal function

So we could write:

const babel = require('@babel/core'); const code = `const fn = (a, b) => a + b`; // const fn = function(a, b) {return a + b} const ArrowFnPlugin = {// {// ArrowFunctionExpression(path) {// Take the node const node = path.node; console.log('ArrowFunctionExpression -> node', node); ,}}}; const r = babel.transform(code, { plugins: [arrowFnPlugin], }); console.log(r);

Modifying the AST structure

So what we get is a node that looks like this and it looks like this, which is actually the AST of ArrowFunctionExpression, All we need to do at this point is replace the structure of ArrowFunctionExpression with the structure of FunctionExpression, but we need to assemble a similar structure. This is a bit of a hassle to write directly, but Babel provides us with a tool called @babel/types

@babel/types has two functions:

  1. ArrowFunctionExpression is an ArrowFunctionExpression node.
  2. Generate the corresponding expression

And then when we use it, we need to check the documentation a lot, because there are a lot of node types, and people who are not doing compilation work can’t remember how many nodes, right

So what we’re going to do is we’re going to start generating FunctionExpression, and we’re going to replace the previous ArrowFunctionExpression, and we’re going to look at the types document, and we’re going to find FunctionExpression, This method takes the corresponding argument and we pass it in to generate a FunctionExpression

t.functionExpression(id, params, body, generator, async);
  • ID: Identifier (default: NULL) ID can pass NULL
  • Params: Array

    (required) function argument, you can take the previous argument
  • Body: BlockStatement (Required) function body, accept oneBlockStatementWe need to generate one
  • Generator: Boolean (default: false) Whether this is a generator function or not, of course not
  • Async: Boolean (default: false) async: Boolean (default: false

We also need to generate a BlockStatement, and we’ll look at the documentation to find the parameters that the BlockStatement takes

t.blockStatement(body, directives);

The blockStatement accepts a body, so we can take the body and use it directly, but in this case the body accepts an array

We are looking at the AST structure. The body of a BlockStatement in a function expression is a collection of ReturnStatements, so we need to generate a ReturnStatement as well

Now we can rewrite the AST

ArrowFunctionExpression(path) {// Take the node and replace the node const node = path.node; // take the function const params = node.params; const returnStatement = t.returnStatement(node.body); const blockStatement = t.blockStatement([returnStatement]); const functionExpression = t.functionExpression(null, params, blockStatement); // Replace path.replaceWith(FunctionExpression); }, const fn = function (a, b) {return a + b; };

Of course, we can also generate an ExpressionStatement if there is no returnStatement. We just need to change the returnStatement to an ExpressionStatement and keep the rest of the logic unchanged

ArrowFunctionExpression(path) {// Take the node and replace the node const node = path.node; // take the function const params = node.params; // Replace a returnStatement with an expressionStatement const expressionStatement = t.expressionStatement(node.body); const blockStatement = t.blockStatement([expressionStatement]); const functionExpression = t.functionExpression(null, params, blockStatement); // Replace path.replaceWith(FunctionExpression); }, const fn = function (a, b) {a + b; };

According to the need to introduce

In development, we introduce UI framework, such as Element-UI used in Vue, AnTD in Vant or React, which supports global import and on-demand import. The default is global import. If you need to import on demand, you need to install a babel-plugin-import plug-in to change global writing to import on demand.

Import {Button} from ‘vant’ import {Button} from ‘vant’ import Button from ‘vant/lib/Button’ Referencing the entire vant changes to using only one file under the vant, and the packaged file will be much smaller than the entire imported file size

Analytic syntax tree

import { Button, Icon } from 'vant'I’m going to write this as
import Button from 'vant/lib/Button'; import Icon from 'vant/lib/Icon'

Look at the difference between the two syntax trees

According to the analysis of the two graphs, we can get some information:

  1. We found that the only modules introduced by destructuring are import declarations. The second figure shows two import declarations
  2. In the detailed description of the introduction of Deconstruction (specifiers) are twoImportSpecifierIn the second picture, they are separated, and they are bothImportDefaultSpecifier
  3. What they introducedsourceAre not the same as
  4. So what we’re going to do is we’re going to take individualImportDeclarationTo become moreImportDeclaration“And then destructs a single importspecifiersPart of theImportSpecifierConvert to multipleImportDefaultSpecifierAnd modify the correspondingsourceCan be

Analysis of the type

To make it easier to pass the parameters, this time we’ll write a function to pass the directory that is stitched after the transformation

Here we need to use several types, also need to find the corresponding explanation in the official website of types

  • First we need to generate multipleimportDeclarationtype
/** * @param {Array<ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier>} specifiers (required) * @param  {StringLiteral} source (required) */ t.importDeclaration(specifiers, source);
  • inimportDeclarationNeed to generate inImportDefaultSpecifier
  /**
   * @param {Identifier} local  (required)
   */
  t.importDefaultSpecifier(local);
  • inimportDeclarationWe need to generate one more inStringLiteral
  /**
   * @param {string} value  (required)
   */
  t.stringLiteral(value);

In the code

With the above analysis, let’s start with the code

const babel = require('@babel/core'); const t = require('@babel/types'); const code = `import { Button, Icon } from 'vant'`; // import Button from 'vant/lib/Button' // import Icon from 'vant/lib/Icon' function importPlugin(opt) { const { libraryDir } = opt; return { visitor: { ImportDeclaration(path) { const node = path.node; // console.log(" importDeclaration -> node", node) // Get the node details and convert to multiple import declarations const specifiers = node.specifiers; Import vant, {Button, Icon} from 'vant'; import vant, {Button, Icon} from 'vant'; If the length is not 1 and is not exported by default we only need to convert if (! (specifiers.length === 1 && t.isImportDefaultSpecifier(specifiers[0]))) { const result = specifiers.map(specifier => { const local = specifier.local; const source = t.stringLiteral( `${node.source.value}/${libraryDir}/${specifier.local.name}` ); // console.log("ImportDeclaration -> specifier", specifier) return t.importDeclaration([t.importDefaultSpecifier(local)], source); }); console.log('ImportDeclaration -> result', result); // There is no single AST that needs to be replaced, so the 'path.replaceWithMultiple(result)' is required. ,}}}}; } const r = babel.transform(code, { plugins: [importPlugin({ libraryDir: 'lib' })], }); console.log(r.code);

Looking at the print results and conversion results seems to be no problem, the plug-in is almost implemented

A special case

But let’s consider a case where loading on demand is just an option if the user doesn’t load everything on demand. If the user says import vant, {Button, Icon} from ‘vant’, then we have a problem with the plug-in

If this is the case, then the default source should be the same, we need to take the original source out

Therefore, it is also necessary to determine whether each specifier is an importDefaultSpecifier and then process different sources. The complete processing logic should be as follows

function importPlugin(opt) { const { libraryDir } = opt; return { visitor: { ImportDeclaration(path) { const node = path.node; // console.log(" importDeclaration -> node", node) // Get the node details and convert to multiple import declarations const specifiers = node.specifiers; Import vant, {Button, Icon} from 'vant'; import vant, {Button, Icon} from 'vant'; If the length is not 1 and is not exported by default we only need to convert if (! (specifiers.length === 1 && t.isImportDefaultSpecifier(specifiers[0]))) { const result = specifiers.map(specifier => { let local = specifier.local, source; / / judge whether there is a default export situation if (t.i sImportDefaultSpecifier (specifiers)) {source = t.s tringLiteral (node. The source. The value). } else { source = t.stringLiteral( `${node.source.value}/${libraryDir}/${specifier.local.name}` ); } return t.importDeclaration([t.importDefaultSpecifier(local)], source); }); path.replaceWithMultiple(result); ,}}}}; }

babylon

There’s a quote on the Babel website
Babylon is a JavaScript parser used in Babel.

The relationship between Babylon and Babel

The engine that Babel uses is Babylon. Babylon was not developed by the Babel team itself, but by Fork’s Acorn project. Acorn project itself was used in the build of Clan of Interest 1.0 a long time ago, for some code conversion, which is a great engine. But the Acorn engine only provides basic AST parsing capabilities, traversing requires acorn-travesal, and replacing nodes requires acorn-travesal. These developments, developed under Babel’s plug-in architecture, Getting integrated (from AlloyTeam’s Anatomy of Babel)

The use of Babylon,

Use Babylon to write an array REST to ES5 syntax plug-in

Convert const arr = […arr1,…arr2] to var arr = []. Concat (arr1, arr2)

Instead of using @babel/core, we use the traverse and generator in Babylon, The packages used are Babylon, @babel/traverse, @babel/generator, and @babel/types

Analytic syntax tree

Let’s take a look at the differences between the two syntax trees

According to the above analysis, we can get:

  1. Both trees are variable declarations, the difference is that they declare different keywords
  2. They initialize variable values differently, one is an ArrayExpression and the other is a CallExpression.
  3. What we need to do is simply convert the array expression into a call expression

Analysis of the type

The core of this code generates a CallExpression callExpression, so we analyze the API we need to use for the type of official website

  • So let’s look at what’s in init first, CallExpression
  /**
   * @param {Expression} callee  (required)
   * @param {Array<Expression | SpreadElement | JSXNamespacedName>} source (required)
   */
  t.callExpression(callee, arguments);
  • Callee on the corresponding syntax tree is a MemberExpression, so a MemberExpression is generated
/** * @param {Expression} object (required) * @param {if computed then Expression else Identifier} property (required) *  @param {boolean} computed (default: false) * @param {boolean} optional (default: null) */ t.memberExpression(object, property, computed, optional);
  • The object in callee is an arrayExpression ArrayExpression, which is an empty array
  /**
   * @param {Array<null | Expression | SpreadElement>} elements  (default: [])
   */
  t.arrayExpression(elements);
  • Now that we’ve analyzed what’s in there, we’re going to generate VariableDeclarator and VariableDeclaration and eventually generate a new syntax tree
  /**
   * @param {LVal} id  (required)
   * @param {Expression} init (default: null)
   */
  t.variableDeclarator(id, init);

  /**
   * @param {"var" | "let" | "const"} kind  (required)
   * @param {Array<VariableDeclarator>} declarations (required)
   */
  t.variableDeclaration(kind, declarations);
  • In fact, analyze the syntax tree backwards, analyze how to write it will be clear, so let’s start the code

In the code

const babylon = require('babylon'); // With the package provided by Babel, the traverse and generator are both exposed to the default object const traverse = require('@babel/traverse').default; const generator = require('@babel/generator').default; const t = require('@babel/types'); const code = `const arr = [ ...arr1, ...arr2 ]`; // var arr = [].concat(arr1, arr2) const ast = babylon.parse(code, { sourceType: 'module', }); // Convert tree traverse(ast, {variableDeclaration (path) {const node = path.node; const declarations = node.declarations; console.log('VariableDeclarator -> declarations', declarations); const kind = 'var'; If (node.kind! === 1 &&t.i ArrayExpression(declarations[0]. Init) {const elements == 1 &&t.i ArrayExpression(declarations[0] declarations[0].init.elements.map(item => item.argument); const callee = t.memberExpression(t.arrayExpression(), t.identifier('concat'), false); const init = t.callExpression(callee, args); const declaration = t.variableDeclarator(declarations[0].id, init); const variableDeclaration = t.variableDeclaration(kind, [declaration]); path.replaceWith(variableDeclaration); }}});

Handle async await gracefully

Async + await async code async + await async code async All right, the only problem is that to catch a problem with the code you need to wrap “await” snippet using try/catch. For the robustness of the program, it may be necessary to write try/catch logic frequently in async so that we can use the AST to catch the corresponding code and handle await statements that are not being tried/caught

Async function func() {await asyncFn(); }
Async function func() {try {await asyncFn(); } catch (e) {} }

Analytic syntax tree

We find that all we need to do is wrap a TryStatement try statement around the AwaitExpression Aawait expression

Analysis of the type

All we need to do is create a TryStatement and look at the corresponding API

/**
 * @param {BlockStatement} block  (required)
 * @param {CatchClause} handler  (default: null)
 * @param {BlockStatement} finalizer (default: null)
 */
t.tryStatement(block, handler, finalizer);

Catchclause is not considered for the time being, Sir

/**
 * @param {Array<Statement>} body  (required)
 * @param {Array<Directive>} directives  (default: [])
 */
t.blockStatement(body, directives);

According to the AST tree structure, the body is composed of an ExpressionStatement

/**
 * @param {Expression} expression  (required)
 */
t.expressionStatement(expression);

The expression we need in the ExpressionStatement is the node we currently capture, so we can start writing the code

code

To capture code in AwaitExpression, we also need to determine that the parent Node of the code segment is not wrapped by a try/catch. We can use the findParent method of the path parameter to walk up all the parent nodes and determine whether they are wrapped by a try/catch Node

AwaitExpression(path) {// first ensure that await statement is not wrapped by try/catch if (path.findParent(path = >t.IstryStatement (path.node))) return; const expression = t.expressionStatement(path.node); const tryBlock = t.blockStatement([expression]); // create a catch --> console.log(e) const paramsE = t.identifier('e'); const memberExpression = t.MemberExpression(t.identifier('console'), t.identifier('log')); const consoleExpression = t.expressionStatement(t.callExpression(memberExpression, [paramsE])); const catchClause = t.catchClause(paramsE, t.blockStatement([consoleExpression])); const tryStatement = t.tryStatement(tryBlock, catchClause); / / an array of the path. ReplaceWithMultiple ([tryStatement]); } async function func() {// try {// await asyncFn(); // } catch (e) { // console.log(e); / / / /}}

Other situations

We also need to allow for the possibility that await expressions might be able to do other things. We could declare the assignment of the variable directly, we could assign the value directly, and then we would have the immediate expression that we just processed

// declare const r = await asyncFn(); // Assign r = await asyncFn(); // await asyncFn(); await asyncFn();

If we look at the syntax tree again, we can see that the difference is under the BlockStatement node, so we can replace this level directly and complete the catch statement by the way

The code we entered at this time is as follows:

async function func() {
  const r = await asyncFn1();
  res = await asyncFn2();
  await asyncFn3();
}

Process:

AwaitExpression(path) {// first ensure that await statement is not wrapped by try/catch if (path.findParent(path = >t.IstryStatement (path.node))) return; const parent = path.parent; let replacePath = null; If (t.i sVariableDeclarator (parent) | | t.i sAssignmentExpression (parent)) {/ / assignment and the way of structure similar to claim, The nodes on the parentPath are the parameters required for the BlockStatement. Can directly replace replacePath = path. ParentPath. ParentPath; } else {// If it's just an expression, path.parentpath.node is the BlockStatement parameter replacePath = path.parentPath; } const tryBlock = t.blockStatement([replacePath.node]); // Dentifier (' E '); // Dentifier (' E '); const throwStatement = t.throwStatement(t.newExpression(t.identifier('Error'), [paramsE])); const catchClause = t.catchClause(paramsE, t.blockStatement([throwStatement])); const tryStatement = t.tryStatement(tryBlock, catchClause); replacePath.replaceWithMultiple([tryStatement]); }, // async function func() {// const r = await asyncFn1(); // } catch (e) { // throw new Error(e); // } // try { // res = await asyncFn2(); // } catch (e) { // throw new Error(e); // } // try { // await asyncFn3(); // } catch (e) { // throw new Error(e); / / / /}}

Concrete grammar

The opposite of an abstract Syntax Tree is a Concrete Syntax Tree (CST) (often referred to as an Analysis Tree). Typically, during the translation and compilation of source code, the parser creates a parse tree. Once the AST is created, some information is added during subsequent processing, such as the semantic analysis phase. What is the difference between abstract and concrete syntax trees?

supplement

For node types, the complete set looks something like this:

(parameter) node: Identifier | SimpleLiteral | RegExpLiteral | Program | FunctionDeclaration | FunctionExpression | ArrowFunctionExpression | SwitchCase | CatchClause | VariableDeclarator | ExpressionStatement | BlockStatement | EmptyStatement | DebuggerStatement | WithStatement | ReturnStatement | LabeledStatement | BreakStatement | ContinueStatement | IfStatement | SwitchStatement | ThrowStatement | TryStatement | WhileStatement | DoWhileStatement | ForStatement | ForInStatement | ForOfStatement | VariableDeclaration | ClassDeclaration | ThisExpression | ArrayExpression | ObjectExpression | YieldExpression | UnaryExpression | UpdateExpression | BinaryExpression | AssignmentExpression | LogicalExpression | MemberExpression | ConditionalExpression | SimpleCallExpression | NewExpression | SequenceExpression | TemplateLiteral | TaggedTemplateExpression | ClassExpression | MetaProperty | AwaitExpression | Property | AssignmentProperty | Super | TemplateElement | SpreadElement | ObjectPattern | ArrayPattern  | RestElement | AssignmentPattern | ClassBody | MethodDefinition | ImportDeclaration | ExportNamedDeclaration | ExportDefaultDeclaration | ExportAllDeclaration | ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier | ExportSpecifier

Babel has documentation for a detailed definition of the AST tree, which can be found here

The code address

Code to store on GitHub, address

A link to the

  1. JavaScript syntax parsing, AST, V8, JIT
  2. Detail the AST abstract syntax tree
  3. Abstract syntax tree PS: This is a class to ES5 constructor process, interested can take a look at
  4. Analyze the Babel – Babel overview | AlloyTeam
  5. Don’t try/catch the async function too much
  6. @babel/types
  7. The article has been synchronized with Nuggets
  8. Original address – shared by the AST team