Learn the abstract syntax tree AST

preface

Babel, ESLint, Prettier, etc. But the principles behind these tools are not very well understood 😅. Recently I tried to understand them, but I found a new world: AST Abstract Syntax tree.

I met

In computer science, an Abstract Syntax Tree (AST), or Syntax Tree for short, is an Abstract representation of the Syntax architecture of source code. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code. The syntax is “abstract” because it does not represent every detail that occurs in real grammar. For example, nested parentheses are hidden in the structure of the tree and are not represented as nodes. 2 conditional jump statements like if-conditional-then can be represented by nodes with three branches. The opposite of an abstract syntax tree is a concrete syntax tree (often called an analysis tree). Typically, during source code translation and compilation, the parser creates an analysis tree from which the AST is generated. Once the AST is created, information is added during subsequent processing, such as the semantic analysis phase.

Abstract syntax tree

Let’s start with an all-powerful function that returns the ultimate answer to life, the universe, and everything else

function ask() {
  const answer = 42
  return answer
}
Copy the code

First it is a function declaration, the function name is ask, the function body defines a constant value of 42 answer, finally return answer. Type it into a magical site called AstExplorer, and the mysterious AST is finally revealed.

As you can see, AST is a top-down tree structure, each layer consists of one or more nodes, and each node is represented by a type attribute such as FunctionDeclaration, BlockStatement, VariableDeclaration, And other attributes. The node type definition is in the ESTree repository, which includes es5 to the latest JS syntactic type definition. The structure of this function AST can be seen more clearly in the following figure.

So far we have a preliminary understanding of AST

parsing

To get an AST of your code, you first need to parse the code. The parsing phase takes the source code and outputs the AST, which uses a parser to lexically and syntactic the source code. Lexical analysis converts the Tokens in string form into an array of syntax fragments. The syntax analysis phase converts Tokens into AST form.

Lexical analysis

Tokens are arrays of fragments of code statements that can contain numbers, labels, punctuation marks, operators, or just about anything else.

// Sample source code
a + b

// Tokens[{type: {... },value: "a".start: 0.end: 1.loc: {... }}, {type: {... },value: "+".start: 2.end: 3.loc: {... }}, {type: {... },value: "b".start: 4.end: 5.loc: {... }},]Copy the code

Syntax analysis

In the grammar analysis stage, convert the Tokens array into AST for subsequent operations. For general operations, please refer to the code here.

traverse

Having an AST allows you to recursively traverse it from top to bottom, using a design pattern called the Visitor pattern to access nodes in the tree. This pattern creates an object visitor that contains some methods, matches the method name in the visitor as the AST is traversed, and calls the method when the match is successful.

Source code can be syntactic checked by accessing the AST node, and ESLint works on this basis.

Here’s an example of a rule that checks syntax (see this link for details on writing ESLint rules)

Limit the number of function arguments

Match the FunctionDeclaration node of type FunctionDeclaration. If the number of arguments in a function is greater than 3, the function is prompted.

export default function (context) {
  return {
    // Access the FunctionDeclaration node
    FunctionDeclaration: (node) = > {
      // Determine the number of function arguments
      if (node.params.length > 3) {
        context.report({
          node,
          message: "No more than 3 parameters."}); }}}; }Copy the code

Results the following

Restrict nested conditional statements

Matches a node whose type is IfStatement and prompts if its first child is also IfStatement.

export default function (context) {
  return {
    IfStatement(node) {
      const { consequent } = node;
      const { body } = consequent;

      // Check whether the first child type is IfStatement
      if (body[0] && body[0].type === "IfStatement") {
        context.report({
          node: body[0].message: "Nested conditional statements are not allowed"}); }}}; }Copy the code

Modify the

You can add, move, and replace nodes in the tree while traversing the AST. You can also generate a new AST.

Babel works by modifying nodes on the code AST to achieve the purpose of modifying the code.

A Babel plug-in is a function that takes a Babel object as a parameter and returns an object with a visitor attribute. Each function in the Visitor object takes path and state parameters.

The following is an example of writing a Babel plug-in. See this link for more information about writing a Babel plug-in.

// a Babel plugin
export default function(babel) {
  return {
    visitor: {
      Identifier(path, state) {},
      // ...}}; }Copy the code

will`支那`Syntactic conversion to`Math.pow`

// Before
const a = 10支那2 
// After
const a = Math.pow(10.2)
Copy the code

find支那Syntax location
Gets the left and right operands
createMath.powStatement to replace the original node

export default function (babel) {
  const { types: t } = babel;
  
  return {
    visitor: {
      // Access binary expressions
      BinaryExpression(path) {
        const { node } = path
        Exit if the operator is not **
        if(node.operator ! = ='* *') return
        const { left, right } = node
        // Create the call statement
        const newNode = t.callExpression(
          t.memberExpression(t.identifier('Math'), t.identifier('pow')),
          [left, right]
        )
        // Replace the original node
        path.replaceWith(newNode)
      },
    }
  };
}

Copy the code

Modify the introduction of utility functions

// Before
import { get, isFunction } from 'lodash'
// After
import get from "lodash/get";
import isFunction from "lodash/isFunction";
Copy the code

findlodash 的 importnode
Iterates through all incoming values to get the referenced valuesnameattribute
Insert the newly generatedimportnode
Deleting the original Node

export default function (babel) {
  const { types: t } = babel;
  
  return {
    visitor: {
      // Access the import declaration
      ImportDeclaration(path) {
        let { node } = path
        if(node.source.value ! = ='lodash') return
        const val = node.source.value

        node.specifiers.forEach((spec) = > {
          if (t.isImportSpecifier(spec)) {
            const { local } = spec

            // Insert a new import node
            path.insertBefore(
              t.importDeclaration(
                [t.importDefaultSpecifier(local)],
                t.stringLiteral(`${val}/${local.name}`)))}})// Delete the original node
        path.remove()
      },
    }
  };
}
Copy the code

generate

Based on the AST output code, the following is illustrated by two tools.

Jscodeshift

Jscodeshift is Facebook’s open source tool for running transformations to JavaScript or TypeScript files. It’s designed to make bulk changes to code easier. It converts the source code using Transformer, which is a function that takes fileInfo, API, and options arguments and returns the source code.

module.exports = function(fileInfo, api, options) {
  // transform `fileInfo.source` here
  // ...
  // return changed source
  return source;
};
Copy the code

Example: convert < react. Fragment> to the <> syntax. The idea is to find a node whose name is Fragment and remove its parent node.

export default function transformer(file, api) {
  const j = api.jscodeshift
  const root = j(file.source)

  root.find(j.JSXIdentifier).forEach((p) = > {
    const { node } = p
    // The operation continues when the name of the node is equal to Fragment
    if(node.name ! = ='Fragment') return
    // Remove the parent node
    j(p.parent).remove()
  })

  return root.toSource()
}

Copy the code

Results the following

See this link for more examples

Gogocode

Gogocode is a recently discovered code handling tool, which claims to be the most simple, easy to use, the most readable, and provides an API similar to jQuery.

Example: Replace variable names

#! /usr/bin/env node

const$=require('gogocode')
const code = ` function ask() { const answer = 42 return answer } `
const newCode = $(code)
  .replace('ask'.'question')
  .replace('answer'.'result')
  .generate()

console.log(newCode)

/ / output
// function question() {
// const result = 42
// return result;
// }
Copy the code

Results the following

conclusion

This article explains what an AST abstract syntax tree is, how to get an AST of your code, and the code tools you can develop using the AST to walk through, modify, and generate it.

This article code address

A link to the

Awesome AST

The ESTree Spec

Babel Handbook

The Super Tiny Compiler

Working with Rules

jscodeshift

GOGOCODE

preface

I met

parsing

Lexical analysis

Syntax analysis

traverse

Limit the number of function arguments

Restrict nested conditional statements

Modify the

will支那Syntactic conversion toMath.pow

Modify the introduction of utility functions

generate

Jscodeshift

Gogocode

conclusion

A link to the

Related Posts

AntdPro Permission Management (1)

Web side native video TAB plays video some pits

Git practical review part one step in place!

will`支那`Syntactic conversion to`Math.pow`