If you have a basic knowledge of Babel, it is recommended to skip the pre-knowledge section and head straight to the “plug-in writing” section.

Front knowledge

What is the AST

To learn Babel, one must understand AST.

So what is AST?

Here’s what Wikipedia explains:

In computer science, an Abstract Syntax Tree (AST), or Syntax Tree for short, is an Abstract representation of the syntactic structure of source code. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code

“An abstract representation of source code syntax structure” these words to underline, is the key to our understanding of AST, in human language is in accordance with some agreed norms, to tree data structure to describe our code, so that JS engine and translator can understand.

For example, just as the framework now uses the virtual DOM to describe real DOM structures and manipulate them, the AST is a good tool for describing code at a lower level.

Of course, AST is not unique to JS, each language code can be converted into the corresponding AST, and there are many AST structure specifications, MOST of the standards used in JS is ESTree, of course, this only do a simple understanding.

What the AST looks like

Now that you know the basic concepts of an AST, what does an AST look like?

Astexplorer.net is a site where you can generate an AST online, and you can try to generate an AST to learn about the structure

Babel processing

Q: What are the stages of stuffing a refrigerator into an elephant?

Open the fridge -> stuff the elephant -> close the fridge

The same is true with Babel, which uses the AST to compile code. Naturally, the code needs to be turned into an AST, then the AST is processed, and then the AST is converted back

That is, the following process

Code converts to AST -> process AST -> AST converts to code

And then we’ll give them a more professional name

Parse -> Transform -> build

Parse

Convert source code to an Abstract syntax tree (AST) using Parser

In this stage, the main task is to transform code into AST, which will go through two stages, namely lexical analysis and syntax analysis. When the Parse phase begins, a document scan is performed first, during which lexical analysis is performed. So how do we understand this analysis? If we think of a piece of code that we’re writing as a sentence, what lexical analysis is doing is breaking up the sentence. Just as the phrase “I am eating” can be broken down into “I”, “am”, and “eating”, so does code. For example, const a =’ 1′ is broken down into the most fine-grained words (tokon): ‘const’, ‘a’, ‘=’, ‘1’ that’s what the lexical analysis phase does.

At the end of the lexical analysis, the tokens obtained from the analysis were given to the grammar analysis. The task of the grammar analysis stage was to generate AST based on tokens. It traverses the tokens, and it generates a tree with a specific structure and that tree is the AST.

In the following figure, you can see the structure of the above statement. We found several important pieces of information, the outermost being oneVariableDeclarationIt means variable declaration, and the type used isconst, the fielddeclarationsThere’s another one insideVariableDeclaratorThe variable declarator object, founda.1Two keywords.

In addition to these keywords, you can also find important information such as line numbers and so on, which is not elaborated here. Anyway, this is what we end up with as an AST.

So the question is, how do you convert code to AST in Babel? At this stage we will use the @babel/ Parser provided by Babel, formerly called Babylon, which was not developed by the Babel team itself, but is based on the Acorn project of Fork.

It gives us a way to convert code to an AST. The basic usage is as follows:

More information can be found in the official documentation @babel/ Parser

Transform

After the Parse phase, we have successfully obtained the AST. When Babel receives the AST, it does a depth-first traverse with @babel/traverse, and the plug-in is triggered at this stage to visit each different type of AST node as a vistor function. Using the above code as an example, we can write a VariableDeclaration function to access a VariableDeclaration node, which is triggered whenever a node of that type is encountered. As follows:

The method takes two arguments,

path

Path is the current access path and contains node information, parent node information, and many methods to operate on the node. You can use these methods to add, update, move, delete, and so on to the ATS.

state

State contains information about the current plugin, parameters, etc., and can also be used to pass data between nodes using custom definitions.

Generate

Generate: Prints the transformed AST as object code and generates sourcemap

This phase is relatively simple. After the AST is processed in the Transform phase, the task of this phase is to convert the AST back to code. During this phase, the AST is depth-first traversed, the corresponding code is generated according to the information contained in the node, and the corresponding Sourcemap is generated.

Classic case attempts

As the saying goes, the best way to learn is by doing. Let’s try a simple classic example: convert an ES6 const to an ES5 var

Step 1: Convert to AST

It is easier to generate an AST using @babel/ Parser, as in the previous example, where our AST variable is the transformed AST

const parser = require('@babel/parser');
const ast = parser.parse('const a = 1');
Copy the code

Step 2: Process the AST

Use @babel/traverse to handle AST

At this stage we analyze the generated AST structure to determine that the const is controlled by the kind field in the VariableDeclaration, so can we try to rewrite kind to var as we want? In that case, let’s try it

const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default

const ast = parser.parse('const a = 1');
traverse(ast, {
    VariableDeclaration(path, state) {
	  // Use path.node to access the actual AST node
      path.node.kind = 'var'}});Copy the code

Ok, now we modify kind with a guess and rewrite it to var, but we don’t know if it actually works yet, so we need to convert it back to code to see how it works.

Step 3: Generate code

Handle AST using @babel/generator

const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default
const generate = require('@babel/generator').default

const ast = parser.parse('const a = 1');
traverse(ast, {
    VariableDeclaration(path, state) {
      path.node.kind = 'var'}});// Put the processed AST into generate
const transformedCode = generate(ast).code
console.log(transformedCode)
Copy the code

Let’s take a look at the effect:

Execution completed, successful, is what we want ~

How to develop plug-ins

From the above classic case, we have a general understanding of the use of Babel, but how to write our ordinary plug-in?

In fact, the basic idea of plug-in development is the same as above, but as a plug-in we only need to focus on this phase of transformation

Our plug-in needs to export a function/object, and if it is a function it needs to return an object. We just need to do the same thing in the visitor that changes the object, and the function takes several arguments. The API inherits a series of methods provided by Babel. Dirname indicates the file path during the processing period.

The above cases are transformed into the following:

module.exports = {
	visitor: {
    	VariableDeclaration(path, state) {
          path.node.kind = 'var'}}}// Or function form
module.exports = (api, options, dirname) = > {
	return {
		visitor: {
          VariableDeclaration(path, state) {
            path.node.kind = 'var'}}}}Copy the code

Plug-in to write

With the prior knowledge, let’s go through the process of developing a Babel plug-in. First of all, we define the core requirements of the plug-in to be developed:

  • A function can be automatically inserted and called.
  • Automatically import the dependencies of the insert function.
  • You can comment out the functions to be inserted and the functions to be inserted, and if not, the default insertion position is in the first column.

The basic effect is shown as follows:

Before processing

// log declares the methods that need to be inserted and called
// @inject:log
function fn() {
	console.log(1)
	// Specify insert rows with @inject:code
	// @inject:code
	console.log(2)}Copy the code

After processing

// After importing package XXX, provide configuration in plug-in parameters
import log from 'xxx'
function fn() {
	console.log(1)
	log()
	console.log(2)}Copy the code

Train of thought to sort out

Now that we know the general requirements, we’re not going to rush to start, we’re going to think about how we’re going to start, and we’re going to think about the problems we’re going to deal with in the process.

  1. Find the function with the @inject tag and check whether it has the @Inject :code location tag inside.
  2. Import the corresponding packages for all insert functions.
  3. Once the tag is matched, all we need to do is insert the function, and we also need to deal with various functions, such as object methods, iife, arrow functions, and so on.

Design plug-in parameters

In order to improve the flexibility of the plug-in, we need to design a suitable parameter rule. The plug-in argument takes an object.

  • Key is the name of the insert function.

  • Kind indicates the import form. There are three import modes named, default, and namespaced. For this design, refer to babel-helper-module-imports

    • Named correspondingimport { a } from "b"In the form of
    • The default correspondingimport a from "b"In the form of
    • Namespaced correspondingimport * as a from "b"In the form of
  • Require is the name of the dependent package

For example, if I need to insert the log method, which needs to be imported from the log4js package in named form, the arguments will look like this.

// babel.config.js module.exports = {plugins: [// plugins: plugins] // babel.config.js module.exports = {plugins: [// plugins: ['./babel-plugin-myplugin.js', {log: {// in this case, require: 'log4js'}}]]}Copy the code

start

Ok, now that we know exactly what to do and have the rules of the parameters designed, we are ready to get started.

First we enter astexplorer.net/ will be processed code generated AST to facilitate us to comb the structure, and then we carry out specific coding

Let’s start with the function declaration, and let’s look at the AST structure and how to deal with it. Let’s take a look at the demo. Okay

// @inject:log
function fn() {
	console.log('fn')}Copy the code

The generated AST structure is as follows, and you can see that there are two key attributes:

  • LeadingComments represents the comments ahead, and you can see that there is an inner element that we wrote in the demo@inject:log
  • The body is the concrete content of the function body, as the demo wroteconsole.log('fn')It’s in there right now, and that’s what we’re going to do when we insert code later

Ok,, now that we know whether a function needs to be inserted by leadingComments, the body operation will fulfill our code insertion requirements.

First we need to find the FunctionDeclaration layer, because that’s the only layer that has the leadingComments property, and then we need to walk through it to match the functions that need to be inserted. Insert the matching function into the body, but we need to pay attention to the level of the body that can be inserted. The body inside the FunctionDeclaration is not an array but a BlockStatement, which represents the body of the function. It also has a body, so the actual location is inside the body of the BlockStatement

The code is as follows:

module.exports = (api, options, dirname) = > {

  return {
    visitor: {
	  // The matching function declares the node
      FunctionDeclaration(path, state) {
        // path.get('body') equals path.node.body
        const pathBody = path.get('body')
        if(path.node.leadingComments) {
          // Filter out all annotations matching @inject: XXX characters
          const leadingComments = path.node.leadingComments.filter(comment= > /\@inject:(\w+)/.test(comment.value) )
          leadingComments.forEach(comment= > {
            const injectTypeMatchRes = comment.value.match(/\@inject:(\w+)/)
            // The match is successful
            if( injectTypeMatchRes ) {
              // The first matching result is the XXX in @inject: XXX, which we extract
              const injectType = injectTypeMatchRes[1]
              // Get the key of the plug-in parameter to see if XXX is declared in the plug-in parameter
              const sourceModuleList = Object.keys(options)
              if( sourceModuleList.includes(injectType) ) {
                // Search for @code: XXX annotation inside body
                Since comment is not directly accessible, we need to access the leadingComments property of each AST node within the body
                const codeIndex = pathBody.node.body.findIndex(block= > block.leadingComments && block.leadingComments.some(comment= > new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
                // The default insertion position is the first line
                if( codeIndex === -1 ) {
                  // Operate on the body of 'BlockStatement'
      			pathBody.node.body.unshift(api.template.statement(`${state.options[injectType].identifierName}(a) `) ()); }else {
                  pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${state.options[injectType].identifierName}(a) `) ()); }}}})}}}})Copy the code

And when we’re done, we look at the results,logIt was successfully inserted because we didn’t use it@code:logSo it’s inserted on the first line by default

Then we try using the @code:log identifier, and we change the demo code to the following

// @inject:log
function fn() {
	console.log('fn')
	// @code:log
}
Copy the code

Run the code again and look at the results@code:logThe position was inserted successfully

After dealing with our first case function declaration, at this time someone may ask, what do you do with arrow functions that have no function body, such as:

// @inject:log() = >true
Copy the code

Is that a problem? No problem!

There’s no body so let’s just give it a body. How do we do that?

The outermost layer of the AST is an ExpressionStatement. The ArrowFunctionExpression is the ArrowFunctionExpression, which is very different from the structure generated by the previous function declaration. In fact, we don’t need to be confused by the multi-layer structure, we just need to find the information that is useful to us. We’ll look for any layer that has leadingComments. The leadingComments here is on the ExpressionStatement, so we’ll just look for it

After analyzing the structure, how do you determine if there is a function body? Remember the BlockStatement we saw in the body of the function declaration above, whereas you saw BooleanLiteral in the body of our arrow function. We can use Babel’s path.isBlockStatement() method to determine whether a function has a body.

module.exports = (api, options, dirname) = > {

  return {
    visitor: {
      ExpressionStatement(path, state) {
        // Access ArrowFunctionExpression
        const expression = path.get('expression')
        const pathBody = expression.get('body')
        if(path.node.leadingComments) {
          // Matches whether comment contains @inject: XXX character
          const leadingComments = path.node.leadingComments.filter(comment= > /\@inject:(\w+)/.test(comment.value) )
          
          leadingComments.forEach(comment= > {
            const injectTypeMatchRes = comment.value.match(/\@inject:(\w+)/)
            // The match is successful
            if( injectTypeMatchRes ) {
              // The first matching result is the XXX in @inject: XXX, which we extract
              const injectType = injectTypeMatchRes[1]
              // Get the key of the plug-in parameter to see if XXX is declared in the plug-in parameter


              const sourceModuleList = Object.keys(options)
              if( sourceModuleList.includes(injectType) ) {
                // Check if there is a function body
                if (pathBody.isBlockStatement()) {
                  // Search for @code: XXX annotation inside body
                  Since comment is not directly accessible, we need to access the leadingComments property of each AST node within the body
                  const codeIndex = pathBody.node.body.findIndex(block= > block.leadingComments && block.leadingComments.some(comment= > new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
                  // The default insertion position is the first line
                  if( codeIndex === -1 ) {
                    pathBody.node.body.unshift(api.template.statement(`${injectType}(a) `) ()); }else {
                    pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${injectType}(a) `)());
                  }
                }else {
                  // No function body
                  // Use the '@babel/template' API provided by the AST to generate the AST with code snippets
                  const ast = api.template.statement(` {${injectType}(a); return BODY; } `) ({BODY: pathBody.node});
				 // Replace the original bodypathBody.replaceWith(ast); }}}})}}}}}Copy the code

You can see that the logic is basically the same as before except for the new function body judgment, the generated function body insertion code and the replacement of the original node with the new AST.

Usage of the @babel/template API used to generate the AST can be found in the documentation @babel/template

The functions in different situations are generally the same, and the conclusion is:

Analysis of AST to findleadingCommentsNode -> Find the node where the body is insertable -> Write the insert logic

The actual processing of the situation there are many, such as: object attributes, IIFE, function expressions and so on, processing ideas are the same, but here repeated elaboration. I’ll post the full plug-in code at the bottom of the article.

Automatically is introduced into

The first one is complete, so the second requirement, how can the package we use be automatically imported, such as log4js in the above example, then our processed code should be automatically added:

import { log } from 'log4js'
Copy the code

At this point, we can think about, we need to deal with the following two cases

  1. The log has already been imported
  2. The log variable name is already in use

For question 1, we need to check if log4js has been imported, and import log in named form. For question 2, we need to give log a unique alias, and make sure that this alias is used in subsequent code inserts. So this requires us to handle the auto-imported logic at the beginning of the file.

So we have the general idea, but how do we do automatic import logic ahead of time. With that in mind, let’s look at the structure of AST. As you can see, the outermost layer of the AST is the File node, which has a comments property that contains all the comments in the current File. With this property, we can parse out the functions that need to be inserted in the File and import them in advance. Let’s look further down, inside is a Program, which we will access first, because it will be called before any other type of node, so we will implement automatic import logic at this stage.

Fact: Babel provides path.traverse methods that can be used to visit children of the current node synchronously.

As shown in figure:

The code is as follows:

const importModule = require('@babel/helper-module-imports');

/ /...
{
    visitor: {
      Program(path, state) {
        // Copy a copy of options to state. The original options cannot be operated
        state.options = JSON.parse(JSON.stringify(options))

        path.traverse({
          // First access the original import node to check whether the log has been imported
          ImportDeclaration (curPath) {
            const requirePath = curPath.get('source').node.value;
            / / traverse the options
            Object.keys(state.options).forEach(key= > {
              const option = state.options[key]
              // Determine the same package
              if( option.require === requirePath ) {
                const specifiers = curPath.get('specifiers')
                specifiers.forEach(specifier= > {

                  // If the default type is imported
                  if( option.kind === 'default' ) {
                    // Determine the import type
                    if( specifier.isImportDefaultSpecifier() ) {
                      // Find the introduction of the existing default type
                      if( specifier.node.imported.name === key ) {
                        // Attach to identifierName for subsequent calls
                        option.identifierName = specifier.get('local').toString()
                      }
                    }
                  }

                    // Import in named form
                  if( option.kind === 'named' ) {
                    // 
                    if( specifier.isImportSpecifier() ) {
                      // Find the introduction of the existing default type
                      if( specifier.node.imported.name === key ) {
                        option.identifierName = specifier.get('local').toString()
                      }
                    }
                  }
                })
              }
            })
          }
        });


        // Process packages that have not been imported
        Object.keys(state.options).forEach(key= > {
          const option = state.options[key]
          // require and identifierName is not found
          if( option.require && ! option.identifierName ) {/ / the default form
            if( option.kind === 'default' ) {
              // Add default import
              // Generate a random variable name, roughly _log2
              option.identifierName = importModule.addDefault(path, option.require, {
                nameHint: path.scope.generateUid(key)
              }).name;
            }

            / / named form
            if( option.kind === 'named' ) {
              option.identifierName = importModule.addNamed(path, key, option.require, {
                nameHint: path.scope.generateUid(key)
              }).name
            }
          }

          // If you do not pass require, it will be considered a global method and will not be imported
          if( !option.require ) {
            option.identifierName = key
          }
        })
    }
  }
}
Copy the code

In the Program node, we first copied the received plug-in configuration options and attached it to state. As mentioned before, state can be used as data transfer between AST nodes. If log4JS has been imported, it will be recorded in the identifierName field. If log4js has been imported, it will be recorded in the identifierName field. The identifierName field determines whether it has been imported. If not, import is created using @babel/helper-module-imports and a unique variable name is created using the generateUid method provided by Babel.

Therefore, we also need to adjust the previous code slightly. Instead of using the method name extracted from the annotation @inject: XXX, we should use identifierName. The key part of the code is modified as follows:

if( sourceModuleList.includes(injectType) ) {
  // Check if there is a function body
  if (pathBody.isBlockStatement()) {
    // Search for @code: XXX annotation inside body
    Since comment is not directly accessible, we need to access the leadingComments property of each AST node within the body
    const codeIndex = pathBody.node.body.findIndex(block= > block.leadingComments && block.leadingComments.some(comment= > new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
    // The default insertion position is the first line
    if( codeIndex === -1 ) {
      / / use identifierName
      pathBody.node.body.unshift(api.template.statement(`${state.options[injectType].identifierName}(a) `) ()); }else {
      / / use identifierName
      pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${state.options[injectType].identifierName}(a) `)());
    }
  }else {
    // No function body
    // Use the '@babel/template' API provided by the AST to generate the AST with code snippets

    / / use identifierName
    const ast = api.template.statement(` {${state.options[injectType].identifierName}(a); return BODY; } `) ({BODY: pathBody.node});
    // Replace the original bodypathBody.replaceWith(ast); }}Copy the code

The final effect is as follows:

We implemented function auto-insert and auto-import dependency packages.

At the end

This article is a summary of my own record after learning the booklet “How to complete the Babel plug-in”. I started to be like most students who want to write Babel plug-in but have no idea how to do it, so this article is mainly based on the ideas I explored when writing the plug-in. I hope I can also provide you with an idea.

The full version now supports the insertion of custom snippets. The full code has been uploaded to Github and also published to NPM. Welcome everyone star and Issue.

Giving to star is a favor, not giving to star is an accident, haha.