In front end development, Babel is often used to compile code for frameworks such as React and Vue to support more (older) browsers.

Babel is a JavaScript compiler! This is the official definition of Babel. As a front-end engineer, it was necessary to understand how compilation worked, and fortunately, The open source project “The Super Tiny Compiler” wrote a simple Compiler using JavaScript.

Although the sparrow is small, the five organs are complete, through the study of the project, deepen the understanding of the compilation process, to improve our ability to write a higher quality program!

A, earnings

By understanding the principles of compilation, you will have the following benefits:

  1. Deepen your understanding of the programming language, no matter what it is
  2. It helps to understand how translators like Babel work, tools like ESLint, Prettier, and less work, and to develop plug-ins
  3. More wheels can be built 🐶

2. Overview of compilation process

The specific implementation of the compilation process is mainly divided into three steps:

  1. Code parsing (parse)
  2. Code conversion (Transformation)
  3. Code generation (Code Generation)

With the above three steps, we can convert (compile) our source code into object code, for example, javascript code into Python.

The “The Super Tiny Compiler” project compiles LISP to C as follows:

 *                  LISP(source)             C(target)
 *
 *   2 + 2          (add 2 2)                 add(2.2)
 *   4 - 2          (subtract 4 2)            subtract(4.2)
 *   2 + (4 - 2)    (add 2 (subtract 4 2))    add(2, subtract(4.2))
Copy the code

Second, the conversion process

Based on “The Super Tiny Compiler” project, implement a LISP function into C language function writing form!

LISP CODE: (add2 (subtract 4 2C CODE: add()2, subtract(4.2))
Copy the code

2.1 Parsing (Parse)

The transition from representational to representational is difficult, but the transition from abstract to representational is often easy.

Lexical Analysis and Syntactic Analysis are two key steps in the process of parsing, and parsing is a process from concreteness to abstraction.

2.1.1 Lexical analysis

The process of lexical analysis is to generate a token list describing the semantics of the program by dividing the source code (string) into words.

The principle of word segmentation: read the characters in the source code one by one, and convert the preset key words, strings, numbers, operators and other syntax-related rules defined by LISP language into {type: ‘xx’, value: ‘xx’} with descriptive form

Subtract 4 2 for example, LISP: (add 2 (subtract 4 2)), a lexical analysis results in the following:

[{type: 'paren'.value: '('        },
    { type: 'name'.value: 'add'      },
    { type: 'number'.value: '2'        },
    { type: 'paren'.value: '('        },
    { type: 'name'.value: 'subtract' },
    { type: 'number'.value: '4'        },
    { type: 'number'.value: '2'        },
    { type: 'paren'.value: ') '        },
    { type: 'paren'.value: ') '},]Copy the code

Such a list (tentatively called the Tokens list) describes strings and programming semantics in the source code in order.

2.1.2 Grammar analysis

The list of tokens obtained after lexical analysis can describe the Syntax of LISP, but it is not Abstract. Intuitively, we cannot interpret the meaning of this program, so we need to convert it into an AST (Abstract Syntax Tree).

Why do you want to convert it to AST, AST can better describe the meaning of the source code, description, structure is more general, tokens list just describe the meaning of “symbol”, lexical analysis process can be regarded as the classification process, the syntax analysis process, is the symbol combination, make its have the execution order and enforce rules of grammar, the abstract syntax tree, Because it is more abstract, it can be converted to other forms more efficiently.

The AST obtained by LISP can be better converted to THE AST of C because their AST structures are similar. Manipulating AST is easier than Tokens.

Parse the list of tokens into a tree structure:

{
  type: 'Program'.body: [{type: 'CallExpression'.name: 'add'.params: [{type: 'NumberLiteral'.value: '2'}, {type: 'CallExpression'.name: 'subtract'.params: [{type: 'NumberLiteral'.value: '4'}, {type: 'NumberLiteral'.value: '2',}]}]}]}Copy the code

2.2 Code Transformation

The process of code conversion is to convert the incoming AST structure into the standard AST structure required by C language by adding, deleting, and modifying attributes on the AST.

To implement the conversion, we added a traverser(AST, visitor) function that takes the AST and the visitor rule conversion object from the parse process.

The visitor object can be understood as a conversion rule, and the traverser function performs the conversion according to the rules defined in the VISITOR as it traverses the AST structure, generating a new AST structure that conforms to the C language description standard.

Transform (AST) -> traverser(AST, visitor) to get a new AST structure:

{
  type: 'Program'.body: [{type: 'ExpressionStatement'.expression: {
        type: 'CallExpression'.callee: {
          type: 'Identifier'.name: 'add'
        },
        arguments: [{type: 'NumberLiteral'.value: '2'
          },
          {
            type: 'CallExpression'.callee: {
              type: 'Identifier'.name: 'subtract'
            },
            arguments: [{type: 'NumberLiteral'.value: '4'
            },
            {
              type: 'NumberLiteral'.value: '2'}}}}]}Copy the code

2.3 Code Generate

Finally, the AST that conforms to THE C language standard is parsed and converted into the C language format according to the syntax rules of C language

The codeGenerator(newAst) function is as follows to receive the newly generated AST structure:

function codeGenerator(newAst) {
  switch (newAst.type) {
    case "Program":
      return newAst.body.map(codeGenerator).join("\n");
    case "ExpressionStatement":
      return codeGenerator(newAst.expression) + ";";
    case "CallExpression":
      return (
        codeGenerator(newAst.callee) +
        "(" +
        newAst.arguments.map(codeGenerator).join(",") +
        ")"
      );
    case "Identifier":
      return newAst.name;
    case "NumberLiteral":
      return newAst.value;
    case "StringLiteral":
      return '"' + newAst.value + '"';
    default:
      throw new TypeError(newAst.type); }}Copy the code

At the same time, we also do some formatting of the code, such as adding Spaces and newlines.

Does the formatting plug-in for Prettier in the VScode editor do the same thing?

Four,

It is recommended that you read The open source project “The Super Tiny Compiler” in its entirety, and The code comments written by The author are very detailed. The process of compiling (translation) is basically similar, and there are many excellent projects, such as codeMirror, Babel, Esprima, Acorn, Recast are worth reading the source code, like small friends must check out.

Reference

  • Babel Docs – babeljs. IO/Docs /en/
  • The Super Tiny Complier – github.com/jamiebuilds…