primers

When I read The Babel document, I came into contact with The Super Tiny Compiler. The annotations in it seem to explain quite easily, so let me translate and record.

Why care about compilers

Most people don’t really have to think about compilers in their daily work, and it’s normal not to care about compilers. However, compilers are pretty common around you, and many of the tools you use are borrowed from the concept of compilers.

Compilers are scary

Compilers are scary. But it’s our own fault (those of us who write compilers) that we’ve taken away the simple and reasonable and made it so complex and scary that most people think it’s completely inaccessible and only nerds can understand it.

Where should I start?

Start by writing the simplest compiler you can. The compiler is very small, only 200 lines of code if you remove all the comments.

We are going to write a compiler that converts some LISP method calls into C method calls.

If you are not familiar with the language, I will introduce it briefly.

If we had two methods Add and subtract, they would be written like this:

example LISP C
2 + 2 (add 2 2) add(2, 2)
4-2 (subtract 4 2) subtract(4, 2)
2 plus 4 minus 2. (add 2 (subtract 4 2)) add(2, subtract(4, 2))

Easy, right? Good, that’s what we’re going to compile. While this is not complete LISP or C syntax, it is syntax enough to illustrate the main parts of most modern compilers.

Three stages of compilation

Most compilation can be divided into three main stages: Parsing, Transformation, and Code Generation.

  • Parsing: Take each line of code and turn it into more abstract code.
  • Transformation: Takes abstract code and does what the compiler intends.
  • Code generation: Takes transformed code and transforms it into new code.

Parsing

Lexical Analysis is usually divided into two stages: Lexical Analysis and Syntactic Analysis.

  • Lexical Analysis: Take the code and break it into individual tokens using a tokenizer. Tokens are an array of objects that describe each part of the syntax. They might be numbers, text, punctuation, operators, and so on.
  • Parsing: Taking those notations and translating them into another representation that describes its own grammar and relationships. This is called an intermediate representation or Abstract Syntax Tree or AST. An abstract syntax tree is a deeply nested object that represents a way for code to run, and it gives us a lot of information.

For example, the following syntax:

(add 2 (subtract 4 2))
Copy the code

The notation might look like this:

[{type: 'paren'.value: '('        },
  { type: 'name'.value: 'add'      },
  { type: 'number'.value: '2'        },
  { type: 'paren'.value: '('        },
  { type: 'name'.value: 'subtract' },
  { type: 'number'.value: '4'        },
  { type: 'number'.value: '2'        },
  { type: 'paren'.value: ') '        },
  { type: 'paren'.value: ') '},]Copy the code

An abstract syntax tree might look like this:

{
  type: 'Program'.body: [{
    type: 'CallExpression'.name: 'add'.params: [{
      type: 'NumberLiteral'.value: '2'}, {type: 'CallExpression'.name: 'subtract'.params: [{
        type: 'NumberLiteral'.value: '4'}, {type: 'NumberLiteral'.value: '2',}]}]}]}Copy the code

Transformation

The next stage of compilation is transformation. This step simply takes the AST from the previous step and changes it again. You can manipulate the AST in the same language, or you can convert the AST into a completely new language.

Let’s see how to convert an AST.

You may notice some similarities in our AST. There are several objects that have the Type attribute, and each such object is called an AST Node. These nodes define the attributes of each individual part of the tree.

We have a NumberLiteral node:

{
  type: 'NumberLiteral'.value: '2',}Copy the code

Or perhaps a CallExpression node:

{
  type: 'CallExpression',
  name: 'subtract',
  params: [...nested nodes go here...],
}
Copy the code

When transforming an AST, we can add/remove/replace the attributes of a node. We can add new nodes, remove nodes, or create a completely new AST based on an existing AST.

Because we are targeting a new language, we will create a completely new AST for the new language.

Traversal In order to be able to find all nodes we need to traverse them. This traversal reaches every node of the AST.

{
  type: 'Program'.body: [{
    type: 'CallExpression'.name: 'add'.params: [{
      type: 'NumberLiteral'.value: '2'
    }, {
      type: 'CallExpression'.name: 'subtract'.params: [{
        type: 'NumberLiteral'.value: '4'
      }, {
        type: 'NumberLiteral'.value: '2'}}}}]]]Copy the code

For the AST above, we’ll iterate like this:

  1. Program – Starts at the top level of the AST
  2. CallExpression (add) – Moves to the first element inside the body of the Program
  3. NumberLiteral (2) – Moves the first element of params to CallExpression(add)
  4. CallExpression(Subtract) – Move to the second element of the Params with CallExpression(add)
  5. NumberLiteral (4) – Move to CallExpression(subtract) the first element of params
  6. NumberLiteral (2) – Move to CallExpression(Subtract) the second element of params

If you were to work directly with the AST, various abstractions might be introduced here. But what we’re trying to do, it’s enough to access every node in the tree.

The basic idea here is to create a “visitor” object that has methods that can accept different types of nodes.

var visitor = {
  NumberLiteral() {},
  CallExpression(){}};Copy the code

As we traverse the AST, we simply “enter” to a matching type node, and we call the method of that visitor.

To make this idea work, we’ll pass in a reference to a node and its parent.

var visitor = {
  NumberLiteral(node, parent) {},
  CallExpression(node, parent){}};Copy the code

Then there is the possibility of “exit.” Imagine a tree structure like ours:

- Program
  - CallExpression
    - NumberLiteral
    - CallExpression
      - NumberLiteral
      - NumberLiteral
Copy the code

When we go through it, we end up at a dead end. So when we finish walking through each branch of the tree, we “exit.” So we walk down the tree, “enter” to the tree node, and when we return, we “exit.”

-> Program (enter)
  -> CallExpression (enter)
    -> Number Literal (enter)
    <- Number Literal (exit)
    -> Call Expression (enter)
      -> Number Literal (enter)
      <- Number Literal (exit)
      -> Number Literal (enter)
      <- Number Literal (exit)
    <- CallExpression (exit)
  <- CallExpression (exit)
<- Program (exit)
Copy the code

To support this functionality, our visitor will end up looking like this:

var visitor = {
  NumberLiteral: {
    enter(node, parent) {},
    exit(node, parent){},}};Copy the code

Code Generation

The final stage of compilation is code generation. Sometimes the compiler does something that overlaps with the conversion at this stage, but most of the code generation just means taking the AST and converting it to string code.

Some compilers reuse tokens. Some compilers create a single code representation so they can print nodes linearly. But from what I’ve seen, most of them use the AST that we just created, and that’s what we’re going to be looking at.

Our code generator will effectively know how to “print” all the different node types of the AST, and it will recursively call itself to print nested nodes until everything is printed as a long string of code.

In this way! These are all the different parts of the compiler. Not every compiler looks like what I’ve described here. Compilers are used for different purposes, and they may require more steps than I have described. But now you should have a better idea of what most compilers look like.

Now, I’ve explained so much that you should be able to write your own compiler pretty well, right? Just kidding. That’s what I’m here to help. So let’s get started!

Example compiler

See the compiler – js.

The resources

  • the-super-tiny-compiler
  • Let’s Build A Simple Interpreter
wastebasket

I haven’t read the manga for a long time. I have accumulated a lot of words.

After watching it, I found that it developed faster than expected, and should be enough to make the second season of TV version of the content.