preface

In this article, we will build our own JS interpreter with JS and write JS in JS. This may sound strange, but in doing so we will become more familiar with JS and learn how the JS engine works!

What is Interpreter?

An interpreter is a language evaluator that runs at run time and executes the source code of a program dynamically.


The interpreter parses the source code, generates the AST(Abstract Syntax Tree) from the source code, walks through the AST and evaluates them one by one.

How Interpreter works

  • Vocabulary analysis (Tokenization)
  • Parsing Syntax
  • Evaluation (Evaluating)

Vocabulary analysis (Tokenization)

The process of breaking up source code and organizing it into a meaningful set of words is called lexical analysis (TOKEN).

In English, when we encounter a statement like this:

Javascript is the best language in the world

We subconsciously break sentences into individual words:

+----------------------------------------------------------+
| Javascript | is | the | best | language | in |the |world |
+----------------------------------------------------------+

This is the first stage of analyzing and understanding the sentence.

Lexical analysis is done by lexers that scan code to extract lexical units.

var a = 1;

[
  ("var": "keyword"),
  ("a": "identifier"),
  ("=": "assignment"),
  ("1": "literal"),
  (";": "separator"),
];

After the lexer splits the code into tokens, it passes the tokens to the parser for parsing. Let’s see how the parsing phase works.

Parsing Syntax

The process of converting the tokens generated during the lexical analysis phase into an Abstract Syntax Tree is called Parsing.

In English, JavaScript is the best language broken down into the following words:

+------------------------------------------+
| Javascript | is | the | best | language  |
+------------------------------------------+

This allows us to pick out words and form grammatical structures:

"Javascript": Subject
"is the best language": Predicate
"language": Object

JavaScript is a subject noun in the grammar, the rest is a meaningless sentence called the predicate, and language is the receiver of the action, namely the object. The structure is as follows:

Subject(Noun) -> Predicate -> Object

Syntax parsing is done by the syntax parser, which converts the tokens generated in the previous step into an Abstract Syntax Tree (AST) according to the syntax rules.

{
  type: "Program",
  body: [
    {
      type: "VariableDeclaration",
      declarations: [
        {
          type: "VariableDeclarator",
          id: {
            type: "Identifier",
            name: "sum"
          },
          init: {
            type: "Literal",
            value: 30,
            raw: "30"
          }
        }
      ],
      kind: "var"
    }
  ],
}

Evaluating Stage

The interpreter will traverse the AST and evaluate each node. – Evaluation phase

1 + 2
|
    |
    v
+---+  +---+
| 1 |  | 2 |
+---+  +---+
  \     /
   \   /
    \ /
   +---+
   | + |
   +---+
{
    lhs: 1,
    op: '+'.
    rhs: 2
}

The interpreter parses the AST to get the LHS node, and then collects the operator node +. The + operator indicates that an addition is required, and it must have a second node to add. Then he collected the RHS nodes. It gathered some valuable information and it did the addition and it got the answer,3.

{
  type: "Program",
  body: [
    {
      type: "ExpressionStatement",
      expression: {
        type: "BinaryExpression",
        left: {
          type: "Literal",
          value: 1,
          raw: "1"
        },
        operator: "+",
        right: {
          type: "Literal",
          value: 2,
          raw: "2"
        }
      }
    }
  ],
}

practice

Now that we’ve looked at how the Interpreter works, let’s do some relaxing work and implement a Mini JS Interpreter

Practice to prepare

  • Acorn.js

A tiny, fast JavaScript parser, written completely in JavaScript. A full JavaScript implementation, small and fast JavaScript parser

In this exercise we will use acorn.js, which will help us with lexing, parsing and converting to an abstract syntax tree.

Third-party libraries like Webpack/Rollup/Babel(@babel/parser) also use acorn.js as the base library for their own parser. (On the shoulders of giants!)

  • The Estree Spec

The Mozilla JS Parser API was originally a specification document for the output JavaScript AST from the SpiderMonkey engine created by Mozilla engineers in Firefox. The format described in the document was used to manipulate JavaScript The source code of the common language.

As JavaScript has evolved, more new syntax has been added to help develop the format to keep up with the JavaScript language. The Etree Spec was born as a community standard for those involved in building and using these tools.

Acorn.js parse returns AST objects whose values correspond to the ESTree spec, where we use @types/ ESTree for the type definition.

  • Jest

The so-called enjoyable JavaScript testing… We use it for unit testing.

  • Rollup

Rollup is a JavaScript module wrapper that we use to package to expose modules to the UMD specification.

Project initialization

// Visit.ts creates a Visitor class and provides a method to manipulate the ES node. import * as ESTree from "estree"; class Visitor { visitNode(node: ESTree.Node) { // ... } } export default Visitor;
// Interpreter.ts creates a Interpreter class to run the ES node tree. // Create a Visitor instance and use it to run the ESTree node import Visitor from "./ Visitor "; import * as ESTree from "estree"; class Interpreter { private visitor: Visitor; constructor(visitor: Visitor) { this.visitor = visitor; } interpret(node: ESTree.Node) { this.visitor.visitNode(node); } } export default Interpreter;
// vm. Ts exposes the Run method to the public and gives it to the Interpreter instance after using Acorn code->ast. const acorn = require("acorn"); import Visitor from "./visitor"; import Interpreter from "./interpreter"; const jsInterpreter = new Interpreter(new Visitor()); export function run(code: string) { const root = acorn.parse(code, { ecmaVersion: 8, sourceType: "script", }); return jsInterpreter.interpret(root); }

Practice the first bullet: 1+1=?

We’re going to do one plus one in this video. Let’s start by looking at the 1+1 code conversion AST structure through the AST Explorer.

We can see that there are four node types in this code. Let’s briefly describe them:

Program

The root node represents the entire abstract syntax tree, and the body attribute is an array containing multiple Statement nodes.

interface Program { type: "Program"; sourceType: "script" | "module"; body: Array<Directive | Statement | ModuleDeclaration>; comments? : Array<Comment>; }

ExpressionStatement

Expression statement node, the expression attribute points to an expression node object

interface ExpressionStatement {
  type: "ExpressionStatement";
  expression: Expression;
}

BinaryExpression

Binary operator expression nodes, where left and right represent two expressions to the left and right of the operator, and operator represents one binary operator. For the main purpose of this section, we simply take the type of operator and implement it, then evaluate the left and right values.

interface BinaryExpression {
  type: "BinaryExpression";
  operator: BinaryOperator;
  left: Expression;
  right: Expression;
}

Literal

Literals, not [] or {}, but literals that represent a value semantically, such as 1, “hello”, true, and regular expressions, such as /\d? /.

type Literal = SimpleLiteral | RegExpLiteral; interface SimpleLiteral { type: "Literal"; value: string | boolean | number | null; raw? : string; } interface RegExpLiteral { type: "Literal"; value? : RegExp | null; regex: { pattern: string; flags: string; }; raw? : string; }

Talk less nonsense, open lu!!

// standard/es5.ts import Scope from ".. /scope"; import * as ESTree from "estree"; import { AstPath } from ".. /types/index"; Const es5 = {// The root node is simply iterated through its body property and then accessed. Program(node: ESTree.Program) { node.body.forEach((bodyNode) => this.visitNode(bodyNode)); }, // The expression attribute can be accessed as well. ExpressionStatement(node: ESTree.ExpressionStatement>) { return this.visitNode(node.expression); }, // The literal node is evaluated directly. This is special for regular expression types. Other types simply return value. Literal(node: ESTree.Literal>) { if ((<ESTree.RegExpLiteral>node).regex) { const { pattern, flags } = (<ESTree.RegExpLiteral>node).regex; return new RegExp(pattern, flags); } else return node.value; }, // The Literal operator is used to evaluate two Literal nodes of the left/node, and to return the result. BinaryExpression(node: ESTree.BinaryExpression>) { const leftNode = this.visitNode(node.left); const operator = node.operator; const rightNode = this.visitNode(node.right); return { "+": (l, r) => l + r, "-": (l, r) => l - r, "*": (l, r) => l * r, "/": (l, r) => l / r, "%": (l, r) => l % r, "<": (l, r) => l < r, ">": (l, r) => l > r, "<=": (l, r) => l <= r, ">=": (l, r) => l >= r, "==": (l, r) => l == r, "===": (l, r) => l === r, "! =": (l, r) => l ! = r, "! ==": (l, r) => l ! == r, }[operator](leftNode, rightNode); }}; export default es5;
// visitor.ts import Scope from "./scope"; import * as ESTree from "estree"; import es5 from "./standard/es5"; const VISITOR = { ... es5, }; VisitNode (node: estree.node) {return {visitNode: this.visitNode, visitNode, visitNode... VISITOR, }[node.type](node); } } export default Visitor;

In this way, the ordinary binary operation is done!!

Practice # 2: How do you find variables?

We are all familiar with the concept of scope and scope chains in JavaScript, so I won’t go into detail here

Yes, we need to implement scope to access variables, implement scope chains to search for identifiers.

Before we do that, we implement the Variable class, which implements the access methods to variables.

// variable.ts export enum Kind { var = "var", let = "let", const = "const", } export type KindType = "var" | "let" | "const"; export class Variable { private _value: any; constructor(public kind: Kind, val: any) { this._value = val; } get value() { return this._value; } set value(val: any) { this._value = val; }}
import { Variable, Kind, KindType } from "./variable"; The class Scope {/ / parent Scope private parent: Scope | null; Private targetScope: {[key: string]: any}; constructor(public readonly type, parent? : Scope) { this.parent = parent || null; this.targetScope = new Map(); } private HasDefinition (rawName: String): Boolean {return Boolean(this.search(rawName));} private HasDefinition (rawName: String): Boolean {return Boolean(this.search(rawName)); } public defineVar(rawName: string, value: any) {let scope: scope = this; // If it is not a global scope, it will serve to save the variable. If it is not a function scope, it will serve to save the variable from the global scope. == "function") { scope = scope.parent; } // Stores Scope.targetScope.set (rawName, new Variable(kind.var, value)); } public defineLet(rawName: string, value: any) { this.targetScope.set(rawName, new Variable(Kind.let, value)); } // const public defineConst(rawName: string, value: any) { this.targetScope.set(rawName, new Variable(Kind.const, value)); } public search(rawName: string) public search(rawName: string) Variable | null { if (this.targetScope.get(rawName)) { return this.targetScope.get(rawName); } else if (this.parent) { return this.parent.search(rawName); } else { return null; }} / / variable declaration method, variable defined public will throw syntax error abnormal declare (kind: kind | KindType, rawName: string, the value: any) { if (this.hasDefinition(rawName)) { console.error( `Uncaught SyntaxError: Identifier '${rawName}' has already been declared` ); return true; } return { [Kind.var]: () => this.defineVar(rawName, value), [Kind.let]: () => this.defineLet(rawName, value), [Kind.const]: () => this.defineConst(rawName, value), }[kind](); } } export default Scope;

This is the basic implementation of variable objects, scopes, and scope chains, and then we can define and access variables.

Var age = 18

From the syntax tree we can see three unfamiliar node types and see what they mean:

VariableDeclaration

Variable declarations, the kind attribute indicates what type of declaration is, because ES6 introduces const/let. Declarations: let a = 1, b = 2; let a = 1, b = 2; .

interface VariableDeclaration {
  type: "VariableDeclaration";
  declarations: Array<VariableDeclarator>;
  kind: "var" | "let" | "const";
}

VariableDeclarator

A description of a variable declaration, where id represents the variable name node and init represents an expression of the initial value, which can be null.

interface VariableDeclarator { type: "VariableDeclarator"; id: Pattern; init? : Expression | null; }

Identifier

As the name implies, the identifier node, the variable name, the function name, the property name that we define when we write JS, are all classified as identifiers.

interface Identifier {
  type: "Identifier";
  name: string;
}

After understanding the meaning of the corresponding node, let’s implement:

// standard/es5.ts import Scope from ".. /scope"; import * as ESTree from "estree"; type AstPath<T> = { node: T; scope: Scope; }; const es5 = { // ... // The scope scope parameter variableDeclaration (astPath: AstPath<ESTree.VariableDeclaration>) { const { node, scope } = astPath; const { declarations, kind } = node; Let a = 1, b = 2; // Let a = 1, b = 2; , so we traverse it here: / / here traversal of each item is VariableDeclarator node declarations. The forEach ((declar) = > {const {id, init } = <ESTree.VariableDeclarator>declar; // Age const key = (< estree.identifier >id).name; // Age const key = (< estree.identifier >id).name; // Check if the variable is initialized. Find the init node value (Literal return value :18) : set as undefined; const value = init ? this.visitNode(init, scope) : undefined; Var age = 18 scope.declare(kind, key, value); var age = 18 scope.declare(kind, key, value); var age = 18 scope.declare(kind, key, value); }); }, // Identifies the node point. We simply access the value by accessing the scope. Identifier(astPath: AstPath<ESTree.Identifier>) { const { node, scope } = astPath; const name = node.name; // Walk Identifier // This example is the age variable const variable = scope.search(name); // return age = 18 if (variable) return variable.value; }}; export default es5;

Exports = 6. Module. Exports = 6

Let’s first look at module.exports = 6 and the corresponding AST.

From the syntax tree, let’s look at two unfamiliar node types and see what they mean:

AssignmentExpression

The assignment expression node. The operator attribute represents an assignment operator. Left and right are the expressions to the left and right of the assignment operator.

interface AssignmentExpression {
  type: "AssignmentExpression";
  operator: AssignmentOperator;
  left: Pattern | MemberExpression;
  right: Expression;
}

MemberExpression

A member expression node is a statement that refers to a member of an object,object is an expression node that refers to an object,property is a property name, and computed is false. The property should be an Identifier node, or [] if computed is true. That is, the property is an Expression node whose name is the result value of the Expression.

interface MemberExpression {
  type: "MemberExpression";
  object: Expression | Super;
  property: Expression;
  computed: boolean;
  optional: boolean;
}

Let’s start by defining the module.exports variable.

import Scope from "./scope"; import Visitor from "./visitor"; import * as ESTree from "estree"; class Interpreter { private scope: Scope; private visitor: Visitor; constructor(visitor: Visitor) { this.visitor = visitor; } interpret(node: ESTree.Node) { this.createScope(); this.visitor.visitNode(node, this.scope); return this.exportResult(); } createScope() {this.scope = new Scope("root"); this.scope = new Scope("root"); // exports const $exports = {}; const $module = { exports: $exports }; this.scope.defineConst("module", $module); this.scope.defineVar("exports", $exports); } // ExportResult () {// Find the module const ModuleExport = this.scope.search("module"); // exports module.exports return moduleExport? moduleExport.value.exports : null; } } export default Interpreter;

OK, now let’s implement the above node function ~

// standard/es5.ts import Scope from ".. /scope"; import * as ESTree from "estree"; type AstPath<T> = { node: T; scope: Scope; }; const es5 = { // ... MemberExpression(astPath, scope, scope, scope, scope, scope, scope, scope) AstPath<ESTree.MemberExpression>) { const { node, scope } = astPath; const { object, property, computed } = node; // If computed is false, the property should be an Identifier node. If computed is true, the property is an Expression node So what we've got here is exports, which is const prop = computed? this.visitNode(property, scope) : (<ESTree.Identifier>property).name; Const obj = this.visitNode(object, scope); // access module.exports return obj[prop]; }, // Assign Expression Node (astPath: astPath < estree.>) {const {node, scope} = astPath; const { left, operator, right } = node; let assignVar; // Lhs if (left.type === "Identifier") {// Lhs if (left.type === "Identifier") {const value = scope.search(left.name); assignVar = value; } else if (left. Type === "memberExpression ") {const {object, property;} else if (left. Type ===" memberExpression ") {const {object, property; computed } = left; const obj = this.visitNode(object, scope); const key = computed ? this.visitNode(property, scope) : (<ESTree.Identifier>property).name; assignVar = { get value() { return obj[key]; }, set value(v) { obj[key] = v; }}; } // RHS // The left node is assigned to the right node. The left node is assigned to the left node. return { "=": (v) => { assignVar.value = v; return v; }, "+=": (v) => { const value = assignVar.value; assignVar.value = v + value; return assignVar.value; }, "-=": (v) => { const value = assignVar.value; assignVar.value = value - v; return assignVar.value; }, "*=": (v) => { const value = assignVar.value; assignVar.value = v * value; return assignVar.value; }, "/=": (v) => { const value = assignVar.value; assignVar.value = value / v; return assignVar.value; }, "%=": (v) => { const value = assignVar.value; assignVar.value = value % v; return assignVar.value; }, }[operator](this.visitNode(right, scope)); }}; export default es5;

OK, implementation done, it’s time to verify a wave, go to the JEST method.

// __test__/es5.test.ts import { run } from ".. /src/vm"; describe("giao-js es5", () => { test("assign", () => { expect( run(` module.exports = 6; `) ).toBe(6); }); }

Practice number five: for loop

var result = 0;
for (var i = 0; i < 5; i++) {
  result += 2;
}
module.exports = result;

In fact, different syntax corresponds to different tree nodes. We just need to implement the corresponding node functions. Let’s take a look at the meaning of these strange nodes.

ForStatement

The init/test/update attributes represent the three expressions in parentheses of the for statement, the initialization value, the condition of the loop, and the variable update statement (init can be either a variable declaration or an expression) that is executed each time through the loop. All three properties can be null, i.e., for(;;) {}. The body attribute represents the statement to loop through.

interface ForStatement { type: "ForStatement"; init? : VariableDeclaration | Expression | null; test? : Expression | null; update? : Expression | null; body: Statement; }

UpdateExpression

The UPDATE expression node, ++/–, is similar to the unary operator, except that the operator refers to a different type of node object. In this case, the UPDATE operator.

interface UpdateExpression {
  type: "UpdateExpression";
  operator: UpdateOperator;
  argument: Expression;
  prefix: boolean;
}

BlockStatement

Block statement nodes, for example: if (…) {// This is the contents of a block statement}. A block can contain several other statements, so the body property is an array that represents the number of statements in the block.

interface BlockStatement { 0; type: "BlockStatement"; body: Array<Statement>; innerComments? : Array<Comment>; }

Talk less nonsense, dish it!!

// standard/es5.ts import Scope from ".. /scope"; import * as ESTree from "estree"; type AstPath<T> = { node: T; scope: Scope; }; const es5 = { // ... // for loop node forStatement (astPath: astPath < estree.forStatement >) {const {node, scope} = astPath; const { init, test, update, body } = node; // In the previous Scope class implementation, the var declaration will be promoted in the block Scope,const/let will not be const forScope = new Scope("block", Scope); For (// Initialize value // VariableDeclaration init? this.visitNode(init, forScope) : null; // BinaryExpression is an expression that has been implemented before. this.visitNode(test, forScope) : true; // UpdateExpression update? this.visitNode(update, forScope) : null ) { // BlockStatement this.visitNode(body, forScope); }}, // The update expression node, ++/--, is similar to the unary operator except that the operator refers to a different type of node object. In this case, the update operator. UpdateExpression(astPath: AstPath<ESTree.UpdateExpression>) { const { node, scope } = astPath; // The update operator, with a value of ++ or --, is represented by the prefix property of the update expression node. const { prefix, argument, operator } = node; let updateVar; // For (var query={count:0}; query.count < 8; If (argument.type === "Identifier") {const value = scope.search(argument.name); if (argument.type === "Identifier") { updateVar = value; } else if (argument.type === "memberExpression ") {const {object, property;} else if (argument.type ===" memberExpression ") {const {object, property; computed } = argument; const obj = this.visitNode(object, scope); const key = computed ? this.visitNode(property, scope) : (<ESTree.Identifier>property).name; updateVar = { get value() { return obj[key]; }, set value(v) { obj[key] = v; }}; } return { "++": (v) => { const result = v.value; v.value = result + 1; // preifx? ++i: i++; return prefix ? v.value : result; }, "--": (v) => { const result = v.value; v.value = result - 1; // preifx? --i: i--; return prefix ? v.value : result; }, }[operator](updateVar); }, // Block statement node // Block statement implementation is simple, simulate the creation of a block scope and then traverse the body property for access. BlockStatement(astPath: AstPath<ESTree.BlockStatement>) { const { node, scope } = astPath; const blockScope = new Scope("block", scope); const { body } = node; body.forEach((bodyNode) => { this.visitNode(bodyNode, blockScope); }); }}; export default es5;

On the JEST method to verify a ha ~

test("test for loop", () => {
  expect(
    run(`
      var result = 0;
      for (var i = 0; i < 5; i++) {
        result += 2;
      }
      module.exports = result;
    `)
  ).toBe(10);
});

Did you think that was the end of it? Can you think of anything left undone? What about the break statement of the for loop?

var result = 0;
for (var i = 0; i < 5; i++) {
  result += 2;
  break; // break,continue,return
}
module.exports = result;

Interested partners can try their own hands, or stamp the source address

conclusion

Giao-js currently implements only a few grammars, so this article is just an idea.

Those of you who are interested can look at the full code.

Feel helpful to you, click star to support the author ❤️ ~

reference

bramblex/jsjs

Use Acorn to parse JavaScript

Build a JS Interpreter in JavaScript Using Acorn as a Parser