How to compile: Write a JS interpreter from 0

Some time ago, I wrote a JS interpreter without relying on any third party libraries.

So far, most of the basic functions of JS have been written out except for objects. Here we share and review the main implementation and technical details.

Github.com/zuluoaaa/ma…

Parse a quick sorting function

0 initialization

We enter a meaningful JS string

1. Lexical analysis

Iterate through the input loop, interpreting string by string into a meaningful data structure (represented here by token)

var a = 1
Copy the code

The above line of code is parsed into the following four tokens

[{token:"var"}, {token:"indent".val:"a"}, {token:"assign"}, {token:"number".val:1}]
Copy the code

For the above input, converting the string into a token array is easy, and we can export the value by reading the input string values one by one and skipping the Spaces.

However, this is just the beginning, you may need to deal with some different input, you have to read the value of a section before you can determine what token value is

For example, distinguish between == and =, > and >=…

The solution is also very simple, directly paste implementation code

            case "<":
             next = nextChar();
            if(next === "="){
                token.type = tokenTypes.T_LE;// Determine <=
            }else {
                token.type = tokenTypes.T_LT;// determine < <
                putBack(next);
            }
Copy the code

PS: In this step, we do not care about whether the syntax and semantics are correct. We are only responsible for parsing into tokens. If it’s a number, we parse it into a numeric token. If it’s a letter, we parse it into a variable token. It is then up to subsequent programs to deal with these issues.

2. Grammar Analysis

Converting tokens into an AST (syntax tree) is the process of converting a set of tokens into the syntax structure of the entire link

This step is the focus of the entire interpreter.

Let me give you an example

var a = 1 + 3 * 2;
Copy the code

Corresponding syntax tree

       =
      /  \
     +    a
    / \
    *  1
  /  \
 3    2
Copy the code

The reason for putting variable A on the right, rather than the left, is that we will later perform the AST through pre-order traversal parsing to make it easier to evaluate.

The above is just a simple example. The most complicated part of this step is dealing with the various keywords if,else,function,while…

Processing complex expressions is also costly, with operators such as &&, >,<=,+,-,*,/,? :…

For example

let a = 8+6*32 -*5 > 12*3+ (1+5)?1 : 2;
Copy the code

Parsing the above expression into the corresponding AST, those of you (like me before you) who are not familiar with interpreters and compilers often get stuck in this step, overwhelmed and Mired in self-doubt…

Due to the details of this process is too much, here will only tell you the core architecture, the specific implementation, you can be interested in my project code to look inside

The core implementation of the main method responsible for parsing

function statement(){
    let tree = null,left = null;
    while (true) {let {token}  = gData;
        // Jump to the corresponding parsing function for different keywords
        switch (token.type) {
            case tokenTypes.T_VAR:
                left = varDeclaration();
                break;
            case tokenTypes.T_IF:
                left = ifStatement();
                break;
            case tokenTypes.T_WHILE:
                left = whileStatement();
                break;
            case tokenTypes.T_FUN:
                left = funStatement();
                break;
            case tokenTypes.T_RETURN:
                left = returnStatement();
                break;
            case tokenTypes.T_EOF://EOF means that the entire input string has been executed
                return tree;
            default:
                 left = normalStatement();
        }
        // The loop basically parses one line at a time, where multiple lines are grouped together and the entire input string is assembled into a syntax tree
        if(left ! = =null) {if(tree === null){
                tree = left;
            }else{
                tree = new ASTNode().initTwoNode(ASTNodeTypes.T_GLUE,tree,left,null); }}}}function normalStatement() {
    let tree =  parseExpression(0);// Perform expression parsing to get the syntax tree
    semicolon();// check the comma
    returntree; }...Copy the code

Above is the parsing of the syntax, below is the most core expression parsing (e.g., parsing the arithmetic expression 1+3*(6+1))

First, define a set of prefix resolution map and a set of infix resolution map, automatically according to the type of the corresponding resolution method, so that we have any new symbols to resolve, we can directly add to the inside of the function without changing the implementation


const prefixParserMap = {
    [tokenTypes.T_IDENT]:identifier,/ / variable
    [tokenTypes.T_INT]:int,
    [tokenTypes.T_STRING]:str,
    [tokenTypes.T_LPT]:group,/ / brackets
    [tokenTypes.T_LMBR]:array,/ / brackets
    [tokenTypes.T_ADD]:prefix.bind(null,tokenTypes.T_ADD),
    [tokenTypes.T_SUB]:prefix.bind(null,tokenTypes.T_SUB),
};

const infixParserMap = {
    [tokenTypes.T_LPT]:{parser:funCall,precedence:precedenceList.call},
    [tokenTypes.T_QST]:{parser:condition,precedence:precedenceList.condition},// A ternary expression

    [tokenTypes.T_ASSIGN]:{parser:assign,precedence:precedenceList.assign},/ / = assignment operator

    [tokenTypes.T_AND]:{parser:infix.bind(null,precedenceList.and),precedence:precedenceList.and},
    [tokenTypes.T_OR]:{parser:infix.bind(null,precedenceList.and),precedence:precedenceList.and},
    [tokenTypes.T_ADD]:{parser:infix.bind(null,precedenceList.sum),precedence:precedenceList.sum},
    [tokenTypes.T_SUB]:{parser:infix.bind(null,precedenceList.sum),precedence:precedenceList.sum},

    [tokenTypes.T_GT]:{parser:infix.bind(null,precedenceList.compare),precedence:precedenceList.compare},
    [tokenTypes.T_GE]:{parser:infix.bind(null,precedenceList.compare),precedence:precedenceList.compare},
    ...
};

Copy the code

Expression analysis core implementation, using Pratt analysis (is also a recursive descent analysis method

function parseExpression(precedenceValue) {
    let {token} = gData;

    // Get the prefix resolution function corresponding to the current token
    let prefixParser = prefixParserMap[token.type];

    if(! prefixParser){ errPrint(`unknown token : ${token.value}(${token.type}) `)}let left = prefixParser();// Execute the parse function
    scan();
    if(token.type === tokenTypes.T_SEMI
        || token.type === tokenTypes.T_RPT
        || token.type === tokenTypes.T_EOF
        || token.type === tokenTypes.T_COMMA
        || token.type === tokenTypes.T_COL
        || token.type === tokenTypes.T_RMBR
    ){
        return left;
    }
    let value = getPrecedence();// Gets the priority of the current operator
    while (value>precedenceValue){
// If the current operator has a higher priority than the previous one, continue parsing
// For example, 1+6*7, * is obviously higher than +, so we parse 6*7 first and then go back to the first
        let type = token.type;
        if(token.type === tokenTypes.T_SEMI
            || token.type === tokenTypes.T_RPT
            || token.type === tokenTypes.T_EOF
            || token.type === tokenTypes.T_COMMA
            || token.type === tokenTypes.T_RMBR
        ){
            return left;
        }
        let infix = infixParserMap[type]; 
        scan();
        left = infix.parser(left,type);

        if(infixParserMap[token.type]){ value = getPrecedence(); }}return left;
}
Copy the code

About pratt analytical method, especially recommend the bosses wrote about introducing journal.stuffwithstuff.com/2011/03/19/ pratt…

3 Run the AST

A simple interpreter takes the AST syntax tree from the previous step and executes it node by node to evaluate it.

function interpretAST(astNode,result=null,scope){... let leftResult,rightResult;if(astNode.left){
        leftResult = interpretAST(astNode.left,null,scope);
    }
    if(astNode.right){ rightResult = interpretAST(astNode.right,leftResult,scope); }... switch (astNode.op) {case ASTNodeTypes.T_VAR:
            scope.add(astNode.value);
            return;
        case ASTNodeTypes.T_INT:
            return astNode.value;
        case ASTNodeTypes.T_STRING:
            return astNode.value;
        case ASTNodeTypes.T_ADD:
            if(rightResult === null || typeof rightResult === "undefined") {return leftResult;
            }
            return leftResult + rightResult;
        case ASTNodeTypes.T_SUB:
            if(rightResult === null || typeof rightResult === "undefined") {return -leftResult;
            }
            return leftResult - rightResult;
        case ASTNodeTypes.T_MUL:
            return leftResult * rightResult;
        case ASTNodeTypes.T_DIV:
            return leftResult / rightResult;
        case ASTNodeTypes.T_ASSIGN:
            return rightResult;
        case ASTNodeTypes.T_IDENT:
            return findVar(astNode.value,scope);
        case ASTNodeTypes.T_GE:
            return  leftResult >= rightResult;
        case ASTNodeTypes.T_GT:
            returnleftResult > rightResult; .Copy the code

The last

End, interested students can go to my github repository to view the full implementation github.com/zuluoaaa/ma…

The writing may not be very good, please point out any mistakes.

How to compile: Write a JS interpreter from 0

0 initialization

1. Lexical analysis

2. Grammar Analysis

3 Run the AST

The last

Related Posts

Vue source code parsing (computed, Watch)

The difference between Web testing and App testing

Simple and easy to understand red black tree principle and implementation (JS)