Brief introduction to V8 script engine working principle (three. V8 core modules: parser, interpreter, optimized compiler)

We know the history of V8, and the core of V8 today is: Parser, Interpreter, Ignition, TurboFan.

1. The origin of V8 scripting engine. 2. Evolution of the V8 scripting engine. 3. V8 core module: parser, interpreter, optimized compiler.

How does V8 execute JavaScript code?

When V8 executes JavaScript source code, the parser first parses the source code into an Abstract Syntax Tree, and the interpreter translates the AST into bytecode, which is interpreted and executed again and again.

During this process, the interpreter keeps track of how many times a particular snippet of code has been run, and if the code has been run more than a certain threshold, the code is marked as hot, and the run information is fed back to the optimized compiler.

The optimization compiler optimizes and compiles the bytecode based on the feedback, and finally generates the optimized machine code. When the code is executed again, it does not need to be interpreted again, which improves efficiency.

This technique of compiling code at run time is called JIT(just-in-time compilation), which can greatly improve the performance of JavaScript code.

Briefly describe what each one does

The parserJavaScript source code is parsed into AST, and the parsing process is divided into lexical analysis and grammar analysis. V8 improves parsing efficiency through pre-parsing;
The interpreter IgnitionBytecode generation and execution based on the AST. This process collects execution feedback and feeds it to TurboFan for optimization.
TurboFanBased on the feedback collected by Ignition, bytecode is compiled to optimized machine code, which Ignition then replaces bytecode execution with optimized machine code to improve performance.

How does a Parser convert source code to an AST?

To get V8 to execute the source code we’ve written, we need to put it in a format V8 can understand. V8 begins by Parsing the source code into an abstract syntax tree (AST), which is an object that represents the tree structure of the source code, a process known as Parsing.

The performance of the parsing and compilation process is so important that V8 cannot run the code until the compilation is complete. The analysis process is shown as follows:

Divided into two parts:

Lexical analysis: willCharacters of the flowConvert to tokens,Characters of the flowThe token is the smallest unit that is syntactically undivisible (either a single character or a string). The Scanner in the figure is V8Lexical analyzer.
Parsing: According to the syntax rules, combine tokens into an AST with a nested hierarchy. In this process, if the source code does not conform to the syntax, parsing is terminated and a syntax error is thrown. Both Parser and pre-Parse are V8 parsers.

Lexical analysis

In V8, the Scanner is responsible for receiving a stream of Unicode characters and parsing them as tokens, which are made available to the parser. Var a = 1; Tokens look like this after lexical analysis:

[{"type": "Keyword"."value": "var"
    },
    {
        "type": "Identifier"."value": "a"
    },
    {
        "type": "Punctuator"."value": "="
    },
    {
        "type": "Numeric"."value": "1"
    },
    {
        "type": "Punctuator"."value": ";"}]Copy the code

Contains five tokens:

The var keyword
Identifier for a
The assignment operator =
Value of 1
The separator;

Syntax analysis

Var a = 1; var A = 1; The JSON structure of the AST generated by this line of code looks like this:

{
  "type": "Program"."start": 0."end": 10."body": [{"type": "VariableDeclaration"."start": 0."end": 10."declarations": [{"type": "VariableDeclarator"."start": 4."end": 9."id": {
            "type": "Identifier"."start": 4."end": 5."name": "a"
          },
          "init": {
            "type": "Literal"."start": 8."end": 9."value": 1."raw": "1"}}]."kind": "var"}]."sourceType": "module"
}
Copy the code

Those interested in generating AST content can check out astexplorer.net/.

However, for a piece of JavaScript source code, there is a problem if all the source code has to be parsed before it can be executed.

Longer code execution time: Parsing all code at once inevitably increases code execution time.
More memory consumption: The AST parsed and the bytecode compiled from the AST are stored in memory, which inevitably takes up more memory.
Disk space: Compiled code is cached on disk and occupies disk space.

As a result, mainstream JavaScript engines now implement Lazy Parsing.

Delay resolution

The idea of delayed parsing: During parsing, only pre-parsing (Pre Parser) is performed for functions that are not immediately executed, and full parsing is performed only when the function is called.

During pre-parsing, only the function syntax is validated, the function declaration is parsed, and the function scope is determined. Instead of generating an AST, the pre-parser implements pre-parsing.

function foo(a, b) {
    var res = a + b;
    return res;
}
var a = 1;
var c = 2;
foo(1.2);
Copy the code

Because the Scanner reads code line by line in byte stream, the V8 parser also parses code from top to bottom. When the V8 Parser encounters a function declaration foo, it finds that it is not executed immediately, so it is pre-parsed with the pre-Parser. Only the function declaration is parsed, not the function internal code, and no AST is generated for the function internal code.

The Ignition interpreter then compiles the AST into bytecode and executes it. The interpreter executes the code in top-down order, starting with var a = 1; And var c = 2; Two assignment expressions, and then a function call to foo(1, 2) is executed, at which point the Parser continues parsing the code inside the function, generating the AST, and turning it over to the Ignition interpreter.

How does the interpreter (Ignition) translate the AST into bytecode and execute it?

In the evolution of V8 architecture, the working principle of V8 scripting engine (II). V8 introduced bytecode to address memory footprint. As shown in the figure, usually a file of tens of KB may be tens of megabytes when converted into machine code, which consumes a lot of memory.

What is bytecode

V8 bytecode is an abstraction of machine code with a syntax somewhat similar to assembly. We can think of V8 bytecode as instructions that are combined to implement the functions we write and designed using the same computing model as the physical CPU. Any function of JavaScript source code can be translated into a combination of bytecode equivalently. Bytecodes generation is actually a tree traversal. V8 defines hundreds of bytecodes, all of which can be viewed in the V8 interpreter header file.

When the interpreter executes bytecode, it mainly uses the general register and accumulator register. Function parameters and local variables are stored in the general register R0, R1, and accumulator register is used to hold intermediate results.

Illustrate the bytecode execution process. We define a function f that takes three parameters. The function evaluates the parameters and returns a value.

// index.js
function f(a, b, c) {
  var d = c - 100;
  return a + d * b;
}
f(5.2.150);
Copy the code

Assuming we call a function with arguments 5, 2, 150, the interpreter compiles the function to bytecode.

You can view the bytecode generated by the JavaScript file by using node –print-bytecode index.js. (Will generate a lot, take the last important paragraph)

$ node --print-bytecode index.js
... 
[generated bytecode for function: f (0x242cd33a35b9 <SharedFunctionInfo f>)]
Parameter count 4
Register count 1
Frame size 8
   32 S> 0x242cd33a3e06@ 0:25 02Ldar a2
   34 E> 0x242cd33a3e08 @    2 : 41 64 00          SubSmi [100], [0]
         0x242cd33a3e0b@ 5:26fb             Star r0
   43 S> 0x242cd33a3e0d@7:25 03Ldar a1
   56 E> 0x242cd33a3e0f@ 9:36fb 02          Mul r052, [2]E> 0x242cd33a3e12@ 12:34 04 01Add a0[1] 60S> 0x242cd33a3e15@ 15:aa                Return
Constant pool (size = 0)
Handler Table (size = 0)
Source Position Table (size = 14)
0x242cd33a3e19 <ByteArray[14] >Copy the code

When the interpreter executes the code, it loads the parameters into registers A0, A1, and A2 (accumulator is the accumulator) and executes the bytecode line by line.

Read parameter c and perform the calculation
Ldar A2: Ldar indicates the operation of loading the value of a register into the accumulator. A2 is the loaded value. After loading, accumulator is 150.

The result of the calculation is placed in the accumulative register.
SubSmi [100], [0] : SubSmi [100] indicates that the value of the accumulator register is reduced by 100. At this time, the value of Accumulator becomes 50, which is the index of the FeedBack Vector [0], which records some key intermediate data during the execution of the function.

Putting the value of the accumulative register into R0 temporary records are also variablesbThe value of the
Star r0: Saves the accumulator value to register R0, which changes to 50.

reada + d * bStatement, executed firstd * b
Ldar A1: Indicates that the value of register A1 is loaded into the accumulator register. In this case, the value of accumulator changes to 2.

Continue to performd * bThe second action of
Mul R0,[2] : Mul r0 means to multiply the value of accumulator with the value of r0 and put the result into the accumulator register again, where [2] is also a feedback vector. After execution, the value of Accumulator becomes 100.

performa + 100The action of
Add A0, [1] : Add A0 means that the value of the accumulator register is added to the value of A0 register and the result is put into the accumulator register again. At this time, the value of accumulator becomes 105.

Return: terminates the execution of the current function and returns the value in the accumulated register. The result is 105.

This is a simple process by which the interpreter executes the bytecode (bypassing the translation of the AST), but still needs to convert the bytecode to machine code, which the CPU recognizes only.

It may seem like an extra layer of bytecode conversion feels inefficient, but the advantage of bytecode over machine code is that it is easier to optimize performance, primarily by having the optimized compiler compile hot code. The architecture performance of bytecodes-based optimization is much better than that of architecture directly converted to machine code.

As mentioned earlier, the Ignition interpreter flags repeated hot code as it executes and hands it to TurboFan to generate more efficient machine code. Here’s how TurboFan works.

How the Optimized compiler (TuboFan) works

V8 has made a number of optimizations in terms of prompt JavaScript performance, the most important of which are the inline and escape analysis algorithms.

Inlining

function add(x, y) {
  return x + y;
}
function three() {
  return add(1.2);
}
Copy the code

First we define a summation function add, which takes two arguments x and y, and then three, from which we call add.

If the code is compiled directly without optimization, the machine code for both functions is generated separately. But to further improve performance, the TurboFan optimized compiler first inlines the two functions and then compiles them.

Since the behavior inside the function three is to sum 1 and 2, the above code is equivalent to the following:

function three_add_inlined() {
  var x = 1;
  var y = 2;
  var add_return_value = x + y;
  return add_return_value;
}
Copy the code

Furthermore, since the x and y values in three_add_inlined are determined, three_add_inlined can be further optimized to return result 3:

function three_add_const_folded() {
  return 3;
}
Copy the code

Inlining can reduce complexity, eliminate redundant code, merge constants, and is often the basis for escape analysis. So what is escape analysis?

Escape Analysis

Analyze whether the life cycle of an object is limited to the current function.

class Point {
  constructor(x, y) {
    this.x = x;
    this.y = y;
  }

  distance(that) {
    return Math.abs(this.x - that.x)
         + Math.abs(this.y - that.y); }}function manhattan(x1, y1, x2, y2) {
  const a = new Point(x1, y1);
  const b = new Point(x2, y2);
  return a.distance(b);
}

Copy the code

We define a Point class that represents the coordinates of a Point and a distance method that computes the Manhattan distance between the two points.

We then new two points in the Manhattan function, A and B, and calculate the Manhattan distance of A and B. TurboFan starts by inlining the Manhattan function into something like this:

function manhattan_inlined(x1, y1, x2, y2) {
  const a = {x:x1, y:y1};
  const b = {x:x2, y:y2};
  return Math.abs(a.x - b.x)
       + Math.abs(a.y - b.y);
}

Copy the code

This is followed by an escape analysis of objects in manhattan_Inlined. What kind of object is considered “unescaped”? There are mainly the following conditions:

Objects are defined inside functions;
An object is inside a scoped function, not returned, not passed to other functions, etc.

In manhattan_inlined, variables A and B are generic objects within functions, so they are “non-escaped” objects. We can then replace the object in the function with a scalar:

function manhattan_scalar_eplacement(x1, y1, x2, y2) {
  var a_x = x1;
  var a_y = y1;
  var b_x = x2;
  var b_y = y2;
  return Math.abs(a_x - b_x)
       + Math.abs(a_y - b_y);
}
Copy the code

So instead of having an object definition in the function, we have a_x a_y b_x b_y, which comes directly from the function parameters.

The advantage of this is that we can load variables directly into registers, eliminating the need to access object properties from memory, improving execution efficiency and reducing memory usage.

Finally, thank you for your patience. The theory of this article is rather boring, but as a front-end developer, the technology is updated and iterated with each passing day, but its essence will not change, so grasp the basics, understand the principle of JavaScript engine, no matter what, everything changes.

If this article can help or inspire you, it will be my pleasure