Get inside Chrome and see how the V8 engine works

As a front-end programmer, the first thing I do at work is turn on my computer, and I can’t help but click on the Chrome browser, fish for a while, or get to work right away. The browser window will follow you through the day until 7 or 8 o ‘clock, then 9 or 10 o ‘clock, then across the day, keeping an eye on your work. As a loyal companion, ask yourself, have you taken a serious look at how it works? Have you ever been inside its mind?

If you’ve ever wondered, watch this episode of Inside Chrome to see how V8 works.

What is the V8

Before you can get too deep into a thing, you have to know what it is.

V8 is a Google open-source JavaScript and WebAssembly engine written in C++ for Chrome, Node.js, and other applications. It implements ECMAScript and WebAssembly and runs on Windows 7 and above, MacOS 10.12+, and Linux systems using x64, IA-32, ARM, or MIPS processors. V8 can be run on its own or embedded in any C++ application.

V8 origin

So let’s take a look at how it came to be, and why it got its name.

Originally developed by Lars Bak’s team and named after the car’s V8 engine (a V-shaped engine with eight cylinders), V8 promised to be a high-performance JavaScript engine and was released as open source on September 2, 2008, along with Chrome.

Why do we need V8

The JavaScript code we write is ultimately executed in a machine, but the machine doesn’t recognize these high-level languages directly. It takes a series of processes to convert the high-level language into instructions that the machine can recognize, namely binary code, and then give the machine to execute. The transition process in between is what V8 does.

Next, we will have a detailed understanding.

V8 composition

First, let’s look at the internal composition of V8. There are a number of modules inside V8. The four most important ones are as follows:

Parser: Parser, which parses the source code intoAST
Ignition: interpreter, responsible for willASTConvert it to bytecode and execute it, marking the hot spot code
Turbofan: Compiler, which compiles hot code into machine code and executes it
Orinoco: Garbage collector, which is responsible for recycling memory space

V8 workflow

The following is a detailed workflow flowchart of several important modules in V8. Let’s do it one by one.

Parser Parser

The Parser is responsible for converting the source code into an abstract syntax tree AST. There are two important stages in the process of transformation: Lexical Analysis and Syntax Analysis.

Lexical analysis

Also known as word segmentation, it is the process of converting a string of code into a sequence of tokens. The token here is a string, the smallest unit that makes up the source code, similar to a word in English. Lexical analysis can also be understood as the process of combining English letters into words. The lexical analysis process does not care about the relationship between words. For example, it is possible to mark parentheses as tokens during lexical analysis, but does not verify that the parentheses match.

Tokens in JavaScript mainly include the following types:

Keywords: var, let, const, etc

Identifier: An unquoted continuous character that may be a variable, keywords such as if or else, or built-in constants such as true or false

Operators: +, -, *, /, etc

Numbers: things like hexadecimal, decimal, octal, and scientific expressions

String: the value of a variable, etc

Spaces: consecutive Spaces, line feeds, indents, etc

Comments: A line comment or a block comment is an unbreakable, minimal unit of syntax

Punctuation: braces, braces, semicolons, colons, etc

Const a = ‘hello world’ tokens are generated after esprima parsing

[
    {
        "type": "Keyword",
        "value": "const"
    },
    {
        "type": "Identifier",
        "value": "a"
    },
    {
        "type": "Punctuator",
        "value": "="
    },
    {
        "type": "String",
        "value": "'hello world'"
    }
]

Syntax analysis

Syntactic distraction is the process of translating tokens generated by lexical analysis into an AST in a given form of grammar. It’s the process of putting words together into sentences. The syntax is validated during the transformation, and syntax errors are thrown if there are any.

Const a = ‘hello world’ const a = ‘hello world’

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "value": "hello world",
            "raw": "'hello world'"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}

The AST generated by the Parser Parser will be processed by the Ignition interpreter.

Ignition interpreter

The Ignition interpreter is responsible for converting the AST to Bytecode and executing it. Bytecode is code that is intermediate between AST and machine code. It is independent of a particular type of machine code and needs to be converted to machine code by the interpreter before it can be executed.

Since bytecode also needs to be converted to machine code to run, why not convert the AST to machine code to run in the first place? Converting to machine code is definitely faster, so why add an intermediate process?

In fact, there was no bytecode before the 5.9 version of V8. Instead, JS code was directly compiled into machine code and the machine code was stored in the memory, which occupied a lot of memory. However, the memory of early mobile phones was not high, and excessive usage would lead to a significant decline in the performance of mobile phones. Moreover, direct compilation into machine code leads to long compilation time and slow startup speed. In addition, directly converting JS code into machine code requires writing different instruction sets for different CPU architectures, which is highly complex.

Bytecode was introduced after 5.9 version, which can solve the problems of large memory consumption, long startup time and high code complexity.

Now let’s look at how Ignition converts the AST to bytecode.

Below is a flowchart of how the Ignition interpreter works. The AST needs to go through a bytecode generator and then through a series of optimizations to generate the bytecode.

The optimizations include:

Register Optimizer: This is to avoid unnecessary loading and storage of registers
Peephole Optimizer: Find parts of the bytecode that can be reused and merge them
Deal-code Elimination: Eliminate unnecessary code and reduce the size of bytecode

Once the code is converted to bytecode, it can be executed through the interpreter. During execution, Ignition monitors the execution of your code and records execution information, such as the number of times the function is executed, the arguments passed each time the function is executed, and more.

When the same piece of code is executed more than once, it is marked as hot code. The hot code is handed to the TurboFan compiler for processing.

TurboFan compiler

When Turbofan gets the hot code for the Ignition tag, it optimizes it, then compiles the optimized bytecode into more efficient machine code and stores it. The next time the same code is executed, the corresponding machine code will be executed directly, which greatly improves the execution efficiency of the code.

When a piece of code is no longer hot, TurboFan goes through the process of de-optimizing, reverting the optimized compiled machine code back to bytecode, and giving execution rights back to Ignition.

Now let’s take a look at the implementation process.

In the case of sum += arr[I], since JS is a dynamically typed language, each time sum and arr[I] are of a different type. When you execute this code, the Ignition will check the data types of sum and arr[I] each time. When the same code is found to have been executed multiple times, it is flagged as hot code and handed to TurboFan.

It is a waste of time for TurboFan to determine the data types of sum and arr[I] each time it executes. Therefore, during optimization, the data types of SUM and ARR [I] are determined based on previous executions and compiled into machine code. The next time you execute it, you eliminate the need to determine the data type.

If the ARR [I] data type changes later in the execution, then TurboFan throws away the code and turns to Ignition to complete the optimization process.

Hotspot Code:

Before optimization:

After the optimization:

conclusion

Now let’s summarize the V8 implementation:

Source code processParserParser, after lexical analysis and syntax analysis generationAST
ASTafterIgnitionThe interpreter generates the bytecode and executes it
During execution, if a hot spot code is found, pass it toTurboFanThe compiler generates machine code and executes it
If the hotspot code no longer meets the requirements, deoptimize the process

This combination of bytecode with an interpreter and compiler is commonly known as just-in-time compilation (JIT).

This article does not cover the garbage collector Orinoco. V8’s garbage collection mechanism can be covered in a separate article, but I will see you next time.

Refer to the article

V8 official documentation
Celebrating 10 years of V8
How does V8 execute JavaScript code?
Ignition: An Interpreter for V8
Instantaneous compiling