Author: Tadm

A peek into the Google Lev8 engine

Before exploring it, let’s throw out the following questions:

  • Why do you need a V8?
  • What exactly is a V8 engine?
  • What can it do?
  • What can be gained from understanding it?

The following is a detailed description of the above problems.

origin

As we all know, JS is an interpreted language that supports dynamic typing (unlike static languages like Java, which don’t declare the type of a variable when naming it), weak typing, prototype-based languages, and built-in typing support. However, JS is generally implemented in the front end (directly affecting the interface), which needs to be able to quickly respond to the user, so it requires that the language itself can be quickly parsed and executed, and JS engine comes out for this.

Interpreted languages and static languages (compiled languages) are mentioned here, but a brief introduction to both is given:

  • Interpreted Language (JS)

    • Each run requires an interpreter to interpret the source code execution sentence by sentence, translate it into machine-aware machine code and execute it
  • Compiled language (Java)

    • The runtime can be translated into an executable by the compiler and then run by the machine

As can be seen from the above description, the JS runtime needs to be interpreted and executed according to the source file each time, while the compiled type only needs to be compiled once and can run its executable file directly next time, but this will lead to poor cross-platform compatibility, so each has its own advantages and disadvantages.

V8 is the best of the many JAVASCRIPT engines (V8, JavaScriptCore, SpiderMonkey, Chakra, etc.), and it is also used in the most popular Google Browser, so it is important to know and understand. This will make it easier for developers to understand how JS works in the browser.

know

define

  • Developing in C++
  • Google, the open source
  • Compiled to native machine code (supports IA-32, x86-64, ARM, or MIPS CPUs)
  • Methods such as inline caching are used to improve performance
  • Fast, comparable to binary programs
  • Supports multiple operating systems, such as Windows, Linux, and Android
  • Support other hardware architecture, such as IA32,X64,ARM, etc
  • It has good portability and cross-platform features

run

Here’s the official flow chart:

To prepare

JS file loads (not under V8’s control) : they can be from network requests, local caches, or service workers, which are a prerequisite for V8 to run (active files are required to explain execution). 3 loading methods & Optimization of V8:

  • Cold Load: No data is cached when the script file is first loaded
  • Warm Load: V8 analyzes that if the same script file is used, the compiled code is cached in the disk cache along with the script file
  • Hot Load: When the same script file is loaded a third time, V8 can load the script from the disk cache and also retrieve the compiled code from the last load, avoiding parsing and compiling the script completely from scratch

And further improvement in V8 version 6.6 code cache strategy, simple is to rely on the compilation process from the cache code model, change into two process decoupling, and increase the amount of code can cache, thus improve the parse and compile time, greatly improve the performance, the details see V8 6.6 to improve cache performance.

Analysis of the

This process converts the JS code from the previous section into an AST (Abstract syntax tree).

Lexical analysis

Scanning the source code from left to right one character, through the analysis, to produce a different mark, mark here called token, representing the smallest units of source code, easy to speak is to a piece of code into the smallest unit cannot be broken, a process known as lexical markers, the product of the process for the use of the syntax analysis link below.

Here are the types of tokens commonly used by lexical analyzers:

  • Constants (integers, decimals, characters, strings, etc.)
  • Operators (arithmetic operators, comparison operators, logical operators)
  • Delimiters (commas, semicolons, parentheses, etc.)
  • Reserved words
  • Identifiers (variable names, function names, class names, etc.)
TOKEN-TYPE TOKEN-VALUE\ -----------------------------------------------\ T_IF if\ T_WHILE while\ T_ASSIGN =\ T_GREATTHAN  >\ T_GREATEQUAL >=\ T_IDENTIFIER name / numTickets / ... \ T_INTEGERCONSTANT 100 / 1 / 12 / .... \ T_STRINGCONSTANT "This is a string" / "hello" / ... Copy the codeCopy the code

Mentioned above will be from left to right scan code and then analyze them one by one, then obviously will think of two kinds of schemes, scanning the reanalysis (not flow processing) and scan and analysis (flow processing), simple drawing about their sequence diagram can be found that the flow processing efficiency is much higher, at the same time analyzed the memory will also release the analysis process, also can greatly improve the efficiency of memory usage, See the details of this optimization.

Syntax analysis

Grammatical analysis refers to the process of analyzing and identifying the grammatical structures of input texts consisting of word sequences based on a given formal grammar (for example, tokens Stream, the lexical analysis product of the last stage), and finally producing AST (Abstract Grammar tree).

V8 breaks the parsing process into two stages:

  • Pre-parser

    • Skip code that is not yet used
    • No CORRESPONDING AST is generated, and the scopes information is generated without reference and declaration of variables
    • Parsing will be twice as fast as full-parser
    • Only specific error messages are thrown according to the syntax rules of JS
  • Full-parser

    • Parse the code that is used
    • Generate the corresponding AST
    • Generate specific scopes information, with variable references and declarations, etc
    • Throws all JS syntax errors
Why do you have to parse it twice?

If it’s only once, it’s full-parser, but in that case, a lot of unused code consumes a lot of parsing time. For example, Coverage records pages that aren’t used, and up to 75% of them aren’t executed.

But preliminary parsing is not everything, gain and loss is coexist, obviously a scene: code was carried out in this file, it is not necessary, of course, this is actually still accounts for far less than the example above, so there is a trade-off, need to take care of the most comprehensive performance to achieve the ascension.

Here is an example:

function add(x, y) { if (typeof x === "number") { return x + y; } else { return x + 'tadm'; }} Copy the codeCopy the code

Copy the above code into Web1 and Web2 to see their tokens and AST structure.

  • tokens
[ { "type": "Keyword", "value": "function" }, { "type": "Identifier", "value": "add" }, { "type": "Punctuator", "value": "(" }, { "type": "Identifier", "value": "x" }, { "type": "Punctuator", "value": "," }, { "type": "Identifier", "value": "y" }, { "type": "Punctuator", "value": ")" }, { "type": "Punctuator", "value": "{" }, { "type": "Keyword", "value": "if" }, { "type": "Punctuator", "value": "(" }, { "type": "Keyword", "value": "typeof" }, { "type": "Identifier", "value": "x" }, { "type": "Punctuator", "value": "===" }, { "type": "String", "value": ""number"" }, { "type": "Punctuator", "value": ")" }, { "type": "Punctuator", "value": "{" }, { "type": "Keyword", "value": "return" }, { "type": "Identifier", "value": "x" }, { "type": "Punctuator", "value": "+" }, { "type": "Identifier", "value": "y" }, { "type": "Punctuator", "value": ";" }, { "type": "Punctuator", "value": "}" }, { "type": "Keyword", "value": "else" }, { "type": "Punctuator", "value": "{" }, { "type": "Keyword", "value": "return" }, { "type": "Identifier", "value": "x" }, { "type": "Punctuator", "value": "+" }, { "type": "String", "value": "'tadm'" }, { "type": "Punctuator", "value": ";" }, { "type": "Punctuator", "value": "}}, {" type ":" Punctuator ", "value" : "} "}] duplicate codeCopy the code
  • AST
{ "type": "Program", "body": [ { "type": "FunctionDeclaration", "id": { "type": "Identifier", "name": "add" }, "params": [ { "type": "Identifier", "name": "x" }, { "type": "Identifier", "name": "y" } ], "body": { "type": "BlockStatement", "body": [ { "type": "IfStatement", "test": { "type": "BinaryExpression", "operator": "===", "left": { "type": "UnaryExpression", "operator": "typeof", "argument": { "type": "Identifier", "name": "x" }, "prefix": true }, "right": { "type": "Literal", "value": "number", "raw": ""number"" } }, "consequent": { "type": "BlockStatement", "body": [ { "type": "ReturnStatement", "argument": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "x" }, "right": { "type": "Identifier", "name": "y" } } } ] }, "alternate": { "type": "BlockStatement", "body": [ { "type": "ReturnStatement", "argument": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "x" }, "right": { "type": "Literal", "value": "tadm", "raw": "'tadm'" } } } ] } } ] }, "generator": false, "expression": false, "async": False}], "sourceType": "script"} copy codeCopy the code

explain

This stage is to convert the AST generated above into bytecode.

The advantage of adding bytecodes (intermediates) here is that the AST is not translated directly into machine code, because the corresponding CPU systems are inconsistent, and the translation into machine code requires the combination of each CPU’s underlying instruction set, which can be very complex to implement. There is also the problem of memory usage, because the machine code is stored in memory, and after exiting the process is stored on disk, plus the converted machine code has much more information, which is much larger than the source file, resulting in a serious memory usage problem.

V8 in the process of executing bytecode, the use of the general register and cumulative register, function parameters and local variables stored in the general register, the accumulator to save the intermediate calculation results, in the process of executing instructions, if the CPU directly read data from memory, compared to affect the performance of the program. The design of using registers to store intermediate data can greatly improve the speed of CPU execution.

compile

This process is primarily the translation of bytecode into machine code by V8’s TurboFan compiler.

Bytecode interpreter and compiler design, this technology can be called a JIT (instantaneous compiling technology), the Java virtual machine is also a similar technique, the interpreter in interpreting, when performing a bytecode will collect information code, tag some hot code (that is, a piece of code is repeated many times), TurboFan hot code will be compiled into machine code directly, Cache up, the next call directly run the corresponding binary machine code, speed up the execution.

As TurboFan compiles bytecode into machine code, there are also simplifications: constant merging, forced reduction, and algebraic recombination.

For example: 3 + 4 –> 7, x + 1 + 2 –> x + 3……

perform

At this point we begin to execute the machine code produced in the previous phase.

And in the JS execution process, often encountered is the object attribute access. As a dynamic language, a simple property access can have complex semantics in the form of object. XXX, direct access to the property, call the Getter method of the Object, or look up the upper Object through the stereotype chain. This kind of uncertainty and dynamic judgment can waste a lot of lookup time, so V8 puts the results of the first analysis in the cache, and when the same attribute is accessed again, it takes precedence over the cache. Call GetProperty(Object, “XXX “, feedback_cache) to get the cache. If there are cached results, the search process will be skipped, which greatly improves performance.

In addition to the above optimizations for the result cache of reading Object attributes, V8 introduces Object Shapes (hidden classes), which records some basic information about an Object (such as all the attributes it owns and the offsets of each attribute to the Object). In this way, when we access the attribute, we can directly locate its memory address through the attribute name and offset, which can be read, greatly improving the access efficiency.

Since V8 introduced hidden classes (two objects of the same shape reuse the same hidden class, what is an object of the same shape? Two objects satisfy the same number of the same attribute name and the same attribute order), then we developers can also make good use of it:

  • Try to create objects with the same shape
  • After creating the object, try not to manipulate the properties, that is, do not add or delete the properties, and will not break the shape of the object

complete

V8 has completed a JS code reading, analysis, interpretation, compilation, execution.

conclusion

The above is the analysis of the process from JS code download to the final execution in V8 engine. It can be found that V8 actually has a lot of technical points, has a very clever design ideas, such as streaming processing, caching intermediates, garbage collection, etc., which will be involved in many details, it is worth further study