preface

This period of time in the compilation principle of the course set, for the implementation of the compiler is a door, which aroused in my mind a fundamental question, what is the JavaScript engine like, V8 directly led to the Era of Node.js, JavaScript can do more and more things. As an excellent JavaScript engine, its model is worth our thinking and learning.

So how does a V8 engine actually work?

A tale of two compilers

V8 compiles all JavaScript into native code, and in V8, there are two compilers running: one that runs fast and outputs mediocre code, and one that runs less fast but does its best to output optimized code.

The first compiler – the full-codeGen compiler

The compiler that outputs generic code is internally called a full-codeGen compiler. It takes an abstract syntax tree of a function, traverses the syntax tree, and produces assembly code directly. Native code with a type record cache by taking the parsed source code of the function (abstract syntax tree). It is a generic compiler that runs the generic compiler from parsing to code generation.

All local variables are not stored in registers, but in stacks or heaps. All variables referenced by nested functions are stored in the heap, which determines which functions are defined in the function context. The compiler puts these values into registers as needed and does the work. For variables in the stack, the top variables are temporarily cached in the register. More complex cases are managed by real-time handlers. The compiler records the context in which the statement is executed, so that it can jump directly to the block that needs to be executed, rather than putting variables into registers, testing if the value is zero, and then branching. Simple arithmetic evaluations are also optimized here.

The compiler uses a very important technique to optimize code, inline caching. This cache is available at compile time and can be used directly for assignments, unary operations, binary operations, function calls, attribute fetching, and comparison values. Inline Caching also provides type source data to another optimization compiler. Inline caching, on the other hand, caches the storage of keys and values at compile time, whereas inline caching is not triggered by normal operations.

The type of feedback

When V8 first sees a function, it does nothing but build the syntax tree. V8 doesn’t run its first full-codeGen compilation until the first function call. But this lazy practice changed after the code started running. Once it starts running, it triggers the profiling thread, which is responsible for seeing how the code is running, and those are the hot functions.

This lazy, wait-and-see approach allows the V8 engine to track type changes and record data. When V8 finds a hot function and thinks it can help, it feeds the type data back to the compiler. The runtime type feedback data is logged.

         Unknown
           |   \____________
           |                |
      Primitive       Non-primitive
           |   \_______     |
           |           |    |
        Number       String |
         /   \         |    |
    Double  Integer32  |   /
        |      |      /   /
        |     Smi    /   /
        |      |    / __/
        Uninitialized.
Copy the code

Each time we see a new value, we evaluate the value’s type and then evaluate it with the old value type. The initial variable type is Uninitialized. So if you see an integer whose size is within the Smi (Small Integer) range, you infer from the type feedback that it is an Smi. But when you see that the value becomes a Double, then after doing the operation, the inference of the value becomes a Number. The result of each inference is to find the nearest common parent of the two values. Type estimation is done internally so that the compiler can optimize it purposefully.

The type feedback data and the abstract syntax tree are linked. The heat of a function is recorded by an integer, and the corresponding hot node marker information is retrieved from Full-CodeGen and sent to the compiler for further optimization.

At this point, the process starts to get a little more complicated. This process requires up-down compatibility with the compiler stack. The compiler needs to get type feedback on operands and results, and be able to find this data exactly. Then you also have to be able to reassociate those things with the abstract syntax tree so that the compiler can purposefully optimize the code from the tree.

V8 does this by parsing the data into a TypeFeedbackOracle object and associating that object with a specific syntax tree node. Ultimately, V8 accesses nodes in the syntax tree through this object, which also optimizes the compilation process.

The second compiler – the crankshaft compiler

Once V8 has identified the hot functions and got the type feedback, it tries to run the optimized compiler with that information. The optimized compiler is marketed as the crankshaft compiler, although it is not named that way in the source code. In fact, the crankshaft compiler was made up of four processes in the source code: abstract syntax tree with type feedback -> high level intermediate code -> low level intermediate code -> optimized native code.

The high-level intermediate code is the code formed by the front end of the compiler, while the low-level intermediate code is the intermediate code used by the back end. Through the dual optimization of the front end and the back end, the V8 engine can better handle the hot function.

conclusion

The V8 engine uses lazy optimization to improve performance through a combination of hot function optimization for runtime, fast compilation and slow optimization, and reasonable type inference to solve JavaScript typing problems. This is just a conceptual analysis of earlier versions of V8, but I’m already getting my hands on the magic of V8 optimisation.

References:

http://wingolog.org/archives/2011/07/05/v8-a-tale-of-two-compilers#ffc2b5d74c27fa60d75658244fee88e6fa783afb

https://github.com/v8/v8/tree/master/src