JavaScript: V8 compilation process

The ECMAScript language types are known as Undefined, Null, Boolean, String, Symbol, Number, and Object. We often refer to the first six data types as basic types, and Object as reference or complex type data. Use Undefined, Null, Boolean, String, Symbol, Number as the base type and Object as the reference type. I didn’t really think about it until I looked at the ECMAScript specification for myself, that’s what everyone said, and that’s what a lot of books have written, and that’s what it’s all about.

The ECMAScript specification does not specify that Undefined, Null, Boolean, String, Symbol, Number are the base types and Object is the reference type. Let’s see what the specification says:

4.3.2 primitive value

member of one of the types Undefined, Null, Boolean, Number, Symbol, or String as defined in clause 6 NOTEA primitive value is a datum that is represented directly at the lowest level of the language implementation.

Undefined, Null, Boolean, Number, Symbol, and String defined in Chapter 6 of the specification are primitive values that directly represent the lowest level of data in the language implementation.

4.3.3 object

member of the type Object An object is a collection of properties and has a single prototype object. The prototype may be the null value.

An object is a collection of properties and has a prototype object. Stereotypes can be null.

The specification does not distinguish between underlying data types and reference data types, so why do we have these two concepts in JavaScript? This has to do with JavaScript engine memory management, so let’s talk about how JavaScript engine compilation works.

Language type

Computers can’t directly understand any language other than machine language. The code we usually write is a high-level language that computers can’t understand directly, so we have to translate the high-level language code into machine language before computers can execute programs.

At present, programming languages are mainly divided into compiled language and interpreted language. Compiled language is converted into machine language by the compiler before the code is run, which does not need to be re-translated, but directly uses the compiled results. Interpreted languages also need to be converted from programming language to machine language, but at run time. It is obvious that interpreted languages are slower than compiled languages, which require a conversion of the source code for each execution.

C, C++ is a compiled language, editing source code, compilation and running are all separate, each other is a chain of responsibility situation. After we write the source code, we compile the link and finally get the native binary code, which we hand to the operating system.

Ruby, JavaScript and other typical interpreted languages are also scripting languages. Scripts do not need to be compiled, but are directly interpreted by the interpreter and run by calling operating system resources in the process of running. In the case of JavaScript, the interpreter is the JavaScript engine, and the early JavaScript engines compiled JavaScript code in this way, but the early V8 engine did not compile this way, as we’ll see below.

Java is also classified as an interpreted language, which is controversial. Some people say Java is a compiled language, but don’t worry about what kind of language Java is, let’s look at how Java is compiled. The process for Java code is similar to that described above, but more complicated. It is divided into two stages: compilation and interpretation:

First, the imageC++Language like compiler, compilerjavaCode, but withC++Instead of generating machine code, the compiler converts it first into an abstract syntax tree and then into bytecode, which is intermediate code.
Explanation: Then there is the bytecode running, which mainly depends onjavaVirtual machine (JVM) loads the bytecodes, executes them using interpretations, and converts the bytecodes into machine code. Bytecode can be operating system – and platform-independent with the help ofjavaVirtual machine also realized the function of cross-platform, so always saidjavaIt’s cross-platform. That’s how it spans.

Java processing has introduced the concept of JIT, which can convert bytecode into native code and then execute, thus improving execution efficiency. JIT is mainly used to optimize performance. Many JavaScript engines also use it.

JavaScript is classified as a weakly typed interpreted language because it is a weakly typed, dynamically typed language. In contrast, statically typed languages such as C++ or Java know the type of each variable at compile time. However, JavaScript cannot know the data types of its variables at editing time, but only at runtime, which causes JavaScript to face great pressure in terms of performance. Calculation and decision of data type at runtime will bring serious performance loss, which also leads to JavaScript running efficiency is much lower than C++, Java, in order to improve the running efficiency, many manufacturers are making efforts, the best is Chrome V8 engine, V8 engine is JavaScript engine.

The execution of JavaScript relies on JavaScript engine, which is a virtual machine specialized in processing JavaScript scripts, similar to JVM. There are many JavaScript engines at present: SpiderMonkey, JavaScriptCore, Chakra, V8, etc. Modern JavaScript engines incorporate many of the techniques of Java virtual machines and C++ compilers, and work very differently from earlier JavaScript engines:

In the early days, the interpreter could interpret them by turning the source code into an abstract syntax tree and then interpreting the execution in the abstract syntax tree. This is how early JavaScriptCore worked, and it has since been improved. With the introduction of JIT technology for the Java Virtual Machine, the practice is to convert abstract syntax trees into intermediate representations (that is, bytecode) and then into native code with JIT technology. There are also JIT techniques that generate native code directly from abstract syntax trees, such as the early V8.

JavaScript engine

At present, the processing process of JavaScript by JavaScript engine is similar to Java, after all, Java compilation technology is introduced, but there are still differences, Java processing is divided into two stages: Compilation and interpretation, which uses the compiler to parse source code to generate bytecode, which is then converted into machine code by the JVM and run. JavaScript engine compilation combines the compilation and interpretation phases and is performed in a JavaScript engine, which currently consists of the following parts:

Compiler: Compiles source code into abstract syntax trees in some engines (e.gJavaScriptCoreAnd now theV8) to convert abstract syntax trees into bytecode
Interpreter: In some engines (e.gJavaScriptCore), the interpreter is mainly to accept bytecode, interpret the execution of bytecode, but earlyV8There is no interpreter in the engine
JITTools: convert bytecode or abstract syntax tree into native code, optimized for use
Garbage collector and analysis tools (profiler) : Responsible for garbage collection and collecting information from the engine to help improve engine performance and efficiency

The figure above shows the compilation process of a JavaScript engine. At present, most JavaScript engines compile JavaScript according to the above process. First, the compiler converts the source code into an abstract syntax tree and then into bytecode, which is parsed and executed by the interpreter to generate local code.

V8 engine

V8 is an open source project, which is superior to other JavaScript engines in terms of performance. The V8 engine used by Chrome has a large share of the browser market, and Node is based on V8. V8 also supports many operating systems and hardware architectures. V8 is representative, V8 performance has been steadily improving since its 2008 release:

The following is the entire process of V8 engine execution, mainly including the following modules:

parse: responsible for theJavaScriptSource code to abstract syntax tree (AST)
Ignition:interpreter, interpreter, convert AST to bytecode (Bytecode), parsing performs bytecode while also collectingTurboFanOptimize the information needed for compilation
TurboFan:compiler.JITCompiler, usingIgnitioThe type information collected will beBytecodeConvert to optimized machine code
Orinoco:garbage collectorThe garbage collection module is responsible for reclaiming memory space that the program no longer needs

Generate abstract syntax tree

V8 engine first uses the compiler (parse) to parse the source code into an abstract syntax tree (AST). The generation of AST is divided into two stages, one is lexical analysis, the other is syntax analysis:

Lexical analysis: Breaking source code into its smallest, non-divisible lexical units (token). For example programvar a = 2;. The program is usually broken down into these lexical units:var,a,=,2,;. Five lexical units. Whether or not Spaces are considered lexical units depends on whether or not Spaces have meaning in the languageJavaScriptIn, Spaces are not considered lexical units.
Parsing: The process of converting a stream of lexical units (arrays) into a hierarchical nested tree of elements that represents the syntax structure of the program. This tree is called an abstract syntax tree (AST).var a = 2;May have an abstract syntax tree calledVariableDeclarationIs followed by a top-level node calledIdentifier(Its value isa), and a node calledAssignmentExpressionChild node of.AssignmentExpressionThe node has a node calledNumericLiteral(Its value is2).

Var a = 2 generates an abstract syntax tree.

An AST is an abstract representation of the syntactic structure of the source code. The computer does not recognize the source code, so it is only a step in the process of converting the source code into machine code that the computer can recognize.

Babel is a JavaScript compiler that consists of three stages: Analysis, translation, generation. The basic implementation of Babel is to parse ES6 source code into an AST, convert the AST of ES6 syntax into an AST of ES5, and finally use it to generate ES5 source code.

ESLint works in much the same way; the inspection process also converts source code into an AST, which is used to check code specifications.

AST is a very important concept in computer science. It needs to be understood to help us better understand the code we write. The AST is also used for Vue compilation.

Generate bytecode

Now that we’ve covered converting JavaScript source code to an AST, we need to convert the AST to bytecode. As mentioned earlier, before v5.6 of the V8 engine, AST was not converted to bytecode, but directly to machine code. There are two compilers:

full-codegen: simple and fast compiler that can generate simple but relatively slow machine code
Grankshaft: More complexJITCompiler that can generate highly optimized machine code

The first time you execute JavaScript code, you convert the AST directly to machine code through the full-CodeGen compiler, skipping the process of converting to bytecode, which makes it very fast to execute machine code.

The main purpose of this is to reduce the conversion time of the intermediate process to bytecode and improve the execution speed of the code, which is also done at the time of page loading. This can improve the possibility of optimization and the execution performance can be greatly improved, but the disadvantages are also obvious:

This creates a memory footprint problem because the abstract syntax tree is entirely generated in machine code, which is much more memory intensive than bytecode
Some of theJavaScriptUsage scenarios It is more appropriate to use an interpreter, which is parsed into bytecode. Some code does not need to generate machine code, thus minimizing the problem of machine code taking up too much memory
Because there is no intermediate representation, there is less opportunity for optimization because there is one less intermediate representation layer

A lot of work was done to optimize performance before the v5.6 release, with a lot of Lazy parsing and compiling to reduce machine code generation. For example, if a function in a piece of code is not called in the initial call, the call will be “deferred” until the code for that function is compiled for the first call.

After the machine code is generated by the full-CodeGen compiler, some information is collected by the Profiler, which is fed to the Grankshaft compiler.

Grankshaft compiler, mainly optimized for hot code. As can be seen from the compilation process above, the compiler also conducts analysis based on source code, builds Hydroger diagram and optimizes analysis based on it, resulting in more efficient machine code. This is a step-by-step optimization process. At the same time, when the optimized code is found to be performing worse than the unoptimized code, V8 reverts to the original code, which is de-optimized.

Despite constant optimization, the optimization layer is still on the machine code, which is itself a huge memory hog. The Grankshaft compiler is also JIT optimized, but Grankshaft is re-parsed from source every time.

The process of converting AST to bytecode was also added in V8 V5.6, returning to bytecode. The Ignition interpreter, which converts the AST to bytecode, was introduced.

V8’s reintroduction of the Ignition interpreter and conversion of AST to bytecode resulted in a significant reduction in memory footprint, as well as further optimizations using the JIT compiler.

This is a test of ten popular mobile sites, which showed a significant decrease in their memory usage.

Bytecode is a type of code between AST and machine code that needs to be converted to machine code before it can be executed. Bytecode can be understood as an abstraction of machine code. You don’t really need to understand exactly what bytecode is, just that it’s intermediate code.

Ignition is designed to build an interpreter for V8 that performs low-level bytecode, so that code that is only run once or is not hot can be stored more compact in bytecode form. Because the bytecode is smaller, the compilation time is also greatly reduced. Bytecode can also be passed directly to the TurboFan graph generator, avoiding reparsing JavaScript source code when optimizing functions in TurboFan. This means TurboFan is compiled based on bytecode, not source code.

The addition of the Ignition interpreter completely replaced full-CodeGen, and the crankshafts, which could not parse optimized bytecodes, were completely removed and replaced by the TurboFan compiler.

Generating machine code

In addition to quickly generating unoptimized bytecodes, the Ignition interpreter is also responsible for executing bytecodes. The first time bytecode is executed, analysis data is also collected and interpreted by the interpreter. If hot code is found (that is, a piece of code is executed multiple times), the generated bytecode and analysis data are passed to the TurboFan compiler, which generates highly optimized machine code from the analysis data. When this code is executed again, only the compiled machine code needs to be executed.

The TurboFan compiler is a JIT optimized compiler that starts running bytecode in the Ignition interpreter. At some point, the engine determines that the code is hot and starts the TurboFan front end, which is part of TurboFan that handles the integration analysis data and the basic machine representation of the build code. It is then sent to TurboFan on another thread to further improve the code. The V8 engine is multi-threaded, and TurboFan does not compile and generate bytecode on the same thread.

V8 continues to execute bytecode in the Ignition interpreter while TurboFan is running. At some point, TurboFan is finished and has executable machine code to continue with.

The analysis data gathered by the Ignition interpreter is used by TurboFan to generate highly optimized machine code primarily through a technique called Speculative Optimization. TurboFan looks at the type of value it has seen in the past and assumes that it will see the same type of value in the future, saving TurboFan a lot of cases that don’t need to be handled. If the hypothesis fails, then we return to parsing bytecode, which is also known as deoptimization.

Now that V8 is done compiling JavaScript, it should be clear to look at the following diagram. There are too many things involved in the V8 engine, and it is difficult to sort out the whole content of V8 in a short time. This is not an article that can explain it clearly, so consider writing it from the perspective of V8 as much as possible in the following article.

The compilation process of modern JavaScript engines is mostly similar. The core principle is the same. The main difference is that different engines have different optimization layers, which means different engines have different numbers of interpreters and compilers. There is a fundamental trade-off between using an interpreter to generate code quickly or using an optimized compiler to generate high school code. Adding more layers of optimization allows us to make more fine-grained decisions, but at the expense of additional complexity and overhead. In addition, there is a tradeoff between the level of optimization and the amount of memory used to generate code. It really boils down to better engine performance. If you are interested, you can learn about the compilation process of other engines

We have not yet answered the question of why data types are divided into base types and reference types. This is related to the memory management of the engine. In the next article, we will talk about V8’s memory management.

conclusion

If the article is not correct, you are welcome to correct, also hope to see the article students have harvest, grow together!

— — — — — — — — — — — — — — — — — — — — — — — — — – in this paper, starting from individual public number — — — — — — — — — — — — — — — — — — — — –

Language type

JavaScript engine

V8 engine

Generate abstract syntax tree

Generate bytecode

Generating machine code

conclusion

Related Posts

Use three. js to create a sticky hover effect

【 Stars don’t ask Wayfarers 】 — My 2020

Let, const review