What is a V8

V8 is an open source Javascript engine developed by Google and currently used in The Chrome browser and Node.js. Its core function is to execute human-readable Javascript code. Also known as a virtual machine, it simulates a computer’s CPU, stack, registers, etc., and has its own instruction set system. V8 is the world to JavaScript, so there is no need to worry about the difference in the computer environment on which JavaScript is located.

Its main core process is divided into two steps: compilation and execution. The JavaScript code needs to be converted to low-level intermediate code or machine code that the machine can understand, and then the converted code is executed to output the execution results.

Why is high-level code compiled before it is executed

You can think of the CPU as a very small running computing machine, and we can communicate with the CPU through binary instructions. In order to be able to perform complex tasks, engineers provide a bunch of instructions to implement various functions, namely machine language. Note that the CPU can only recognize binary instructions, which are machine instructions. What people can read are assembly instructions.

1000100101011010Machine instruction MOV AX BX assembly instructionCopy the code

While assembly language is concerned with the details of a computer’s architecture, JavaScript, C, Python, and other high-level languages mask these differences. However, high-level languages need to be parsed before they can be executed. There are two main ways:

  1. Explain to perform

A piece of code –> parser –> intermediate code –> parser execution –> output

  1. Compile implementation

A piece of code — — — — > the middle > the parser code — — — — > > compiler machine code — — > machine code execution > output

How does V8 execute a piece of JavaScript code?

V8 is a hybrid of compilation execution and interpretation execution, known as JIT (JUST IN TIME) technology. Explain execution is fast to start and compile execution is fast to execute, thus combining the best of both. The specific process is as follows:

  • Initialize the environment at startup: stack space, global execution context, event loop system initialization
  • AST parsing of the source code, while generating the correlation scope, the scope holds the relevant variables
  • Generate intermediate code (bytecode). The bytecode is the same and the machine code is different on different cpus
  • The parser interprets the execution
  • If a piece of code is found to be executed more than once, it is marked as hot code and thrown to the compiler to compile for binary optimized code execution.
  • If the optimized code is modified, the de-optimized code is generated, and the next execution is handed over to the parser

V8 debugging tool: D8

  • Print the AST
pt --print-ast xxx.js
Copy the code
  • View scope
pt --print-scopes xxx.js
Copy the code
  • View the bytecode generated by the interpreter
pt --print-bytecode xxx.js
Copy the code
  • View the optimized hotspot code
pt --trace-opt xxx.js
Copy the code
  • Look at the de-optimized code
pt --trace-deopt xxx.js
Copy the code

JavaScript Design ideas

Fast properties and slow properties

Define an object as follows:

let foo = {'10': 1.100: 'dds'.2: 'sda'.1: 'fdsf'.'ds': 'sda'.'3': 'dfs'}
Copy the code

The numeric key attributes are listed before the alphabetic key attributes, and the numeric key attributes are sorted by numeric size:

The key attributes are called fast attributes, and the key attributes are called slow attributes. The result is that the ECMAScript specification defines that numeric attributes should be sorted in ascending order by index size and string attributes in ascending order by creation time. Numeric properties are called sort properties, called Elements in V8, and string properties are called general properties, called properties in V8.

Because linear structures are fast to find and dictionary structures are slow to find, up to 10 general properties are built into the object for quick lookup, and more than 10 are placed in properties.

In addition to the Elements and properties properties, V8 also implements the Map and __proto__ properties for each object. Maps are hidden classes that are used to quickly find object properties in memory.

A function expression that is invoked immediately

There are three main differences between a function expression and a function declaration:

  • Function expressions use function in expression statements. The most typical expression isa=bIn this form, since a function is also an object, let’s call itvar a = function(){}It’s called a functional expression
  • In function expressions, you can create anonymous functions by omitting function names
  • A function expression can be used as a just-in-time function — IIFE

JavaScript has a parenthesis operator that can put an expression inside a parenthesis, such as

(a=3)
Copy the code

Since the parentheses must contain expressions, if you define a function inside the parentheses, V8 will treat the function as a function expression and return a function object when executed.

(function(){
    var a = 3
    return a + 3
})
Copy the code

We call an immediate function expression (IIFE) by enclosing the call parentheses directly after the expression.

(function(){
    var a = 3
    return a + 3}) ()Copy the code

Because an expression that a function calls immediately is also an expression, V8 does not process a function expression and does not create a function object for that expression during V8 compilation. One advantage of this is that it does not pollute the environment, and neither the function nor its internal variables can be accessed by other parts of the code.

Prototype inheritance

JavaScript every object contains a hidden attribute __proto__, which is the prototype of the object. The object __proto__ points to is called the prototype object of that object. Here is the prototype chain, which is used to look up properties on an object, level by level as follows.

Inheritance is the ability of one object to access properties and methods in another object. In JavaScript, we implement inheritance in the form of stereotypes and stereotype chains.

The implementation of inheritance is implemented through constructors. Every function object has a public prototype property. When you create a new object using this function as a constructor, the prototype object of the newly created object points to the function’s Prototype property.

The scope chain

Each function needs to look up its scope when it executes, which is called function scope. V8 creates a scope for the function at compile time. Variables defined in the function and declared functions are thrown into this scope, and a hidden variable, this, is added.

At execution time, if a variable function is not found in the scope, it is searched in the scope in which the function was defined, not in the scope in which it was called. This is called the scope chain. Because JavaScript is lexing based, the order of the lexing is determined by the position at which the function is defined. The scope is defined at the time the function is declared, so the lexing is called static scope. It is the dynamic scope that cares where the function is called; the scope chain is based on the call stack, not the location of the function definition.

Look at one question:

var a = []
for (let i=0; i < 10; i++) {
    a[i] = function(){
        console.log(i)
    }
}
a[3] ()Copy the code

The I defined by the let runs the block-level scope for for, creating a block-level scope by executing the loop one at a time. In this block-level scope, a function is defined that refers to the external variable I, resulting in the closure. Since A is not destroyed, all I in the block-level scope will not be destroyed. Because each I value in the closure is different, you can print out the corresponding I value of 3. However, if you change the declaration of variable I from let to var, then I is a variable in the global scope. Since there is only one global scope, the value of I is always changing. It’s always going to print out 10.

Type conversion

We all know that JavaScript is a weakly typed language. When performing some operations, it will automatically perform type conversion. The execution flow of type conversion is as follows:

  1. First check whether the object existsvalueOfMethod, if it has and returns the original data type, then use that value for the cast
  2. ifvalueOfIf no primitive type is returned, use ittoStringMethod return value
  3. ifvalueOfandtoStringIf neither method returns a primitive data type, an error is reported.

Here is the problem:

let foo = {'10': 1.100: 'dds'.2: 'sda'.1: 'fdsf'.'ds': 'sda'.'3': 'dfs'}
foo.valueOf() Return the value of the object itself
foo.toString() // Return [object object]
foo + 2 // Result: [object object]2
// If you change foo's valueOf method
foo.valueOf = function(){return '100'}
foo + 2 // The result is 1002
foo.valueOf = function(){return 100}
foo + 2 // Result: 102
Copy the code

Compile pipeline

Runtime environment

Before executing the JavaScript code, V8 prepares the runtime environment for the code, including heap and stack space, global execution context, global scope, built-in built-in functions, extension functions and objects provided by the host environment, and the message loop system.

What is the host environment

To execute V8, you need a host environment, be it a browser, Nodejs, or another custom-developed environment, that provides the basic features V8 needs to execute JavaScript. Here is the functional relationship between the host environment and V8:

Heap space and stack space

Whenever a render process is opened in Chrome, it initializes V8 and the stack space at the same time. Stack space is used to manage JavaScript function calls. When a function is not called, it is pushed onto the stack. When a function is encountered during execution, it is pushed onto the stack again. After execution, the function is pushed off the stack until all function calls are complete and the stack is empty.

The stack space is a contiguous space in which the address of each element is fixed, so it is very efficient to find the stack space, but it is often difficult to allocate a large contiguous space in memory, so V8 limits the size of the stack space, and if the function calls are too deep, the stack may overflow.

Heap space is a tree storage structure used to store discrete data of reference type. Heap space can hold a lot of data, but it can be slow to read.

function add (x, y) {
    return x + y
}
function main () {
    let num1 = 2
    let num2 = 3
    let num3 = add(num1, num2)
    let data = {
        sum: num3
    }
    return data
}
main()
Copy the code

Let me explain the execution process of the above code:

  1. createmainThe stack frame pointer to the
  2. In the heart of the stacknum1Initialize thenum1 = 2
  3. In the heart of the stacknum2Initialize thenum2 = 3
  4. savemainThe top of the stack pointer to the function
  5. createaddThe stack frame pointer to the
  6. Initialize x on the stackx = 2
  7. Initialize y on the stacky = 2
  8. willThe x, y,The added value is stored in a register
  9. The destructionaddfunction
  10. The resurrectionmainThe top of the stack pointer to the function
  11. In the heart of the stacknum3Initialize thenum3= in registeraddThe return value of the function
  12. Create a new object in heap space, return the object address assigned todata
  13. Writes the return value to a register
  14. The destructionmainfunction

The following figure represents a stack:

The following diagram shows the relationship between registers and bytecode stacks:

  • Use an area of memory to hold bytecode
  • Use general purpose registers R0, R1… These registers hold some intermediate data
  • The PC register is used to point to the next bytecode to execute
  • Top of the stack register to point to the top of the stack
  • The accumulator is a very special register that holds intermediate results. For example, the function return ends the execution of the current function and passes control back to the caller, returning the value in the accumulator.

Global execution context

The execution context consists of three main parts, the variable context, the lexical context, and the this keyword. For example, in the browser environment, the global execution context contains the Window object, the this keyword pointing to the window, and some Web API functions, such as setTimeout, XMLHttpRequest, and so on. In the lexical context, variables such as let and const are included.

Construct the event loop system

For the event loop system, since all tasks are run on the main thread, V8 shares the main thread and message queue with the page in the browser, so if a function is executed too long, it can affect the page’s interactive performance.

Machine code: How does the CPU manipulate binary code

Take a look at this code:

int main(a) {
    int x = 1;
    int y = 2;
    int z = x + y;
    return z;
}

Copy the code

The above code will be compiled into assembly code, as well as binary code.As shown below, the storage of the stack isContinuous space, realizes the function on and off the stack by pointer and register:

Inert parsing

Lazy interpretation means that if the parser encounters a function declaration during parsing, it skips the code inside the function and generates only a function object instead of the AST and bytecode for it. The function object contains the name and code attributes. Name is the name of the function and code is the source of the function.

function foo(a, b) {
    var d = 100
    var f = 10
    return d + f + a + b
}
var a = 1
var c = 4
foo(1.5)
Copy the code

The result is an abstract syntax number for the top-level code:

In the case of closures, the pre-parser determines whether the internal function references its variables even though it is not parsed and executed during the execution phase of the function.

The preparser looks for syntax errors and checks for references to external variables inside the function. If an external variable is referenced, the pre-resolver copies the stack variable to the heap and uses the reference from the heap the next time the function is executed. This solves the problem that variables outside the closure cannot be destroyed.

Hidden classes

In V8, hidden classes are also known as maps, and each object has a map attribute whose value points to the hidden class in memory. Hidden classes describe the property layout of an object, including the property names and offsets corresponding to each property.

var point = {x=200,y=400}
Copy the code

Once you have a map, when you use point. X to access property X again, V8 will query the map for the offset of the x property relative to the point object, and then add the offset to the starting position of the point object to obtain the location of the x property value in memory, saving a complicated search process.

Note that:

  • Objects with the same structure and different values can share the same hidden class
  • If the object structure changes, the hidden classes have to be recreated, which can affect V8’s execution efficiency

So, when writing code, keep the following points in mind:

  1. When initializing objects with literals, ensure that the order of attributes is consistent. Because the order in which keys are initialized is different, the structure is different.
  2. Try to initialize the entire object properties at once using literals rather than adding them one by one.
  3. Try to avoid using the delete method to delete attributes.

In the bad1 example below, point and point3 share a hidden class, and the others each have a hidden class

// bad1
var point = {x=200,y=400}
var point2 = {y=200,x=400}
var point3 = {x=20,y=40}
var point4 = {x='200',y='400'}
// bad
var x = {}
x.a = 1
x.b = 2
Copy the code

Event loop system

You can do this problem first, and then think about how well you understand the loop of events. The answer, check yourself below.

function a () {return Promise.resolve(99).then((data) = > {
console.log(1)
setTimeout(() = > {
    console.log(data)
    console.log(70)},0) 
return 44})}async function b () {
    console.log(3)
    let c = await a()
    console.log(56)
    console.log(c)
    console.log(4)}setTimeout(() = > {console.log('setTimeout')}, 0)
setImmediate(() = > {console.log('setImmediate0')})
requestAnimationFrame(() = > {console.log('requestAnimationFrame')})
setImmediate(() = > {console.log('setImmediate2')})

console.log(5)
b()
console.log(6)
Copy the code
  • ReadFile: executed on the read-write thread
  • ReadFileSync: executed on the main thread
  • XMLHttpRequest: Executed in a network thread
  • SetTimeOut: There is another queue to hold the callbacks in the timer, and a task scheduler to retrieve the next execution event from a series of event queues according to certain rules

The macro task refers to the events in the message queue waiting for the main thread to execute:

  • <script>Running code in the tag
  • Event to trigger the callback function, for exampleDOM Events,I/O,requestAnimationFrame
  • setTimeout,setInterval,setImmediateThe callback function of

Microtasks are asynchronous functions that perform tasks (Promise, async/await, Generator, coroutines (coroutines are executed on the main thread) after the main function completes but before the current macro task completes.

  • promises:Promise.then,Promise.catch,Promise.finally
  • MutationObserver: Usage mode
  • QueueMicrotask: Usage mode
  • Process. nextTick: unique to Node

When a Promise is created, no microtask is generated. Instead, it is generated when the Promise object calls resolve or Reject. The resulting microtasks are not executed immediately, but wait until the current macro task is near the end of execution.

Did you get it right?

5 3 6 1 56 44 4 requestAnimationFrame setImmediate0 setTimeout  setImmediate2 99 70
Copy the code

Garbage collection mechanism

The general steps of garbage collection are as follows:

  1. Active and inactive objects in space are marked by GC Root. The accessibility algorithm currently used in V8 determines whether an object in the heap is an active object. Specifically, this algorithm takes some GC Roots as an initial collection of living objects, starts from CG Roots, and iterates through all objects in GC Root. The accessible objects are traversed and the inaccessible objects are not traversed.

There are many GC Root types:

  • The global Window object
  • A document DOM tree, consisting of all the native DOM nodes that can be reached by traversing the document
  • Store variables on the stack
  1. Reclaim memory occupied by inactive objects
  2. Memory consolidation

In V8, the heap is divided into two regions: new generation and old generation. The new generation stores objects with a short lifetime, and the old generation stores objects with a long lifetime. The new generation usually supports only 1-8m capacity, while the old generation is much larger.

Heap area Memory capacity Time to live Garbage collector Garbage collection frequency
New generation area 1-8M short Secondary garbage collector high
The old generation area A lot more long Main garbage collector low

Cenozoic recovery process

The new generation area is divided into object area and free area. New objects will be put into the object area. When the object area is full, garbage collection will begin. The surviving objects in the object area are copied to the free area, and then the roles of the free area and the object area are reversed. If the object survives two garbage collections in the new generation area, the object is placed in the old generation area, which is the object promotion strategy.

Cenozoic recovery process

Old area directly mark the object, write down to clear the inactive object, and then tidy up memory space.

How does V8 optimize garbage collector execution

Since JavaScript runs on the main thread, once the garbage collection algorithm is executed, it is necessary to pause the JavaScript script that is being executed and resume the script execution after garbage collection is complete. We call this behavior stop-the-world.

A full garbage collection is divided into markup and clean up the two stages, garbage data tag, the V8 will continue to perform cleaning and finishing operations, although the main garbage collector and the deputy of the garbage collector in a somewhat different approach, but they are carried out on the main thread, garbage recycling process, will suspend the other tasks on the main thread, The specific execution effect of full pause is shown in the figure below:

As you can see, garbage collection takes up time on the main thread. If the garbage collector takes too long on the main thread during garbage collection, as shown above, it takes 200 milliseconds, during which time the main thread cannot do anything else. For example, if the page is executing a JavaScript animation, and the garbage collector is working, the animation will not execute for 200 milliseconds, causing the page to stagnate and the user experience to be poor.

The V8 team has worked for many years to add parallel, concurrent, and incremental garbage collection techniques to the existing garbage collector to solve the user experience problems caused by total pauses, and has had some success. These technologies mainly solve the problem of garbage collection efficiency from two aspects:

First, a complete garbage collection task is broken up into smaller tasks, thus eliminating a single long garbage collection task.

Second, moving tasks such as marking objects and moving objects to background threads will greatly reduce the main thread pause time, improve page lag, and make animation, scrolling, and user interaction smoother.

Let’s take a closer look at how V8 adds parallelism, concurrency, and increments to existing garbage collectors to improve garbage collection execution.

Parallel recovery

The so-called parallel collection means that the garbage collector will open multiple helper threads to perform the same collection work at the same time in the process of executing on the main thread. Its working mode is shown in the following figure:

With parallel collection, the time consumed by garbage collection is equal to the time consumed by the total worker threads (the number of worker threads times the time consumed by individual threads), plus some synchronization overhead.

V8’s side garbage collector uses a parallel strategy. It executes garbage collection by starting multiple threads to clean up the garbage in the new generation, moving data from the object space to the free area at the same time. Because the address of the data has changed, Pointers referencing these objects also need to be updated synchronously.

The incremental recovery

While the parallel strategy increases the efficiency of garbage collection and optimizes the subgarbage collector, it is still a fully paused garbage collection approach, with the helper threads only starting when the main thread is collecting, which still has efficiency issues.

For example, the old generation stores some large objects, such as Window, DOM, etc., and the complete implementation of the old generation garbage collection will still take a long time. These large objects belong to the main garbage collector, so in 2011 V8 introduced incremental markup, which we call incremental garbage collection.

Incremental garbage collection means that the garbage collector breaks down the marking work into smaller chunks and executes them between different tasks on the main thread. With incremental garbage collection, it is not necessary for the garbage collector to perform the entire garbage collection process all at once, but only a small part of the entire garbage collection process, as shown in the following figure:

Concurrent collection

Although incremental garbage collection can be implemented well with tricolor notation and write barriers, because these operations are performed on the main thread, incremental garbage collection can still increase throughput if the main thread is busy.

With parallel collection, you can assign some tasks to the worker threads, but parallel collection still blocks the main thread. Is there a way to perform garbage collection without blocking the main thread? There is, and this is the concurrency recycle mechanism we’re going to focus on. Concurrent collection means that during the execution of JavaScript by the main thread, the helper thread can perform garbage collection in the background.

As you can see, the main garbage collector uses all three strategies:

  • First, the main garbage collector mainly uses concurrent tags. As you can see, when JavaScript is executed in the main thread, the worker thread starts to perform the tags, so the tags are done in the worker thread.
  • After the tag is complete, the parallel cleanup operation is performed. While the main thread is cleaning up, multiple worker threads are also cleaning up.
  • In addition, the main garbage collector uses incremental markup, with cleanup tasks interspersed between JavaScript tasks.

A memory leak

The main cause of memory leaks is that no longer needed memory data is still referenced by other objects.

Example 1: memory leaks caused by global objects in non-strict mode

function foo () {
    this.memeryLeak = new Array(200000)
}
foo()
Copy the code

Unexpectedly, such a code caused a memory leak! Because foo’s scope is global, this refers to the window object when foo is called. When foo is called, memeryLeak is still referenced by the Window object, causing a memory leak.

Solution:

  1. To the JavaScript file headeruse strict, use strict patterns to avoid unexpected global variables, where this in the example above points toundefined

Example 2: Memory leaks caused by closures

function fooLizzy () {
    const leakObject = new Object()
    leakObject.a = 1
    leakObject.b = 3
    leakObject.c = new Array(200000)
    return function() {
        console.log(leakObject.a)
    }
}
const foo = fooLizzy()
Copy the code

Due to the closure, this leakObject will not be garbage collected, and the entire leakObject cannot be garbage collected, even though it only references LeakObject.a. So, it’s better to:

function fooLizzy () {
    const leakObject = new Object()
    leakObject.a = 1
    leakObject.b = 3
    leakObject.c = new Array(200000)
    const needUse = leakObject.a
    return function() {
        console.log(needUse)
    }
}
const foo = fooLizzy()
Copy the code

At this point, the only object that won’t be garbage collected is needUse’s value 1.

Example 3: MEMORY leaks caused by DOM nodes

This is usually the case where the JS references the DOM node and the JS is never destroyed, but the DOM node is drained from the page, and the DOM node’s data is still stored in the heap because the JS holds the reference to the DOM node. This KIND of DOM node is commonly referred to as “detached.” Detached nodes are a common cause of DOM memory leaks, and we need to be very careful.

Frequent garbage collection triggers

Frequent use of a large number of temporary variables causes the new generation space to fill up quickly, triggering frequent garbage collection. Frequent garbage collection operations can make your pages feel sluggish. To solve this problem, consider setting these temporary variables as global variables. In this way, variables are stored in the old area, which has a large capacity and a different garbage collection mechanism than the new area.


function strToArray(str) {
  let i = 0
  const len = str.length
  let arr = new Uint16Array(str.length)
  for (; i < len; ++i) {
    arr[i] = str.charCodeAt(i)
  }
  return arr;
}


function foo() {
  let i = 0
  let str = 'test V8 GC'
  while (i++ < 1e5) {
    strToArray(str);
  }
}


foo()
Copy the code

The scene of a

Node.js V4. X, the BFF layer server has written a lib module in the JS code to cache LFU and LRU, which is used to cache data returned by the back end. When using memory as cache, the cache module is frequently called when the online QPS is large, resulting in a noticeable GC STW phenomenon, which externally shows that Node is slow to return upstream HTTP. Since nginx was upstream and had timeout retry enabled, this memory GC problem caused an avalanche of Nginx retraction when node returned more time than the Nginx timeout threshold. The BFF layer of Node is not suitable for caching in memory.

Scenario 2

Operation scenario: K line quotation list technical solution, Websocket push binary data (2 times/second) -> convert to UTF-8 format -> Check whether the data is the same -> render to DOM problems: page running for a long time after the phenomenon of lag problem analysis: Converting binary data to UTF-8 frequently triggers a garbage collection mechanism solution: back-end push takes the form of incremental push