preface

As we all know, JavaScript works in the browser environment and NodeJS environment because of the V8 engine behind the scenes. It is essential for the entire process, from compilation, memory allocation, running, and garbage collection.

Before writing this article, I also read a lot of blogs on the Internet, including some original English content, so I want to do a summary through this article, the article added my own thinking, and pure handmade flow chart ~~

Hope this article can help you, and it will be included in my own personal website.

Why do we have garbage collection

In C and C++ languages, if we want to open up a heap of memory, we need to calculate the size of the memory, and then manually allocate the memory through the malloc function, and always remember to use the free function to clean up after the use of memory, otherwise the memory will be permanently occupied, causing memory leak.

However, when writing JavaScript, we don’t have this process, because it is already encapsulated for us, V8 will automatically allocate memory based on the size of the object you are currently defining.

There is no need for us to manually manage the memory, so it is natural to have garbage collection. Otherwise, if we only allocate and do not recycle, it will not take long for the memory to be full, causing the application to crash.

The advantage of garbage collection is that we do not need to manage memory, more energy on the implementation of complex applications, but the disadvantage also comes from this, do not need to manage, may not pay attention when writing code, causing circular reference and other conditions, resulting in memory leaks.

Memory structure allocation

Since V8 was originally built for JavaScript execution in the browser, it is unlikely to encounter scenarios that use a lot of memory, so the maximum memory it can claim is not set too high, around 1.4GB on 64-bit systems and 700MB on 32-bit systems.

In the NodeJS environment, we can view memory allocation by **process.memoryUsage()**.

Process. memoryUsage returns an object containing information about the Node process’s memoryUsage. This object contains four fields with the following meanings:

RSS (residentsetHeapTotal: specifies the maximum amount of memory that V8 can allocate, including heapUsed. HeapUsed: specifies the amount of heap memory that V8 has allocated. V8 manages the memory that C++ objects bind to JavaScript objectsCopy the code

All the preceding memory units are bytes.

If you want to increase the amount of memory available for a Node, you can use out-of-heap memory such as a Buffer.

Here’s the overall architecture of Node to help you understand what’s going on:

The Node Standard Library is used every day, such as Http, Buffer module Node Bindings. It is the bridge between JS and C++, encapsulates the details of V8 and Libuv, and provides basic API services to the upper layer. The third layer is the key to support the running of Node.js, and is implemented by C/C++ : 2. Libuv is a wrapper library specifically developed for Node.js, providing cross-platform asynchronous I/O capability. 3. 4. Http_parser, OpenSSL, zlib, etc. : Provides other capabilities including HTTP parsing, SSL, data compression, etcCopy the code

Garbage collection mechanism

1. How to determine whether it can be recycled

1.1 Mark Clearing

When a variable enters the environment (for example, when a variable is declared in a function), the variable is marked as “entering the environment.” Logically, the memory occupied by variables entering the environment can never be freed, because they are likely to be used as soon as the execution flow enters the appropriate environment. When a variable leaves the environment, it is marked “out of the environment.”

Variables can be marked in any way. For example, you can record when a variable enters the environment by flipping a particular bit, or you can use a list of variables “in the environment” and a list of variables “out of the environment” to track which variables have changed. It doesn’t matter how you mark your variables, it’s what you do.

  • (1) When the garbage collector is running, it marks all variables stored in memory (of course, it can use any marking method).
  • (2) It then unflags variables in the running environment and variables referenced by variables in the environment
  • (3) After that, variables that are still marked are considered to be ready to be deleted because they are no longer accessible in the runtime environment.
  • (4) Finally, the garbage collector completes the memory cleanup, destroying those marked values and reclaiming the memory space occupied by them.

Currently, the JavaScript implementations of Internet Explorer, Firefox, Opera, Chrome, and Safari all use a tag-clear garbage collection policy (or something similar), but at different intervals.

If an object and its associated objects are no longer referenced by the current root, the object will be garbage collected.

1.2 Reference Counting

Garbage collection policies for reference counting are less common. The meaning is to track the number of times each value is referenced. When a variable is declared and a reference type value is assigned to it, the number of references to the value is 1.

If the same value is assigned to another variable, the number of references to that value is increased by one. Conversely, if the variable containing the reference to the value changes the reference object, the number of references to the value is reduced by one.

When the number of references to the value goes to 0, it means that the value is no longer accessible, so the memory it occupied can be reclaimed.

This way, the next time the garbage collector runs, it frees the memory occupied by the values that are referenced zero times.

Netscape Navigator 3.0 was the first browser to use reference counting policies, but it soon ran into a serious problem: circular references.

A circular reference is when object A contains A pointer to object B, and object B contains A reference to object A. Here’s an example:

function foo () {
    var objA = new Object();
    var objB = new Object();
    
    objA.otherObj = objB;
    objB.anotherObj = objA;
}
Copy the code

In this example, objA and objB refer to each other by their properties, that is, they are both referenced 2 times.

In a mark-clear implementation, since both objects are out of scope after the function is executed, this mutual reference is not an issue.

In an implementation with the count of references policy, however, objA and objB will continue to exist after the function is executed because their count will never be zero.

Adding that this function is called multiple times can cause a lot of memory not to be reclaimed. To this end, Netscape also abandoned reference counting in Navigator 4.0 in favor of tag cleanup for its garbage collection mechanism.

Also note that most of us write code for circular references all the time. Take a look at this example, which I’m sure you’ve done:

var el = document.getElementById('#el');
el.onclick = function (event) {
    console.log('element was clicked');
}
Copy the code

We bind an anonymous function to the click event of an element. We use the event parameter to get information about the element el.

Is this a circular reference? El has a property onclick that refers to a function (which is actually an object), and the arguments in the function refer to el, so that el is always referenced by 2 times, so that even if the page is currently closed, it is not garbage collected.

If you do this too many times, you can cause a memory leak. We can recycle this by clearing the event reference when the page is uninstalled:

var el = document.getElementById('#el');
el.onclick = function (event) {
    console.log('element was clicked'); } / /... / /... // Clear the bound event window. Onbeforeunload = when the page is uninstalledfunction(){
    el.onclick = null;
}
Copy the code

V8 garbage collection policy

There are many algorithms for automatic garbage collection, and because different objects have different lifetimes, it is not possible to solve the problem with just one collection strategy, which would be inefficient.

Therefore, V8 adopts a generation recycling strategy, dividing memory into two generations: New Generation and old generation.

New generation of object for shorter survival time of the object, the old generation of objects in the live for a long time or for permanent memory object, adopts different recycling of old new generation algorithm to improve efficiency, and the object at the beginning will be assigned to a new generation (if the new generation of memory space is not enough, direct distribution to the old generation), after the object in the new generation will satisfy certain conditions, Being moved to an older generation, a process also known as promotion, which I’ll explain in more detail later.

Generational memory

By default, the new generation memory is 16MB on a 32-bit system and the old generation memory is 700MB. In a 64-bit system, the new generation memory is 32MB and the old generation memory is 1.4GB.

The new generation is divided into two equal chunks of memory, called semispace, each of 8MB (32-bit) or 16MB (64-bit) size.

The new generation

1. Distribution mode

The new generation of objects are short-lived, and memory allocation is easy. Only a pointer to the memory space is saved, and the pointer is increased according to the size of the allocated object. When the storage space is nearly full, a garbage collection is performed.

2. The algorithm

Scavenge is used in the new generation. The application of the Scavenge garbage collection algorithm is mainly based on the Cheney algorithm.

The Cheney algorithm splits memory in two, called Semispace, with one in use and one idle.

The semispace that is in use is called the From space, and the semispace that is idle is called the To space.

I have drawn a detailed set of flow charts, which I will then use in detail to illustrate how The Cheney algorithm works. Garbage Collection is referred to as Garbage Collection.

Step1. Three objects A, B, C are allocated in the From space

Step2. GC comes in and determines that object B has no other reference and can be reclaimed. Objects A and C are still active objects

Step3. Copy active objects A and C From the From space To the To space

Step4. Clear all the memory in the From space

Step5. Swap From space and To space

Step6. Add two objects D and E in the From space

Step7. The next GC comes in and finds that there is no reference to object D, mark it

Step8. Copy active objects A, C, and E From the From space To the To space

Step9. Clear all the memory in the From space

Step10. Continue swapping From space and To space To start the next round

From the flowchart above, it is clear that the exchange between From and To is intended To keep the active objects in one semispace while the other semispace remains idle.

Scavenge is excellent in terms of time efficiency because it only replicates surviving objects, and takes a small proportion of surviving objects in scenarios with short life cycle. Scavenge has the disadvantage of using only half of the heap memory, due to the partitioning and replication mechanism.

Scavenge is a typical spatial-for-time algorithm, so it cannot be applied to all garbage collections on a large scale. Be insane. However, we can see that Scavenge is very suitable for application in the Cenozoic era, because the life cycle of the object in the Cenozoic is short, it is exactly suitable for this algorithm.

3. The promotion

An object is considered long-lived when it survives multiple copies. Such long-lived objects are then moved to the old generation and managed using a new algorithm.

The process of an object moving from the new generation to the old generation is called promotion.

There are two main conditions for an object to be promoted:

  1. When an object is copied From the From space To the To space, it checks its memory address To determine if the object has undergone a Scavenge. If already experienced, the object is moved From the From space To the old generation space, or copied To the To space if not. In summary, if an object is copied From the From space To the To space for the second time, then the object is moved To the old generation.

  2. When an object is copied From the From space To the To space, if the To space has been used by more than 25%, the object is directly promoted To the old generation. Be insane. The reason for setting the 25% threshold is that when the Scavenge is completed, the To space will be changed To the From space, and the next memory allocation will be made in this space. If the proportion is too high, subsequent memory allocation will be affected.

The old generation

1. Introduction

In the old generation, the surviving object accounts for a large proportion. If the Scavenge algorithm is continued to be applied for management, there will be two problems:

  1. Because there are many living objects, the efficiency of copying living objects is very low.
  2. Use Scavenge algorithm will waste half of the memory, because the old generation heap memory is much greater than the new generation, so the waste can be serious.

Therefore, V8 mainly used mark-sweep and Mark-sweep combined method for garbage collection in the old generation.

2. Mark-Sweep

Mark-sweep means “Mark Sweep”. It is divided into two stages: Mark Sweep and Sweep.

Unlike Scavenge, Mark-Sweep does not split memory in two, so there is no waste of half the space. Mark-sweep traverses all objects in the heap memory in the marking phase and marks the living objects, and in the subsequent cleaning phase only unmarked objects are cleared.

Scavenge copies only living objects, while Mark-Sweep cleans only dead objects. The number of living objects in the new generation is small, and the number of dead objects in the old generation is small. This is why both recycling methods are efficient.

Let’s look at it through the flowchart:

There are objects A, B, C, D, E, F in the old generation

Step2. GC enters the marking stage and marks A, C, and E as the live objects

Step3. GC enters the cleanup phase and reclaims the memory space occupied by the dead objects B, D, and F

As you can see, the biggest problem with Mark-Sweep is that the memory space is discontinuous after a clean collection. This fragmentation can cause problems with subsequent memory allocation.

If there is a situation where a large amount of memory needs to be allocated, garbage collection is triggered early because the remaining debris space is not sufficient to complete the allocation, and the collection is not necessary.

2. Mark-Compact

To solve mark-Sweep’s memory fragmentation problem, Mark-Compact was proposed.

** Mark-compact means Mark Sweep, ** is an evolution of Mark-sweep. Mark-compact moves the living object to one end of the memory space after marking it. When the move is complete, all memory outside the boundary is cleaned up. As shown in the figure below:

A, B, C, D, E, F (same as Mark — Sweep)

Step2. GC enters the marking phase and marks A, C and E as live objects (same as Mark — Sweep)

Step3. GC enters the sorting stage, and all the living objects are moved to the side of the memory space. The gray part is the space left after moving

Step4. GC enters the cleanup phase and reclaims all the memory on the other side of the boundary at one time

3. Combine the two

In V8’s recycling strategy, Mark-Sweep and Mark-Conpact are used in combination.

Since Mark-Conpact requires objects to be moved, it is unlikely to perform very fast. For trade-off, V8 uses mark-Sweep mainly, and mark-Compact only when there is not enough space to allocate objects promoted from the new generation.

conclusion

V8’s garbage collection mechanism is divided into the new generation and the old generation.

The cenogeneration is mainly used To Scavenge. The primary implementation is The Cheney algorithm, which divides the memory evenly into two pieces, the use space is called From, and the idle space is called To. New objects are allocated To the From space first. Swap the From and To Spaces and continue allocating memory. When those two conditions are met, the object is promoted From the new generation To the old generation.

The old generation mainly adopts mark-Sweep and Mark-compact algorithms, one is Mark Sweep, the other is Mark cleaning. The difference between Mark-Sweep and Mark-Compact is that mark-Sweep generates fragmented memory after garbage collection, while Mark-Compact cleans up memory by moving the living objects to one side and then emptying the memory on the other side of the boundary. The free memory is continuous, but the problem is that it is slower. In V8, old age is managed by both Mark-Sweep and Mark-Compact.

That’s all for this article, which was written with reference to a number of Chinese and foreign articles, including NodeJS in Simple Terms by Park Dada and JavaScript Advanced Programming. We have not discussed the specific algorithm implementation here, interested friends can continue to study.

Finally, thank you for reading this, and if there is any ambiguity or error in this article, please leave me a comment ~~

Welcome to follow my public account

Refer to the link

medium.com/@_lrlna/gar… Alinode.aliyun.com/blog/14 www.ruanyifeng.com/blog/2017/0… Segmentfault.com/a/119000000…