Do you really know anything about garbage collection

Statement: This article for the nuggets first signing article, unauthorized reproduction is prohibited.

Writing in the front

We know that garbage collection mechanism is the engine to do, JS engines there are many kinds of every browser (different), the garbage collection mechanism is slightly different in some details and optimization, in this paper, we use some general recovery algorithms as cuts, again by a V8 engine development since the optimization of the mechanism as an example, why take V8 for example? Because it has a large market share 😄), step by step helps us understand the garbage collection mechanism, because only truly understand the garbage collection mechanism, later to understand the memory leak issues and manual prevention and optimization

JavaScript is a fascinating language. How much do you know about GC? I think most people because of the interview to see some interview questions to understand the garbage recycling, that before the formal start, give you a few small questions, you can first think about the answer, with the question and answer to read the article, finally read this article if your answer can be optimized, that is harvest

What is garbage collection?

How is garbage produced?

Why garbage collection?

How does garbage collection work?

What optimizations does V8 make for garbage collection?

Of course, it’s not just for the interview, it’s to get to the bottom of GC once and for all! If you do not understand a certain piece of content, do not worry, read the whole article to understand the content and then go back and look carefully again will be a lot clearer, dry goods full, first praise after look oh

What is the GC

Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection, Garbage Collection. The GC process is relatively insensitive, and the set of operations that the engine performs that are relatively insensitive to us is known as garbage collection

Of course, not all languages have GC, general high-level languages will come with GC, such as Java, Python, JavaScript, etc., and there are no GC languages, such as C, C++, etc., then this needs our programmers to manually manage the memory, relatively more trouble

Waste generation & Why recycling

We know that when you write code, you create a basic type, object, function… These are memory requirements, but we don’t care about them because they are allocated by the engine, so we don’t need to explicitly allocate memory manually

But have you ever wondered what happens when we no longer need something? How does the JavaScript engine find it and clean it up?

Let’s take a simple example

let test = {
  name: "isboyjc"
};
test = [1.2.3.4.5]
Copy the code

As shown above, we assume it is a complete program code

We know that reference data types in JavaScript are stored in heap memory, and then a reference to the actual object in heap memory is stored in stack memory. Therefore, any operation on a reference data type in JavaScript is a reference to the object rather than the actual object. The simple idea is that stack memory holds an address that is related to the actual value in heap memory

We declare the variable test, which refers to the object {name: ‘isboyjc’}, and then we assign the variable to an array object, which means that the variable refers to an array

If there is no reference, the object is useless. At this time, if it is left alone, one or two objects will be fine. If there is more, the memory will not be able to bear it, so it needs to be cleaned up (recycling).

In official words, the running of the program needs memory, as long as the program is required, the operating system or the runtime must provide memory, so for the continuous running service process, must be timely release of memory, otherwise, memory usage is getting higher and higher, it will affect the system performance, heavy will lead to process crash

Garbage Collection Policy

In JavaScript memory management, there is a concept called reachability, which means that values that are accessible or available in a certain way are guaranteed to be stored in memory, and otherwise unaccessible values are reclaimed

As for how to recycle, in fact, how to find these unreachable objects (garbage) and give it to clean up the problem, JavaScript garbage collection mechanism principle is to periodically find those no longer used memory (variables), and then free its memory

You might be wondering why not just find and free up the unused memory in real time? It’s simply too much overhead in real time

We can all Get to the point of this, which is how to find the so-called garbage?

This process involves a number of algorithmic strategies, in many ways, but let’s briefly introduce two of the most common ones

Marker clearing algorithm
Reference counting algorithm

Marker clearing algorithm

strategy

Mark-sweep is the most commonly used algorithm in JavaScript engines. Up to now, most browsers use mark-sweep algorithm in JavaScript engines, but major browser manufacturers have also optimized this algorithm. And the JavaScript engines of different browsers vary in how often they run garbage collection

As its name suggests, the algorithm is divided into two phases: marking, which marks all active objects, and clearing, which destroys unmarked (that is, inactive) objects

You might be wondering how do you tag variables? Actually there are many kinds of ways, such as when the variables into the execution environment, inversion of a particular (through a binary characters to represent markup), or can maintain into the environment and leave the environment variables so that the two list, free transfer variable from one list to another list, there are many other ways. Well, it doesn’t really matter to us how it’s marked, it’s the strategy, right

When the engine performs GC (using the mark-clearing algorithm), it needs to traverse all the objects in memory from the starting point to mark, and there are many starting points, we call a set of root objects, and the so-called root objects in the browser environment, including and more than the global Window object, document DOM tree, etc

The general process of the tag clearing algorithm is as follows

At run time, the garbage collector will mark all variables in memory, assuming that all objects in memory are garbage and all marks are 0
It then iterates from each of the root objects, changing the non-garbage nodes to 1
Clean up all garbage marked 0, destroy and reclaim the memory space they occupy
Finally, change all in-memory object markers to 0 and wait for the next round of garbage collection

advantages

The only advantage of the marker clearing algorithm is that it is simple to implement, and there are no more than two cases of marking, which makes it possible to mark one binary bit (0 and 1), very simple

disadvantages

Tag removal algorithm has a big weakness, is after the removal of the rest of the object memory location is the same, can also lead to free memory space is discrete, the memory fragments (pictured), and because the remaining free memory is not a block, it is made up of different sizes of memory memory list, it will get sucked out of the memory allocation problem

Suppose we need size to allocate memory for a new object. Since free memory is discontinuous and discontinuous, we need to conduct a one-way traversal of the free memory list to find blocks larger than or equal to size before allocating it (as shown in the figure below).

So how do you find the right pieces? There are three allocation strategies we can employ

First-fit, returns a block larger than or equal to size immediately
Best-fit, which traverses the entire free list, returning the smallest chunk of the list that is larger than or equal to size
Worst-fit, iterate through the entire free list, find the largest chunk, then cut into two parts, one size size, and return that part

Among the three strategies, the space utilization of worst-fit seems to be the most reasonable, but in fact, more small pieces will be created after splitting, forming memory fragments, so it is not recommended to use. For first-fit and best-fit, Given the speed and efficiency of distribution, first-fit is a more sensible choice

To sum up, there are two obvious disadvantages of the tag removal algorithm or strategy

Memory fragmentation, free memory blocks are discontinuous, prone to many free memory blocks, and may occur when allocating an object that requires too much memory to find a suitable block
Slow distributionBecause even usingFirst-fitStrategy, whose operation is still aO(n)In the worst case, we have to go all the way to the end each time, and because of fragmentation, large objects are allocated more slowly

PS: The shortcomings of the marker clearing algorithm complement

At the end of the day, the disadvantage of the marker clearing algorithm is the free memory discontinuity caused by the unchanged position of the remaining objects after the clearing, so as long as this is addressed, both disadvantages can be solved perfectly

The mark-Compact algorithm can effectively solve this problem. The marking phase of mark-compact is no different from that of mark-clean algorithm, except that after the marking is over, the mark-Compact algorithm moves the living objects (that is, the objects that do not need to be cleaned up) to one end of the memory, and finally clears the memory at the edge (as shown below).

Reference counting algorithm

strategy

Reference Counting is an earlier type of garbage collection algorithm that simply defines whether an object is no longer needed by other objects. If there is no Reference to the object (zero Reference), the object will be collected by the garbage collection mechanism. Currently, this algorithm is rarely used. Because it has a lot of problems, but we need to know about it

Its strategy is to track the number of times each variable value is used

When a variable is declared and a reference type is assigned to it, the number of references to the value is 1
If the same value is assigned to another variable, the number of references is increased by one
If the value of the variable is overwritten by another value, the number of references is reduced by one
When the number of references to this value goes to zero, it means that no variable is in use and the value cannot be accessed. Reclaim space. The garbage collector cleans up the memory occupied by the zero reference value at runtime

The following example

let a = new Object(a)// This object has a reference count of 1 (a reference)
let b = a 		// This object has a reference count of 2 (a,b reference)
a = null  		// This object has a reference count of 1 (b reference)
b = null 	 	// This object has a reference count of 0 (no reference).// GC reclaims this object
Copy the code

Isn’t that easy? Simple enough, but not long after reference counting was introduced, it ran into A serious problem: circular references, in which object A has A pointer to object B, which in turn refers to object A, as in the following example

function test(){
  let A = new Object(a)let B = new Object()
  
  A.b = B
  B.a = A
}
Copy the code

As shown above, objects, A and B through their respective attributes refer to each other, in accordance with the above reference counting strategy, their reference number is 2, but, in the function test execution is completed, the object of A and B are to be cleaned, but will not be clear, use reference counting number of reference because they will not be zero, If this function is called multiple times in the program, a large amount of memory will not be freed

We use tags to clear point of view, when the function is complete, two objects are not in the scope, A and B will be regarded as to remove the inactive objects, by contrast, the reference count is not released, also can cause A lot of useless memory footprint, this also is later abandon reference counting, use the tag to remove one of the reasons

In IE8 and earlier versions of IE, BOM and DOM objects are not native JavaScript objects. They are COM (Component Object Model) objects implemented by C++. COM objects are garbage collected using a reference counting algorithm, so even if the browser uses a tag clearing algorithm, if there’s a circular reference to a COM object, like two DOM objects that reference each other, etc., and if you want to solve circular references, You need to set the reference address to NULL to break the relationship between the variable and the previously referenced value, as follows
/ / COM object
let ele = document.getElementById("xxx")
let obj = new Object(a)// Create a circular reference
obj.ele = ele
ele.obj = obj

// Cut the reference
obj.ele = null
ele.obj = null
Copy the code
However, in IE9 and later, BOM and DOM objects are changed to JavaScript objects, which avoids the above problems

Refer here to JavaScript Advanced Programming Version 4, section 4.3.2

advantages

The advantages of the reference counting algorithm are much clearer when compared to flag clearing. First of all, the reference count is collected when the reference value is zero, which is the moment it becomes garbage, so it can collect garbage immediately

And tag removal algorithm requires a every once in a while, that in the process of the application (JS) script running thread will have to suspend to perform a period of GC, in addition, the tag removal algorithm need to traverse the activities in the heap and inactive objects to clear, and the reference count is just when the reference count is ok

disadvantages

The disadvantages of reference counting are obvious. First, it requires a counter, which takes up a lot of space, because we don’t know the upper limit on the number of references, and the most serious problem is that circular references can’t be collected

V8 optimizations for GC

As we said above, most browsers are based on the tag cleaning algorithm, V8 is also based on it, of course V8 has some optimization, so let’s focus on V8’s garbage collection optimization

Generational garbage collection

Imagine, we said above rubbish algorithm in each time the garbage collection will check all the objects in the memory, so for some big, old, survival time long object with new, small, short survival time of object a frequency of check is very bad, because the former needs long time and don’t need to clean frequently, the latter the contrary, How to optimize this?? So here’s the generation

The old new generation

V8’s garbage collection strategy is mainly based on the generational garbage collection mechanism. V8 divides the heap memory into the new generation and the old generation, and adopts different garbage collectors, which means different strategies to manage garbage collection

New generation objects are newly generated objects with a short lifetime, and usually support only 1 to 8 meters of capacity. Old generation objects are objects with long lifetime events or resident memory. In short, old generation objects are objects that survive the garbage collection of new generation, and their capacity is usually large

V8 the size of the entire heap memory is equal to the memory of the new generation plus the old generation (figure below)

For the new and old two memory areas of garbage collection, V8 uses two garbage collectors to control, we will manage the new generation of garbage collector called the new generation garbage collector, similarly, we call the management of the old generation garbage collector called the old generation garbage collector

New Generation of garbage collection

Avenge is the application of an algorithm called Scavenge. In the implementation of Scavenge, the application mainly adopts a duplicant method called The Cheney algorithm, which we describe in detail

The Cheney algorithm splits the heap into two parts: the space in use, let’s call it the use area, and the space in idle, let’s call it the free area, as shown below

Newly added objects are stored in the use area. When the use area is nearly full, a garbage cleaning operation is required

When garbage collection begins, the new generation garbage collector will mark the active objects in the use area. After the mark is completed, the active objects in the use area will be copied into the free area and sorted, and then enter the garbage cleaning phase, which is to clear up the space occupied by inactive objects. Finally, the role swap, the original use area into the free area, the original free area into the use of the area

If an object is still alive after multiple copies, it is considered to have a long life cycle and is moved to the old generation and managed using the old generation garbage collection policy

Another kind of circumstance, if copy an object to the free area, leisure area space take up more than 25%, then the object will be promoted straight to the old generation space, set to 25% the proportion of the reason for this is that when the completion of Scavenge recycling, leisure area will turn into use area, continue to undertake object memory allocation, if the proportion is too large, Subsequent memory allocation will be affected

Old generation garbage collection

Compared to the new generation, the old generation of garbage collection is easy to understand, we have said above, for most will take up the space is large, the survival time long object is assigned to the old generation, because the old generation of objects is usually large, if again like the new generation of partitioning and replication to copy and you will be very time consuming, leading to recovery of execution efficiency is not high, So the old-generation garbage collector manages its garbage collection execution, and its entire process uses the marker removal algorithm described above

The first is the marking stage, starting from a set of root elements, recursively traversing this set of root elements, traversing the elements that can be reached in the process of called active objects, not reached elements can be judged as inactive objects

In the cleanup phase, the old garbage collector will simply clean up the inactive objects, that is, the data

As mentioned earlier, the tag cleaning algorithm will generate a large number of discontinuous memory fragments after the cleanup. Too much fragmentation will cause large objects to be unable to allocate enough continuous memory. In V8, the tag cleaning algorithm described above is used to solve this problem to optimize the space

Why do we need a generation?

As the subtitle says, why do we need generational? What are the advantages of this mechanism and what problems does it solve?

In fact, it’s not a solution, it’s an optimization, right

Generational type mechanism to some new, small, short survival time of the object as a new generation, use a small piece of memory quickly clean up higher frequency, survival time and some of the big, old, long object as the old generation, seldom examined, new recycling of old generation mechanism and frequency is different, can say the emergence of the mechanism greatly improve the efficiency of garbage collection

Parallel Collection

Before we go into parallelism, we need to understand The concept of stop-the-world. We all know that JavaScript is a single-threaded language, it runs on The main thread, and that blocks The execution of The JavaScript script when it is garbage collected. Waiting for garbage collection to complete before resuming script execution is called full pausing

For example, if a GC takes 60ms, then our application logic has to be paused for 60ms. If a GC takes too long, it may cause page lag and other problems for users

Given the time it takes to perform a GC, considering the difficulty of one person building a house, two people, ten people… ? Switching to the program side, can we introduce multiple helper threads to process at the same time, which will speed up garbage collection? So the V8 team introduced parallel recycling

By parallel, I mean at the same time, it means that the garbage collector opens multiple helper threads while executing on the main thread, performing the same collection at the same time

In simple terms, with parallel recycling, if the main thread used to work by one person, it would take one person 3 seconds, but now two helper threads are called to work with the main thread, and the three people work together and one person can finish the work in 1 second, but because many people work together, Therefore, we need to add a part of the time of multi-person collaboration (synchronization overhead) let’s calculate 0.5 seconds, which means that with the parallel strategy, the work that should take 3 seconds can now be done in 1.5 seconds

Can be done, but while 1.5 seconds also greatly reduced, but the 1.5 seconds, the main thread or need to get out, it is also for thread or need to get out, the process of memory is static, don’t need to worry about memory object reference relations change, only need to consider the synergies, it is very easy to implement

New generation space using the parallel strategy, in the process of performing garbage collection, will be launched multiple threads to responsible for the new generation of junk cleaning operations, the threads at the same time moving objects in space data to the free area, this process will change as the data address, so you also need to update refer to these object pointer, this is known as parallel recovery

Incremental markup and lazy cleanup

Although the parallel strategy mentioned above can increase the efficiency of garbage collection and can be well optimized for the new generation of garbage collector, it is still a full stoppage garbage collection method. For the old generation, its internal storage is some relatively large objects. GC for these large objects can consume a lot of time even if we use a parallel strategy

So in 2011, IN order to reduce the total pause time, V8 optimized the old generation of markers, switching from full pause markers to incremental markers

What is delta

Increments are the process of dividing a GC mark into small steps, each of which allows the application logic to execute for a while, alternating many times to complete a round of GC marks (as shown below)

Imagine that a complete GC mark is executed several times. How do you pause to execute the task after each small GC mark execution and then resume? What if, after a full GC marker block pause, the marked object references in memory are modified when the task is executed?

As you can see, incremental implementation is a bit more complicated than parallel, and V8’s solutions to both problems are tricolor notation and write barriers, respectively

Three-color marking (pause and resume)

We know that the old generation is to use tag clean algorithm, and the above tags we said in the cleaning up, that is before no incremental algorithm, simple to use black and white tag data is ok, the tag process that is in the execution before a full GC tag, garbage collector will all of the data set to white, Then, the garbage collector will start from a set of objects and mark all accessible data as black. After traversal, the marked black data object is the active object, and the remaining white data object is the garbage object to be cleaned up

If we use a black and white markup strategy, after the garbage collector performs an incremental collection, pause and enable the main thread to execute a piece of JavaScript code in the application, and then when the garbage collector is started again, the memory is black and white, we don’t know where we are next

To solve this problem, the V8 team used a special method: three-color marking

Tricolor notation uses two markup bits and a markup worksheet for each object, and the two markup bits encode three colors: white, gray, and black

White refers to unmarked objects
Gray means that the object itself is marked and the member variable (the reference object to the object) is not marked
Black means that both self and member variables are marked

As shown in the above, we use the simplest way to explain the process, all objects are initially white, means that the collector didn’t mark them, starting from a set of the root object, the first group of the root object for grey and push to the marking work table, when the collector from mark visit it pops up in the worksheet object and the reference object, Turns itself from gray to black and turns its next reference object to gray

So you go down until there are no gray objects to mark, no reachable (no reference) objects, and all the remaining white objects are unreachable, waiting for collection (C and E in the figure above are waiting for collection)

After adopting the three-color marking method, we can do much better in the recovery execution. We can judge whether the whole mark is completed directly by whether there are gray nodes in the current memory. If there are no gray nodes, we can directly enter the cleaning stage; if there are gray marks, we can continue the execution from the gray nodes directly

The mark operation of the three-color method can be performed increments without scanning the entire memory space each time, and can be well combined with incremental reclamation for some operations of pause recovery, thus reducing the full pause time

Write barriers (modifying references in deltas)

After a complete GC marker block pause, the marked object references in memory are modified during the execution of the task program. Modifying references in increments may not be easy to understand. Let’s take an example (as shown in the figure).

Suppose we have three objects A, B, and C referenced in turn, marked black in the first incremental fragment (live objects), and then paused to start executing the application, i.e., the JavaScript script. In the script, we change the point of object B from object C to object D, and then resume executing the next incremental fragment

At this point the object C is no longer referred to, but it is currently black (for live object) and the whole GC will not clean up C, but we can ignore that, because if it doesn’t clean up this time it will clean up the next GC, and it doesn’t have much effect on our program

We look at the new object D is the initial white, as we mentioned above, there is no gray object, that is all finished tag next to clean up, a new modified white objects D will be recycled in the second round of the GC cleaning stage, and references are recycling, may also be used in our application object behind the D? This must be wrong

To solve this problem, V8 incremental collection uses a write-barrier mechanism, which, once a black object references a white object, forces the referenced white object to be gray to ensure that the next incremental GC markup phase is marked correctly. This mechanism is also known as strong trichromic invariance

So in our example above, by changing the point of object B from object C to object D, the white object D is forced to change to gray

Lazy sex cleaning

In fact, delta tags are only used to tag active and inactive objects. V8 uses Lazy sweeps for real cleanup to free memory.

Once the delta is marked, lazy cleanup begins. When the delta mark is done, if the current memory is available and we can execute the code quickly, we don’t need to clean up the memory immediately. We can delay the cleaning process a little bit and let the JavaScript code execute first. We don’t need to clean up all the inactive objects at once. This can be done on an as-needed basis until all inactive object memory is cleared, followed by delta marking

What are the advantages of incremental markup versus lazy cleanup?

The advent of incremental markup and lazy cleanup greatly reduces the pause time of the master thread, making the user’s interaction with the browser much smoother. However, since JavaScript code is executed between each small delta token, object Pointers in the heap may change, and write barriers are required to record these changes in reference relationships, so the delta token disadvantages are obvious:

First, it does not reduce the total pause time of the main thread, and may even increase it slightly. Second, incremental markup may reduce the throughput of the application due to the cost of the write barrier mechanism.

Concurrent Collection

As we said earlier, parallel collection still blocks the main thread. Increments also increase the total pause time and reduce the application throughput, so how can garbage collection be performed without blocking the main thread and be more efficient than increments?

This refers to concurrent collection, which means that the helper thread can perform garbage collection operations in the background while the main thread is executing JavaScript. The helper thread can perform garbage collection operations while the main thread is executing without being suspended.

While the helper thread is performing garbage collection, the main thread is also free to execute without being suspended. This is an advantage of concurrency, but it is also difficult to implement concurrent collection, because it needs to consider that the main thread is executing JavaScript, the object reference in the heap may change at any time. Some of the tokens that the helper thread has made or is making will change, so it needs to implement some additional read-write locking mechanism to control this, which we won’t go into here

Again, GC optimization in V8

V8’s garbage collection strategy is mainly based on the generational garbage collection mechanism, which we have said, about the new generation of garbage collector, we said that the use of parallel recycling can be very good to increase the efficiency of garbage collection, then the old generation of garbage collector which strategy? I said above that parallel collection, incremental marking and lazy cleaning, concurrent collection of these recycling methods to improve efficiency, optimize the experience, looking at one better than the next, then the old garbage collector in the end to use which strategy? Is it concurrent?? Inner monologue: “As if… Looks like. Maximum concurrent collection efficiency”

In fact, each of these three methods has its own advantages and disadvantages, so in the old generation of garbage collectors these strategies are used together

The old generation used concurrent markup, where the main thread starts executing JavaScript, and the helper thread also performs markup operations (all markup operations are done by the helper thread).

After the markup is complete, the parallel cleanup is performed (while the main thread is doing the cleanup, multiple helper threads are doing the cleanup simultaneously)

At the same time, the cleanup tasks are incrementally batched between JavaScript tasks

The last

That top is a V8 engine for our recycling some of the main optimization, although the engine is optimized, but is not to say that we can completely don’t need to care about recycling this, our code to take the initiative to avoid some not conducive to engine still do recycling operations, because not all useless object memory can be recycled, that when the memory no longer used, When there is no timely collection, we call it a memory leak

The issue of memory leaks is a different one, which is not included in this article

We’re done. Do you have a deeper answer to the opening question? I asked the interviewer before the interview this kind of problem, most of the students answer is limited to tag + reference counting two concepts, dig one to dig deep into all kinds of defects and optimizing the could not say, in fact, we combined with the V8 engine, the optimization of recycling better to answer the above questions, so comments area code your understanding!

In addition, which did not Get the point can comment comment, also welcome to refer to the wrong corrigendum!!