A brief introduction to JavaScript garbage collection mechanisms

Note: This article is mainly aimed at readers who are beginners in GC. The author has a relatively lax understanding of GC, and has the desire to learn it, but in the end, there is too little time. In order to achieve a general understanding of GC, I write notes to understand it. There are many misnomers in the article for the reader to correct.

preface

One of the most common disadvantages of closures that I see on blogs when learning to use them is:

Improperly used closures can cause memory leaks in IE(pre-IE9)

Uhhh, why did this happen before IE9? I began to search for answers, and found a satisfactory one:

The GARBAGE collection algorithm used by IE9’s JavaScript engine is reference counting, which causes GC to fail to reclaim “should be reclaimed” memory for circular references. This results in a meaningless footprint of memory, known as a memory leak.

Memory Leak refers to a waste of system Memory, resulting in slow down of program running and even system crash, when the heap Memory that has been dynamically allocated in a program is not released or cannot be released for some reason.

So, I have the following questions:

What is a circular reference?
What is GC? How does it work?
Why does the reference counting algorithm cause memory not to be freed?
How many garbage collection algorithms does JavaScript (or JavaScript engine) have?

So I “scientific Internet” to the non-existent Google search “JS garbage collection mechanism”, but, the first five results can be seen at a glance is mutual paste copy of other people’s blog, which all mention such a paragraph:

An entry tag is added when a variable enters the execution environment, and an exit tag is added when the variable leaves. Tag cleanup is when the GC marks all variables at run time, then removes those variables that are still in the environment or are referenced by variables in the environment, and clears all remaining variables that are still marked.

Sorry, I don’t understand what the “leave mark” is, when the “then remove” is removed, exactly how it is triggered or it is automatically run. To this, I can only:

So, have to read their own books, in the Turing community to find a good e-book “garbage recycling algorithm and implementation”, after reading part, basically have a preliminary understanding of GC (at least know how to work).

Then the following, on the gradual solution of a few doubts mentioned before, and the combination of text and text to provide you with a quick way to understand. Of course, it is best to read the ebook mentioned above.

What is the GC

Garbage Collection is the abbreviation for Garbage Collection, Uhh. When I think of Garbage Collection, I think of the following scenario:

Right! Is that, in reality we can produce a lot of garbage, there is always a group of people in the us is still in bed or go out to work quietly take garbage, so for the program also is such, in the process of work in process, will produce a lot of ‘junk’, the rubbish is program there is no memory space (may be in use before, later won’t use). So GC is responsible for garbage collection, because it works inside the JavaScript engine, so GC works “somewhat” quietly for us front-end developers (note the quotes here).

So, we know what GC does:

Find garbage in memory space.
Recycling garbage allows programmers to reuse this space.

It’s important to note that not all languages have GC in their world. Relatively speaking, high level languages have GC in their world, such as Java,JavaScript, Python. In a world without GC, the programmer has to manually manage the memory, such as IN C, which is commonly known as malloc/free, It’s short for Memory allocation. And, of course, new/delete in C++.

Another point is that GC is an old but outdated technology. It was first published in 1960, but we still need to develop better GC algorithms to make our programs “better.”

Why use GC

In a word: “easy.”

Save developers the trouble of manually managing memory (not every developer can manage it well), thereby reducing bugs and freeing up energy for the more essential programming work.

Why should I learn something about GC

In a word: “to better find bugs”.

As a front-end developer, I actually lack knowledge of computer system (take myself as an example), such as operating system, computer composition principle and computer network. But this knowledge is really the foundation for solving real development problems, so having a better foundation can lead to faster development/maintenance efficiency, and learning GC can give us some ideas about computer system knowledge. For example, “mark-clear” is similar to the operating system page replacement second chance algorithm, so the knowledge is integrated, here we touch, then need to learn more knowledge later when it will be more handy.

Essential basic concepts

HEAP, “HEAP,” is the amount of memory that is used to dynamically store objects that are reference types in JavaScript, and I talked about JavaScript types in another post earlier.
Mutator, an obscure word that stands for the application itself in GC, requires a lot of memory.
The allocator, to which the mutator submits requests that require memory, is responsible for fetching enough memory space from the heap for the mutator to use.

A picture of understanding:

Live/inactive objects: representative passmutatorObject referenced, for example:

var a = {name: 'bar'} // 'this object' is referenced by a and is an active object.


a=null; // 'this object' is not referenced by a, this object is an inactive object.
Copy the code

Several commonly used GC algorithms

Reference counting

(Figure from Garbage Collection Algorithms and Implementations)

As the name suggests, let all object implementations keep track of how many “programs” refer to them, so that each object knows its “popularity index.” Here’s a simple example:

var a = new Object(a);// Now 'this object' has a reference count of 1 (a in reference)
var b = a; // 'this object' has a reference count of 2 (a,b)
a = null; // reference_count = 1
b = null; // reference_count = 0 
// The next step is to collect 'this object'
Copy the code

This approach has advantages and disadvantages:

advantage

Garbage can be collected immediately, when the referenced value is 0, the object immediately connects itself to the free list as free space, that is. It is recycled as soon as it becomes garbage.
Since the collection is instantaneous, the ‘program’ does not pause to use GC alone for a long period of time, so the maximum pause time is very short.
You don’t have to go through all the active and inactive objects in the heap

disadvantage

Counters need to take up a lot of space because you can’t estimate the upper limit of the number of references. For example, if you have 32 bits or 2 to the power of 32 objects referencing one object, then the counter needs 32 bits.
The biggest disadvantage is that it doesn’t solve the problem of circular references not being recyclable, which is what happened before IE9

A simple example:

function f(){
  var o = {};
  var o2 = {};
  o.a = o2; // o references O2,o2 is referenced 1 times
  o2.a = o; // O2 refers to o, and the reference to O is 1

  return "azerty";
}

f();
Copy the code

Fn is supposed to recycle the memory space in the fn scope after execution, but because o has an attribute that references O2, o2 is always referenced by 1, and O2 is not used specifically as a closure, so it should be destroyed.

Since the algorithm destroys objects with zero references, neither of which is zero here, the GC will not collect them, then this is a memory leak.

This algorithm has gradually been replaced by the ‘mark-clear’ algorithm, which is most commonly used in V8 engines

Marker clearing algorithm

The GC garbage collection process is divided into two phases

Mark phase: Mark all live objects.
Cleanup phase: Unmarked (that is, inactive) objects are destroyed.

Mark phase

Root can understand our global scope, the GC from the global scope of variables, the scope step by step to traverse (yes, depth traversal), when the traverse to the heap object that the object is referenced, and play a tag, continue to the recursive traversal (because certainly exist in the heap object references another object in the heap). Until the last (deepest level of scope) node is traversed.

After the tag is done, this is what it looks like:

Sweep phase

We’re going to iterate again, this time through the heap, collecting the unmarked objects.

We won’t go into the details of how to reallocate the memory space. It’s a bit like disk management or memory management, such as best-fit, first-fit, and worst-fit. And the generation and solution of fragmentation problem.

This approach solves the circular reference problem because the two objects cannot be retrieved from the global object. Therefore, they cannot be marked, and they will be collected by the garbage collector. As the figure:

Advantage:

Simple implementation, marking is to hit or not to hit two possible, so one binary bit can be expressed
Solved the circular reference problem

disadvantages

Cause fragmentation (similar to disk fragmentation)
There are many times to reallocate, and if the appropriate memory block size is not found, the free list (a list of addresses in all the free address space in the heap) will be traversed all the way to the end

This kind of GC is a scheduled task, that is, when the program has been running for a certain period of time, the GC is unified, similar to the figure below:

Replication algorithm

The copy algorithm is very simple to understand with this diagram, which only copies live objects in one space to other Spaces.

A GC is completed by dividing a memory space into two parts, one From space and the other To space, copying the active objects in the From space To the To space, then releasing the entire From space, and then swapping the identities of the From space and To space at this point.

As shown in the figure:

There are

There are several GC algorithms, and I’m going to summarize the GC algorithms (copy, mark-clear, compress) implemented in V8 in a graphical manner in the next blog post.

conclusion

Let’s go back to our previous question

What is a circular reference?
What is GC? How does it work?
Why does the reference counting algorithm cause memory not to be freed?
How many garbage collection algorithms does JavaScript (or JavaScript engine) have?

Do you think we should all have a clearer answer?