Memory management in javascript

The paper

In most cases, we don’t need to worry about JavaScript memory, because JavaScript started out as a browser scripting language with relatively few features and very few memory leaks or even memory spills. But with the advent of technologies like single pages, pages often have a lot of JavaScript code running at the same time. For security reasons, the system usually allocates much less memory to the browser than desktop software. When memory runs out, many times the system crashes.

With the advent of V8-based Node, the use of JavaScript has expanded to the back end. This is a huge boost for JavaScript, but it also creates some problems, including memory management.

On the server side, where there is always a high load, if you don’t squeeze out every bit of memory, it is likely to cause the server service to go down.

Most importantly, memory management is also a frequent interview question. So, XDM, it’s time to take memory management seriously! Recently, I found a lot of articles on the Internet, and I also got some gains. This article is both a summary and an exchange

In the following articles, you will learn:

The concept of memory cycles
Allocation of JavaScript memory
Use of JavaScript memory
Common recycling strategies for JavaScript memory
The advantages and disadvantages of recycling strategies
Optimization of garbage collection strategy for V8 engine
Common memory problems and solutions
The difference between a memory leak and a memory overflow

JavaScript memory reclamation is “automatic”

Low-level languages like C have memory management interfaces, such as malloc() and free(), which require the user to manage memory manually. JavaScript allocates memory “automatically” when variables (objects, strings, etc.) are created and “automatically” freed when they are not used. The process of release is called garbage collection. It is this “automatic” operation that makes us ignore the existence of memory management.

Memory cycle

Regardless of the programming language, the memory life cycle is basically the same:

Allocate as much memory as you need
Use allocated memory (read, write)
Release and return them when they are not needed

Memory allocation

In JavaScript, memory is generally divided into stack memory and heap memory. Basic types are stored in stack memory, which is managed by the operating system. Reference types are stored in heap memory, which is managed by the engine. This comes down to V8 garbage collection, which I’ll cover later in this article.

Memory usage

Value is read and write operations on allocated memory. Reading and writing may be writing the value of a variable or an object property, or even passing the parameters of a function.

When a value is no longer in use, the corresponding memory is reclaimed. Most memory management problems occur at this stage. The hardest task here is to find “which allocated memory is really no longer needed”. It often requires the developer to determine which chunk of memory in the program is no longer needed and to release it.

Like many high-level languages, JavaScript has a built-in garbage collector

Collection and release of memory

The final part of memory management is this collection and release. First we have the concept that memory management is often said to be on the heap. Memory collection, also known as garbage collection, is algorithmic (strategic). Two of the most common are reference counting and mark-sweep.

Reference counting

The earliest garbage collection strategy is reference counting. The idea is to count the number of references to each value. When a variable is declared and a reference value is assigned to it, the number of references is incremented by one. Similarly, if a variable holding a reference to a value is overwritten by another value, the number of references is decreased by one. When a value reference is 0, the value is no longer accessible. So memory can be safely reclaimed. The next time the garbage collector runs, it frees memory with zero references.

Note here that memory reclamation works periodically. Because JavaScript is single-threaded, frequent garbage collection can block the execution of other tasks. We’ll talk about that later.

Reference counting has a serious problem: circular references. A circular reference is when object A has A pointer to object B, and object B refers to object A. Such as:

function problem(){ let objectA = new Object(); let objectB = new Object(); // objecta.someotherObject = objectB; // objecta.someotherObject = objectB; pbjectB.anotherObject = objectA; }Copy the code

In the above example, objectA and objectB refer to each other by their respective attributes, meaning that they both have a reference count of 2 and will never be reclaimed under the current collection policy. This can cause a memory leak.

In fact, there are more problems with reference counting strategies.

Because the BOM and DOM objects in the early IE browser are the component object model COM implemented by C++. COM objects use reference counting for garbage collection. As a result, the page experience in the early Days of Internet Explorer was not very good.

It is for this reason that most browsers today adopt a tag clear strategy.

Mark clear

Mark-and-sweep, as its name suggests, is a strategy that has two phases: mark and sweep. The mark stage marks all active objects, while the clear stage destroys the unmarked (that is, the inactive objects).

So how does tag clearing solve this problem? There’s a term that comes into play here, which is accessibility.

Reachability, as the name implies, is reachability, because the heap memory tends to have root objects (not objects in the traditional sense). Our tag also starts recursively from the root object. When a path to an object is cut off, it is unreachable. Then it needs to be cleaned up.

So, when the function completes, the current function is disconnected from the world. This is why we generally can’t access variables defined inside functions (except for special closures). These variables will not be marked because they are unreachable and will therefore be collected in the next GC. As you can see, the biggest use of tag scavenging and reference counting is the standard use of recycling. Zero-referenced memory is necessarily unreachable, but non-zero-referenced memory is not necessarily reachable.

You might be wondering how do you label variables? Actually there are many kinds of ways, such as when the variables into the execution environment, inversion of a particular (through a binary characters to represent markup), or can maintain into the environment and leave the environment variables so that the two list, free transfer variable from one list to another list, there are many other ways. In fact, it doesn’t matter to us how we mark it, what matters is the strategy. Right

When the engine performs GC (using a mark-clearing algorithm), it needs to go through all the objects in memory to mark them from a starting point, which is called a set of root objects, and the root objects in the browser environment include more than the global Window object, the document DOM tree, and so on

The general process of the tag clearing algorithm looks like this

At runtime, the garbage collector assigns a flag to all variables in memory, assuming that all objects in memory are garbage and marked 0
It then iterates through each root object, changing the nodes that are not garbage to 1
Clean up all garbage marked 0, destroy and reclaim the memory space they occupy
Finally, change all the in-memory object markers to 0 and wait for the next round of garbage collection

Advantages and disadvantages of both recycling strategies

Advantages and disadvantages of mark clearing

Advantages:

Simple implementation. Marking can be either marked or not marked, which allows it to be marked with only one binary bit (0 and 1). (The V8 engine has been optimized, which will be covered in a later article).

Disadvantages:

Memory fragmentation. After memory is reclaimed, the place used by the object is left empty, which causes memory incompleteness.

Of course, there are schemes for how to reallocate this fragmented memory, and there are three common schemes:

First-fit: returns memory that is equal to or greater than the required size.
Best-fit, which searches the entire memory until it finds the smallest chunk that matches the size to be allocated.
Worst-fit: searches the entire memory, finds the largest memory block, excises the portion of memory that needs to be allocated, and returns the rest.

In terms of execution efficiency, first-fit is commonly used. However, this also takes time, which is another disadvantage of tag clearing: slow allocation

The disadvantage of token clearing is that after the clearing, the position of the remaining objects remains unchanged, resulting in a discontinuity of free memory. And that’s where the mark-up algorithm comes in.

The principle of the mark-up algorithm is to do one more thing after the end of the mark-up, move the living object to one end of memory, and finally clean up the border of memory.

Pros and cons of reference counting

Advantages:

Recycle immediately. When a memory reference is zero, the memory is immediately reclaimed.

Reduce program pauses. The application must consume memory during execution. The current execution platform is bound to have an upper limit on memory, so memory is bound to be full at some point. Since the reference counting algorithm is always watching for objects with a reference value of 0 in memory, to take an extreme case, when it sees that memory is about to be full, the reference counting algorithm will immediately find objects with a reference value of 0 and free them. This ensures that the current memory will not be full, which is called reducing program pause.

Disadvantages:

Circular reference objects cannot be recycled. As mentioned above, this is the biggest problem with reference counting. According to the recycling standard of reference counting, objects referenced in a loop within a function cannot be recycled because their number of references is never zero.

Counting is time-consuming. Counting requires frequent checks to see which objects have zero references, which can be time-consuming when the number of objects being maintained is large. A reference – counted GC will tie up the main thread and block other tasks. Of course, this is not unique to reference counting; tag clearing also exists.

Above all, we can see that automatic memory management in JavaScript is not perfect, but that tag clearing is by far the best solution.

Let’s take a look at how the famous V8 engine has been optimized for garbage collection.

V8 engine garbage collection mechanism

V8’s garbage collection is also based on the tag removal algorithm, but it has some optimizations and processing.

Generational garbage collection

First, let’s review that one of the advantages of the reference technique is to reclaim memory immediately, whereas the garbage collection of the token cleanup algorithm is periodic. Wouldn’t it be more efficient if the tag clearing algorithm could reclaim memory immediately? V8 came up with their solution — generational garbage collection

Principle of generational garbage collection

The heap memory is divided into new generation and old generation regions, and then the two regions adopt different garbage collection strategies.

The new generation

Cenozoic objects are generally short-lived and relatively small objects. This part usually has a very small memory allocation, typically between 1 and 8M capacity.

The reason for this allocation has to do with its garbage collection algorithm. Let’s talk about the next generation of garbage collection algorithms.

The new generation uses an algorithm called the Scavenge, which is actually called the Cheney algorithm.

The Cheney algorithm splits memory in the new generation into two parts, the used region, which we call the used region, and the idle region, which we call the free region.

The flow of the algorithm looks something like this:

The newly generated objects will be stored in the usage area, and when the usage area is nearly full, it will need to be garbage cleaned.

In the marking process, the new generation garbage collector identifies the active objects in the zone and marks them. The next step is to copy the marked objects into the free zone and sort them. Then, enter the garbage cleaning phase, clean up the unmarked space in the use area, and finally switch the roles of the two areas. The following figure

One problem we can see from the figure above is that as the number of new variables increases, the space in the usage area will always be full. What to do? It’s really easy. Just pick the ones that get them promoted, ahem, promoted. Remember we still have old generation, we are through some rules, let the new generation of some variables promoted to old generation.

Generally speaking, promotion rules can be divided into two types:

If a variable occupies more than 25% of the free space during the movement phase, the variable is promoted to the old generation.
When a variable survives multiple swaps, it is also promoted to the old generation.

The old generation

Compared to the new generation, recycling in the old generation is easier to understand. The optimization method used by the V8 engine is the mark-up algorithm mentioned above. In the implementation of tags, there are quite a few concepts, which I will briefly discuss.

As we all know, JavaScript is a single-threaded language. It runs on The main thread, so garbage collection will block The execution of JavaScript scripts, and then resume The execution of scripts after garbage collection is completed. We call this behavior stop-the-world.

We can imagine that if the garbage collection process is very long, the main script will be blocked for a long time, causing problems such as page stalling.

Addressed in V8’s new garbage collection project (Orinoco), it utilizes the latest and best garbage collection techniques to reduce main thread hang time, such as parallel, incremental, and concurrent garbage collection.

Parallel garbage collection

Parallel garbage collection is where the main thread and the helper thread do the same work at the same time, but it’s still a stop-the-world garbage collection. This important feature is that multiple threads notify the collection, but ensure that they do not operate on the same object.

Incremental tag

The delta mark is an alternate between script execution and garbage buyback on the main thread. The difference is that Orinoco breaks up large garbage collection tasks into smaller garbage collection tasks.

This labeling does not reduce the total garbage collection time, or even increases it a bit. But this prevents garbage collection from affecting user operations and so on.

Concurrent tags

The concurrency flag is that the main thread and the garbage collection thread are running at the same time. The main thread executes JavaScript, and the garbage collection thread focuses on garbage collection. This approach is the most troublesome, because the state of the objects in memory changes from moment to moment while the main thread is running, invalidating all previous work. Most importantly, there are now Read /write races, and it is very possible for the main thread and the helper thread to change the same object at the same time. The advantages of this approach are obvious: the main thread is not suspended, and JavaScript can execute freely, despite some synchronization overhead to ensure that only one helper thread is modifying the same object at a time.

Why do we need substitution

As we said at the beginning, generational is not a new garbage collection strategy, it is just an optimization of the tag removal algorithm.

This optimization takes advantage of the reference-counting strategy, dividing the space into two parts with different recycle frequencies and using different strategies to defragment memory. This mechanism greatly improves the efficiency of garbage collection.

At this point, we should have a general idea of JavaScript memory management. In fact, the V8 engine for garbage collection strategy optimization does not know this, and there are many details. The new generation of garbage collection, for example, began as single-threaded Cheney’s half-space replication and has since been upgraded to parallel Mark-evacuate. Deliver replication and recovery efficiencies in a parallel manner.

There are also three color labeling algorithm and strong three color labeling method. There are a lot of concepts, but see how v8 officials explain the new garbage collection mechanism.

Let’s briefly summarize the differences between the new generation and the old generation:

New generation: short life cycle, small memory footprint

Old generation: the life cycle is long or the occupation size is larger.

The difference between a memory leak and a memory overflow

Memory leak: A temporary variable allocated to memory during a program’s execution that is not collected when it runs out.

Memory overflow: Simply speaking, it means that the memory applied for by Cheng Xun is larger than the memory provided by the system during the operation, so that sufficient memory cannot be applied for.

The relationship should be: Too many memory leaks will eventually lead to memory leaks.

Common memory leaks

Special closure

Closures are familiar. Leaving the concept of closures aside, let’s first clarify the concept that only closures that apply internal variables of a function are considered leak-causing closures.

Let’s look at the following code

function fn1(){
  let test = new Array(1000).fill('isboyjc')
  return function(){
    console.log('hahaha')
  }
}
let fn1Child = fn1()
fn1Child()

Copy the code

Obviously it’s a closure, but it doesn’t cause a memory leak. There is no internal reference to fn1 in the returned function. That is, the test variable is completely recyclable. So what kind of closure causes a memory leak? Let’s look at the following code:

function fn2(){
  let test = new Array(1000).fill('isboyjc')
  return function(){
    console.log(test)
    return test
  }
}
let fn2Child = fn2()
fn2Child()

Copy the code

The closure above causes a memory leak because the use of the test variable is used externally. So he can’t be recycled.

To optimize the

When we run out of closures, we should remember to set them to NULL

Implicit global variables

Function fn(){test1 test1 = new Array(1000).fill('isboyjc1'); Test2 this.test2 = new Array(1000).fill('isboyjc2')} fn()Copy the code

Test1 is implicitly declared as a global variable. For global variables, garbage collection can be difficult to determine when they are not needed. So global variables are collectively not recycled.

To optimize the

As above, we remember to empty variables when we don’t use them;

Declare variables using let const, etc. Because in ES6 variables declared by lets and const are not bound to the global object Window.

Forgotten DOM references

A memory leak can also occur if a large DOM object is referenced and forgotten to clean.

To optimize the

As above, we remember to empty variables when we don’t use them;

The forgotten timer

We know that timers like setTimeout and setInerval do not disappear until they are cleared. If the timer refers to a large object, the space occupied by the object will not be freed.

As above, we remember to empty variables when we don’t use them;

Forgotten event listeners

Event listener and timer are the same principle, both need to manually remove the listener.

Uncleaned console output

In the process of writing code, certainly can’t avoid some output, in a small team might not clean up these projects launched in the console, but the console is also a hidden danger, at the same time also is easy to be ignored, we can see the console output data, because the browser save our information data output object references, This is why an uncleaned console can leak memory if it outputs objects

To optimize the

Even clearing console.log from the code

The Map and Set

When a Map or Set Object is used to store objects, both objects are strongly referenced. If a Map or Set Object is not referenced, the memory will not be reclaimed automatically.

To optimize the

Use WeakMap and WeakSet

That brings us to the end of this article. If anything I write is helpful, please give it a thumbs up. Of course, if you have any other questions or mistakes, please correct them

Related references

“Hardcore JS” Do you really understand garbage collection

This article introduces you to JavaScript garbage collection

V8’s memory management and garbage collection mechanisms

JavaScript memory mechanism

“Hardcore JS” There may be a memory leak in your application

Orinoco: Recycling for the new generation

Talk about GC: the new Orinoco garbage collector

The paper

JavaScript memory reclamation is “automatic”

Memory cycle

Memory allocation

Memory usage

Collection and release of memory

Reference counting

Mark clear

Advantages and disadvantages of both recycling strategies

Advantages and disadvantages of mark clearing

Pros and cons of reference counting

V8 engine garbage collection mechanism

Generational garbage collection

The new generation

The old generation

Parallel garbage collection

Incremental tag

Concurrent tags

Why do we need substitution

The difference between a memory leak and a memory overflow

Common memory leaks

Special closure

To optimize the

Implicit global variables

To optimize the

Forgotten DOM references

To optimize the

The forgotten timer

Forgotten event listeners

Uncleaned console output

To optimize the

The Map and Set

To optimize the

Related references

Related Posts

Close reading of The Deep End

The practice of Vue2+Vant2 migrating from VUE-CLI4 to Vite2

The Figma team shares the architectural design of their plug-in system