3. Memory management and common memory leak processing

The original article is available here and is shared under a Creative Commons Attribution 4.0 international license, BY Troland. Recommended index for this chapter: 5 This chapter needs to have a superficial understanding of the internal

This is chapter 3 of how JavaScript works.

An overview of the

Languages like C have low-level memory-management grammars such as malloc() and free(). Developers use these syntax to explicitly allocate and free memory from the operating system.

At the same time, when things (objects, strings, etc.) are created, JavaScript allocates memory for them. When they are no longer used, the memory is “automatically freed” in a process called garbage collection. This “auto-free memory” is the source of the problem, and gives JavaScript(and other high-level languages) developers the false impression that they can choose to ignore memory management. This was a huge mistake.

Even in high level languages, developers should understand memory management (or at least some of the basics). Sometimes there are problems with automated memory management (such as bugs in garbage collection or limitations in memory management implementations, etc.), and in order to properly deal with memory leaks (or find an appropriate solution with minimal cost and code defects), developers must understand memory management.

Memory life cycle

Regardless of which programming language you use, the memory life cycle is almost the same:

Allocate memory – Memory is allocated by the operating system. Low-level languages, such as C, allow developers to manipulate memory explicitly. In high-level languages, the operating system does it for you.
Using memory – This is the phase in which the program actually uses the previously allocated memory, mostly for reading and writing. For example, you’re doing a variable assignment or accessing the value of a variable in your code.
Free memory – This stage frees an entire chunk of memory that you no longer use for easy reuse. Like the memory allocation operation, this operation is written explicitly in the underlying language.

What is memory?

Let’s start with a brief introduction to memory and how it works.

On the hardware side, computer memory is made up of a lot of Flip Flops. Each Flip Flop contains a small number of transistors and can store one bit. Individual Flip Flops can be addressed by a unique identifier, so they can be read and overridden. It’s like if you have 100 boxes, and in order to manage those boxes, you add an index to each box, and that index is the addressing address. The operating system operates on memory based on addressing addresses. In C, you can print out the addressing address of a variable, usually a hexadecimal number such as 0X0240FF5C. The operating system addresses all physical memory and stores it in a table called a memory mapped table. (Modern operating systems also use an additional virtual memory mapping table to make 4 GIGABytes of memory look like 8 gigabytes.)

Memory stores a lot of things:

All variables and all other data used by the program.
Program code, including operating system code.

The compiler works with the operating system to manage memory for you, so I suggest you look at the underlying implementation. Memory space is mainly divided into two categories, a heap area, a stack area. We know that programming languages have primitive data types, such as integers, floating point numbers, and Booleans. For these basic types, the memory footprint is determined. For example, an int is typically 4 bytes. Objects and strings are in the heap.

When compiling code, the compiler checks the raw data types and calculates in advance how much memory the program will need to run. The required memory is allocated to the program from the stack space. It is called stack space because when a function is called, the memory required by the function is added to the top of the existing memory. When functions are interrupted, they are moved out of memory in LIFO(last in, first out) order. Such as:

int n; // 4 bytes int x[4]; // Array of 4 elements, each element 4 bytes double m; / / 8 bytesCopy the code

The compiler immediately calculates how much memory the code needs: 4 + 4 x 4 + 8 = 28 bytes.

This is how the compiler handles the current integer and floating point sizes. About 20 years ago, integers were typically 2 bytes and floating point numbers were 4 bytes. The code does not depend on the current byte size of the underlying data type.

The compiler inserts tags that negotiate with the operating system to get the required size of bytes from the stack to store variables on the stack.

In the above example, the compiler knows the exact memory location of each variable. When you write the variable n, internally it will be converted to something like “memory address 412763”.

Note that when we try to access x[4], we may access the data in the variable m. For an array, the address of x[0] is fetched first, and then the address of x[4] is calculated based on the offset of the subscript and the array type. In this case, the subscript offset is 4 and the array type is an int array, so the address of x[4] = the address of x[0] + the offset of 4*4. At this point, the visit has actually crossed the line. Memory may pick up bits of m. In C, this scenario is called stomping memory, which is bound to cause problems. Such out-of-bounds access is not allowed in Java or c# and will result in an error at compile time. But in C and C++, no errors are reported at compile time.

When a function calls another function, each of the other functions gets a fragment of its memory address on the stack when it is called. A function keeps all its local variables on the stack, and there is also a program counter that remembers the address of the function in its execution environment. When the function finishes running, its memory block can be used again for another purpose.

Dynamic memory allocation

Finding out how much memory a variable requires at compile time is not as easy as it might seem. Imagine doing something like this:

int n = readInput(); // Read from user... // Create an array with n elementsCopy the code

Here, the compiler does not know how much memory the array requires at compile time because this is determined by the values of the array elements that the user enters.

This makes it impossible to allocate memory for variables on the stack. Instead, programs need to explicitly allocate the corresponding memory space from the operating system at runtime, which is dynamically allocated. The difference between static and dynamic memory allocation is summarized in the chart below:

The difference between static and dynamic memory allocation

To fully understand how dynamic memory allocation works, we need to spend some time understanding Pointers.

Memory allocation in JavaScript

Now, we will show how memory is allocated in JavaScript (step 1).

When JavaScript is declared, memory allocation is handled automatically, without the developer having to do anything.

var n = 374; Var s = 'sessionStack '; Var o = {a: 1, b: null}; Var a = [1, null, 'STR ']; // Allocate memory for objects and their values var a = [1, null,' STR ']; Function f(a) {return a + 3; } / / distribution of a function (a callable object) / / function expression also allocate an object someElement. AddEventListener (' click ', function() { someElement.style.backgroundColor = 'blue'; }, false);Copy the code

Some function calls also allocate a new object:

var d = new Date(); Var e = document.createElement('div'); // Assign a DOM elementCopy the code

Methods that can assign values or objects:

var s1 = 'sessionstack'; var s2 = s1.substr(0, 3); // S2 is a new string // Since strings are immutable, JavaScript may choose not to allocate memory and just store the memory address range of the array [0, 3]. var a1 = ['str1', 'str2']; var a2 = ['str3', 'str4']; var a3 = a1.concat(a2); // The new 4-element array consists of a1 and A2 array elementsCopy the code

## Memory usage in JavaScript

The use of allocated memory in JavaScript refers primarily to memory reads and writes.

Memory can be used by assigning values to variables or object attributes, or by passing parameters to functions.

Free memory that is no longer used

Most of the problems with memory management are at this layer.

The pain point is detecting when the allocated memory is idle. It often asks the developer to decide if the memory in the program is no longer in use and then release it.

The high-level programming language incorporates a module called a garbage collector, whose job is to track memory allocation and usage so that idle memory fragments can be found and automatically freed.

Unfortunately, this is an approximate process, because there is a strong undecidability (which cannot be solved by the algorithm) of whether a memory fragment is idle.

Most garbage collectors collect memory that is no longer accessed, such as when all variables that reference that memory are out of memory addressing range. However, less memory will be collected than the approximate value, because there are often variables that will not be accessed again.

Memory garbage collection

Because of the undecidability of finding “no longer used” memory, garbage collection provides a limited solution to this common problem. This section covers the points necessary to understand the major memory garbage collection algorithms and their limitations.

Memory references

Reference is one of the main concepts on which memory garbage collection algorithms depend.

In the context of memory management, if object A accesses another object B, A refers to object B(either implicitly or explicitly). For example, a JavaScript object has a stereotype that references it (an implicit reference) and its property values (an explicit reference).

In this context, the concept of “object” is extended beyond normal JavaScript objects and includes function scope (or global lexical scope).

Lexical scope defines how variable names are resolved in nested functions. The inner function still contains the scope of the parent function, even if the parent function has returned.

Garbage collection reference counting

This is the simplest memory garbage collection algorithm. When an object is referenced by 0, it is marked as “recyclable memory garbage”.

Take a look at this code:

var o1 = { o2: { x: 1 } }; // Create two objects. // 'o1' refers to object 'o2' as its attribute. All of them are not recyclable. Var o3 = o1; // 'o3' is the second variable that references 'o1'; o1 = 1; The object in 'o1' now has a single reference to the 'o2' attribute of the object represented by the variable 'o3'. // This object has two references: one as a property and the other as a variable 'o4' var o4 = o3.o2; // The 'o1' object now has only 0 references and can be garbage collected as memory. // However, its 'o2' attribute is still referenced by the variable 'o4', so its memory cannot be freed. o3 = '374'; o4 = null; // The 'o2' attribute in 'o1' now has only 0 references. So the 'o1' object can be reclaimed.Copy the code

Circular references are a hassle

Circular references can be restrictive. In the following example, two objects are created that reference each other, resulting in a circular reference. They will be out of scope after a function call, so in effect they are useless and references to them can be freed. However, the reference counting algorithm assumes that since both objects refer to each other at least once, neither of them is recyclable.

function f() { var o1 = {}; var o2 = {}; o1.P = O2; // O1 quotes O2. o2.p = O1; // o2 references o1. This creates a circular reference} f();Copy the code

Mark-clear algorithm

In order to determine whether the reference to an object needs to be released, the algorithm determines whether the object is reachable and analyzes accessibility.

The mark-clear algorithm consists of three steps:

Access the root first: In general, the root refers to the global variable referenced in the code. In JavaScript’s case, the window object is the root global variable. The corresponding variable in Node.js is “global”. The garbage collector builds a complete list of all the root variables.
The algorithm then checks all the root variables and their descendants and marks them as active (meaning they are not recyclable). Any variables (or objects, etc.) that cannot be reached by the root variable are marked as memory garbage.
Finally, the garbage collector frees all inactive memory fragments and returns them to the operating system.

This algorithm is better than previous algorithms because object zero references can make objects unavailable. The opposite is not true, as you saw earlier with circular references.

Since 2012, all modern browsers have built in a tag-clean garbage collector. All of the optimizations for JavaScript memory garbage collection (generational/incremental/concurrent/parallel garbage collection) in previous years were optimizations for the implementation of the mark-clean algorithm, but did not improve either the garbage collection algorithm itself or its ability to determine whether objects were available.

You can check out this article for more details on tracking memory garbage collection, including an optimized mark-clean algorithm.

Circular references no longer hurt

In the first example, when the function returns, the global object no longer references both objects. As a result, the memory garbage collector finds that they are not available.

Even if two objects refer to each other, they cannot be retrieved from the root variable.

The counterintuitive behavior of the memory garbage collector

Although GC is convenient, it also comes at a cost to me. One is uncertainty. Memory garbage collection is unpredictable. You cannot determine the exact timing of memory garbage collection. This means that in some cases, the program will use more memory than it actually needs. At the same time, the GC pauses the rendering process, so you might want to pay attention to the pause times. The good news is that most GC implementations garbage collect memory during memory allocation. If no memory is allocated, the memory garbage collector remains idle. Consider the following:

Allocates a fixed size of memory.
Most elements (or all) are marked as unreachable (assuming we assign null to the cache we no longer need)
No more memory is allocated.

In these cases, garbage collection is generally not done. Although reclaimable memory has been generated, the memory collector does not mark it. This is not strictly a memory leak, but it can result in higher than usual memory usage.

What is a memory leak?

As memory management says, a memory leak is when programs that have been used in the past but are idle are not returned to the operating system or the available memory pool.

Programming languages favor multiple approaches to memory management, but only the developer really knows whether a piece of memory can be returned to the operating system

Some programming languages provide functionality for developers to solve this problem. Other programming languages are entirely dependent on the developer having full control over which bits of memory are recyclable. Wikipedia has good articles on manual and automatic memory management.

Four common JavaScript memory leaks

1: indicates a global variable

JavaScript handles undeclared variables in an interesting way: When a reference to an undeclared variable is made, a new variable is created on the global object. In the browser, the global object is window, which means the following code:

function foo(arg) {
  bar = "some text";
}
Copy the code

Is equal to:

function foo(arg) {
  window.bar = "some text";
}
Copy the code

The variable bar is meant to be referenced only in the foo function. However, if you do not use var to declare a variable, then an extra global variable will be created. In the example above, it does not cause a major accident. But you can naturally imagine a more destructive scenario.

You can also inadvertently create a global variable using the this keyword.

function foo() { this.var1 = "potential accidental global"; } // Call foo itself, this will point to the global object (window) instead of undefinedCopy the code

You can prevent the creation of unexpected global variables by adding ‘use strict’ at the top of the JavaScript file.

Unexpected globals are a real problem, and code is often contaminated with explicitly defined globals that are not collected by the memory garbage collector based on reachability analysis. Use global variables only when necessary to temporarily store and process large bits of information. Assign it to null or reassign it when you no longer use it.

2: Timer and forgotten callback function

SetInterval is used in JavaScript, so let’s use it as an example. Some libraries, which provide observers and other tools, may accept callback functions. If an instance of them is invalid, make sure that all references to the callback function are also invalid. Such as:

var serverData = loadData(); setInterval(function() { var renderer = document.getElementById('renderer'); if (renderer) { renderer.innerHTML = JSON.stringify(serverData); }}, 5000); // This will be performed every 5 seconds or soCopy the code

The renderer object will be replaced or removed at some point, which will make the code encapsulated in the timer redundant. At this point, neither the timer nor its dependencies are garbage collected, so the timer is still active. The serverData variable, which holds and handles data loading, is also not garbage collected.

When using observers, you need to be sure to explicitly remove them as soon as you no longer need them (no longer need observers or objects become unavailable). However, most modern browsers handle this problem for you: when the observed object becomes unavailable, the browser automatically recycles the observer handler even if you forget to remove the event listener function.

So, here’s a best practice

var element = document.getElementById('launch-button'); var counter = 0; function onClick(event) { counter++; element.innerHTML = 'text' + counter; } element.addEventListener('click', onClick); // Do stuff element.removeEventListener('click', onClick); element.parentNode.removeChild(element); // Now when an element goes out of scope // even in browsers that don't handle circular references well, the element and the onClick event will be reclaimedCopy the code

You no longer need to call removeEventListener before making a DOM node unreachable, because modern browsers support memory garbage collectors to detect and properly handle the DOM node lifecycle.

If you’re using the jQuery API(which other libraries and frameworks also support), you can remove the event listener function before deprecating the node. JQuery also ensures that even in older browsers, there are no memory leaks.

closure

Closures are an important feature of JavaScript: nested functions can access variables of external (closed) functions. This is also where memory leaks are common:

var theThing = null var replaceThing = function () { var originalThing = theThing; Var unused = function () {if (originalThing) // reference originalThing console.log("hi"); }; theThing = { longStr: new Array(1000000).join('*'), someMethod: function () { console.log("message"); }}; }; setInterval(replaceThing, 1000);Copy the code

When replaceThing is called, the theThing object consists of a large array and the new closure (someMethod). OriginalThing is referenced by the closure created by unused (that is, by referring to theThing before replaceThing). Keep in mind that once a closure scope is created for a closure in the same parent scope, the closure scope is shared.

In such a case, closures someMethod and unused share the same scope. Unused cites origintalThing. The someMethod function can be used outside the replaceThing domain even if unused is never used. Since someMethod and unused share the same closure scope, the unused variable referencing originalThing forces unused to remain active (both closures share scope). This prevents garbage collection.

In the example above, the closures someMethod and unused share scope, while unused refers to origintalThing. SomeMethod can be used with theThing outside the replaceThing scope, even if unused has never been used. In fact, since someMethod and unused share the closure scope, unused references to origintalThing require unused to remain active.

All of this behavior can lead to memory leaks. As you run the above code snippet over and over again, you will notice a surge in memory usage. These memory usages do not decrease when the memory garbage collector is running. A list of closures is created (in this case, the root variable is theThing), with each closure scope indirectly referring to a large array.

The problem was discovered by the Metor team and they wrote a very good paper detailing the problem.

4: From DOM references

Sometimes, developers store DOM nodes in data structures.

Suppose you want to quickly update a few rows of table content. If you store references to each table row in a dictionary or array, this will result in repeated references to the same DOM element: one in the DOM tree and one in the dictionary. If you want to release references to these table rows, you need to remember to make the references unavailable.

var elements = { button: document.getElementById('button'), image: document.getElementById('image') }; function doStuff() { elements.image.src = 'http://example.com/image_name.png'; } function removeImage () {/ / image element is a direct descendant of body element element to the document. The body. The removeChild (document. GetElementById (" image ")); In other words, the button element is still in memory and cannot be collected by the garbage collector}Copy the code

Additional consideration needs to be given to referencing inner or leaf nodes in the DOM tree. If you keep a reference to a cell in your code, then even if you remove the table from the DOM, the reference to that cell remains, which can cause a lot of memory leaks. Since a cell is a descendant of the table and a descendant holds a reference to its parent, a reference to a single cell prevents the entire table from being freed, although the memory garbage collector will free all memory other than that cell.