preface

Arrays are one of the most common data types in daily front-end development, but do you really know anything about arrays? How is array implemented in JS? Through this article, you will know:

  • The difference between a JS array and a traditional array
  • What optimizations did the V8 engine make for “traditional” arrays
  • Fast and slow arrays
  • ArrayBuffer

What is an array?

Array (Array)

An array is a data structure consisting of a collection of elements of the same type, allocated a contiguous chunk of memory for storage.

Notice that there are two key words here: same type, continuous memory, which is a necessary and sufficient condition for being a “true array”! Okay, here’s the point:

let arr = [100, 'foo', 1.1, {a: 1}];
Copy the code

The above code represents a regular array in JS, so why can array elements coexist with multiple types? Wikipedia’s description of arrays should have some authority. Isn’t the JS array really an array?

∵ Storage space required by different data types varies

The memory addresses used to store arrays in PC/PC must not be contiguous (unless of the same type)

So we venture to guess that the array implementation in JS must not be implemented by the underlying data structure! So there are no “real” arrays in JS! This makes me curious, so how does JS “implement” the concept of arrays? Let’s find out!

Concept 1 in arrays: continuous memory

Before we talk about continuous memory, let’s understand what memory is, and we’ll bypass it in this section.

1) What is memory?

Generally speaking, in a computer, the CPU is used to calculate data, and the data comes from the hard disk. However, considering the low efficiency of the CPU reading and writing data directly from the hard disk, the memory plays the role of “porter”.

Memory is made up of DRAM chips. The internal structure of DRAM can be said to be the simplest in THE PC chip, is composed of many repeated “units” – cell, each cell is composed of a capacitor and a transistor (generally N channel MOSFET), capacitor can store 1bit data amount, charge charge after discharge (potential level) corresponding to binary data 0 and 1 respectively.

Due to its simple structure, DRAM can be achieved with a small area and a large storage capacity. Using chips to store data for a short time is much more efficient than reading and writing data on disks. So the operation of memory also determines the stable operation of the computer.

2) The memory and array story

Now that we know what memory is, let’s go back to the concept of arrays:

An array is a data structure consisting of a collection of elements of the same type, allocated a contiguous chunk of memory for storage.

Contiguous memory in an array means that a finite set of data structures of the same type can be stored by delineating a series of contiguous, fixed-length Spaces in memory. Arrays are implemented this way in C/C++, Java and other compiled languages. C++ array declaration example code is as follows:

// Array name [element number] double balance[10];Copy the code

In the code above, it is important to specify the element type and number. Savor the contents contained in it, and connect it to memory and how the computer works.

Concept 2 in arrays: fixed length

It is easy to understand why computer language designers want compiled languages such as C/C++ and Java to design arrays of fixed length. Since the number of arrays is continuous, this means that you need a whole block of memory to store the array. If the length is not fixed, then the area of memory after the array cannot be allocated further down! Memory doesn’t know whether or not to store the current array, how much space. After all, the space behind the array has to be set aside for someone else to use. You can’t just hog it all the time.

Concept 3 in arrays: Same data type

Why does the definition of an array require each element to have the same data type? If an array allows various types of data, each element is boxed and each element read is unboxed. Unifying data types eliminates the need for boxing and unboxing, which improves storage and reading efficiency.

V8 engine array implementation

Writing in the front

First, we need to understand how JS code is executed on the computer. Like Python, it is an interpreted language that requires the host environment to “convert” it, and then the machine runs machine code, or binary. Most of our discussion of JS is (by default) browser-based. Most of the major browsers are based on C++, and Node is based on Chrome V8’s javascript environment, which is also based on C++. So some developers think JavaScript is written in C++, which is not true.

As an interpreted language, JS does not have to be C++ to parse JS.

  • D: DMDScript
  • Java: Rhino, Nashorn, DynJS, Truffle/JS, etc
  • C# : Managed JScript, SPUR, etc

Then there’s Rust: Deno (also based on V8), which is still popular. So, we want to study the implementation of JS array will rely on “explain” his JS engine. For this reason, this article uses the V8 engine to explain the array in JS.

V8 source JS array

To track how JS implements arrays, let’s trace back to V8 to see how it “parses” ARRAYS. The following screenshot is from V8 source code:

  • 1.JSArrayArrays inherit fromJSObjectobject
  • 2. Arrays have fast and slow modes

So let’s talk about it.

JS arrays are objects

If the array in JS is an object, then we can explain why the array in JS can put various types. Assuming we’re right, how do we test this? For this purpose, I have spent some time recently to explore, read a lot of information on the Internet, but also find a lot of methods, and finally lock the use of bytecode by observing the JS final execution to see the final code conversion. Finally, we chose jsvu of GoogleChromeLabs, which can help us install various JS engines and convert JS to bytecode.

Test code:

const arr = [1, true.'foo'];
Copy the code

Then use the V8-Debug engine to debug and print the bytecode he translates, as shown below:

Fast and slow arrays

If you are careful, you may have noticed that there is a description in the V8 source code comment in the previous screenshot:

Such an array can be in one of two modes:
- fast, backing storage is a FixedArray and length <= elements.length();
- slow, backing storage is a HashTable with numbers as keys.
Copy the code

An array contains two modes:

  • Fast (mode) : The backup store is a FixedArray of length <= elements. Length
  • Slow (mode) : The backup store is a HashTable with numeric keys

So think about why V8 “designed” arrays this way, and what was the motivation? It’s all about performance, and when you talk about performance, you have to talk about memory, and it’s all about:

Sacrifice performance to save memory, sacrifice memory to improve performance

This is a game of time for space, space for time, and finally see which is “cost-effective” (reasonable).

Fast array

Let’s start with a fast array. A fast array is a linear storage with a variable length that allows you to dynamically adjust the storage space. There are expansion and contraction mechanisms inside, so take a look at the implementation of expansion in V8. Source code (C++) :

./src/objects/js-objects.h

New capacity = Old capacity + 50% + 16

Because JS arrays are dynamically variable, this arrangement of fixed space is bound to cause memory space depletion. The array is then copied to the new memory space after expansion:

Shrink implementation source code (C++) :

Holes

In summary: V8 uses fast arrays to achieve contiguous memory space (increase memory to improve performance), but JS is a weakly typed language, the space cannot be fixed, so the array length is used as a basis for the underlying memory reallocation.

Slow array

The underlying implementation of slow arrays uses the HashTable HashTable. Compared with fast arrays, it does not open up a large contiguous space, thus saving memory, but it is undoubtedly less efficient than fast arrays (time for space). Code (C++) :

Conversion between fast and slow arrays

JS is not fixed in length and type, so we will do the appropriate conversion where appropriate to expect it to improve performance in the way that best suits the current array. Corresponding source code:

true
3 x Capacity after expansion x 2 <= New capacity

JSObjects favors dictionary elements over fast (mode) elements if they save a lot of memory. static const uint32_t kPreferFastElementsSizeFactor = 3;Copy the code

The second red box indicates the index – the current capacity >= 1024(the value of kMaxGap) will also change from the fast array to the slow array. Where kMaxGap is a tentative constant used to control fast and slow array conversion, declared in the source code as follows:

Static const Uint32_t kMaxGap = 1024; //Copy the code

That is, if the current array is reassigned much more than it needs +1024, it will be a waste of internal slave, so it will be a slow array. Let’s verify:

Example code 1:

let arr = [1];
Copy the code

%DebugPrint(ARR)

arr[1024] = 2;
Copy the code

%DebugPrint(ARR)

Ok, the verification is successful. Now let’s see how to go from a slow array to a fast array.

As you can see from the above source comments, the condition for fast arrays to slow arrays is: the fast array saves only 50% of the space, the slow array (Dictionary). Let’s continue to verify:

let arr = [1];
arr[1025] = 1;
Copy the code

The array declared above uses a slow Dictionary, as shown in the screenshot below

for(leti=500; i<1024; i++){ arr[i]=1; }Copy the code

The results are as follows:

HOLEY_SMI_ELEMENTS
Fast Holey Elements
Reasonable use of memory

new ArrayBuffer

Talk about really how much, nothing more than in said JS because of the language “characteristics” in the array implementation have some performance problems, so in order to solve this problem V8 engine introduced the concept of continuous array, this is in the JS code translation layer optimization, so there are other ways?

Of course there is, and that’s the ArrayBuffer in ES6. An ArrayBuffer object is used to represent a generic, fixed-length buffer of raw binary data, which is an array of bytes. An ArrayBuffer can be used to obtain a contiguous binary region in operating system memory. This area is then used by JS.

// create an ArrayBuffer with a size inbytes const buffer = new ArrayBuffer(8); // Input parameter 8 is length console.log(buffer.bytelength);Copy the code

It is important to note that the ArrayBuffer itself does not have the ability to manipulate bytes. Such as:

let buffer = new ArrayBuffer(3);
let view = new DataView(buffer);
view.setInt8(0, 8)
Copy the code

More details are not covered in this article, please find out for yourself.

The end.