This article is the second in a series that delve deeper into immutable.

The Implementation of IMmutable.js

This article explores the implementation mechanism of IMmutable


In the last article, we studied the basic implementation principle of Immutable. Js persistent data structure, introduced its core data structure Vector Trie, and explored the bit partitioning mechanism. The fundamental reason for using bit-partitioning is to optimize speed, but what does Immutable. Js do for space? So let’s talk about that.

HAMT

HAMT is hash Array Mapped Trie. The basic principle is very similar to Vector Trie, but it compresses the tree to save some space. Immutable. Js is a reference to HAMT for tree height and internal node compression.

Tree high compression

Let’s say we have a 2 xVector Trie, now a value is saved, key is110(binary), it will be saved to0 1 1Under this path, the figure is as follows:



Obviously, the structure shown in the figure has been optimized to the simplest extent, since only one value is now stored, so the value of the110Extraneous nodes are removed. What else can be optimized? We found that the two nodes in the middle can also be removed, as shown in the figure below:



To get this value, we first get it from0If I look it up, I realize that this is a root node, so I just take the value that it stores. This means that we can identify the key in as few bits as possible without causing confusion. This gives us a high degree of compression, reducing both space and searching and modifying time.

If you want to add a value, its key ends as well0What should I do? It’s very simple, as shown below:



We simply add or subtract nodes as needed.

Internal compression of a node -Bitmap

As we mentioned in the previous article, Immutable. Js Trie has a node array that is 32 in length. However, in many cases, these 32 positions are mostly unused, and such a large array takes up a lot of space. useBitmap, we can compress the array.

Let’s start with an array of length 8:



We’re actually indexing the key with the index of the array, so bits 5, 6, and 7 are useless. What about bits 0, 2, and 3? Do we need to maintain an array of length 5 for an index of 4? We just need to specify the number of positions with subscripts 1 and 4 in the imaginary array. And that’s where it comes inbitmapThat is as follows:



We use a binary representation of a placeholder in an imaginary array of length 8, where 1 indicates that the corresponding subscript position in the array has a value and 0 indicates that the corresponding subscript position is empty. For example, if the fourth bit of the binary number (from right to left, counting from 0) is now 1, it means that the index 4 of the array has a value. So an array of length 8 can be compressed down to 2.

Note the or the elements of an array according to the order of “imaginary array”, so if we want to take the subscript to the I element in the array “hypothetical”, the first is to determine the position have any value, if any, the next step is to get before it has several elements, namely in the binary number how many people are there in the first before I bit is 1, assume that number for a, The subscript of this element in the current compressed array is a.

In the actual operation, we can passbitmap & (1 << i - 1)In the binary number, only the bits before the ith bit are 1, and the rest are all 0. Now we only need to count the number of 1’s in the binary number to get the subscript. The process of counting 1’s in binary numbers is calledpopcount, there are many specific algorithms, I do not know more about the expansion, after the first click is the address of wikipedia, interested can study.

Let’s take a look at this part of the source:

get(shift, keyHash, key, notSetValue) {
  if (keyHash === undefined) {
    keyHash = hash(key);
  }
  const bit = 1 << ((shift === 0 ? keyHash : keyHash >>> shift) & MASK);
  const bitmap = this.bitmap;
  return (bitmap & bit) === 0
    ? notSetValue
    : this.nodes[popCount(bitmap & (bit - 1))].get(
        shift + SHIFT,
        keyHash,
        key,
        notSetValue
      );
}Copy the code

popCount


function popCount(x) {
  x -= (x >> 1) & 0x55555555;
  x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
  x = (x + (x >> 4)) & 0x0f0f0f0f;
  x += x >> 8;
  x += x >> 16;
  return x & 0x7f;
}Copy the code

Why 32

In the previous article we mentioned that Immutable. Js Vector Trie uses 32 as the length of the array, and explained why it uses 32 as the length of the arrayA partition, the number can only be an integer power of 2, so it cannot be 31, 33, etc. But what about 8, 16, 64, and so on? This is based on actual testing, as shown below:



Here are the search and update times, respectively. Does it look like 8 or 16 would be better? 32 is selected for Immutable because searches are much more frequent than updates.

review

Now we can understand the screenshot at the beginning of the first article:





We can see that there are three main types of nodes in the map:

  • HashArrayMapNode, the number of child nodes is > 16, and the array length is 32
  • BitmapIndexedNode, the number of child nodes ≤16, the length of the array is the same as the number of child nodes, compressed by bitmap
  • ValueNode, leaf nodes that store keys and values

In addition, each node seems to have an ownerID attribute. What does that do? It involves mutable data structures in Immutable. Js.

Transient

Data structures in Immutable. Js have two forms, “Immutable” and “mutable.” While “Immutable” is a major advantage of Immutable. Js, operations in the “mutable” form are certainly more efficient. Sometimes for a set of operations, we only need to get the state after that set of operations, and it would be unnecessary to use immutable data structures for every operation in the middle. In this scenario, we can use the withMutations method to temporarily “change” the corresponding data structure, and finally return an immutable structure, which is Transient, for example:

let map = new Immutable.Map({});
map = map.withMutations((m) = > {
  / / open Transient
  m.set('a'.1); M = m.set('a', 1);
  m.set('b'.2);
  m.set('c'.3);
});
/ / end of TransientCopy the code

withMutations
map
deleteAll
ownerID
Transient







Transient
Transient
ownerID
Transient

  1. Of the node to operate onownerIDIf the value is inconsistent with that of the parent node, a new node is generated and the value of the old node is copied overownerIDUpdated to the parent nodeownerIDAnd then carry out corresponding operations;
  2. Of the node to operate onownerIDIf the value is the same as that of the parent node, the operation is performed on the parent node.

To enable Transient in Immutable. Js:

function OwnerID() {}Copy the code

function asMutable() {
  return this.__ownerID ? this : this.__ensureOwner(new OwnerID());
}Copy the code

function withMutations(fn) {
  const mutable = this.asMutable();
  fn(mutable);
  return mutable.wasAltered() ? mutable.__ensureOwner(this.__ownerID) : this;
}Copy the code

It gives the root node one
ownerIDthe
ownerIDWill be used in the following operations according to the above logic. Note that this code uses the address of the JS object as the ID, because each new object must have a different address from the previous object, so this method can be very simple and efficient to construct a set of ID system.


Let’s take a look at the source code (Map)
setThe operation will call this
updateMethod) :

update(ownerID, shift, keyHash, key, value, didChangeSize, didAlter) {
  / /... Omit the previous code
  const isEditable = ownerID && ownerID === this.ownerID;
  const newNodes = setAt(nodes, idx, newNode, isEditable);

  if (isEditable) {
    this.count = newCount;
    this.nodes = newNodes;
    return this;
  }

  return new HashArrayMapNode(ownerID, newCount, newNodes);
}Copy the code

As with the previous logic, compare the node first
ownerIDAnd then operate directly on the node or generate a new node.


Hash conflict

There’s nothing new in this section, but let’s look at how Immutable is actually handled.

As we saw in the previous article, Immutable. Js hashes a key or value into a Map, and stores the value in the corresponding position in the tree. Different keys can be hashed to the same result, even though the probability should be small.

Hash conflict is a very basic problem and can be solved in many ways. The simplest and most applicable method here is to expand the conflicting nodes into a linear structure, that is, an array, which directly stores a group of keys and values. When the array is found, it will traverse the array to find the matching key. Although the time complexity here becomes linear, the increase in time complexity is negligible given the low probability of hash collisions.

I found a hash function pair for ImmutableabcandbCcThe hash results are all96354Using these two keys in the same map will cause hash collisions.



Immutable. Js uses a method calledHashCollisionNodeTo handle the conflicting key values, which are placed onentriesIn the array.

You can also try it yourself, the code is as follows:

let map = new Immutable.Map({});

for (let i = 0; i < 10; i++) {
  map = map.set(Math.random(), i); // Insert any other data
}

map = map.set('abc'.'value1');
map = map.set('bCc'.'value2');

console.log(map)Copy the code



If you have any questions in this article, please point them out.

This article is the second in my ongoing series of in-depth explorations of immutable. To be honest, it’s been a long time coming, and the available data are rarely scattered, and mostly from other programming languages. I will continue to update the third and even the fourth article when I have time and energy, and I feel there is still something to expand on.



Reference:

Hypirion.com/musings/und… Io-meter.com/2016/11/06/… Cdn.oreillystatic.com/en/assets/1… Infoscience. Epfl. Ch/record / 1698… Lampwww. Epfl. Ch/cca shut/idea… Github.com/funfish/blo… Github.com/facebook/im…