above


Before formal source learning begins, we need to have a general understanding of what we are studying. As a data structure often used in daily development, HashMap is also familiar with its basic features, such as key-value pairs in actual storage form, only one same key can exist, and differences from HashTable, etc. But these are relatively application-level things, and understanding the underlying implementation is essential if you really want to get the most out of using HashMap and avoid pitfalls.

This article will provide a brief analysis of the source code for HashMap in the following directory structure. The following JDK version is 1.8.

  1. The structure of a HashMap
    1. API
    2. The inner class
    3. parameter
  2. The Hash algorithm
  3. Put () and get ()
  4. Difference between 1.7 and 1.8

structure

API


HashMap implements the Map interface, so let’s start by looking at the underlying APIS that the Map interface defines.


The first are the methods found in most collection classes, size(), PUT (), get(), remove(), clear(), isEmpty(), and so on, as well as methods that return a Key or Value set. The rest are methods related to a key-value pair storage structure like Map.

Starting with JDK1.8, HashMap provides some functional programming apis that we can use in certain scenarios to make our business scenarios very straightforward.

The inner class


There are several inner classes in HashMap. The storage data structure of HashMap is based on Node and TreeNode. These two inner classes implement the inner classes of Map interface.

Node The code of a storage Node is as follows:

/** Basic hash bucket node */
static class Node<K.V> implements Map.Entry<K.V> {
    final int hash; / / hash value
    final K key;
    V value;
 Node<K,V> next; // Next node of the linked list } Copy the code

The underlying structure of a HashMap should be familiar. In general, it is stored as an array and a linked list. The basic storage unit of the linked list is the Node structure, which stores the key and value and the hash value corresponding to the key. It avoids the repeated calculation of hash value in some method calls, and holds a reference to the next Node to ensure the form of a linked list. The following image shows how a HashMap holds data as an array plus a linked list.


TreeNode is a storage Node used when the linked list is converted into a red-black tree structure when the length of the list is greater than a certain extent. In addition to the four attributes of Node, It also holds references to its parent, left, right, and prev nodes (as well as before and after of LinkedhashMap.Entry). The following image shows how a HashMap holds data as an array, a linked list, and a red-black tree.


Other inner classes are simpler, except for collection classes that act as HashMaps to access their internal key, value, or entry nodes. The rest are related iterators, etc., which we won’t go into in depth in this article.

parameter


Before we look at the loadFactor and size parameters, we know that a HashMap can be instantiated using four public constructors.


The four constructors are specify initialize capacity and specify load factor construction, specify initialize capacity construction, default construction, and give another Map object to construct. As you can see from the code, we can only customize these two parameters when instantiating the HashMap, and use the default values if they are not specified.

Let’s look at what these two values mean.

The size parameter is the number of key-value mappings contained in the map, and its corresponding default value DEFAULT_INITIAL_CAPACITY is 16. It must be a power of 2 because of the hash algorithm of the HashMap. We’ll talk about that when we get to the hash algorithm part.

The loadFactor parameter is the loadFactor of a HashMap. What is a load factor? The load factor decision is the factor by which a HashMap expands its existing elements (load). The default value of DEFAULT_LOAD_FACTOR is 0.75F, indicating that the HashMap expands when the number of existing elements in the HashMap reaches three quarters of its capacity. For example, when the initial capacity is 16, the elements in the HashMap are 12 and the elements are discretely distributed in different positions of the HashMap, the next element insertion may lead to Hash collision. Therefore, expansion is required to ensure the discreteness of the elements of the HashMap.

The capacity expansion threshold obtained from these two parameters becomes the threshold. Once the capacity of the HashMap exceeds this threshold, the capacity expansion mechanism will be triggered to attempt capacity expansion. The maximum capacity that can be expanded is MAXIMUM_CAPACITY = 1 << 30. If the maximum capacity exceeds this threshold, the system will not expand the capacity (the highest bit is a sign bit, so the capacity can only be expanded to this size, but not to 1 << 31).

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    / /... The put operation
    // If the size is greater than the threshold, expand the capacity
    if (++size > threshold)
 resize();  return null; } Copy the code

After JDK1.8, the form of HashMap array + linked list becomes array + linked list + red-black tree. This mechanism is to solve the problem that Hash table degrades into linked list under the condition of frequent Hash collisions, resulting in the query time complexity increasing to O(N). Therefore, when the length of the linked list is larger than a certain threshold, The linked list is transformed into a red-black tree structure to ensure O(log N) time complexity. The threshold is TREEIFY_THRESHOLD=8, which translates to a red-black tree when the list length is greater than 8. In practice, the node in the array is considered, so the conversion takes place at the size of treeIFy_threshold-1.


Of course, there must also be an anti-tree threshold, UNTREEIFY_THRESHOLD=6. The length threshold of the tree degradation back list is slightly smaller than the mature tree threshold. The lazy loading method is used to avoid frequent structural changes.

There is another important parameter for treification: MIN_TREEIFY_CAPACITY=64. The tree mechanism is enabled only when the capacity is greater than this value. If the capacity of the HashMap is less than this value, even if the condition is reached, capacity expansion is used instead of tree, rather than tree directly. In the case of fewer tree elements, query efficiency is not necessarily better than linked lists.

The remaining property is entrySet, which holds a collection of key-value pairs. The Table property is the Node array shown in the figure above. ModCount, a version-like attribute, is present in many collection classes, enabling fail-fast.

The Hash algorithm

There are two important methods for this part, hash() and tableSizeFor(). The two methods are used to compute hash values and a tableSize.

hash()

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The hash() method in JDK1.8 is different from the hash() method in JDK1.7. The differences will be discussed later. In a HashMap, the initial length of the hash table is 16. If hashCode is used directly to address the length, then only the lower four bits are valid. So if hashcodes are 2^10, 2^20, 2^30, then the index will be the same and the hash table will not be evenly distributed. Therefore, JDK1.8 uses xOR between high and low values to perform perturbation calculations to reduce such conflicts (HashMap allows null keys to be stored, which returns 0 directly from the hash calculation of null keys).

tableSizeFor()

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
 n |= n >>> 8;  n |= n >>> 16;  return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; } Copy the code

Before explaining this method, let’s now look at how a HashMap is addressed using a hash value. As can be seen from the figure, the array subscript corresponding to the hash value is calculated by (n-1) & hash (n is the array length). So why not just hash % n? In fact, (n-1) &hash evaluates to hash % n (if n is a power of 2, which is why the size of a HashMap must be a power of 2). Using and instead of modulo is faster, so a HashMap needs to ensure that the length of the hash table is a power of 2. And the tableSizeFor() method does just that.


If tableSizeFor() is used, you can obtain the minimum power of 2 that is greater than or equal to cap. For example, tableSizeFor(6)=8 and tableSizeFor(12)=16. By keeping the length of the hash table to a power of two, the (n-1) & hash addressing ensures that the array index corresponding to the hash value is correctly and efficiently obtained.

Put () and get ()

put

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

The Put() method is implemented by calling the putVal() method of the final modifier. Let’s follow the code step by step to see how it is implemented.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the hash table is empty, resize
    if ((tab = table) == null || (n = tab.length) == 0)
 n = (tab = resize()).length;  // If (n-1)&hash is empty, a new node is created  if ((p = tab[i = (n - 1) & hash]) == null)  tab[i] = newNode(hash, key, value, null);  else {  Node<K,V> e; K k;  / / the key conflict  if (p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))  e = p;  / / tree node  else if (p instanceof TreeNode)  e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);  // Query for a replacement in the list, or add a new node at the end  else {  for (int binCount = 0; ; ++binCount) {  if ((e = p.next) == null) {  p.next = newNode(hash, key, value, null);  if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st  treeifyBin(tab, hash);  break;  }  if (e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))  break;  p = e;  }  }  / / return  if(e ! =null) { // existing mapping for key  V oldValue = e.value;  if(! onlyIfAbsent || oldValue ==null)  e.value = value;  afterNodeAccess(e);  return oldValue;  }  }  // Modify "version number"  ++modCount;  // If the size is greater than the threshold, expand the capacity  if (++size > threshold)  resize();  afterNodeInsertion(evict);  return null; } Copy the code

resize()

Let’s look at the main resize() method, which is annotated in more detail.

final Node<K,V>[] resize() {
    // Save a copy of the current table
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
 int newCap, newThr = 0;  if (oldCap > 0) {  / / capacity  // Limit the maximum capacity  if (oldCap >= MAXIMUM_CAPACITY) {  threshold = Integer.MAX_VALUE;  return oldTab;  }  // Try twice  else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&  oldCap >= DEFAULT_INITIAL_CAPACITY)  newThr = oldThr << 1; // double threshold  }  // If an initial capacity is specified during construction, the capacity is stored in threshold and initialized with this value  else if (oldThr > 0) // initial capacity was placed in threshold  newCap = oldThr;  // Otherwise, use the default initial capacity and calculate the corresponding threshold  else { // zero initial threshold signifies using defaults  newCap = DEFAULT_INITIAL_CAPACITY;  newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);  }  if (newThr == 0) {  // After the initial capacity is specified, a new threshold is calculated  float ft = (float)newCap * loadFactor;  newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?  (int)ft : Integer.MAX_VALUE);  }  threshold = newThr;  // Add data from the old table to the new table  @SuppressWarnings({"rawtypes"."unchecked"})  Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap];  table = newTab;  if(oldTab ! =null) {  // Iterate over the old table array node  for (int j = 0; j < oldCap; ++j) {  Node<K,V> e;  if((e = oldTab[j]) ! =null) {  oldTab[j] = null;  // This node is a single node. Hash & (capcity-1) is used to calculate the index of the new table  if (e.next == null)  newTab[e.hash & (newCap - 1)] = e;  // if it is a tree node, use the TreeNode#split() method  else if (e instanceof TreeNode)  ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);  / /!!!!!! Point!!  // This node is a linked list node that reallocates the elements in the list  else { // preserve order  // Low level linked list  Node<K,V> loHead = null, loTail = null;  // High order list  Node<K,V> hiHead = null, hiTail = null;  Node<K,V> next;  // Loop through the list  do {  next = e.next;  // hash & oldCap == 0  if ((e.hash & oldCap) == 0) {  if (loTail == null)  loHead = e;  else  loTail.next = e;  loTail = e;  }  // Other high  else {  if (hiTail == null)  hiHead = e;  else  hiTail.next = e;  hiTail = e;  }  } while((e = next) ! =null);  // Place the low list here unchanged  if(loTail ! =null) {  loTail.next = null;  newTab[j] = loHead;  }  // place the high-order list at j + oldCap  if(hiTail ! =null) {  hiTail.next = null;  newTab[j + oldCap] = hiHead;  }  }  }  }  }  return newTab; } Copy the code

Let’s focus on the “!! Important!!” The code. This is basically rehash from the old hash table to the new hash table.

If the node traversed is a single node, the index is found through the addressing method mentioned above and placed directly in the corresponding position of the new table. If it is a tree node, it is reassigned using the TreeNode#split() method; If it’s a linked list node, it’s more complicated.

As can be seen from the code, the designer of HashMap prepares two linked lists for rehash linked list nodes, namely low and high, corresponding to the two linked lists of low and high index. That is, the current long linked list is split into one or two chains. It can be seen that (e.hash & oldCap) == 0 is used to judge in the code, true will be put into the low level, false will be put into the high level, and after the allocation, the low level list will be put in its original position, and the high level list will be put into the position of “current index j + old table length oldCap”. This piece of code is actually quite different from JDK1.7, and JDK1.8 fixes a serious bug caused by this piece of code, which we’ll save for later.

TreeNode#split()

final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    TreeNode<K,V> b = this;
    // Redistribute the tree into two linked lists, which are saved by pre-traversal
    // Also split into low and high lists
    TreeNode<K,V> loHead = null, loTail = null;
 TreeNode<K,V> hiHead = null, hiTail = null;  int lc = 0, hc = 0;  for(TreeNode<K,V> e = b, next; e ! =null; e = next) {  next = (TreeNode<K,V>)e.next;  e.next = null;  // This judgment condition is the same as linked list splitting  if ((e.hash & bit) == 0) {  if ((e.prev = loTail) == null)  loHead = e;  else  loTail.next = e;  loTail = e;  ++lc;  }  else {  if ((e.prev = hiTail) == null)  hiHead = e;  else  hiTail.next = e;  hiTail = e;  ++hc;  }  }   // Place the index as index  if(loHead ! =null) {  // Decide whether to restore the tree based on the length of the split list  if (lc <= UNTREEIFY_THRESHOLD)  // Replace TreeNode with Node  tab[index] = loHead.untreeify(map);  else {  tab[index] = loHead;  if(hiHead ! =null) // (else is already treeified)  loHead.treeify(tab);  }  }  // place index as index + bit(oldCap)  if(hiHead ! =null) {  if (hc <= UNTREEIFY_THRESHOLD)  tab[index + bit] = hiHead.untreeify(map);  else {  tab[index + bit] = hiHead;  if(loHead ! =null)  hiHead.treeify(tab);  }  } } Copy the code

Reassigning a tree is actually saving the tree structure as a pre-traversal of the list and rehashing it in the same way as list allocation. The logic is not detailed.

get()

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code

The get() method is implemented using the getNode() method. Again, follow the code step by step.

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // The table is not empty, and the position corresponding to the key exists
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
 // Check the first node first  if (first.hash == hash && // always check first node ((k = first.key) == key || (key ! =null && key.equals(k))))  return first;  if((e = first.next) ! =null) {  // if the node is a tree, use the TreeNode#getTreeNode() method  if (first instanceof TreeNode)  return ((TreeNode<K,V>)first).getTreeNode(hash, key);  // Otherwise, the list is iterated for the corresponding key  do {  if (e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))  return e;  } while((e = e.next) ! =null);  }  }  return null; } Copy the code

First of all, we still look at the operation mode of linked lists. The operation mode of linked lists is very simple, that is, ordinary linked lists query corresponding keys.

If it is a tree node, it is queried using the TreeNode#find() method.

TreeNode#getTreeNode()

final TreeNode<K,V> getTreeNode(int h, Object k) {
    return((parent ! =null)? root() :this).find(h, k, null);
}
Copy the code
final TreeNode<K,V> find(inth, Object k, Class<? > kc) {
    TreeNode<K,V> p = this;
    do {
        int ph, dir; K pk;
        TreeNode<K,V> pl = p.left, pr = p.right, q;
 // Use hash comparisons to determine which side to traverse  if ((ph = p.hash) > h)  p = pl;  else if (ph < h)  p = pr;  // Equality returns directly  else if((pk = p.key) == k || (k ! =null && k.equals(pk)))  return p;  // Iterate over the right side first  else if (pl == null)  p = pr;  // Iterate over the left side again  else if (pr == null)  p = pl;  // Compare by class provided  else if((kc ! =null || (kc = comparableClassFor(k)) ! =null) && (dir = compareComparables(kc, k, pk)) ! =0)  p = (dir < 0)? pl : pr; else if((q = pr.find(h, k, kc)) ! =null)  return q;  else  p = pl;  } while(p ! =null);  return null; } Copy the code

The bottom layer is the simplest tree traversal query.

Difference between 1.7 and 1.8

JDK1.7 and 1.8 have quite a few different implementations of HashMap.

  • 1.7 Use array + list, 1.8 use array + list + red-black tree.

  • 1.7 uses head insertions (which led to a serious bug) and 1.8 uses tail insertions.

  • 1.7 Rehash makes it possible to change the order of a linked list (as a result of head insertions), while 1.8 rehash guarantees the order of the original list.

  • The hash algorithm in 1.7 is different from that in 1.8.

  • The resize method of 1.7 differs greatly from that of 1.8.

The difference between 1.7 and 1.8 will be discussed in detail in a separate article later (because I haven’t read the 1.7 source code yet).

later

The article references several blogs for a few key points.

https://www.cnblogs.com/eycuii/p/12015283.html

https://mp.weixin.qq.com/s/_zbOHbQa2zDVosXUlYUrSQ

Here are some notes and understandings I made while studying the HashMap source code. If anything is wrong, feel free to comment