Introduce a HashMap

A HashMap is an implementation of a map based on a hash table. It stores key-value pairs in the form of key-value and allows null values and null keys. However, only one key can be null. Inside a HashMap, an array of entries is used as the storage medium.

This article is based on the Java8 version, which has a major change to HashMap, using arrays + linked lists + red-black trees to record key-value pairs. In addition, the resize method has been rewritten in Java8. Before Java8, it was possible to cause a loop problem in the linked list during capacity expansion.

The source code interpretation

There is a lot of code in this article, and most of the instructions are in comments, so it may not be very literal, but it also introduces some of the new methods in java8 as much as possible. It’s important to note that many of these changes have been made and added to the list. This article will certainly be very different from the Java7 HashMap you saw earlier, and it’s worth considering why Java8 made so many changes.

The Node is introduced

Node is the implementation class of map. Entry, which is used to store the key-value pairs in HashMap. It is a very important internal class in HashMap It’s too long… For a brief introduction, most inner classes are used for collection operations, such as keySet,values,entrySet, etc.

An internal

Static class Node<K,V> implements map. Entry<K,V> {// Final K key is immutable; //value V value; Node<K,V> next; / / hash value int hash; }Copy the code

A HashMap of

HashMap main parameters

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; Static final float DEFAULT_LOAD_FACTOR = 0.75f; // Default load factor, equivalent to only how many transient Node<K,V>[] table; // Hash table to store value TRANSIENT int size; // Actual size int threshold; // Threshold final float loadFactor; // Load factor TRANSIENT int modCount; // Record the number of operationsCopy the code

Java8 favors and bitwise operations over previous versions of HashMap. >> Represents logical move right >>> represents unsigned logic move right. High zero padding

HashMap constructor

Public HashMap() {this.loadfactor = this.loadfactor = this.loadfactor = this.loadfactor = DEFAULT_LOAD_FACTOR; Public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: "+ // initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; / / load silver determine if (loadFactor < = 0 | | Float. The isNaN (loadFactor)) throw new IllegalArgumentException (" Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }Copy the code

A HashMap method

This paper mainly selects several commonly used and important methods for analysis. Because the write method already covers most of the method call relationships.

The hash method uses hashCode xor hashCode to shift 16 bits to the right to get the hash value, which is equivalent to taking zero for both of the high values. By the way, the hashcode of type int is itself.

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The put method parses the PUT method, which is an important method in HashMap. It involves a lot of knowledge and needs to be read carefully.

public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } /** * onlyIfAbsent Whether to replace, if true, replace if there is a value * EVICT is used for LinkedHashMap to remove nodes that have not been used recently */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; Expansion and / / uninitialized expansion (including new and expanded) if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // If it does not exist, put if ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; / / according to different conditions to obtain the node node, likely TAB [I] is the element of the existence of the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); Else {for (int binCount = 0; ; If ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); TreeifyBin (TAB, hash); if (binCount >= treeify_threshold-1) // -1 for 1st break; } / / find records the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; }} // if (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; AfterNodeAccess (e); afterNodeAccess(e); return oldValue; ++modCount; if (++size > threshold) resize(); // For linkedHashMap afterNodeInsertion(evICT); return null; }Copy the code

The following describes the expansion method in detail. It looks huge. The logic is not complicated

/** * there are several cases * 1. When null, that is, when there is no initialization * 2. */ final Node<K,V>[] resize() {Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {if (oldCap > 0) {if (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) Integer.MAX_VALUE; return oldTab; } // If the capacity is in the normal range x2 else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr =  oldThr << 1; Else if (oldThr > 0) {newCap = oldThr; } else {newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // User defined map initialization if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // New capacity threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; // If (oldTab! For (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; // If (e.next == null) newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); // If (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// Preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; // If the value is 0, the position will remain unchanged. If the value is 1, the position will move n+old. There is no cycle dependence like in 1.7 * 2. From the point of view of probability, it can be evenly distributed (generally high and low ratio is similar) * 3. */ do {next = e.next; If ((e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // if (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

Unlike the HashMap before Java8, which has initialization operations, initialization and expansion are chosen together and the concept of red-black tree is added. As a result, the whole method has a lot of judgment times, which is also the main reason for the large size of this method.

It is worthwhile to optimize the linked list when recalculating positions after expansion. If you are interested, you can search for HashMap causing 100% CPU problems and find out whether the high level is 0 or 1 by doing the clever & operation in Java. The final move position is to leave the lower part of the list in place and put the higher part at index+oldsize. There is no need to calculate the hash value for each element and move it to the corresponding position to determine whether it is a linked list or whether it needs to be converted to a tree. As shown below.

hashcode: 1111111111111101212
oldcap:   0000000000000010000
Copy the code

It is easy to know that the ampersand operation is 0, because oldCap is always a multiple of 2, and only the high value is 1, so ampersand is more efficient than % mod. At this point, it may be clearer to look at the expansion operation above.

Here are some newNode methods. Create a new node. Why do you want to pull newNode out? The answer is given at the end of the article.

Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
    return new Node<>(hash, key, value, next);
}
Copy the code

The method of adding a node to a red-black tree is a new addition to Java8, and it will be converted to a red-black tree only if the length of the list reaches 8. The main purpose of this method is to prevent the list at a subscript from being too long, resulting in slow finding

Final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] TAB,int h, K K,V V) {Class<? > kc = null; boolean searched = false; TreeNode<K,V> root = (parent! = null) ? root() : this; for (TreeNode<K,V> p = root;;) { int dir, ph; K pk; if ((ph = p.hash) > h) dir = -1; else if (ph < h) dir = 1; / / is returned directly, used to replace else if ((pk = p.k ey) = = k | | (k! = null && k.equals(pk))) return p; / / whether the node type is the same, different else if (= = null && (kc, kc = comparableClassFor (k)) = = null) | | (dir = compareComparables (kc, k, Pk)) == 0) {if (! searched) { TreeNode<K,V> q, ch; searched = true; If (((ch = p.lift)! = null &&(q = ch.find(h, k, kc)) ! = null) ||((ch = p.right) ! = null && (q = ch.find(h, k, kc)) ! = null)) return q; Dir = tieBreakOrder(k, pk); dir = tieBreakOrder(k, pk); } TreeNode<K,V> xp = p; If ((p = (dir <= 0)? p.left : p.right) == null) { Node<K,V> xpn = xp.next; TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn); if (dir <= 0) xp.left = x; else xp.right = x; xp.next = x; x.parent = x.prev = xp; if (xpn ! = null) ((TreeNode<K,V>)xpn).prev = x; moveRootToFront(tab, balanceInsertion(root, x)); return null; }}}Copy the code

1. Search from the root node, return if found, or search for byte 2 if not found. Determine which direction to search for the node. 3. If the node does not exist, add a new node to the end of the child node

Now let’s take a look at the split method of the tree, which is mainly for expansion operation. To re-settle the location, we need to split the tree. As mentioned before, expansion will perform & operation according to the capacity of the old map and move the node with the high position of 1. And verify that the new node list needs to be converted back into a linked list.

final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) { TreeNode<K,V> b = this; TreeNode<K,V> loHead = null, loTail = null; TreeNode<K,V> loHead = null; TreeNode<K,V> hiHead = null, hiTail = null; int lc = 0, hc = 0; for (TreeNode<K,V> e = b, next; e ! = null; e = next) { next = (TreeNode<K,V>)e.next; e.next = null; If ((e.hash & bit) == 0) {if ((e.hash = loTail) == null) loHead = e; if (e.hash = loTail) == null) loHead = e; else loTail.next = e; loTail = e; ++lc; Else {if ((e.rev = hiTail) == null) hiHead = e; else hiTail.next = e; hiTail = e; ++hc; If (loHead! If (lc <= UNTREEIFY_THRESHOLD) TAB [index] = lohead.untreeify (map); Else {// convert to tree TAB [index] = loHead; if (hiHead ! = null) // (else is already treeified) loHead.treeify(tab); } // if (hiHead! = null) { if (hc <= UNTREEIFY_THRESHOLD) tab[index + bit] = hiHead.untreeify(map); else { tab[index + bit] = hiHead; if (loHead ! = null) hiHead.treeify(tab); Final Node<K,V> untreeify(HashMap<K,V> map) {Node<K,V> hd = null, tl = null; for (Node<K,V> q = this; q ! = null; q = q.next) { Node<K,V> p = map.replacementNode(q, null); if (tl == null) hd = p; else tl.next = p; tl = p; } return hd; }Copy the code

The details of the transformation into a red-black tree are not covered in detail in this article, but I believe it is easy to understand. Here is a picture from Meituan-Dianping technical team to show the process of put method

The get method

The logic is simple: first identify the location using HashCode, and then continue the search in different ways based on the structure type

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; If ((TAB = table)! = null && (n = tab.length) > 0 &&(first = tab[(n - 1) & hash]) ! = null) { if (first.hash == hash &&((k = first.key) == key || (key ! = null && key.equals(k)))) return first; If ((e = first.next)! = null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

The remove method

public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; } /** * 1. Find if there is a fractional data type * 2. */ Final Node<K,V> removeNode(int hash, Object key, Object value, Boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; If ((TAB = table)! = null && (n = tab.length) > 0 &&(p = tab[index = (n - 1) & hash]) ! = null) { Node<K,V> node = null, e; K k; V v; if (p.hash == hash &&((k = p.key) == key || (key ! = null && key.equals(k)))) node = p; else if ((e = p.next) ! = null) {// Get if (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreenode (hash, key); else { do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) ! = null); }} if (node! = null && (! matchValue || (v = node.value) == value || (value ! If (node instanceof TreeNode) = null && value.equals(v)))) {// ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); Else if (node == p) TAB [index] = node.next; Else // Next to node p.next = node.next; ++modCount; --size; afterNodeRemoval(node); return node; } } return null; }Copy the code

This article will skip the detailed tree deletion process, which is too much and too long. Imagine the tree deletion process for yourself. Replace method

@Override public V replace(K key, V value) { Node<K,V> e; If ((e = getNode(hash(key), key))! = null) { V oldValue = e.value; e.value = value; AfterNodeAccess (e); return oldValue; } return null; }Copy the code

From the basic methods, we can see that the most complicated method is the PUT method, which designs a lot of methods. The subsequent GET,replace and remove methods are all based on the PUT method.

Supplementary methods

I’ve covered a few basic methods, but here are some useful tips. All added in Java8.

Merge merge adds elements if they do not exist. If they do exist, merge adds elements as specified by remappingFunction. If value is null, delete elements

public V merge(K key, V value,BiFunction<? super V, ? super V, ? extends V> remappingFunction) { if (value == null) throw new NullPointerException(); if (remappingFunction == null) throw new NullPointerException(); int hash = hash(key); Node<K,V>[] tab; Node<K,V> first; int n, i; int binCount = 0; TreeNode<K,V> t = null; Node<K,V> old = null; / / expansion operation if (size > threshold | | (TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((first = TAB [I = (n-1) &hash])! = null) { if (first instanceof TreeNode) old = (t = (TreeNode<K,V>)first).getTreeNode(hash, key); else { Node<K,V> e = first; K k; do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) { old = e; break; } ++binCount; } while ((e = e.next) ! = null); } // If (old! = null) { V v; if (old.value ! V = remAppingfunction.apply (old.value, value); else v = value; AfterNodeAccess subclass linkedHashMap if (v! = null) { old.value = v; afterNodeAccess(old); Else removeNode(hash, key, null, false, true); return v; } // If (value! = null) { if (t ! = null) t.putTreeVal(this, tab, hash, key, value); else { tab[i] = newNode(hash, key, value, first); if (binCount >= TREEIFY_THRESHOLD - 1) treeifyBin(tab, hash); } ++modCount; ++size; afterNodeInsertion(true); } return value; }Copy the code

Compute method

public V compute(K key,BiFunction<? super K, ? super V, ? Extends V> remappingFunction extends V> remappingFunction extends V> remappingFunction extends V> remappingFunction extends V> remappingFunction extends V> remappingFunction null : old.value; V V = remappingfunction. apply(key, oldValue); // Old elements exist, old assignment to old elements if (old! = null) { if (v ! = null) { old.value = v; afterNodeAccess(old); } else removeNode(hash, key, null, false, true); } //r if there are no old elements, but the calculated value is not empty, add operation, same as merge method, omit return v; }Copy the code

I’m sure you’re all wondering why you put two methods that are almost the same,

Let’s compare the two methods in detail and find several differences:

  1. The merge method requires that value not be null, and the compute method does not.
  2. Merge methods that require old.value not to be null.compute are still not required.

There are also two extension methods of compute, just to show you a little bit.

The computeIfAbsent method does not replace oldValue if it is not null. If the calculated value does not exist, the value is returned. Otherwise, if old is not null, the value is replaced by the key, or the value is added to the map

public V computeIfAbsent(K key,Function<? super K, ? Extends V> mappingFunction) {extends V> mappingFunction) { if (old ! = null && (oldValue = old.value) ! = null) { afterNodeAccess(old); return oldValue; } V V = mappingfunction.apply (key); if (v == null) { return null; } else if (old ! = null) { old.value = v; afterNodeAccess(old); return v; } // add operation omitted}Copy the code

The computeIfPresent method does not make a judgment on expansion (because there is no such thing as adding an operation if you can’t find it). If the result is not null, replace it; otherwise, delete it

public V computeIfPresent(K key,BiFunction<? super K, ? super V, ? extends V> remappingFunction) { if (remappingFunction == null) throw new NullPointerException(); Node<K,V> e; V oldValue; int hash = hash(key); if ((e = getNode(hash, key)) ! = null && (oldValue = e.value) ! V V = remappingfunction. apply(key, oldValue); if (v ! = null) { e.value = v; afterNodeAccess(e); return v; } else removeNode(hash, key, null, false, true); } return null; }Copy the code

Again, let’s summarize the similarities and differences between the three compute methods

  1. OldValue is required for both the compute and computeIfPresent methods, but not for computeIfAbsent
  2. The remappingFunction.apply method for compute does not restrict oldValue to null, but computeIfPresent must not be null. The computeIfAbsent method must be old or oldValue must be nu Ll performs subsequent operations
  3. ComputeIfPresent computeIfPresent applies only oldValue and then replaces or deletes it based on the condition. Compute and computeIfAbsent perform additional operations based on the condition if old is not present

To put it simply:

  1. If oldValue does not exist, then we are declaring itString a = nullFor this operation, you now need to initialize, select the computeIfAbsent method,
  2. If oldValue is required and only deletion or modification is performed, select the computeIfPresent method to see if there is any value left to modify.
  3. Otherwise choose the compute method

Fast failure and security failure issues

When iterators iterate over a collection object, Concurrent Modification Exceptions are thrown if the contents of the collection object have been modified (added, deleted, modified) during the iteration. How it works: Iterators directly access the contents of a collection during traversal and use a modCount variable during traversal. If the contents of the collection change during traversal, the value of modCount is changed. Whenever the iterator iterates over the next element using hashNext()/next(), it checks whether the modCount variable is the expectedmodCount value and returns the traversal if it is; Otherwise, an exception is thrown and the traversal is terminated. Note: The exception is thrown if modCount is detected! =expectedmodCount This condition. If the set changes when the modCount value is just set to the expectedmodCount value, the exception will not be thrown. Therefore, concurrent operations cannot be programmed depending on whether or not this exception is thrown; this exception is only recommended for detecting concurrent modification bugs. Scenario: Collection classes in the java.util package fail fast and cannot be modified concurrently (iteratively) in multiple threads. The collection container that adopts the security failure mechanism is not directly accessed on the collection content during traversal. Instead, the original collection content is first copied and traversed in the copied collection. Principle: Since the copy of the original collection is iterated during iteration, the changes made to the original collection during iteration cannot be detected by the iterator, so Concurrent Modification Modification is not triggered. Disadvantages: Copy-based has the advantage of avoiding Concurrent Modification Exceptions, but again, the iterator cannot access the modified content, i.e., iterators iterate over the copy of the collection at the beginning of the iteration, and the iterator is not aware of the changes that occurred during the iteration of the original collection. Scenario: Containers under the java.util.concurrent package are security failures and can be used and modified concurrently in multiple threads.Copy the code

answer

To explain the suspense set up earlier in this article, the newNode method is mentioned separately, in large part because LinkedhashMap will need to override this method for additional operations. For details, check out the source code for yourself, or see my other article (yet to be published), the Map Collection Class Summary.

conclusion

The HashMap looks a lot bigger and more mysterious, because it looks like the methods are bigger than before and there is a red-black tree data structure, so it makes people feel bad to see the code, and it also appeals to everyone not to write such a big method. Make the method as small as possible. It’s going to be more concise. However, the changes made to HashMap by Java8 have improved the performance of HashMap to some extent, as well as filled in a number of new methods.

Of course, this article is just some personal views, if there is insufficient or wrong place, welcome to correct!!

conclusion

With your mutual encouragement!!