preface

HashMap is an asynchronous implementation of the Map interface based on a hash table. This implementation provides all optional mapping operations and allows the use of NULL values and NULL keys. This class does not guarantee the order of the mapping, and in particular, it does not guarantee that the order is constant.

The data structure of HashMap

In the Java programming language, there are two basic structures, one is an array and the other is an analog pointer (reference). All data structures can be constructed from these two basic structures, and HashMap is no exception. A HashMap is actually a “linked list hash” data structure, a combination of an array and a linked list.

Text description should always be accompanied by the image above to better explain the data structure. The structure diagram of HashMap is shown below.

As can be seen from the above figure, the underlying HashMap is an array structure, and each item in the array is a linked list or a red-black tree. When a new HashMap is created, an array is initialized.

Let’s take a look at the core members of the HashMap first.

Public class HashMap<K,V BB0 extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable; Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30 static final int MAXIMUM_CAPACITY = 1 << 30 Static final float DEFAULT_LOAD_FACTOR = 0.75f; static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8 static final int TREEIFY_THRESHOLD = 8 static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; /* Minimum hash table size when bin in bucket is treed. * If this threshold is not reached, i.e. the hash table capacity is less than MIN_TREEIFY_CAPACITY, a resize expansion will be performed when the number of bins in the bucket becomes too large. * This MIN_TREEIFY_CAPACITY value is at least four times the TREEIFY_THRESHOLD. */ static final int MIN_TREEIFY_CAPACITY = 64; static class Node<K,V> implements Map.Entry<K,V> { //... } // transient Node<K,V>[] table; Set< map.entry <K,V>> entrySet; // The number of key-values in a Map transient int size; /** * The number of structural changes. * Structural changes are changes in the number of elements in a map, such as the rehash operation. * for a HashMap fail fast operation, such as structural changes that happened in traversed will throw ConcurrentModificationException. */ transient int modCount; // The size of the next resize operation. int threshold; // Load factor, after resize the size of the capacity will increase the existing size * loadFactor final float loadFactor; }

Initialization of a HashMap

public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // All other values are default values}

I sorted out the ones for 2021
JAVA engineer classic interview question, 485 pages, about 850 interview questions with answers, PDF, Includes Java, MyBatis, ZooKeeper, Dubbo, ElasticSearch, Memcached, Redis, MySQL, Spring, Spring Boot, Spring Cloud, RabbitMQ, Kafka, Linux Such as almost all technology stack, each technology stack has not less than 50 classic interview questions, can not say that the brush package you into the big factory, but the targeted brush let you face the interviewer more than a few minutes of confidence or no problem.

We didn’t initialize the array table when we initialized it, so we can only put it when we put it. Presume to avoid using HashMap after initializing it instead of using it, hahaha.

Store operations for a HashMap

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

Let’s take a look at how HashMap determines the location of an array index, the details of putting, and the resize mechanism

Hash calculation to determine the array index position

Locating the location of the hash bucket array is a critical first step in adding, deleting, or finding key-value pairs. Said earlier HashMap data structure is a combination of arrays and linked list, so we certainly hope this HashMap element position as far as possible some more uniform distribution, try to ensure that only one of the number of elements in each position, so when we use the hash algorithm to obtain the location, immediately can know is the corresponding position of the element, No need to traverse the linked list, greatly optimizing the efficiency of the query. HashMap locates the index position of the array, which directly determines the discrete performance of the hash method.

Look at the source code implementation:

Static final int hash(Object key) {// jdk1.8int h; // h ^ (h >>> 16) return (key == null)? 0 : (h = key.hashCode()) ^ (h >>> 16); }

Implemented by the high or low 16 bits of hashCode() : (h = k.Hashcode ()) ^ (h >>> 16), mainly from the speed, efficiency, quality of consideration, this can be done when the array table length is relatively small, also can ensure that high and low bits are involved in the calculation of the Hash, at the same time will not have too much overhead.

We all know that the key.hashCode() function in the above code calls a hash function of the key type and returns an int hash. In theory, the hash value is an int. If you use the hash value as a subscript to access the main HashMap array, consider that the 32-bit signed int table value in binary ranges from 2147483648 to 2147483648. That adds up to about 4 billion mapping Spaces. As long as the hash function mapping is relatively uniform and loose, the general application is very difficult to collision. But the problem is that you can’t fit a 4 billion array in memory. You see, the initial size of the array before HashMap was expanded was 16. So the hash value can’t be used directly. Before you use it, you have to do the modulo operation on the length of the array to get the remainder of the array index. The modulo computation in the source code is done in the indexFor() function.

bucketIndex = indexFor(hash, table.length); Static int indexFor(int h, int length) {return h & (length-1); static int indexFor(int h, int length) {return h & (length-1); }

Incidentally, this also explains why the array length of a HashMap is an entire power of 2. Because this (array length 1) is exactly equivalent to a “low mask”. The result of the “and” operation is that all the high values of the hash are zeroed, leaving only the low values for array subscript access. Take the initial length 16 as an example, 16‑1=15. The binary representation is 00000000000000001111. The result is to truncate the lowest four bits of a hash value.

10100101 11000100 00100101 & 00000000 000000001111 ---------------------------------- 00000000 00000000 00000101 // All the top positions are reset to zero

But that’s where the problem comes in, because even if I have a very loose hash distribution, if I just take the last few bits, I’m going to crash a lot. What’s more, if the hash itself is not done well, the hole in the distribution of the arithmetic sequence, which happens to make the last few low positions repeat regularly, can be extremely painful. And that’s where the value of the perturbation function comes in, and I think you know what it is. Look at the graph below.

Hash calculation process

Move 16 bits to the right, which is exactly half the size of 32bit, and make XOR of its own high and low halves, in order to mix the high and low parts of the original hash code, so as to increase the randomness of the low parts. In addition, the mixed low places are mixed with some features of the high places, so that the information of the high places is retained in a disguised way.

PutVal method

The implementation process of HashMap’s put method can be understood from the following figure. If you are interested, you can study the source code more clearly.

The source code and explanation are as follows:

// real put final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {Node<K,V>[] TAB; Node<K,V> p; int n, i; / / if the table is not initialized, or the size of the initialized to 0, for the resize operation if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // If there is no data in the bucket corresponding to the hash, If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); Else {Node<K,V> E;} else {Node<K,V> E; K k; / / judge put elements and existing element is the same (hash, and equals return true) if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If the type of the element in the bucket is a treeNode (hash is the same, and equals returns true), then the value of the element in the bucket is not the same. Else if (p instanceof treeNode) e = ((treeNode <K,V>)p).putTreEval (this, TAB, hash, key, value); Else {// For (int binCount = 0; // For (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // If the length of the linked list is greater than the TREEIFY_THRESHOLD, If (binCount >= TREEIFY_THRESHOLD -1) // -1 for 1st treeifyBin(TAB, hash); break; } / / check if the key already exists, stop traversing the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; }} // if (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; // If (++ Size > Threshold) resize(); afterNodeInsertion(evict); return null; }

Expansion mechanism

The scaling mechanism for HashMap is clever and achieves scaling with minimal performance. So after the rehash, the element will either be in its original position, or it will be moved to the position of the last number of times, which means that if the last capacity was 16, the next capacity will be 16+16, If an element is at 7, the next time we expand it, it’s either going to be at 7 again, or it’s going to be at 7 plus 16.

Let’s explain how Java8’s scaling mechanism works. N is the length of the table. Figure (a) represents an example of key1 and key2 determining the index position before capacity expansion. Figure (b) represents an example of key1 and key2 determining the index position after capacity expansion, in which hash1 is the result of hash and high level operation corresponding to key1.

After the element has recalcated the hash, because n becomes 2 times, the mask range of n-1 is 1bit more in the high position (red), so the new index will change like this:

Therefore, when we extend a HashMap, we don’t need to recalcate the hash as we did in JDK1.7, we just need to see if the new bit in the original hash value is 1 or 0. If 0, the index remains the same, and if 1, the index becomes “old index +oldCap”. You can see the resize diagram below where 16 is expanded to 32:

And whether the high value of hash value is 1, only need to do and operation with the length after expansion, because the length after expansion is a power of 2, so the high value must be 1, the low value must be 0, such as the form of 10000, E.Hash & Oldcap in the source code to achieve this logic.

This design is really very clever, not only saves the time to recalculate the hash value, but also, since the new 1bit is considered to be a 0 or a 1, so the resize process evenly spread the previously conflicting nodes to the new bucket. This is where JDK1.8’s new optimizations come in. In JDK1.7, when the old list is migrated to a new one, if the new table has the same array index position, the list elements will be inverted. As you can see from the above figure, JDK1.8 does not inverted. Here is the resize source code for JDK1.8. It is written as follows:

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {if (oldCap > 0) {if (oldCap > 0) {if (oldCap > 0) { If (oldCap >= MAXIMUM_CAPACITY) {threshold = Integer.max_value; return oldTab; } // does not exceed the maximum value, Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap bb0 = DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // Calculate new resize limit if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) {// Move each bucket to the new buckets for (int j = 0; j < oldCap; ++j) { Node<K,V> e; If ((e = oldTab[j])! = null) { oldTab[j] = null; If (e.ext == null) newTab[e.ash & (newcap-1)] = e; if (e.ext == null) newTab[e.ash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e). Split (this, newTab, j, oldCap); Else {// preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; If (e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; Else {if (hitAil == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // If (loTail!) the list with a low hash value is placed in the original position of the array = null) { loTail.next = null; newTab[j] = loHead; } // The list with a high hash value is put into the array's original position + original capacity if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }