00 HashMap’s underlying data structure

In JDK 1.7, HashMap was composed of an array and a linked list. Since JDK 1.8, a red-black tree structure has been added. When the length of the linked list is greater than 8 and the capacity of the hash bucket is greater than 64, the structure of the linked list will be converted to a red-black tree structure. So, its composition is as follows:

Each element of an array in a HashMap is also known as a hash bucket, which is an instance of a key-value. It’s called an Entry in Java7 and a Node in Java8.

Because all of its positions are null, the PUT insert calculates an index value based on the hash of the key. For example, I put (” dog “, 666), insert the “dog” element into the HashMap, and calculate the index position of 3 through the hash algorithm. The structure looks like this, again an array.

If there is no hash conflict above, if there is a hash conflict then we have to introduce a linked list. Suppose I put (” dog “, 666) again, insert the element “dog” into the HashMap, and the hash algorithm calculates that the index position is also 3. The structure looks like this: a linked list is formed.

Each bucket contains four fields: hash, key, value, and next, where next represents the next node in the linked list.

static class Node < K, V > implements Map.Entry < K, V > { final int hash; final K key; V value; Node < K, V > next; . }

JDK 1.8 added red-black trees because if the list is too long, it will seriously affect the performance of HashMap, and red-black trees have the characteristics of fast add, delete, change and search, which can effectively solve the problem of long list operation is slow.

Important methods of a HashMap

PS: The following source analysis is all based on JDK1.8.

1.0 Query (GET method)

The source is as follows:

public V get(Object key) { Node < K, V > e; Return (e = getNode(hash(key), key)) == null? null : e.value; } final Node < K, V > getNode(int hash, Object key) { Node < K, V > [] tab; Node < K, V > first, e; int n; K k; If (TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {/ / to determine whether the first element to query elements / / always check first node if (first. The hash = = hash && ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; If ((e = first.next)! = null) {// If the first node is a tree structure, If (first instanceof treeNode) return ((treeNode < K, V >) first). GetTreeNode (hash, key); Do {/ / the tree structure, loop nodes judging / / hash is equal and the same key, it returns the node if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }

The code comments are detailed enough to emphasize that when a hash conflicts we need to determine not only the hash value, but also whether the key value is equal in order to determine whether the element is the one we want.

1.1 Added (PUT method)

The source is as follows:

Public V put(K key, V value) {return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node < K, V > [] tab; Node < K, V > p; int n, i; / / a hash table is empty, create a table if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) & hash]) == null) // if ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); else { Node < K, V > e; K k; / / if the key already exists, direct cover value if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If the key does not exist, Else if (p instanceof treeNode) // e = ((treeNode < K, V >) p). PutTreEval (this, TAB, hash, key, hash) value); Else {// for (int binCount = 0; ++binCount) {if ((e = p.next) == null) {p.next = newNode(hash, key, value, null); If (binCount >= TREEIFY_THRESHOLD -1) // -1 for 1st treeifyBin(TAB, hash); break; } / / the key already exists to cover the value directly if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; } } // existing mapping for key if (e ! = null) { V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++ Size Threshold) resize(); afterNodeInsertion(evict); return null; }

The comments are quite detailed. But the new method is more complex, draw a flow chart to facilitate you to understand:

1.2 Capacity enlargement (resize method)

The source is as follows:

Final Node < K, V > [] resize() {// final Node < K, V > [] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; int oldThr = threshold; Int newCap, newThr = 0; int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {threshold = Integer.max_value; return oldTab; } // enlarge the capacity to double the current capacity, MAXIMUM_CAPACITY else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold} // The current array has no data, Else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; Else {// zero initial threshold using defaults newCap = DEFAULT_INITIAL_CAPACITY; newCap = DEFAULT_INITIAL_CAPACITY newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // if (newThr == 0) {float ft = (float) newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ? (int) ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({ "rawtypes", "unchecked" }) Node < K, V > [] newTab = (Node < K, V > []) new Node[newCap]; // newTab = newTab; // Copy original data to new table if (oldTab! = null) {for (int j = 0; j < oldCap; ++j) { Node < K, V > e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; If (e.ext == null) newTab[e.ash & (newCap - 1)] = e; Else if (e instanceof TreeNode) // split((TreeNode < K, V >) e). Else {// preserve order // if the node is not empty and is a single linked list, Node < K, V > loHead = null, loTail = null; Node < K, V > hiHead = null, V > hiTail = null; Node < K, V > next; do { next = e.next; If (e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } + oldCap else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); If (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // put the original index + oldCap into the hash bucket if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

From the above source code can be seen, expansion is mainly divided into two steps:

  • Enlargement: Creates a new empty Entry array that is twice as long as the previous one.
  • Bit operation: The original element hash is ampersand to the original array length.

JDK 1.8 does not recalcate the hash value of each element as JDK 1.7 does. Instead, it uses a high-order operation (e.ash & oldCap) to determine if the element needs to be moved, assuming that the information for key1 is as follows:

  • Key1. Hash = 10; Binary: 0000 1010
  • OldCap = 16; Binary: 0001 0000

The result obtained by using E.Hash & Oldcap shows that the high bit is 0. When the result is 0, it means that the position of the element will not change during expansion. However, suppose that Key 2 information is as follows:

  • Key2. Hash = 17; Binary: 0001 0001
  • OldCap = 16; Binary: 0001 0000

When the result is 1, it means that the position of the element has changed during capacity expansion. The new subscript position is equal to the original subscript position + the original array length, as shown in the figure below: the dotted lines of KEY2 and KRY4 are the moved positions.

What are the properties of the 02 HashMap?

Here it is, look at the code comment, it is very clear.

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; static final int MAXIMUM_CAPACITY = 1 << 30; Static final float DEFAULT_LOAD_FACTOR = 0.75f; // 1073741824 static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY = 64;

Why is the HashMap initialization length 16?

As mentioned earlier, mapping a Key to the corresponding position in a HashMap array uses a Hash function, such as index = Hash(” dog “)

Notice that the HashMap initializes the length with 1<<4, rather than writing 16 directly. Why is that? In fact, this is for the convenience of bitwise operation, bitwise operation is much more efficient than arithmetic calculation, and the reason why we choose 16 is to serve the algorithm that maps Key to index.

So how do you implement a Hash function that is as evenly distributed as possible? To reduce HashMap collisions? That’s right, using the hashCode value of the Key to perform bitwise operations.

There is the formula (Length is the Length of the HashMap) : hashCode (Key) & (Length -1)

For example, if the decimal with the key “book” is 3029737, then the binary is 101110001110101110 1001 and the HashMap has the default length of 16, length -1. Decimal system: 15; Binary: 1111

Add the above two results: 101110001110101110 1001&1111 = 1001; 1001 decimal =9, so index=9.

In other words, the index result of the hash algorithm depends on the last few bits of the hashCode value

You can try to specify a length of 10 and some other number that is not a power of 2, and do the bits. It is found that the probability of index appearing the same is greatly increased. In the case of length 16 or other powers of 2, the value of length -1 is that all the binary bits are 1. In this case, the result of index is equal to the value of the last few bits of hashCode. As long as the hashCode itself is evenly distributed, the result of hash algorithm is uniform

So, the default length of a HashMap is 16 to reduce the chance of hash collisions.

Why is treing 8 and de-treing 6?

The average lookup length of red-black tree is log(n). When the length is 8, the lookup length is 3, while the average lookup length of linked list is n/2. That’s 8 divided by 2; Find the length of the linked list is greater than the tree, converted to a tree, more efficient.

When is 6, tree: 2.6; Linked list: 3. Linked list > tree. It should still be treeing, but treeing takes time, and sacrificing time for efficiency is not a good idea.

What is a loading factor? Why is the loading factor 0.75?

The expansion mechanism was mentioned earlier. When will it be expanded? This depends on the size of the original array and the load factor.

The load factor, also known as the expansion factor or load factor, is used to determine when to expand. If the load factor is 0.5 and the initial capacity of the HashMap is 16, then the HashMap will expand when there are 16*0.5=8 elements in the HashMap.

So why is the loading factor 0.75 instead of 0.5 or 1.0? This is due to the balance between capacity and performance:

  • As mentioned above, in order to increase the capacity of the HashMap, there is a fixed requirement that the capacity must be a power of two. So, if the load factor is 3/4, then the product of capacity can be an integer
  • When the loading factor is set to be large, the threshold of capacity expansion will be increased, the frequency of capacity expansion will be low and the space occupied will be small, but the probability of Hash conflict will be increased. Therefore, a more complex data structure is needed to store the elements, which will increase the operation time of the elements and reduce the operating efficiency.
  • When the loading factor setting is small, the expansion threshold will be lowered and more space will be occupied. At this time, the storage of elements will be sparse and the possibility of hash conflicts will be small, so the operational performance will be relatively high.

Therefore, based on the above situation, an average of 0.75 from 0.5 to 1.0 was selected as the loading factor.

Is HashMap thread-safe?

No, because neither get nor put is locked. There is no guarantee that multithreading can cause thread-safety problems if the value of put is the same at the moment and the value of get is the same moments later.

There is also a Hashtable that is thread-safe, but with too much granularity for locking. Concurrency is very low, at most one thread access at the same time, performance is not high. Usually we use currentHashMap, which we’ll talk about later, of course.

07 Why is it necessary to override hashCode when overriding equals?

In Java, all objects inherit from the Object class. The OJBect class has two methods: equals and hashCode. Both of these methods compare two objects to see if they are equal.

Let’s start with the equals method:

public boolean equals(Object obj) {
    return (this == obj);
}

Without overriding the equals method, it’s just ==. It has the following two characteristics:

  • For value objects, == compares the values of two objects
  • For reference objects, == compares the addresses of two objects

Look back at the source code for the put method: HashMap looks for the address index through the key’s hashCode. If the index is the same it will form a linked list, which means it’s possible that the dog and the dog are in the same position.

As mentioned in the previous GET method, we need to determine not only the hash value, but also whether the key value is equal in order to determine whether the element is the one we want. When we go to get, the first thing we do is we find the same hash value, so how do we know which object you want? If you override hashCode instead of writing equals, you won’t know which object to pick if a hash conflict occurs and hashCode is the same.

08 HashMap Infinite Loop Analysis

The following code is based on JDK1.7 analysis. This problem is mainly caused by the JDK1.7 tail interpolation method. Assuming the default size of the HashMap is 2, the original HashMap has no elements. Add elements key1, key2, and key3 using three threads: t1, t2, and t3. I put a breakpoint before the expansion and let all three threads stop there. The source is as follows:

void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry < K, V > e: table) { while (null ! = e) { Entry < K, V > next = e.next; If (rehash) {e.hash = null == e.key? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}

Assume that three elements hash and are placed on the same linked list. Where key1→key2→key3. Nothing’s wrong. Everything’s fine.

At this point, the breakpoint is released and the HashMap expands. It might look like this: key1→key2→key3. Unfortunately, if you expand it, key1 and key2 will still be in the same place, and you’ll form a linked list, and if you insert key2 after key1, you’ll do it by head interpolation. So this becomes key2 goes to key1

When all three threads are resized, an Infinite Loop appears as shown in the figure below: at this point, GET (key1) produces an Infinite Loop exception.

Of course, the reason for the endless loop is that the JDK 1.7 list insertion method inserts in reverse order, which changes the order between the list nodes during capacity expansion. This problem was corrected in JDK 1.8, which changed to tail-ordered inserts. When scaling, the list elements will remain in their original order, so there will be no linked list loops.

09 summary

HashMap is the focus of the Java foundation. It can be said that both in the work or interview are very common, partners must be skilled in using, just can pass. This article has basically covered all the important points of HashMap. The red-black tree is not covered, mainly because of the length, which will be discussed separately later. In addition, if you find anything wrong with this article, please kindly correct me.

So that’s the source code for HashMap. Thanks to all the leaders of the tech community, especially the geek time, it’s awesome. If I have seen further, it is because I am standing on your shoulders. Hope this article has been helpful to you and we will see you in the next article

10 The shoulders of giants

  • https://kaiwu.lagou.com/cours…

Send some interview questions & e-books

If you like this article, please help to have a look at it.

I don’t know what to send you when I first meet you. Just send hundreds of eBooks and the latest interview materials for 2021. WeChat search JavaFish reply ebook to send you 1000+ programming ebook; Send some interview questions in reply to the interview; 1024 sends you a complete set of Java video tutorials.

The interview questions are answered as follows: If you need it, come and get it. It’s absolutely free. No way to get it.