preface

It is well known that HashMap is one of the most frequently asked questions in job interviews and is one of the criteria to determine whether a candidate has a solid foundation, because it can lead to a lot of knowledge, such as data structures (arrays, linked lists, red-black trees), equals and HashCode methods.

HashMap is the most ingenious collection of design that I learned in my early years. There are many details and optimization techniques that deserve further study. Let’s take a look at the related interview questions first.

• What is the default size, load factor, and expansion factor

• Underlying data structures

• How are hash conflicts handled

• How to calculate the hash value of a key

• Why is the array length a power of 2

• Expansion and search process

If you can answer all of the above, you don’t need to read the article.

The data structure

• in JDK1.8, HashMap is made up of an array + a linked list + a red-black tree.

• When a value is stored in a HashMap, the hash value is calculated based on the Key value, and the location in the array is determined by the hash value. If a hash conflict occurs, the value is stored as a linked list. If the list is too long, HashMap converts this list into a red-black tree for storage.

Before looking at the source code we need to look at some basic properties

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // Default load factor is 0.75 static final float DEFAULT_LOAD_FACTOR = 0.75f; //Hash array (initialized in resize()) TRANSIENT Node<K,V>[] table; // Number of elements transient int size; // Capacity threshold (if the number of elements exceeds this threshold, capacity will be automatically expanded) int threshold;Copy the code

The table array contains the Node object, which is an inner class of HashMap and represents a key-value.

static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); // objects.hashcode (o)————>return o! = null ? o.hashCode() : 0; } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o; / / Objects. Equals (1 b) -- - > return (a = = b) | | (a! = null && a.equals(b)); if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; }}Copy the code

Conclusion:

• The default initial capacity is 16 and the default load factor is 0.75

• threshold = Array length * loadFactor. When the number of elements exceeds threshold(capacity threshold), the HashMap will expand its capacity

• The table array holds references to the linked list

One thing to note here is that the table array is not initialized in the constructor, it is initialized in the resize method.

The table array is always a power of two

The HashMap array is always a power of two, but have you ever wondered why?

TableSizeFor (); tableSizeFor (); tableSizeFor (); tableSizeFor ();

*/ static final int tableSizeFor(int cap) {int n = cap-1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }Copy the code

TableSizeFor returns a number greater than or equal to the nearest integer power of 2 for an input parameter. For example, 10 returns 16.

This algorithm makes all bits after the highest 1 become 1. And then n plus 1, you get 2 to the whole power.

The purpose of reassigning cap-1 to n is to find a target value greater than or equal to the original value. For example, the value is 1000 in binary and 8 in decimal. If you don’t subtract 1 from it, you get the answer 10000, which is 16. Obviously not the result. If you subtract 1 from the binary to 111, you get the original value of 1000, or 8. Efficiency is greatly improved through a series of bit operations.

Now, where would you use tableSizeFor?

The answer is to call this method inside the constructor to set threshold, the capacity threshold.

Here you may have another question: why set threshold?

Because the length of the array threshold will be set when the table array is initialized for the first time in the capacity expansion method, which will be introduced later in the capacity expansion method.

Public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " +initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " +loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }Copy the code

So why make the array length a power of two?

Personally, I think this design has the following advantages:

1. When an array is a power of 2, you can use bitwise operations to calculate the subscripts of elements in the array

Index =hash&(table.length-1) hash = (table.length-1) hash = (table.length-1) hash = (table. This formula is equivalent to hash%length, that is, modulo mod the length of the array, except that hash&(length-1) is equivalent to hash%length only if the array length is a power of 2.

2. Increase the randomness of hash values to reduce hash conflicts

If length is a power of 2, then leng-1 into binary must be 11111… If length is not a power of 2, for example, length is 15, then length-1 is 14. The corresponding binary is 1110. The last digit is always zero, which is a waste of space.

capacity

Each expansion of a HashMap creates a new table array with twice the size and capacity threshold, and then remaps the original array elements to the new array as follows:

  1. If the table array length is greater than 0, it indicates that the table array has been initialized. Then, expand the table array by 2 times of the current length, and the threshold becomes 2 times of the original

  2. If the table array has not been initialized and threshold is greater than 0, the HashMap(initialCapacity, loadFactor) constructor was called, then set the array size to threshold

  3. If the table array is not initialized and threshold is 0, the HashMap() constructor is called, so set the array size to 16 and threshold to 16*0.75

  4. Then you need to determine if it is not the first initialization, then after the expansion, you need to recalculate the key-value pairs and move them to the appropriate positions, or split the red-black tree if the node is a red-black tree type.

It is important to note that when remapping elements in JDK1.8 HashMap expansion phase, it is not necessary to re-calculate the hash value of each element as in version 1.7. Instead, it is determined by the value of hash & oldCap. If 0, the index position is unchanged. If it is not 0, the new index = the old index + the old array length. Why? Specific reasons are as follows:

Since we are using a 2-power extension, the element is either in its original position or moved to a 2-power position from its original position. Therefore, when we extend the HashMap, we don’t need to recalculate the hash as we did in JDK1.7. We just need to see if the new bit is 1 or 0. 0 is the same index, and 1 is old index +oldCapThis can also be seen as a benefit of a power of two, and a difference between HashMap 1.7 and 1.8.

/ / Final Node<K,V>[] resize() {Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If oldCap>0, hash array table has been initialized. If (oldCap >0) {if (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; }// Double the size of the current table array. Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; }// if the array is not initialized, Else if (oldThr >0) newCap = oldThr; if (oldThr >0) newCap = oldThr; Else {//3. If the table array is not initialized and threshold is 0, call the HashMap() constructor newCap = DEFAULT_INITIAL_CAPACITY; // Default is 16 newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); If (newThr == 0) {float ft = (float)newCap * loadFactor; float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // Create a new hash array. table = newTab; // If the old hash array is not empty, the old hash array is iterated over and mapped to the new hash array if (oldTab! = null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; NewTab [e.hash & (newcap-1)] = e; newTab[e.hash & (newcap-1)] = e; newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {//rehash————> Remap to new array Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; /* if ((e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

In the capacity expansion method, there are also several knowledge points about red-black trees:

List the tree

To convert a linked list into a red-black tree, the following two conditions must be met:

The length of the list is greater than or equal to 8

The length of the table array is greater than or equal to 64

Why is the table array size greater than or equal to 64 to tree?

Because when the size of the table array is small, the collision rate of key-value on node hash may be high, resulting in long linked list length. This is a time to prioritize scaling, not immediately tree.

Red black tree split

Splitting means that a red-black tree may be split into two linked lists when elements are remapped after expansion.

I won’t go into red-black trees here because I don’t have space.

To find the

To find an element, you need to know the hash value of the key. In a HashMap, the key’s hashcode method is not directly used to obtain the hash value. Instead, the internal hash method is used to calculate the hash value.

static final int hash(Object key) {  
    int h;  
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);  
}
Copy the code

(h = key.hashcode ()) ^ (h >>> 16) ^ (h >>> 16) Also to increase the randomness of the hash value.

Now that we know how to compute the hash value let’s look at the get method, okay

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; // Hash (key) does not equal key.hashCode} Final Node<K,V> getNode(int hash, Object key) {Node<K,V>[] TAB; // Pointer to hash array Node<K,V> first, e; //first points to the first node of the hash array link, e points to the next node int n; // Hash array length K K; /*(n-1) &hash ————> Hash index (index) */ if ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {/ / basic types using the = = comparison, other comparing euqals the if (first. The hash = = hash && ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; if ((e = first.next) ! = null) {// If first is TreeNode, If (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreenode (hash, key); Do {/ / backward traversing the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; } `Copy the code

The key index of the HashMap is (n-1) &hash (modulo array length). A: What are you going to do

insert

Let’s first look at the steps for inserting elements:

  1. When the table array is empty, initialize the table by expanding it

  2. After calculating the hash value of the key to calculate the subscript, if there is no element in the position (no hash conflict), a new Node Node is inserted

  3. If a hash conflict occurs, the linked list is traversed to find whether the key to be inserted already exists, and if so, whether to replace the old value with the new value according to the condition

  4. If not, the element is inserted into the end of the list and the list length is used to decide whether to turn the list into a red-black tree

  5. Check whether the number of key-value pairs exceeds the threshold. If yes, expand the capacity

First look at the above process, and then look at the source code will be much simpler, the source code is as follows:

public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) { Node<K,V>[] tab; // Pointer to hash array Node<K,V> p; Int n, I; int n, I; / / n as the array length, I for the index / / TAB is delayed to insert new data to initialize the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); //new Node<>(hash, key, value, next) else { Node<K,V> e; // If the key-value to be inserted already exists, use e to point to the node K K; / / if the first node is to insert the key - value, let e point to the first node (p points to the first node here) if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If p is of type TreeNode, the red-black tree insertion operation is called (note: Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; ; ++binCount) {if (e = p.ext) == null) {p.ext = newNode(hash, key, value, null); If (binCount >= treeIFy_threshthreshold - 1) treeifyBin(TAB, hash); if (binCount >= treeify_threshthreshold - 1) treeifyBin(TAB, hash); break; } / / if you want to insert the key - value exists is terminated traverse, or backward traversing the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; }} // If e is not null, the key-value to be inserted already exists. = null) { V oldValue = e.value; // Determine whether to update the old value if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); // If (++size > threshold) resize(); afterNodeInsertion(evict); // Also empty function? The callback? Return null; }Copy the code

It can also be seen from the source that the table array is initialized after the first call to put.

delete

Deleting a HashMap is not complicated and requires only three steps.

  1. Locating the bucket

  2. Walk through the list to find equal nodes

  3. Step 3 Delete the node

public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; } final Node<K,V> removeNode(int hash, Object key, Object value,boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; If ((TAB = table)! = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) ! = null) { Node<K,V> node = null, e; K k; V v; / / if the key value and list the first node are equal, then the node to the node if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) node = p; else if ((e = p.next) ! = null) {// If TreeNode is used, If (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreenode (hash, key); Else {/ / 2, traverse the list and find to do {if delete node (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) ! = null); } // 3, delete the node, and repair the linked list or red black tree if (node! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) { if (node instanceof TreeNode) ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); else if (node == p) tab[index] = node.next; else p.next = node.next; ++modCount; --size; afterNodeRemoval(node); return node; } } return null; }Copy the code

Note: Deletion of nodes may destroy the balanced nature of red-black trees. RemoveTreeNode will perform operations such as discoloration and rotation on red-black trees to maintain the balanced structure of red-black trees, which is quite complicated.

traverse

HashMap traversal operation is also very common in work. Many people like to use for-each traversal operation, but do you know what pits there are?

Look at the following example, when we are in the traversal HashMap, if use the remove method will throw ConcurrentModificationException remove elements

Map<String, Integer> map = new HashMap<>();  
map.put("1", 1);  
map.put("2", 2);  
map.put("3", 3);  
for (String s : map.keySet()) {  
    if (s.equals("2"))  
        map.remove("2");  
}
Copy the code

This is often called fail-fast, and it starts with a variable

transient int modCount;
Copy the code

In the HashMap, there is a variable called modCount, which is used to indicate the number of times the collection has been modified, by inserting or deleting elements. If you look at the source code for inserting and deleting elements, modCount will be incremented at the end.

When iterating over a HashMap, we check modCount every time before iterating over the next element. If it is not the same as the original, the set result has been modified, and then we throw an exception.

public Set<K> keySet() { Set<K> ks = keySet; if (ks == null) { ks = new KeySet(); keySet = ks; } return ks; } final class KeySet extends AbstractSet<K> { public final Iterator<K> iterator() { return new KeyIterator(); } final class implements HashIterator implements Iterator<K> {public final K next() {return nextNode().key; / / class HashIterator {Node<K,V> next; // next Node Node<K,V> current; // The current node is expectedModCount; Int index; ExpectedModCount = expectedModCount; expectedModCount = expectedModCount; Node<K,V>[] t = table; current = next = null; index = 0; // find the index of the first non-empty bucket if (t! = null && size > 0) { do {} while (index < t.length && (next = t[index++]) == null); Public final Boolean hasNext() {return next! = null; } final Node<K,V> nextNode() {Node<K,V>[] t; Node<K,V> e = next; if (modCount ! = expectedModCount) throw new ConcurrentModificationException(); //fail-fast if (e == null) throw new NoSuchElementException(); If ((next = (current = e).next) == null && (t = table)! = null) { do {} while (index < t.length && (next = t[index++]) == null); } return e; } public final void remove() {Node<K,V> p = current; if (p == null) throw new IllegalStateException(); if (modCount ! = expectedModCount) throw new ConcurrentModificationException(); current = null; K key = p.key; removeNode(hash(key), key, null, false, false); // Call external removeNode expectedModCount = modCount; }}Copy the code

The relevant code is as follows, if you can see modCount changed will throw ConcurrentModificationException.

if (modCount ! = expectedModCount) throw new ConcurrentModificationException();Copy the code

So how do you delete elements in traversal?

We can look at the iterator’s built-in remove method, which has the last two lines as follows:

`removeNode(hash(key), key, null, false, false); // Call external removeNode expectedModCount = modCount; `Copy the code

The external remove method is called to remove the element, and the modCount is assigned to the expectedModCount so that the exception is not thrown if the two are the same, so we should write:

Map<String, Integer> map = new HashMap<>();  
map.put("1", 1);  
map.put("2", 2);  
map.put("3", 3);  
Iterator<String> iterator = map.keySet().iterator();  
while (iterator.hasNext()){  
    if (iterator.next().equals("2"))  
        iterator.remove();  
}
Copy the code

Another point is that when we iterate over a HashMap, we find that the order in which we iterate is different from the order in which we insert. Why?

As you can see from the HashIterator source, it starts by finding buckets in the bucket array that contain references to the linked list node. It then iterates through the list to which the bucket points. Once traversal is complete, continue looking for the next bucket that contains the list node reference and continue traversal. If not found, end traversal. This explains why traversal and insertion are not in the same order. For those who don’t understand, please see the following picture:

Equasl and hashcode

Why do objects added to a HashMap need to override the equals() and hashCode () methods?

Take a quick example, using Person as an example:

public class Person { Integer id; String name; public Person(Integer id, String name) { this.id = id; this.name = name; } @Override public boolean equals(Object obj) { if (obj == null) return false; if (obj == this) return true; if (obj instanceof Person) { Person person = (Person) obj; if (this.id == person.id) return true; } return false; } public static void main(String[] args) { Person p1 = new Person(1, "aaa"); Person p2 = new Person(1, "bbb"); HashMap<Person, String> map = new HashMap<>(); Map. Put (p1, "this is p1"); System.out.println(map.get(p2)); }}Copy the code

• The native equals method uses == to compare objects

• The native hashCode value is a converted value based on the memory address

The Person class overrides equals to determine equality based on id. When hashCode is not overridden, p2 cannot fetch elements after p1 is inserted because p1 and P2 hashes are not equal.

The HashMap inserts elements based on their hash values to determine where in the array, so the HashMap key overrides the equals and hashCode methods.

conclusion

This article describes the implementation principle of HashMap, and combined with the source code to do further analysis, subsequent free talk about HashMap thread safety issues, I hope this article can help you, but also welcome to discuss correction, thank you for your support! I have arranged a: Java systematic information, including Java core knowledge points, interview topics and 20 years of the latest Internet real questions, e-books, etc., friends can pay attention to the public number [procedures yuan xiao Wan] can obtain.