This article covers implementation principles and interviews for HashMap

preface

Hashmaps are common in daily development, and people know when to use hashMaps and access values by Key, but those operations are rarely studied when saving and retrieving. It’s also a must-ask question for interviewers. The following is based on JDK1.8

The body of the

Let’s look at the structure of the HashMap:

1. Take a look at some of the member variables defined in HashMap that you need to know

// hash16 Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // Load factor 0.75f; static finalfloatDEFAULT_LOAD_FACTOR = 0.75 f; Static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY = 64 static final int MIN_TREEIFY_CAPACITY = 64hashA Map is an internal array, while a Node is a linked list Node object. transient Node<K,V>[] table; // Array expansion threshold. int threshold;Copy the code

InitialCapacity the initialCapacity of the array is 16. This can be specified in the constructor. It has to be a power of two. (16 → 32 → 64…)

LoadFactor loadFactor 0.75f. Resize () will be attempted when the HashMap capacity reaches 0.75. (For example, if a HashMap has an initial capacity of 16, then the threshold for expansion is 0.75 * 16 = 12. In other words, HashMap does the expansion before you want to store the 13th value. The load factor can also be specified in the constructor. If the load factor is greater than 1, the array will not expand, sacrificing performance but improving memory.

TREEIFY_THRESHOLD Indicates the treification threshold. When the number of nodes in the list is greater than or equal to this value, the list is converted to a red-black tree.

UNTREEIFY_THRESHOLD Indicates the threshold for deleting the tree. When the number of nodes in the list is less than or equal to this value, the red-black tree is converted to a regular list.

MIN_TREEIFY_CAPACITY Specifies the second condition of the treification threshold. When the length of the array is smaller than this value, even if the tree threshold is reached, the linked list will not be converted to a red-black tree, but the array resize() will be preferentially expanded.

Threshold Array expansion threshold. Total capacity of the HashMap array * load factor. Capacity expansion is performed when the current capacity is greater than or equal to this value **resize()**. The capacity expanded is twice the total capacity of the current HashMap. For example, if the current total capacity of the HashMap is 16, it will be 32 after expansion.

2. Inheritance

Static class Node<K,V> implements map. Entry<K,V> {final inthash; final K key; V value; Node<K,V> next; // Next node // omit... }Copy the code

A zipper hash resolves collisions through linked lists, so the internal array of a HashMap is a node type. The hash value is the hashCode processed by the hash() method, which is the index bucket of the array, to make the hashCode distribution more random.

java.util.HashMap<K, V>.Node<K, V>
    java.util.LinkedMap<K, V>.Entry<K, V>
        java.util.HashMap<K, V>.TreeNOde<K, V>
Copy the code

TreeNode is a Node subclass, and the inheritance relationship is as follows: Node is a one-way linked list Node, Entry is a two-way linked list Node, and TreeNode is a red-black TreeNode. TreeNode has over 400 lines of red-black tree code. This is a little difficult.. You can find out for yourself.

3. A brief summary of HashMap

A HashMap is a hash table based on the zipper method, internally implemented by arrays and linked lists and red-black trees.

The initial size of the array was 16, and the capacity was raised to the power of two to improve performance by using an array that was large enough, and to enable bitwise operations instead of the modulus budget (reportedly increased by 5-8 times).
Whether the array needs to be expanded is determined by the load factor. If the current number of elements is 0.75 of the array’s size, the array is expanded. This 0.75 is the default load factor that can be passed in by the construct. We can also set a load factor greater than 1 so that the array does not scale, sacrificing performance and saving memory.
To resolve collisions, the elements in the array are of one-way linked list type. When the list length reaches a threshold (7 or 8), the list is converted into a red-black tree to improve performance. When the length of the list is reduced to another threshold (6), the red-black tree will be converted back to the single-way list to improve performance, which is a balance point.
In addition to the third point, it will check whether the current array reaches a threshold (64) before the list length is converted to a red-black tree. If not, the conversion will be abandoned and the array will be expanded first. So it also says that the list length threshold is 7 or 8, because there will be an aborted conversion.

4. Dig deep into the source code

4.1 Construction method

// The default array initial size is 16 and the load factor is 0.75f publicHashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
Copy the code

Public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR); }Copy the code

Public HashMap(int initialCapacity,float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                initialCapacity);
    if(initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // NaN: Not a Number. If you take the square root of negative one, you get NaN.if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: "+ loadFactor); this.loadFactor = loadFactor; // This method can convert any integer to the power of 2. For example, if you enter 10, 16 is returned. // Threshold is an array size * loadFactor. // Yes, this threshold will be used as the array capacity when the array is expanded by the first PUT operation, and then recalculated. this.threshold = tableSizeFor(initialCapacity); }Copy the code

When we use the initial size of the specified array, we said that the array size must be raised to the power of two. So we need an algorithm to convert our given number to the power of two.

// This method can convert any integer to the power of 2. static final int tableSizeFor(intcap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

The relevant bit operations are not covered here. Want to know you can go to the information.

4.2 Index bucket of an array

Each time the system initializes the HashMap, it creates an array named ** Capacity. The locations in the array that can store elements are called buckets. Each bucket** has its own index. You can quickly access stored elements based on this index.

Public V put(K key, V value) {// The incoming key passedhash(key) methodreturn putVal(hash(key), key, value, false.true); } // Specially processedhashCode
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

In Java, every object has a **hashCode()** method, which is a hash function that returns a 32-bit integer. The reason for using such a large value as a hash value is to avoid collisions (same), such as if two different objects have the same hashCode. But if you can’t use a number that long as an index, how big does the array need to be? So we need to shrink this hashCode down to the size of the specified array.

The above code simply xor the high and low 16 bits of hashCode. The hash() method simply obfuscates hashCode further, increasing its “randomness” in an attempt to reduce hash collisions when inserting a HashMap.

In the putVal method, there’s a line like this

if ((p = tab[i = (n - 1) & hash]) == null)
Copy the code

I = (n-1) & hash, where n is the length of the array. Hash is the hash value obtained by xor operation using the hash() method. This expression is the modulo operation of the hash value, and we’ve already said that we can use the and operation to improve performance when the divisor is 2 to the power.

4.3 a HashMap. Put (k, v)

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true); } // onlyIfAbsent: Whether to override the value of the key if it already exists when the key-value pair is saved.falseAs the cover,trueRefer to the putIfAbsent() method for not overriding. // EVICT: for subclass LinkedHashMap. final V putVal(inthash, K key, V value, boolean onlyIfAbsent, boolean evict) { HashMap.Node<K,V>[] tab; // TAB: internal array hashmap. Node<K,V> p; / / p:hashThe first node in the corresponding index bit int n, I; // n: the length of the internal array I:hashThe corresponding index bit // Expands the array because the internal array is empty when put for the first time.if((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // Calculate the array index, get the first node of the index position, if null, add a new nodeif ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else{ HashMap.Node<K,V> e; K k; // If the first key is the same as the key to be stored, then overwrite the value of value.if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // If the first node is a red-black tree, add the key-value pair to the red-black treeelse if (p instanceof HashMap.TreeNode)
            e = ((HashMap.TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // The first node is the list. If the list contains the key-value pair, the value is overwritten. // If not, insert the key-value pair at the end. Then determine if the list is greater than or equal to 7 and try to convert it to a red-black tree. // Note the use of "try" here, as in treeifyBinThe method also determines whether the current array size reaches 64, otherwise the conversion is abandoned and the array size is expanded first.else{// go to this point,hashThe collision. Check if the list contains keys, or add key-value pairs to the end of the listfor(int binCount = 0; ; ++binCount) {// p.ext == null, at the end of the list, add a new node, if the length is enough, convert to a tree structure.if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break; } // Check if the linked list already contains keysif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }} // Override the value method.if(e ! = null) { // existing mappingfor key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);returnoldValue; } } ++modCount; // Fail-fast mechanism // If the number of elements exceeds the threshold, expand the array.if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Take a close look at the comments section, which summarizes the following steps:

1. Check whether the array is empty and run resize() to expand the array. 2. Calculate the array index using the hash value and obtain the first node of the index bit. 3. If the first node is null** (no collision occurred), add the node to the bucket directly. If the first node is not null, then there are three cases: ① The key is the same as the first node key, overwriting the old value **; Otherwise, execute ② or ③ ② to add the key-value pair to the red-black tree if the first node is a TreeNode. ③ If the first node is a linked list, add key-value pairs to the list. This will determine if the list length reaches the treeify_threshold-1 threshold and “try” to convert the list to a red-black tree. 5. Check whether the current number of elements is greater than threshold and expand the array.

// Convert the list to red and black final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // If the current array size is too small (less than 64), discard the conversion and expand the array.if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize(); 
    } else if ((e = tab[index = (n - 1) & hash])! = null) {// Turn the list into a red-black tree... }}Copy the code

After JDk1.8, HashMap introduced the concept of red-black tree, which means that if the bucket contains more than 8 linked list elements, it will automatically convert to red-black tree. If the number of elements in the bucket is less than or equal to 6, the tree structure is restored to the linked list.

The average search length of a red-black tree is log(n) with a length of 8, and the search length is log(8)=3. The average search length of a linked list is N /2. When the length is 8, the average search length is 8/2=4, which is necessary to convert to a tree. If the list length is less than or equal to 6, 6/2=3, it is also very fast, but the time to convert to tree structure and spanning tree is not too short.

The reason for the balance between 6 and 8 is that there is a difference of 7 in the middle to prevent frequent conversions between lists and trees. Suppose that if the number of linked lists is more than 8, the linked list will be converted into a tree structure; if the number of linked lists is less than 8, the tree structure will be converted into a linked list. If a HashMap keeps inserting and deleting elements and the number of linked lists is around 8, tree and linked list will be transformed frequently, and the efficiency will be very low.

To sum up, linked list: if the elements are less than 8, the query cost is high, the new cost is low, red black tree: if the elements are more than 8, the query cost is low, the new cost is high.

4.4 Resize () Array Expansion

final HashMap.Node<K,V>[] resize() {
    HashMap.Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if(oldCap > 0) {// If the array is already the maximum length, do not expand.if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            returnoldTab; } // Otherwise double the array size. Two to the N.else if((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // Double threshold} // If the array has not yet been created, but threshold has been specified (in this case, the object created with a parameter construct), the value of threshold is the array length //"Constructor"That piece of content is explained.else if (oldThr > 0) // initial capacity was placed inthreshold newCap = oldThr; // In this case, objects are created by no-argument constructionelse{ // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // If newThr = oldThr << 1, the highest bit is removed.if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; // At this point, the new array length has been calculated, creating a new array. @SuppressWarnings({"rawtypes"."unchecked"}) HashMap.Node<K,V>[] newTab = (HashMap.Node<K,V>[])new HashMap.Node[newCap]; table = newTab; // The following code transfers the elements of the original array to the new array. The problem is that the array length changes. // Then passhashThe index for the % array length calculation will also be different from the original. // In JDK 1.7, each element is recalculated into a new array, calledrehashOperation. / / it ishashOne of the causes of Map disorder. Now JDK 1.8 has optimized this, very cleverly.if(oldTab ! = null) {// iterate over the arrayfor(int j = 0; j < oldCap; ++j) {// get the first Node hashmap. Node<K,V> e;if((e = oldTab[j]) ! = null) { oldTab[j] = null; // If the list has only one node, the index is recalculated into the new array.if(e.next == null) newTab[e.hash & (newCap - 1)] = e; // If the node is a red-black tree, perform split, similar to a linked list.else if(e instanceof HashMap.TreeNode) ((HashMap.TreeNode<K,V>)e).split(this, newTab, j, oldCap); // The node is a linked listelse{// preserve order // loHead, loTail is the node of the original list, index unchanged. HashMap.Node<K,V> loHead = null, loTail = null; // hiHeadm, hiTail is new list node, old index + old array length. HashMap.Node<K,V> hiHead = null, hiTail = null; HashMap.Node<K,V> next; // Iterate over the listdo{ next = e.next; // Add the node with bit 0 to the original list.if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            elseloTail.next = e; loTail = e; } // Add a node with bit 1 to the new list.else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null); // Restore the index bit of the original listif(loTail ! = null) { loTail.next = null; newTab[j] = loHead; } // Save the new list to: old index bit + old array lengthif(hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } }return newTab;
}
Copy the code

Extending an array is not just a matter of doubling the size of the array and storing the elements of the original array directly into the new array.

Since the index of the element is hash&(n-1), the length of the array changes from n to 2n, and the recalculated index may not be the same as the original one.

In jdk1.7, this is done by iterating through each element, each node, recalculating their index value into a new array, called a rehash operation.

Java1.8 has been optimized to do this without rehash. Because when the length of an array is extended by a power of two, the following rule is observed:

The position of the element is either where it was, or it’s moved two powers from where it was. Therefore, when extending a HashMap, there is no need to recalculate the hash as in JDK1.7, just to see if the new bit is 1 or 0. 0 is the same index, and 1 is “old index +oldCap”.

The length of the new array and the new threshold are calculated, and then the contents of the old array are migrated to the new array. Compared to 1.7, no rehash operation is required. This is because an array extended to the second power can be easily indexed by the added bits.

4.5 a HashMap. Get (k)

public V get(Object key) { Node<K,V> e; // will also get the node when also calledhash() methodreturn (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {// TAB: internal array first: index first Node n: array length k: index first Node key < k,V>[] TAB; Node<K,V> first, e; int n; K k; The length of the array is greater than 0. The first node of the index is not nullif((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash])! = null) {// If the index bit of the first nodehash= = the keyhashOr the key is the same as k at the beginning of the index bitif (first.hash == hash&& // always check first node ((k = first.key) == key || (key ! = null && key.equals(k)))))return first;
        if((e = first.next) ! = null) {// If it is red-black, look in the red-black treeif (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do{// Send collision key.equals(k)if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
Copy the code

1. Check whether the array is null and the first node of the index is null 2. If the hash of the index node ==key or key is the same as k of the index node, return (the first node of the bucket)** 3. If it is red-black, look up 4 in the red-black tree. If there is a conflict, look up 5 through key.equals(k). Return null if none are found

5. Summary (Interview related)

1.Have you ever used HashMap?” “What is a HashMap? Why did you use it?

Uh..

2,Can we synchronize the HashMap?

Map m = Collections.synchronizeMap(hashMap);

3,Do you know how HashMap works?

View the summary for number 3.

4,Do you know how the put() and get() methods of HashMap work?

Put () to view the summary of serial number 4.3. Get () to view the summary for serial number 4.5.

5,What happens when two objects have the same Hashcode?

Both objects have the same hashCode so they have the same bucket position and will hash into each other. A HashMap uses a linked list to store objects. The Entry is stored in the list, and the list is checked to see if it contains a key (key! = null && key.equals(k), or add key-value pairs to the end of the list. If the list length is greater than or equal to 8, the list converts the red-black tree…

6,If two keys have the same Hashcode, how do you get the value object?

Both objects have the same hashCode so they have the same bucket position. Once the bucket position is found, the keys.equals() method is called to find the correct node in the list (key! = null && key.equals(k).

7,How do you reduce collisions?

Using final or immutable objects as keys, using wrapper classes such as (Integer, String) (which are immutable and final and have overridden equals and hashCode methods) as keys is a good idea. A: Sure, as long as it complies with the equals and hashCode method definition rules and does not change when the object is inserted into the Map.)

Eight,What if the size of the HashMap exceeds the capacity defined by the Load factor?

**resize()** is called for array expansion.

9,Do you understand the problem with resizing a HashMap?

Conditional contention can occur in the case of multiple threads.

Because if both threads find that the HashMap needs to be resized, they will try to resize it simultaneously. During resizing, the order of the elements stored in the list is reversed because the HashMap does not place the elements at the end of the list when moving to the new bucket location, but at the head to avoid tail traversing. If conditional competition occurs, then the cycle is endless. At this point, you can ask the interviewer why it’s so weird to use HashMap in a multi-threaded environment. 🙂

10,HashMap is not thread-safe, so why? (Deadlock of HashMap)

Since the capacity of a HashMap is limited, if the array capacity of a HashMap is very small, let’s say only 2, then if you want to put 10 keys in, the collisions will be very frequent. In this case, an O(1) search algorithm will become a linked list traversal, and the performance will become O(n), which is the defect of Hash tables.

To solve this problem, the HashMap is designed with a threshold of 0.75 of its capacity, and when the HashMap exceeds this threshold, it automatically expands its capacity.

In the case of multiple threads, there is a conditional race when it comes to resizing the HashMap, because if both threads find that the HashMap needs to be resized, they will try to resize it at the same time. During resizing, the order of the elements stored in the list is reversed, because the HashMap does not place the elements at the end of the list when moving to the new bucket location, but at the head, to avoid tail traversal. If conditional competition occurs, then there is an endless loop. (back to the previous question 🙂

11,What factors affect the performance of HashMap?

Load factor.
The hash value; Ideally, hash evenly across buckets. The String class overrides the hashCode function.

12,What criteria does a HashMap key need to satisfy?

You must override the hashCode and equals methods

13,A HashMap allows a null key/value, but only one key/value. Why?

If the key is null, it will be placed at the first bucket (subscript 0) and at the top of the list (first position).

Thank you for reading to the end of this article

Java learning, interview; Obtain documents and videos for free

Some documents are captured as follows