Note: Thanks to meituan Dianping technical team for sharing ~ some of the blog excerpted from it. Assault delete!

Today we’ll explore the internal implementation mechanism of HashMap.

JDK 1.8 uses an array + a linked list + a red-black tree to implement HashMap.

The underlying idea of HashMap is mainly hash tables. Let’s take a look at how Java designers designed HashMap using arrays + linked lists + red-black trees.

1 Basic attributes of a HashMap

Since the implementation of hash table, then the basic data structure is an array, HashMap part of the source code is as follows:

public class HashMap<K.V> extends AbstractMap<K.V> implements Map<K.V>, Cloneable.Serializable {
    transient Node<K,V>[] table;    // The underlying data structure of HashMap (Node array)
    transient int size;             // The number of key-value pairs that actually exist in the HashMap
    transient int modCount;         // Records the number of times the internal structure of the HashMap has changed for the quick failure mechanism
    int threshold;                  // The limit of key-value pairs that can be accommodated (I call this "load")
    final float loadFactor;         // Load factor: 0.75 by default
}
Copy the code

In addition to the table array, I have also posted common fields from the source code. For the above code, we need to note the following:

  1. AbstractMap

    abstract class, Map

    Cloneable, Serializable interface
    ,v>
    ,v>
  2. Transient keyword: prevent this field serialization (specific use please baidu)
  3. Threshold = length (hash table length) * loadFactor
  4. ModCount counts the number of times the internal structure of a HashMap has changed. Internal structure changes refer to structural changes, such as putting a new key-value pair, but overwriting the value of a key is not a structural change.

With a table array in mind, let’s use a diagram to illustrate the structure of a hash table in a HashMap:

Now that we know about member variables in HashMap, let’s look at constants defined in HashMap:

// The default initial capacity must be a power of 2.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
  
// Maximum capacity (must be a power of 2 and less than 2 to the 30th, incoming capacity too large will be replaced by this value)
static final int MAXIMUM_CAPACITY = 1 << 30;  
  
// Load factor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;  
  
/ / JDK1.8 characteristic
// When records with the same hash value exceed TREEIFY_THRESHOLD, a special red-black tree implementation is dynamically used to replace the linked list structure, which changes the search time complexity from O(n) to O(logn).
static final int TREEIFY_THRESHOLD = 8;  
  
/ / JDK1.8 characteristic
If the number of linked lists on buckets is less than UNTREEIFY_THRESHOLD, the number of linked lists on buckets is less than UNTREEIFY_THRESHOLD
static final int UNTREEIFY_THRESHOLD = 6;  

/ / JDK1.8 characteristic
// Minimum tree capacity, at least 4 x TREEIFY_THRESHOLD = 32 then set to 64 to avoid (resizing and Treeification Thresholds)
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

2 Node elements in the HashMap

Now, we are concerned about the implementation of the Node element in the table array, source code is as follows:

// Static inner class that manipulates the Entry
      
        interface in the Map interface
      ,v>
static class Node<K.V> implements Map.Entry<K.V> {
    // The hash value generated by the key (unchanged)
    final int hash;
    // key (unchanged)
    final K key;
    // value
    V value;
    // Point to the next Node.
    Node<K,V> next;

    // constructor
    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    // None of these methods can be overridden
    public final K getKey(a)        { return key; }
    public final V getValue(a)      { return value; }
    public final String toString(a) { return key + "=" + value; }

    // Computes the hash code generated by the key
    public final int hashCode(a) {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    // Set the new value and return the old value
    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    // Compare whether two objects are equal
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        // Whether to both operate the Map.Entry interface
        if (o instanceof Map.Entry) {
            // Compare attributes of objects that belong to the same classMap.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false; }}Copy the code

Here we need to note:

  1. The implementation of Node is a static inner class. Why are Java inner classes designed to be static and non-static?
  2. Immutability of hash values and keys: Even with the final keyword constraint on keys and hashes in the HashMap, it is important to note that it is best to use immutable objects as keys.

First, let’s take a look at how the final keyword uses base types versus reference types.

  1. When final modifies a primitive variable type, the primitive variable cannot be reassigned, and therefore cannot be changed.
  2. When final modifies a reference type variable, final only guarantees that the address referenced by the reference type variable will not change, that is, the same object will always be referred to, but the object (whose non-final member variable values can change) can change.

Let’s talk more about why immutable objects are best used as keys when using a HashMap.

What are the possible effects of choosing a mutable object as a HashMap key?

import java.util.HashMap;
import java.util.Map;
 
public class MutableDemo1 {
    public static void main(String[] args) {
        Map<MutableKey, String> map = new HashMap<>();
        
        MutableKey key = new MutableKey(10.20);
 
        map.put(key, "Robin");
        
        System.out.println(map.get(key));

        key.setI(30); System.out.println(map.get(key)); }}Copy the code

Output:

Robin
null
Copy the code

Why it’s best not to use mutable objects as HashMap keys, conclusion:

If the key object is mutable, then the hash value of the key may change. Using mutable objects as keys in a HashMap causes data loss.

How to solve it?

  1. In a HashMap, try to use immutable types such as String and Integer as keys.
  2. Override the hashCode method of your custom class to ensure that the hash value of the object stays the same as the member variable changes. (See: Can a HashMap key be a mutable object?)

3 Put method in HashMap

3.1 Calculation of Hash Value

Now that we have a complete look at the basic structure of a HashMap, let’s examine one of the most commonly used methods in a HashMap: put().

Direct source code:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

Before analyzing putVal’s source code, let’s look at hash(key) :

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

Key.hashcode () is a native method. The implementation is not given in the source code, but that’s not the point. (The hash value itself is 32 bits)

Why do you do that? What are the benefits of this?

The main consideration is speed, efficiency and quality. When the length of array table is relatively small, it can also ensure that both high and low bits are involved in the Hash calculation, and at the same time, it will not have too much overhead. After mixing the high and low values of the original hashCode value, the randomness of the low value is increased, and the mixed low value is mixed with some features of the high value, so that the information of the high value is also preserved in disguise, which makes the value returned by the hash method have higher randomness and reduce conflicts.

For example, n is the length of the table (assume 16).

3.2 Parsing the PUT method

Before we look at the source code for the PUT method, let’s look at a diagram of how the PUT method is executed:

Put (); put ();

  1. Check whether the key-value pair array table is empty or null; otherwise, perform resize() for expansion.
  2. Table [I] == null; if table[I] is not null, go to 6; if table[I] is not null, go to 3;
  3. Check whether the first element of table[I] is the same as key. Overwrite value if it is the same, otherwise change to 4.
  4. Check whether table[I] is a treeNode, that is, whether table[I] is a red-black tree. If table[I] is a red-black tree, insert a key-value pair directly into the tree; otherwise, change to 5.
  5. Table [I] is traversed to determine whether the length of the linked list is greater than 8. If the length is greater than 8, the linked list is converted into a red-black tree and the insertion operation is performed in the red-black tree; otherwise, the insertion operation of the linked list is carried out. If the key already exists in the traversal process, the value can be overwritten directly.
  6. After the insert is successful, check whether the actual number of key/value pairs size exceeds the load threshold. If yes, expand the capacity.

PutVal putVal putVal

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    
    / / a hash table is null | | the length of the hash table is 0 (the resize is a classic method)
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
        
    // (n-1) &hash where the key should be stored in the hash table (except for remainder, & is faster than %)
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        
        // The inserted key already exists in the HashMap (value overwrites directly later)
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        
        // The current node is a red-black tree node
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        
        // Common node, use chain address method for processing
        else {
            // Walk through the list (after inserting a new node, determine the length of the list)
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                
                    // Convert the red-black tree when the number of conflicting chain nodes is greater than or equal to 8
                    if (binCount >= TREEIFY_THRESHOLD - 1) 
                        treeifyBin(tab, hash);
                    break;
                }
                
                // The inserted key already exists in the HashMap
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}// The key already exists, overwriting the old value
        if(e ! =null) {
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    
    ++modCount;                 // Records the number of times the internal structure of the HashMap has changed for the quick failure mechanism
    if (++size > threshold)
        resize();               / / capacity
        
    afterNodeInsertion(evict);  // The function is not clear
    
    return null;
}
Copy the code

That’s pretty much the end of the PUT method analysis, but there are also two things worth thinking about:

  1. Hash table index location:(n - 1) & hash;
  2. Capacity expansion mechanism:resize().

Red black trees and fast failure mechanics are not covered in this blog post.

3.3 Index Positioning

Wouldn’t it be fun to have an (n-1) &hash to locate elements in a hash table?

In essence, it is still division and remainder, but much more efficient than modulo operations because of bit operations.

However, there is a precondition for using this method, that is, the length of the hash table n must meet the power of 2, because the binary corresponding to n-1 is preceded by all 0s and followed by all 1s, leaving only the last bits of the hash, which are within the index range of the array of length N.

For example, assuming the hash value is 3 and the array length n is 16, we use modulo: 3% 16 = 3 and ampers& : 0011&(16-1) 0011&1111 = 0011 to get 3 again.

In a HashMap, the default initial value of the hash table is 16:

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
Copy the code

3.4 Capacity Expansion Mechanism

We are not talking about red-black trees, but we must explore the resize mechanism involved in the PUT method. After learning about the resize method, you will marvel at its clever design!

First of all, make a brief introduction to the capacity expansion mechanism:

Resize is the process of adding more and more elements to a HashMap. When the array inside a HashMap cannot hold any more elements, the object needs to increase the array size to accommodate more elements. If the actual size of the HashMap > load, then the size of the table in the HashMap is doubled. After doubling the capacity, the index of each Node is recalculated to map the limited elements into a larger array to reduce the probability of hash collisions.

I split the capacity expansion mechanism into two parts: 1. Create a new table array; 2. Rehash elements.

Creating a new table array is relatively simple:

  1. If the size of the original table array has reached the maximum and cannot be expanded, change the size of threshold to integer.max_value. The effect is to collide with you without expanding;
  2. Update newCap (size of new array) and newThr (load of new array).
  3. The original table array is null | | length is 0, the expansion using default values;
  4. After the size of the original table array is expanded, change the size of threshold to integer.max_value.

Let’s start with the source code for the first part (creating a new array) :

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If the threshold is set to integer.max_value, no expansion will occur.
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        
        // Update the size of newCap and newThr.
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; 
    }
    
    / /!!!!!! Which case does that correspond to?
    else if (oldThr > 0)
        newCap = oldThr;
    
    // oldCap == 0 || oldTab == null
    else {               
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    
    // Capacity expansion failed (newCap >= MAXIMUM_CAPACITY after capacity expansion)
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
    }
    
    // Update the load value
    threshold = newThr;
    
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    
    // The rehash process. .return newTab;
}
Copy the code

3.4.1 Rehash in JDK 1.7

Reading directly about the rehash process in JDK 1.8 is a bit confusing, so let’s take a look at rehash in JDK 1.7. Overall, the two versions are not very different:

void transfer(Entry[] newTable) {
    Entry[] src = table;                   // SRC references the old Entry array
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) { // Iterate over the old Entry array
        Entry<K,V> e = src[j];             // Get each element of the old Entry array
        if(e ! =null) {
            src[j] = null;                 // Release the object reference of the old Entry array (after the for loop, the old Entry array no longer references any objects)
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity);  // Recalculates the position of each element in the array
                e.next = newTable[i];                   / / head interpolation
                newTable[i] = e;                        // Put the element on the array
                e = next;                               // Access the element in the next Entry chain
            } while(e ! =null); }}}Copy the code

For ease of understanding, here is an example to illustrate the capacity expansion process:

Note: The PUT method in JDK 1.7 used header insertions to insert new nodes. In JDK 1.8, tail insertions are used (see source code above). For those interested in the JDK 1.7 PUT method, check it out.

Suppose our hash algorithm simply mod the table size with a key. Size = 2, key = 3, key = 7, put = 5 (JDK 1.7 header) All conflicts in table[1] after mod 2. It is assumed that the loadFactor is 1, that is, when the actual size of the key-value pair is greater than the load (threshold) of the table, the capacity is expanded. The next step is the hash bucket array resize to 4, and then all nodes rehash.

3.4.2 Rehash in JDK 1.8

The rehash process in JDK 1.8 is similar to JDK 1.7 in that it optimises the repositioning of elements in the hash table:

It is observed that we are using an extension of two powers, so that the element is either in its original position or moved by two powers from its original position. N is the length of the table. Figure (a) shows an example of determining the index position with key1 and KEY2 keys before capacity expansion. Figure (b) shows an example of determining the index position with key1 and KEY2 keys after capacity expansion. Where hash1 is the result of hash and high order operation corresponding to key1.

After table is expanded, because n is doubled, the mask range of N-1 will be 1bit more (red) at the high level, so the new index will change like this:

Therefore, when we extend the HashMap, we don’t need to recalculate the hash as we did in JDK1.7. We just need to see if the new bit is 1 or 0, and if it is 0, the index remains the same. If it is 1, the index becomes oldCap.

Now that we’ve seen the optimizations in JDK 1.8 over JDK 1.7, let’s take a look at the rehash process in JDK 1.8:

finalNode<K,V>[] resize() { ... .// The rehash process
    if(oldTab ! =null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                // Release the object reference of the old Node array (after the for loop, the old Node array no longer references any objects)
                oldTab[j] = null;
                
                // oldTab[j] has only one element, rehash directly
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
                    // Original index (head pointer and tail pointer)
                    Node<K,V> loHead = null, loTail = null;
                    // old index + oldCap (head and tail pointer)
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    
                    // The process of rehashing elements
                    do {
                        next = e.next;
                        
                        // Index ()
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        
                        // old index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    
                    // Place the list in the appropriate position in the new table array
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

3.5 Thread safety of HashMap

In multithreaded usage scenarios, try to avoid using a thread-safe HashMap and use a thread-safe ConcurrentHashMap instead. So why is HashMap thread unsafe? Here’s an example of how using HashMap in concurrent multithreaded usage scenarios can cause an infinite loop. Here is an example of the code (easy to understand, still using JDK1.7) :

public class HashMapInfiniteLoop {  
    private static HashMap<Integer,String> map = new HashMap<Integer,String>(2.0.75 f);  
    public static void main(String[] args) {  
        map.put(5."C");  

        new Thread("Thread1") {  
            public void run(a) {  
                map.put(7."B");  
                System.out.println(map);  
            };  
        }.start();  
        new Thread("Thread2") {  
            public void run(a) {  
                map.put(3."A); System.out.println(map); }; }.start(); }}Copy the code

LoadFactor = 0.75, threshold = 2 * 0.75 = 1, that is, when the second key is put, the map needs to be resized.

Debug thread 1 and thread 2 simultaneously to the first line of the Transfer method by setting breakpoints. Notice that at this point both threads have successfully added data. Release thread1 breakpoints to the transfer method Entry Next = E.next line; Then release the breakpoint for thread 2 and let thread 2 resize. The result is shown below.

NewTable is a local variable, so both threads have their own expanded table array. (Corresponding to the orange and purple squares in the picture)

Note that since Thread1 executes Entry Next = e.next, e points to key(3) and next points to key(7), which, after thread 2 rehash, points to thread 2’s rehash linked list. Newtable is assigned to the member variable table of the HashMap.

Then the next part:

NewTalbe [I] = e (key = 3 on index 3 of thread1, table = 3 on index 3 of thread2) So shared)), then e = next, which causes e to point to key(7), and next = e.next which causes next to point to key(3).

When next points to key(3), e is key(7). After another loop, the result is as follows:

The dotted line also indicates that there is a reference to key(7), just to distinguish the table owned by Thread1 from the member variable table.

At this point, the values of e and next are updated; e is key(3) and next is null, so the next loop is the last. After the next loop, e.ext = newTable[I] causes key(3).next to point to key(7), while key(7).next already points to key(3), and the circular list is formed. The results are shown below:

Thus, when we call map.get(11) with the thread, the tragedy occurs — an infinite loop.

If you have any questions about this part, please feel free to ask in the comments section

4 summarizes

  1. Understand that the underlying implementation of a HashMap is an array of tables (hash tables) for nodes.
  2. Note that it is best to use immutable objects as keys when using a HashMap.
  3. Note that the HashMap computes the hash value of the key using xor of the low and high values and returns the final Hashcode.
  4. Learn about positioning in HashMap:(n - 1) & hash.
  5. Collisions are resolved using the chained address method in a HashMap, and the list is converted to a red-black tree when the number of nodes is greater than 8. (New in JDK 1.8)
  6. JDK 1.7 uses header insertions for put and resize. JDK 1.8 uses header insertions for put and resize.
  7. The rehash process in JDK 1.8 does not recalculate the hash value of an element because there are only two possible positions for an element: the original position and the original position plus the length of the original hash table.
  8. Be aware of one of the pitfalls of using HashMap in a multi-threaded environment – creating a circular linked list.

5 Reference Reading

Java HashMap Working Principle and Implementation (ii)