Talk about the mechanism and principle of HashMap

This is the 15th day of my participation in the August More text Challenge. For details, see: August More Text Challenge

preface

HashMap is a must-know for Java programmers, the basics of the basics, the best of the best, the best of the best; Great oaks grow from little acorns, so it’s important to fully understand how HashMap works

This article focuses on HashMap related knowledge

1. Introduction of HashMap

A HashMap is a hash table that stores memory maps of key-value pairs
HashMap implements the Map interface and stores data according to the HashCode value of the Key. It has fast access speed and allows one value to be null
Hashmaps are unordered and thread-safe
HashMap inherits from AbstractMap and implements Map, Cloneable, and Serializable interfaces

2. Underlying data structure

As shown below, HashMap uses an array + linked list/red-black tree structure

First, the body of a HashMap is an array composed of linked lists, which in turn consist of nodes, which store specific keys and values, and which store information about the next Node

There are several important concepts here

HashMap always uses powers of 2 as array sizes. For example, the default initialization size of 2 to the fourth is 16, and the maximum allowed size is 2 to the 30th
When the data in the array is greater thanLoad factor * Array size ，put()Is expanded
When the list size is larger than 8 and the array size is larger than 64, the list is converted to a red-black tree structure
After JDK1.8, the insertion method of linked list is changed from head insert to tail insert

These concepts are discussed in detail below

3. Why array + Linked list

The array + list data structure is designed to resolve Hash collisions

To put(key, value) into a HashMap, determine where the key is stored in the array

First use methodhash(key)To get the hash value of the key
Then through(n -1) & hash(n is the size of the array) gives us a lower index, let’s call it a lower indexindex, the key-value pair will be stored in the index

In this case, different keys may produce the same index, which is called Hash conflict. In this case, we need to solve Hash conflict by using array + linked list

3.1 How Do I Resolve Hash Conflicts

When a HashMap encounters a Hash conflict, it determines whether the actual values of the two keys are equal. If they are equal, the Node is overwritten. If they are not equal, the Node is inserted at the end of the list

The inner Node class is also called Entry because it is integrated from Map.entry

static class Node<K.V> implements Map.Entry<K.V> // Member variablesfinal int hash;
final K key;
V value;
Node<K,V> next;
Copy the code

3.2 his head? The tail plug?

Since it is inserted into the list, there is the difference between the plug plug tail

In JDK1.7, the method of head insertion is adopted, and the elements added later will be placed at the head of the list, because the author thinks that the elements inserted later are more likely to be used, but this method brings the problem of dead chain

So in JDK1.8, the head insert method changed to the tail insert method

4. Conversion from linked list to red-black tree

After the number of lists is greater than the default threshold of 8, the treeifyBin() method is called to check whether the HashMap is converted to a red-black tree. Only if the array length is greater than or equal to 64 is converted to a red-black tree. If the value is less than 64, the resize() method is called to expand the size

HashMap inner class TreeNode, which is the data structure of a red-black tree

static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> // Member variablesTreeNode<K.V> parent;  // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev;    // needed to unlink next upon deletion
boolean red;
Copy the code

Several parameters associated with the transformation of a linked list into a red-black tree

// The size of the linked list threshold that triggers the judgment
static final int TREEIFY_THRESHOLD = 8;

// The minimum array size for the red-black tree conversion
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

5. Expansion mechanism of HashMap

The capacity expansion mechanism, that is, the resize() method is invoked for capacity expansion, which is accompanied by a rehash allocation and traversal of all elements in the hash table, which is very time-consuming. In your programming, try to avoid triggering the expansion mechanism

If you have the Alibaba Development Protocol plug-in installed, you will be prompted when creating a HashMap

The goal is to let you set the appropriate size so that you don’t trigger the expansion mechanism

Parameters related to the capacity expansion mechanism:

The load factorloadFactor, whose default value is 0.75
The array sizecapacity, the default value is 16 when created
The critical valuethresholdIf the value exceeds this, the array is ready for expansioncapacity*loadFactorget
Number of stored nodessize

The following things were done during the expansion:

Determine what the current capacity is, initially null, first timeput()The default value is 16, or it can be whatever we want it to be. Note that if we give a value of 3, it will replace it with a power of 2 greater than 3, which is 4, so it is better to specify a power of 2
Determine whether the current capacity is greater than the maximum limit, if so, set itthreshold 为Integer.MAX_VALUE“, and does not trigger the expansion mechanism, you can go, too big to endure
If it doesn’t exceed the maximum, it expands to double the thresholdthresholdIt also doubled
Migrate the elements from the old HashMap to the newly created HashMap

You can combine the source code to see the expansion process

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // Determine whether the current capacity is greater than the maximum limit
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If the maximum value is not exceeded, the value is doubled. The threshold value is also doubled
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {
        // signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // Perform the migration
    if(oldTab ! =null) {
        // Move each Node into a new array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        / / the original indexes
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // The original index +oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // Put the original index into the bucket
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the old index +oldCap into the bucket
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

6. The Put mechanism of HashMap

HashMap allows the Put method to add elements to the user, but the putVal() method it calls is not open to the user

First, compute the Hash value of the key(n -1) & hashYou get the index that you’re going to insert
If the array is empty, callresize()Initialize
If there is no Hash conflict, just put in the index
If a Hash conflict occurs, determine if the Key values are really equal. If they are equal, override the Value of Value. If they are not equal, append to the list or the red-black tree
During the append process, if the size of the list exceeds 8, is calledtreeifyBin()Method to determine whether to convert to a red-black tree

conclusion

The principle of HashMap is not that difficult, but it is difficult to understand every detail. I hope this article will help you