Make writing a habit together! This is the third day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

preface

In the previous Java container, some students feedback too much content, see the back of the impatient. I thought about it a little bit and decided to separate hashMaps out, because hashmaps are so important. In this article, we will explain the previous HashMap related points separately.

Those who want to learn more about the Map system can go to this article: juejin.cn/post/708452…

HashMap

A HashMap is an important implementation class in a Map. It is a hash table that stores a key=>value mapping. HashMap is not thread-safe. A HashMap allows you to store null keys and values, which are unique.

Prior to JDK1.8, the underlying data structure of HashMap was a pure array + linked list structure. Because arrays are slow to read and slow to add and delete, and linked lists are slow to read and slow to add and delete, HashMap combines the two and does not use synchronization locks to modify its performance. Arrays are the body of a HashMap, while linked lists were introduced to resolve hash conflicts. The specific method to solve the hashing conflict is: zipper method. We continue our discussion below.

The default array size of a HashMap is 16, and each array stores the header of the list. After capacity expansion, the capacity is doubled each time. A HashMap always uses a power of 2 for its hash table size, which we’ll discuss later. The HashMap uses the hash function to hash the key’s hashCode, and then (n-1) &hash (key.hashcode)(n is the length of the array) to get the element’s subscript. The algorithm is equivalent to hash(key.hashcode) % n. The efficiency of bit operation is higher than that of modulus operation. The hash() perturbation function is improved after JDK1.8, but the principle is the same.

In order to understand the zipper method more intuitively, I drew the array + linked list storage structure diagram of HashMap.

Zipper method: Combine arrays and linked lists. Each array is a linked list. Array elements (element 1, element 2, etc.) store the header of the linked list. Then, according to the algorithm mentioned above, the length of the key is disturbed to determine the index of the array element and store it in the corresponding place in the array. In some cases, the perturbation function hash() may compute the same hash value, so the “zipper” method stores the hash conflict values into a linked list. This is called the “zipper method.”

But there’s a problem with just using arrays and linked lists. Because in extreme cases, the keys of different values we add to the HashMap may compute the same hash value, so the list will be kept on one linked list, causing the list to become too long and the time complexity may deteriorate from O(1) to O(n). Because linked list queries need to be traversed from the beginning, long lists can cause a significant performance penalty, so after JDK1.8, the underlying data structure of HashMap was added with red-black trees, which better resolve hash conflicts and increase the time complexity from O(n) to O(logn). The specific method is as follows: When the length of the linked list is larger than the threshold (8 by default), the linked list is converted into a red-black tree to improve the search efficiency and prevent the search efficiency from being low due to the long linked list. However, if the array length is less than the default value of 64, HashMap will preferentially expand the array rather than convert it to a red-black tree.

Let’s start with the HashMap source code analysis.

Source code analysis

The code here is JDK1.8 version

transient Node<K,V>[] table;

transient Set<Map.Entry<K,V>> entrySet;
Copy the code

It can be seen from the source code that the underlying HashMap is a composite structure composed of the array Node

table and the linked list entrySet. Before JDK1.8, the name of an element in an array was Entry. In JDK1.8, the tree structure was introduced.
,v>

final int hash;
final K key;
V value;
Node<K,V> next;

Node(int hash, K key, V value, Node<K,V> next) {
    this.hash = hash;
    this.key = key;
    this.value = value;
    this.next = next;
}
Copy the code

Entering Node, we can see that Node consists of hash, key-value pairs key, value, and the next Node, next. The array is divided into buckets (see the zipper diagram), and the hash determines how key-value pairs are addressed in the array. Key-value pairs with the same hash value are stored as linked lists.

/** * The default initial capacity - MUST be a power of two. */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
Copy the code

You can see from this that the array of the HashMap has a default initialization capacity of 1 << 4, which is 16. The displacement operation ensures that the capacity is always a power of two.

/** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */
static final int TREEIFY_THRESHOLD = 8;

/** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */
static final int UNTREEIFY_THRESHOLD = 6;
Copy the code

In JDK1.8, we can see a static constant TREEIFY_THRESHOLD with a value of 8. As described above, when the list length is greater than 8, the internal data structure of the HashMap is “treed”, that is, converted to a red-black tree structure. At the same time, when the element is deleted and the length is less than 6 again, the red-black tree will return to the structure of the linked list.

Next, let’s explain why HashMap guarantees that the capacity of a hash table is a power of two.

/** * Returns a power of two size for the given target capacity. */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

Perusing the source code, we can see the tableSizeFor() function. This function subtracts the cap (capacity) passed in by one, and then: move N right 1 bit, take “or”, move n right 2 bits, take “or”, move N right 4 bits, take “or”, and so on. Or until you move 16 to the right, and you get 32 ones. Finally, judge. If n is less than 0, set n to 1. Otherwise, check whether n is larger than the maximum capacity. If n is larger than the maximum capacity, set n to the maximum. To demonstrate more clearly, I will get the results expressed in the following table.

Two to the power binary The decimal system
2 ^ 0 1 (1-1) + 1
2 ^ 1 10 (2-1) + 1
2 ^ 2 100 (4-1) + 1
2 ^ 3 1000 (8-1) + 1
2 ^ 4 10000 (16-1) + 1
2 ^ 5 100000 (32-1) + 1

It is clear from the table that the value of n must be a power of 2. And that explains why the capacity of a hash table is a power of two.

Continue to look at the source code. HashMap provides four constructors, as follows:


/** * specifies the capacity and load factor constructor */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

/** * Specifies the capacity constructor */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/** * Default constructor */
public HashMap(a) {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/** * passes another Map constructor **@param   m the map whose mappings are to be placed in this map
 * @throws  NullPointerException if the specified map is null
 */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false); 
}
Copy the code

In the constructor, we see a loadFactor variable of type float. This variable is the load factor for the HashMap. What is a load factor? The load factor controls how dense the array is. The closer the load factor is to 1, the greater the probability of hash collisions, i.e. the more data in the array (dense), i.e. the length of the list increases. Conversely, the closer the load factor is to 0, the less data is stored in the array (thin).

How much loadFactor is appropriate?

/** * The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code

If loadFactor is too high, the search efficiency of the linked list becomes low because the list is too long. Too small will lead to low array utilization, storage data scattered. Therefore, the official test gave a reasonable default value of 0.75F, so generally we do not need to change.

Where is loadFactor?

Let me give you an example. If the default capacity is 16 and the load factor is 0.75, the bucket expansion threshold is 16 x 0.75 = 12. That is, when the number reaches 12, the bucket is full and the current 16 capacity needs to be expanded by calling resize().

Next we look at the put() method.

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

You can see that the put method calls putVal. The putVal method is only called by the PUT method and is not provided to the user. So let’s go into the putVal method.

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the table is not initialized or its length is 0, expand the table.
    // The resize() method here not only performs capacity expansion, but also performs initialization.
    // Resize () is called only when an element is inserted.
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // (n-1) &hash, as described above
    // If the bucket is empty, the new node is placed in the bucket, and the nodes in this area are placed in the array, i.e., the header.
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // The bucket is not empty
    else {
        Node<K,V> e; K k;
        // Compare the hash and key values of the first element in the bucket
        // If the value is equal, then e = p
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        // If not, check whether it is a red-black tree node
        else if (p instanceof TreeNode)
            // Yes: store nodes in a red-black tree
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // is the case of linked lists
        else {
            // Loop through to the end of the list to insert nodes
            for (int binCount = 0; ; ++binCount) {
                // We have traversed to the last element
                if ((e = p.next) == null) {
                    // Insert a new node at the end
                    p.next = newNode(hash, key, value, null);
                    // As shown above, if the number of nodes in the linked list is greater than the default tree threshold of 8, it becomes a red-black tree structure
                    // Again, as shown above, only the array length greater than the threshold 64 will be converted to a red-black tree. Visible in treeifyBin.
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    // Complete the logic and jump out of the loop
                    break;
                }
                // Check whether the node key in the list is equal to the inserted element key
                // If it is equal, it will jump out of the loop, otherwise: p = e
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // The same key exists
            // Record existing old values
            V oldValue = e.value;
            // If onlyIfAbsent is set to false or oldValue is empty, replace it with the new value.
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            // call back after the node is accessed, mainly for LinkedHashMap
            afterNodeAccess(e);
            returnoldValue; }}// Number of changes + 1
    ++modCount;
    / / expansion.
    if (++size > threshold)
        resize();
    // Callback after node insertion, mainly used for LinkedHashMap
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Read the code carefully with comments. I summarize the put logic as the following steps:

  1. ifHashMapIs not initializedresize()Initialize theHashMap.
  2. The elements of thekeyohashAnd then compute its subscript.
  3. If no hash collision occurs, the element is put into a bucket.
  4. If a hash collision occurs, it is joined as a linked list.
  5. If the list length exceeds the default threshold of 8, the list is converted to a red-black tree.
  6. If the list length falls below the threshold 6 again, the red-black tree is rolled back to the list.
  7. If the node already exists, the old value is replaced.
  8. If the bucket is full (capacity 16 x load factor 0.75), resize() is required.

Problems caused by HashMap expansion:

  • In multithreaded environment, resizing will have conditional contention, which is easy to cause deadlock.
  • The resize method involves redistributing the hash and traversing all elements of the hash table, which is time-consuming. Resize should be avoided as much as possible in real development.

We continue to analyze the source code of resize.

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            // If the value exceeds the maximum threshold, set it to the maximum value of the integer.
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Double the size
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // If the old threshold is greater than 0, the new capacity is initialized to oldThr
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        Otherwise, the new capacity is the default initial capacity of 16
        newCap = DEFAULT_INITIAL_CAPACITY;
        // The new threshold is: Load factor x Default initial capacity, that is, 0.75 x 16 = 12
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        // Move the elements in the bucket to the new bucket
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

After analyzing the PUT method and resize expansion method, we continue to analyze the GET method.

public V get(Object key) {
    // Define a Node
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // The elements in the array are equal
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        // Bucket contains more than one node
        if((e = first.next) ! =null) {
            // Check whether it is a TreeNode node
            if (first instanceof TreeNode)
                // Get nodes by tree method
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                // Get nodes by traversing the list
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}// Return null if not found
    return null;
}
Copy the code

conclusion

type The data structure Thread safety Support for NULL Thread-safe way to implement
HashMap < JDK1.8: array + list >=JDK1.8: array + list + red-black tree no Null keys and null values are allowed There is no