First, write first

I believe readers have also seen a lot of HashMap source code articles, I believe that everything from the source code to talk about the principle is generalized. Some of the so-called principles are mostly read source code after the personal summary, the summary is uneven, coupled with no reading source code, readers are difficult to experience.

While reading the HashMap source code, I briefly commented on every internal attribute, every internal method, and method call logic, but I had a little trouble putting it into writing. For the explanation of some internal attributes, it needs to be combined with its role in the use of some methods to comprehensively explain. The author intends to take readers familiar with the macro design idea of HashMap in order from shallow to deep, then read the source code thoroughly, and then explain the design details of the source code. To let readers read the text side of the source code to learn, to form their own understanding and experience, to avoid the author of a generalized talk.

No matter how deep or shallow one’s own understanding is, it is one’s own. No matter how profound another’s understanding is, it is another’s too. Hope every reader can have own harvest and experience.

Java version

$Java -version Java version “1.8.0_211” Java(TM) SE Runtime Environment (build 1.8.0_211-B12) Java HotSpot(TM) 64-bit Server VM (Build 25.211-B12, Mixed mode)

HashMap official description

The following content mainly makes a simple description of some features and precautions of HashMap, and the author has made some important knowledge points in bold processing. Implementation of Map interface based on hash table. This implementation provides all optional mapping operations and allows null values and NULL keys. (The HashMap class is roughly the same as Hashtable, except that it is not synchronized and null is allowed.) This class does not guarantee the order of the mapping, and in particular it does not guarantee that the order is constant.

This implementation assumes that the hash function distributes elements correctly between buckets, providing consistent performance for basic operations (GET and PUT). The time required to iterate over the collection view is proportional to the sum of the “capacity” (number of buckets) of the HashMap instance and its size (number of key-value mappings). So, if iteration performance is important, don’t set the initial capacity too high (or the load factor too low).

An instance of a HashMap has two parameters that affect its performance: the initial capacity and the load factor. Capacity is the number of buckets in the hash table. The initial capacity is just the capacity of the hash table when it was created. The load factor is a measure of how full a hash table can be before its capacity automatically increases. When the number of entries in the hash table exceeds the product of the loading factor and the current capacity, the capacity is doubled by calling the rehash method.

In general, the default load factor (.75) seeks a compromise between time and space costs. A high load factor reduces the space overhead, but it also increases the query cost (as reflected in most HashMap class operations, including GET and PUT). The number of entries needed in the map and its load factor should be taken into account when setting the initial capacity to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operation will occur.

If many mappings are to be stored in a HashMap instance, creating it with a large enough initial capacity allows the mappings to be stored more efficiently than performing automatic rehash operations on demand to increase the capacity of the table.

Note that this implementation is not synchronous. If multiple threads access the map at the same time, and at least one of them structurally modifies the map, it must remain externally synchronized. (Structural modification is the operation of adding or removing one or more mappings; Changing only values associated with keys that the instance already contains is not a structural change. This is typically done by synchronizing an object that naturally encapsulates the mapping. If there is no such object, you should use the Collections. The synchronizedMap approach to “packaging” the map. It is best to do this at creation time to prevent unexpected out-of-sync access to the map, as shown below:

Map m = Collections.synchronizedMap(new HashMap(…) ); Iterators returned by all such “collection view methods” fail quickly: After the iterator to create, if the mapping from the structure modification, except through the iterator itself remove or add method, other any time in any way modify, iterator will throw ConcurrentModificationException. Therefore, in the face of concurrent changes, iterators will quickly fail completely without the risk of arbitrary uncertain behavior at some unspecified future time.

Note that the fast failure behavior of iterators is not guaranteed, and in general, it is impossible to make any firm guarantees when there are asynchronous concurrent changes. Fail fast iterator throw ConcurrentModificationException as best as you can. Therefore, it is wrong to write programs that rely on this exception; instead, the fast failure behavior of iterators should be used only to detect program errors.

This class is a member of the Java Collections Framework.

3. HashMap storage structure

Before studying the source code, it is best to have the storage structure of a HashMap in mind, otherwise you may get a little confused when you read the source code and encounter parameters such as capacity and quantity. As you’ve probably heard, a HashMap contains an array table of type Node. Node stores key-value pairs.

Node class source code:

static class Node<K, V> implements Map.Entry<K, V> { final int hash; final K key; V value; Node<K, V> next;

Node(int hash, K key, V value, Node<K, V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) { return true; } if (o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >) o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) { return true; } } return false; }Copy the code

} It contains four fields, and we can see from the next field that Node is a linked list. That is, each position in the array is treated as a bucket, and each bucket holds a linked list. HashMap uses the chained address method to resolve hash conflicts. This means that nodes in the same list have the same hash value and hash bucket modulus. The storage structure of HashMap is shown below:

Static attributes of HashMap

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

The default table size, 1<<4 = 2^4 = 16, must be a power of 2.

static final int MAXIMUM_CAPACITY = 1 << 30;

The maximum size of the table must be a power of 2, and the type is int, which can only be 0100 0000 0000 0000 0000 0000 0000 0000. Because Java int has 32 bits, except for the first sign bit, the numeric part has only 31 bits, and only 1<<30 satisfies the condition.

It should be noted here that powers of two are a special group of beings in the computer world whose binary has only one 1 and all the other bits are zeros. This property affects the results of bitwise operations with some special effects.

Static final float DEFAULT_LOAD_FACTOR = 0.75f;

The load factor to use when not specified in the constructor.

static final int TREEIFY_THRESHOLD = 8;

I’m sure many of you know that lists larger than 8 will turn into red-black trees, according to the comments for this field. But that’s really just one of the conditions, and the other condition is this parameter down here.

static final int MIN_TREEIFY_CAPACITY = 64; In the process of converting a linked list to a red-black tree, the first step is to check whether the length of the linked list is greater than or equal to 8, and the second step is to check whether the capacity of the table array is smaller than this value. If it is smaller than this value, cancel the conversion to a red-black tree and expand only the table array.

static final int UNTREEIFY_THRESHOLD = 6; When there are too many key-value pairs, you need to expand the table array and put each key-value pair into a new bucket (or not). The internal structure of the bucket can be either a linked list or a red-black tree. After a shuffle, if the number of red-black tree key-value pairs is too low, the bucket will revert to a linked list. To what extent? That’s the size of this field.

HashMap member attributes

transient Node

[] table; An array of internal Node types for a HashMap with the property name table.
,v>

transient int modCount; This field is used as a marker for the number of structural changes to the HashMap. It is used to detect whether the internal structure of the HashMap has changed due to deletion or other operations during iterator access.

transient Set
> entrySet; The EntrySet class is a simple utility class that encapsulates a HashMap and provides easy traversal, deletion, and other operations.

Call the entrySet() method of a HashMap to return an entrySet instance. To prevent a new entrySet object from being returned each time the method is called, set this property to cache the entrySet instance.

transient int size; Number of key-value pairs.

int threshold; Size threshold. If the size is greater than threshold, capacity expansion must be performed.

final float loadFactor; The load factor, modified by final, is initialized in the constructor and default if not specified.

6. HashMap constructor

There are three constructors of HashMap, source code:

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
Copy the code

Looking at the source code, you can see that loadFactor modified with final must be initialized in the constructor.

The parameter constructor is mainly used to verify initialCapacity and loadFactor, and tableSizeFor method is used to calculate its reasonable initialCapacity. At this time, the table array has not been instantiated. Therefore, the reasonable initial capacity is temporarily stored in the threshold attribute for its custody.

TableSizeFor (int cap) method

TableSizeFor (int Cap) The static method calculates a reasonable initial capacity, that is, the number that is greater than or equal to 2 and is the closest to cap. The method uses bit operations to design efficient algorithm logic, method source code is as follows:

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

The value of cap must be greater than 0, so n must be greater than or equal to 0. If n = 0 is moved to the right, it will still be 0, and 0 and 0 will still be 0. The return statement n+ 1 calculates 1, that is, 2 to the 0 power.

When n is greater than 0, the binary bit of n must have a value of 1, i.e. 001xx.. Then, move n 1 bit to the right to get 0001xx.. Xx, then perform bit or, because 1 and 0 or 1 xor result is 1, so the result must be 0011xx.. The form of xx. And so on, 2 bits, 4 bits, 8 bits, 16 bits. Finally, the algorithm can make all the bits after the highest 1 become 1.

Here’s an illustration:

At the end of the algorithm, the result value n + 1 is an integer power of 2. To make the result greater than or equal to the parameter cap, at the beginning of the algorithm, make cap-1.

Hash (Object Key) method

This method is the core static method of HashMap and is used to calculate the hash value of key. The source code for this method is as follows:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

If the key is null, the bucket index is 0. If the key is null, the bucket index is 0. If the key is null, the bucket index is 0.

When key is not null, key.hashcode () is called and xor the lower 16 bits of hashCode with the higher 16 bits.

This is because the table array uses a power of 2 as its capacity, and bucket subscripts are computed by (capacity-1) & hash, so hashcodes that change only in bits above the current capacity will always conflict.

For example, if the initial capacity is 16 and the hash value is 32 bits:

16-1 = 0000 0000 0000 0000 0000 1111hash1 = 0000 0000 0000 0000 0000 0000 0001 1111
hash2 = 0000 0000 0000 0000 0000 0000 1111Copy the code

The result of 0 and any digit and ampersand is 0, so no matter what the hash value is, all but the last four bits must be 0. If the hash value changes only in the first 32-4 = 28 bits, the computed result will be the same and will be put into the same bucket, always conflicting.

So the designers, balancing speed, practicality, and the quality of bit-extending, applied a transformation that propagated the higher effects of hashcode xor down from the lower 16 bits to the higher 16 bits.

Since many common hashes are already reasonably distributed (such as numbers with zeros in the first 16 bits of the hash value, whose lower 16 bits remain the same as the higher 16 bits and therefore cannot benefit from extension), and because we use trees to handle a lot of collisions in containers, So we only xOR some of the shifted bits in the most convenient way to reduce the system loss and the impact of merging the highest bits, which would never have been used in the index calculation otherwise due to the size limitations of the TABLE array.

Formula for calculating barrel subscript

To calculate the bucket subscript, use the hash() method inside the HashMap to calculate the hash value, and then modulo the hash value to the number of buckets:

Hash % capacity If capacity is a power of 2, then we can convert this operation to an efficient bitwise operation, as shown in the HashMap source code:

(Capacity-1) & hash This is one of the reasons why the internal array of the HashMap is a power of 2. Another reason is that the resize() method allows more efficient recalculation of bucket subscripts.

Put (K key, V value

This method associates the specified value with the specified key in the mapping. If the Key already exists, override and return the old Value (which can be null); Null is returned if there is no key-value pair mapping. The source code for this method is as follows:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

As you can see, this method is actually the encapsulated putVal() method. The putVal() method takes five arguments, including the hash value of the key, the key itself, and the value itself. OnlyIfAbsent indicates whether to discard the old value when the key already exists. AfterNodeInsertion (evict) is not used by HashMap but is processed by LinkedHashMap.

The source code for this method is as follows:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // Step 1: If the table is empty, create the tableif((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // Step 2: calculate the bucket subscript, if there is no collision directly put the bucketif ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else{// Step 3: occurshashIf the key does not exist, create a new Node and insert it directly into the bucket Node<K,V> e; K k; // Check whether the colliding node is a head nodeif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // If the bucket's internal structure is a treeelse if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // If the bucket's internal structure is a linked listelse {
            for (int binCount = 0; ; ++binCount) {
                if((e = p.ext) == null) {// insert the newNode directly to the end of the list.hash, key, value, null); // If the list length is greater than 8, it will be treated as a red-black treeif (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }} // Step 4: The key already existsif(e ! = null) { // existing mappingfor key
            V oldValue = e.value;
            if(! OnlyIfAbsent | | oldValue = = null) / / cover directly, and returns the old value e.v alue = value; afterNodeAccess(e);returnoldValue; } } ++modCount; // Step 5: Check whether the number of key/value pairs exceeds the threshold. If yes, expand the capacityif (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Resize (

This method initializes or increases the size of the table array.

If the table array is NULL, the allocation is based on the initial capacity held in the field threshold. Otherwise expand, because we are using powers of 2, so the elements in each bucket must keep the same index, or be offset by powers of 2 in the new table.

For example, when the capacity is expanded from 16 to 32, the specific changes are as follows:

16-1 = 0000 0000 0000 0000 0000 0000 1111hash1 = 0000 0000 0000 0000 0000 0000 1111hash2 = 0000 0000 0000 0000 0000 0000 0000 0000 0001hash1 = 0000 0000 0000 0000 0000 0000 1111 (16-1)&hash2 = 0000 0000 0000 0000 0000 0000 0000 0000hash1 andhash2 If the result is the same after bucket subscript calculation, the bucket is added to the same bucket. When the capacity is expanded to 32, the new bucket subscript is calculated as follows: 32-1 = 0000 0000 0000 0000 0001 1111hash1 = 0000 0000 0000 0000 0000 0000 1111hash2 = 0000 0000 0000 0000 0000 0000 0000 0000 0001hash1 = 0000 0000 0000 0000 0000 0000 1111 (32-1)&hash2 = 0000 0000 0000 0000 0000 0000 0001 1111
Copy the code

Hash1 is still in the same bucket as hash2. The result of hash2 is 1 bit more, 2^4 = 16, which is offset by the original capacity. As shown below:

Therefore, when extending the HashMap, there is no need to recalculate the hash, just check whether the bit in the binary hash is 0 or 1, which is the same as the valid bit in the binary bucket subscript. If it is 1, the index becomes “old index +oldCap”.

How do I check whether the new bit is 0 or 1? The HashMap uses hash & oldCap bits and operations to check for this new bit. OldCap is a power of 2, so the binary representation is that only one bit is 1, and that bit corresponds exactly to it. I have to say that this design is very clever, not only saves the time of recalculating the hash value, but also, since the newly added 1 bit can be considered as a random 0 or 1, during the expansion process, the nodes that were previously collided are evenly distributed among the old and new buckets.

The resize() method looks like this:

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; // Step 1: Determine whether to expand or initialize the array according to oldCap.if(oldCap > 0) {// If it exceeds the maximum capacity, it will not be expandedif (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            returnoldTab; } // If it does not exceed the maximum value, it will be doubledelse if((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold} // Step 2: If the array is initialized, if the initial capacity has been saved in the field thresholdelse if (oldThr > 0) // initial capacity was placed inthreshold newCap = oldThr; // Step 3: To initialize the array, if the threshold does not save the initial capacity, use the default capacityelse{ // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // Step 4: Calculate the new key pair thresholdif (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes"."unchecked"}) // Step 5: Instantiate a new table array Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab;if(oldTab ! = null) {// Step ⑥ : Move each bucket and internal node into the new TABLE arrayfor (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! OldTab [j] = null; // If the bucket has no hash collision, recalculate the bucket subscriptif(e.next == null) newTab[e.hash & (newCap - 1)] = e; // If the bucket's internal structure is a treeelse if(e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); // If the bucket's internal structure is a linked list, the nodes that collide are either in the old bucket or the new bucketelse{// preserve order // Node<K,V> loHead = null, loTail = null; // Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; // Loop through the bucket collision nodedo{ next = e.next; // Add bit 0 to the bucketif ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            elseloTail.next = e; loTail = e; } // The new bit is 1 to put the new bucketelse {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null); // Add a new table array to the bucketif(loTail ! = null) { loTail.next = null; newTab[j] = loHead; } // The new bucket will put the new table arrayif(hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } }return newTab;
}
Copy the code

Get (Object key

This method returns the value to which the specified key is mapped, or NULL if the mapping for the key is not included.

A null return value does not necessarily indicate that the mapping for the key is not included, but may also indicate that the value for the key is NULL. You can call the containsKey() method if you want to verify that a mapping exists for the key.

The source code for this method is as follows:

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value; } the getNode() method that is called internally with the argument keyhashFinal Node<K,V> getNode(int) final Node<K,V> getNode(inthash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; // Step 1: Calculate and get the bucket where the key-value pair residesif((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash])! = null) {// Step 2: Each get must check whether the bucket head node is the desired key/value pair, because there is a high probability of no hash collisionif (first.hash == hash&& // always check first node ((k = first.key) == key || (key ! = null && key.equals(k))))returnfirst; // Step 3: If the bucket has hash collisions, traverse the nodes in the bucketif((e = first.next) ! = null) {// If the bucket's internal structure is a tree..if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key); // If the bucket's internal structure is linked list..do {
                if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
Copy the code

TreeifyBin (Node[] TAB, int hash

In the previous introduction to the PUT operation, when the length of the list is greater than or equal to TREEIFY_THRESHOLD (8), the treeifyBin() method is called to change the structure of the bucket from Node to TreeNode, that is, the list is transformed into a red-black tree. But strictly speaking, that’s not how it should be phrased. The treeifyBin() method determines whether the current table array capacity is smaller than MIN_TREEIFY_CAPACITY (64). If so, the treeifyBin() method determines whether the current table array capacity is smaller than MIN_TREEIFY_CAPACITY (64).

The source code for this method is as follows:

final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // If the size of the current table array is less than 64, choose to expand the table.if(tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); // The list becomes a red-black treeelse if ((e = tab[index = (n - 1) & hash])! TreeNode<K,V> hd = null, tl = null;doTreeNode<K,V> p = replacementTreeNode(e, null);if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! = null);if((tab[index] = hd) ! = null) // This step is the real linked list converted to red black tree hd.treeify(TAB); }}Copy the code

Treenode.treeify (Node[] TAB)

In addition to Node, there are TreeNode buckets in a HashMap. The TreeNode class contains a member method, treeify(), which forms a red-black tree with the current TreeNode object as the root node. The source code for this method is as follows:

final void treeify(Node<K,V>[] tab) { TreeNode<K,V> root = null; // Step 1: Iterate through the current TreeNode listfor(TreeNode<K,V> x = this, next; x ! = null; x = next) { next = (TreeNode<K,V>)x.next; x.left = x.right = null; // Step 2: If the root node has not been set..if (root == null) {
            x.parent = null;
            x.red = false; root = x; } // Step 3: If the root node has been set..else{ K k = x.key; int h = x.hash; Class<? > kc = null; // Step 4: Start from the root node and insert a new nodefor(TreeNode<K,V> p = root;;) { int dir, ph; K pk = p.key; // Step 5: Compare the values of the current nodehashValue with the new nodehashValue // If the node is newhashSmaller valuesif((ph = p.hash) > h) dir = -1; // If it is a new nodehashHigher valueselse if(ph < h) dir = 1; // If the new node is the same as the current nodehashValues are equalelse if(// If the key of the new node does not implement the Comparable interface.. = = = null (kc && (kc comparableClassFor (k)) = = null) / / or implement the Comparable interface but k.com pareTo (pk) results of 0 | | (dir = Dir = tieBreakOrder(k, pk); dir = tieBreakOrder(k, pk); TreeNode<K,V> xp = p; // Step ⑥ : If the new node is smaller than or equal to the current node and the left child node of the current node is null, insert the new node, and vice versaif ((p = (dir <= 0) ? p.left : p.right) == null) {
                    x.parent = xp;
                    if (dir <= 0)
                        xp.left = x;
                    elsexp.right = x; // Step 7: Balance red-black tree root = balanceInsertion(root, x);break; }}}} // Step 8: Ensure that the given root node is the first node of the bucket moveRootToFront(TAB, root); }Copy the code