10 minutes to get an in-depth look at the HashMap source code. I think I’ll have to watch one more minute to get the hang of it!

Finally, we come to the complex HashMap. Because of the large number of internal variables, internal classes and methods, it cannot be directly tiled like ArrayList, so we are going to start from several specific perspectives.

Bucket structure

Each storage location of a HashMap, also known as a bucket, is allocated to a bucket based on the hash value of a Key&Value when it enters the map.

Take a look at the definition of a bucket: a table is a so-called bucket structure, which is simply an array of nodes.

transient Node<K,V>[] table;
transient int size;
Copy the code

node

A HashMap is a map structure that, unlike a Collection structure, stores key-value pairs rather than individual objects. Therefore, the most basic internal storage unit is a Node: Node.

Nodes are defined as follows:

class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}
Copy the code

The Node stores the key,vaue, and hash values, as well as the next pointer, so that multiple nodes can form a one-way list. This is one way to resolve hash conflicts, and can form a linked list if multiple nodes are assigned to the same bucket.

Inside a HashMap there is another node type called TreeNode:

class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
 }
Copy the code

TreeNode is inherited from Node and can form a red-black tree. Why do we have this? As mentioned above, if the nodes are hashed into the same bucket, then the linked list can be very long, which can dramatically reduce access efficiency. If the key is Comparable (with the Comparable interface implemented), HashMap converts the list into a balanced binary tree to save some efficiency. In practice, we expect this phenomenon never to occur.

With that in mind, we can look at some of the constants associated with HashMap:

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code
  • TREEIFY_THRESHOLD, when the number of nodes in a bucket reaches this threshold, the linked list can be converted into a tree;
  • UNTREEIFY_THRESHOLD: When the number of buckets in a bucket falls below this threshold, the tree is converted back to the linked list.
  • MIN_TREEIFY_CAPACITY, if the number of buckets is lower than this, the number of buckets will be expanded before converting the list to a tree;

Put method: Key&Value

Insertion interface:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The put method calls the private putVal method, but it’s important to note that the hash value of the key is not directly used in hashCode. The final hash= (hashCode shifted right 16) ^ hashCode.

When mapping the hash value to the bucket position, the lower part of the hash value is taken, so that if only the higher part of a batch of keys are inconsistent, they are clustered in the same bucket. (This can happen if the number of buckets is small, the key is a Float, and the key is a consecutive integer).

The procedure for performing an insert:

V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        
        if((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // code segment 1if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);            
        else{ Node<K,V> e; K k; // Code segment 2if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // code segment 3else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else{// code snippet 4for(int binCount = 0; ; ++binCount) {// Code segment 4.1if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break; } // code segment 4.2if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }} // code segment 5if(e ! = null) { // existing mappingfor key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);returnoldValue; }} // code segment 6 ++modCount;if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code
  • The first section handles the case where the bucket array has not yet been allocated;
  • Block 1: I = (n-1) &hash, which computs the bucket position of hash. Since n is the last order of 2, this is an efficient mod fetch operation. If the location is empty, create a Node and put it in it. Otherwise, the node at the conflicting position is called P;
  • Code snippet 2: If the key of node P is the same as the key passed in, then the new value should be placed in an existing node, named e.
  • Code snippet 3: If node P is a tree, insert key&value into the tree;
  • Segment 4: P is the head of the list, or a single node. Either way, you can scan the list.
    • Section 4.1: If the list ends, insert a new node and turn the list into a tree if necessary;
    • 4.2: If an equal key is found in the linked list, do the same as in paragraph 2;
  • Snippet 5: Store value to node E
  • Snippet 6: If size exceeds a certain value, adjust the number of buckets. The resize strategy is described below

The remove method

Now that you know how to put, remove is easy, so let’s go straight to the private removeNode method.

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false.true)) == null ?
        null : e.value;
}
    
Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; // code segment 1if((tab = table) ! = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) &hash])! = null) {// segment 2: Node<K,V> Node = null, e; K k; V v;if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) node = p; // Code snippet 3:else if((e = p.next) ! = null) {// segment 3.1:if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else{// snippet 3.2:do {
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k)))) { node = e;break;
                    }
                    p = e;
                } while((e = e.next) ! = null); }} //if(node ! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) {//if(node instanceof TreeNode) ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); // code segment 4.2:else if(node == p) tab[index] = node.next; // code segment 4.3:else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            returnnode; }}return null;
}
Copy the code
  • Code snippet 1: This if condition determines whether the bucket corresponding to the hash is empty. If so, the map must not contain the key. Otherwise, the first node is called P;
  • Code snippet 2: If the key of node P is equal to the parameter key, the node to be removed is found, denoted as node;
  • Snippet 3: Scan for other nodes in the bucket
    • Section 3.1: If the bucket is a tree, perform tree lookup logic;
    • Snippet 3.2: Perform the linked list scan logic;
  • Snippet 4: If node is found, try to remove it
    • Code snippet 4.1: If it is a tree node, execute the tree node deletion logic;
    • Section 4.2: Node is a linked list header. Put Node. next in a bucket.
    • Snippet 4.3: Remove the middle node of the list

rehash

Rehash redistributes buckets and rehash the original node to the new bucket location.

Let’s start with two member variables that are related to the number of buckets

final float loadFactor;
int threshold;
Copy the code
  • LoadFactor Indicates the upper limit of the ratio of the number of entries in a HashMap to the number of buckets. Once the map load reaches this value, the number of buckets needs to be expanded;
  • When the number of threshold maps reaches this value, buckets need to be expanded. Its value is basically equal to the capacity of buckets *loadFactor, which I feel is a cache value to speed up related operations without calculating each time.

For the bucket expansion strategy, see the function below. If the desired capacity is CAP, the actual expansion capacity is greater than cap by a order of 2. So it depends, every time you expand, you’re adding a multiple of 2.

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

This is the concrete extension logic

Node<K,V>[] resize() {// newCap <K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab;if(oldTab ! = null) {for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! = null) { oldTab[j] = null; / / branch 1if(e.next == null) newTab[e.hash & (newCap - 1)] = e; / / branch 2else if(e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); / / branch 3else{// preserve order // omit list split logic here}}}return newTab;
}
Copy the code
  • First, allocate a new bucket array;
  • Scan old buckets to migrate elements to them;
  • Branch 1: If there is only one new node in the bucket, place it in the corresponding position of the new bucket.
  • Branch 2: The bucket contains a tree, which performs the tree splitting logic
  • Branch 3: Bucket contains a linked list, which performs the split logic of the list.

Since the number of new buckets is a multiple of 2 of old buckets, two or more new buckets can be created for each old bucket without interference. So the migration logic above does not need to check if there are nodes in the new bucket.

Rehash can be expensive, so it is better to set a proper capacity during initialization to avoid rehash.

Finally, although not shown in the code above, the number of buckets increases, not decreases, during the lifetime of the HashMap.

The iterator

At the heart of all iterators is this HashIterator

abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    final Node<K,V> nextNode() {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if(modCount ! = expectedModCount) throw new ConcurrentModificationException();if (e == null)
            throw new NoSuchElementException();
        if((next = (current = e).next) == null && (t = table) ! = null) {do {} while (index < t.length && (next = t[index++]) == null);
        }
        returne; }}Copy the code

For simplicity, only the next portion of the code is kept. The principle is simple: Next points to the next node, which must be in a bucket (the bucket is index). If the same bucket has other nodes, you can definitely follow next. Next, whether it’s a linked list or a tree. Otherwise, scan the next bucket.

With the node iterator above, all iterators visible to other users are implemented through it.

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

final class ValueIterator extends HashIterator
    implements Iterator<V> {
    public final V next() { return nextNode().value; }
}

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    public final Map.Entry<K,V> next() { returnnextNode(); }}Copy the code

view

Part of the KeySet code: This is not a separate Set, but a view whose interface internally accesses the HashMap data.

final class KeySet extends AbstractSet<K> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<K> iterator()     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false.true)! = null; }}Copy the code

EntrySet, Values, and KeySet are similar and will not be described again.

The point to summarize

1. Key&value is stored in the node. 2. The node may be a linked list node or a tree node; 3. Allocate buckets to nodes according to key hash values; 4. If there are multiple nodes in the bucket, then either a linked list or a tree is formed. 5. The load factor limits the ratio of nodes to buckets, and expands the number of buckets when necessary; 6. The number of buckets must be of the order of 2, and the process of redistributing buckets is called rehash, which is an expensive operation;

Write in the last