Implementation principle of HashMap

Small knowledge, big challenge! This paper is participating in theEssentials for programmers”Creative activities

Record Java HashMap underlying data structure, method implementation principle, based on JDK 1.8.

Underlying data structure

Java HashMap uses hash table structure ** (array + linked list, JDK1.8, array + linked list or red-black tree) ** implementation, combining the advantages of array and linked list:

Array advantages: Through the array subscript can quickly achieve access to the array elements, high efficiency;
Advantages of linked lists: Inserting or deleting data does not require moving elements, only modifying node references, which is extremely efficient.

The HashMap diagram looks like this:

A HashMap internally stores data using arrays, each of which is of typeNode<K,V>:

static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey(a)        { return key; }
    public final V getValue(a)      { return value; }
    public final String toString(a) { return key + "=" + value; }

    public final int hashCode(a) {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false; }}Copy the code

Node contains four fields: Hash, key, value, and Next, where next indicates the next Node in the list.

A HashMap computes the hash code of the key using the hash method, and then uses the (n-1) &hash formula (n is the length of the array) to get the index of the key stored in the array. When two keys are stored with the same index in an array, the data is stored as a linked list (hash collision, hash collision). We know that to find the data in a list, you have to start at the first element and work your way down until you find it in O(N) time, so as the list gets longer and longer, the HashMap becomes less and less efficient.

To solve this problem, JDK1.8 started implementing HashMap using an array + linked list + red-black tree structure. When the list has more than 8 elements (TREEIFY_THRESHOLD) and the array length is greater than 64 (MIN_TREEIFY_CAPACITY), the list will be converted into a red-black tree. After the conversion, the data query time complexity is O(logN).

Nodes in a red-black tree are represented by TreeNode:

static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next); }... }Copy the code

HashMap contains several important variables:

// The default array initialization length is 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

// The maximum size of the array, 2 to the 30th power, is 1073741824
static final int MAXIMUM_CAPACITY = 1 << 30;

// Default load factor value
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

// The length threshold for the list to be converted to a red-black tree
static final int TREEIFY_THRESHOLD = 8;

// The length threshold for red-black tree conversion to the linked list
static final int UNTREEIFY_THRESHOLD = 6;

// To convert a list to a red-black tree, the array size must be greater than or equal to 64
static final int MIN_TREEIFY_CAPACITY = 64;

// Number of key-value pairs in HashMap
transient int size;

// Capacity expansion threshold, calculated as array capacity x load factor
int threshold;

// Use an array to store data in a HashMap. The element type is Node
      ,v>
transient Node<K,V>[] table;

// Load factor
final float loadFactor;

// The number of times this hash map has been structurally modified
// Used for fast failure, since the HashMap is not thread-safe, during the iteration of the HashMap, if other threads are involved during the iteration, the HashMap will fail
/ / structure changed (such as the put, the remove operation), throw ConcurrentModificationException directly
transient int modCount;
Copy the code

These fields are particularly important when parsing the source code below, focusing on what the load factor is and why the default value is 0.75F.

The load factor, also known as the expansion factor, is used to determine when the HashMap array is expanded. For example, if the array capacity is 16 and the load factor is 0.75, the expansion threshold is 16*0.75=12, that is, if the HashMap data quantity is greater than or equal to 12, the array will be expanded. As we all know, the size of an array is determined when it is created. By scaling up, you create a new array of the specified size and copy the old value into the new array. Capacity expansion is a time-consuming process that affects application performance. So the load factor is based on the balance between capacity and performance:

When the load factor is too large, the capacity expansion threshold is also large, that is, the capacity expansion threshold is raised, which reduces the capacity usage. But then the probability of hash collisions increases and the efficiency decreases;
When the load factor is too small, the capacity expansion threshold becomes smaller, the capacity expansion threshold decreases, and the capacity usage increases. In this case, the probability of hash collisions decreases and the efficiency increases.

As you can see, capacity usage and performance are a trade-off, and their equilibrium is determined by the load factor. 0.75 is an empirical value for both capacity and performance.

In addition, the table field used to store data uses transient modifier, which will be excluded during serialization. Then, how does HashMap recover data after deserialization? HashMap customizes serialization and deserialization operations through custom readObject/writeObject methods. There are two main reasons for this:

The table will not run out, that is, the capacity of the table is larger than the actual number of key/value pairs. Serializing unused parts of the table is a waste of time and space.
If the type of key does not override the hashCode method, it will call the hashCode method of Object, which is a native method and may be implemented differently under different JVMS. In other words, the same key-value pair may be stored in different places in the table under different JVMS, and an error may occur when deserializing the table.

So in the HashXXX class (HashTable, HashSet, LinkedHashMap, etc.), we can see that the fields used by these classes to store data are all transient and have custom readObject/writeObject methods. ReadObject /writeObject method this section will not be source code analysis, interested in their own research.

Put the source code

The put method is as follows:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

The put method calculates the hash value of the key using the hash function. The source code of the hash function is as follows:

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

If key is null, 0 is returned; otherwise, the hash value is computed by the (h = key.hashcode ()) ^ (h >>> 16) formula. In this formula, hash values are obtained through the high or low 16 bits of hashCode, mainly from the perspective of performance and hash collision, reducing system overhead and avoiding collisions caused by the high bits not participating in subscript calculation.

Insert element putVal(Hash (key), key, value, false, true);

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the array (hash table) is null or has a length of 0, the array is initialized
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // Calculate the subscript position of the data into the array according to the hash value of the key. The formula is (n-1) &hash
    if ((p = tab[i = (n - 1) & hash]) == null)
        // If the subscript position does not already have an element, the Node object is created and inserted
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // If the target location key already exists, overwrite it directly
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        // If the key does not exist and the node is a red-black tree, insert the key into the red-black tree
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // Otherwise, the list structure is traversed, and the tail is inserted
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // If the list length is greater than or equal to TREEIFY_THRESHOLD, consider converting to a red-black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash); If MIN_TREEIFY_CAPACITY is smaller than MIN_TREEIFY_CAPACITY, it will not be converted
                    break;
                }
                // Overwrite the replacement if the key already exists in the list
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            // Return the replaced value
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            returnoldValue; }}// module increments
    ++modCount;
    // When the number of key/value pairs is greater than or equal to the capacity expansion threshold, capacity expansion is performed
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Summary of put operation process:

Check whether the HashMap array is empty and initialize the array if it is.
By (n – 1) & hash computing key stored in the array indexes;
If the target index is empty, create Node storage directly.
If the target index position is not empty, there are three cases:

4.1. If the key is the same, the old value is overwritten.

4.2. If the node type is a red-black tree, the red-black tree insertion operation is performed.

4.3. If the node type is a linked list, it is traversed until the end of the last element is inserted. If the same key is encountered during this period, it is directly overwritten. If the list length is greater than or equal to TREEIFY_THRESHOLD and the array capacity is greater than or equal to MIN_TREEIFY_CAPACITY, the list is converted to a red-black tree structure.
Check whether the number of HashMap elements is greater than or equal to threshold. If yes, expand the number.

Get the source code

Get and PUT, compared to much simpler, here is the get operation source code:

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // Check whether the array is empty, whether the length of the array is greater than 0, and whether the element under the target index is empty
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // If the target index position element is the element to be found, it is returned directly
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        // If the next node of the target index position element is not empty
        if((e = first.next) ! =null) {
            // If the type is a red-black tree, look it up from the red-black tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
            // Otherwise, the list is iterated to find the target element
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

Resize the source code

Resize (); resize (); resize ();

final Node<K,V>[] resize() {
    // Array before expansion
    Node<K,V>[] oldTab = table;
    // The size and threshold of the array before expansion
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    // Pre-define the size and threshold of the new array
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If the value exceeds the maximum value, the capacity will not be expanded
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Expand the capacity to double the current capacity, but cannot exceed MAXIMUM_CAPACITY
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // There is no data in the current array, use the initialized value
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               
        // If the initialization value is 0, the default initialization capacity is used. The default value is 16
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // If the new capacity is 0
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr; 
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    // Start capacity expansion and assign the new capacity to the table
    table = newTab;
    // The original data is not empty. Copy the original data to the new table
    if(oldTab ! =null) {
        // Loop through the array based on capacity, copy non-empty elements into the new table
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                // If there is only one list, direct assignment is performed
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // Red-black tree related operations
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // List replication, JDK 1.8 expansion optimization part
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        / / the original indexes
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Old index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // Put the original index into the hash bucket
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the old index + oldCap in the hash bucket
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

JDK1.8 determines whether elements need to be moved by the high-order operation e.hash & oldCap if the result is 0. There are two main cases:

Case 1:

Before capacity expansion oldCap=16, hash=5, (n-1)&hash=15&5=5, hash&oldCap=5&16=0;

After expansion, newCap=32, Hash =5, (n-1)&hash=31&5=5, hash&oldCap=5&16=0.

In this case, the index position of the element remains unchanged after the expansion, and hash&oldCap==0.

Situation 2:

Before capacity expansion oldCap=16, hash=18, (n-1)&hash=15&18=2, hash&oldCap=18&16=16;

After expansion, newCap=32, Hash =18, (N-1)&hash=31&18=18, hash&oldCap=18&16=16.

In this case, the index position of the element after expansion is 18, that is, the old index 2 plus 16(oldCap), and hash&oldCap! = 0.

Traversal principle

We typically traverse a HashMap in one of two ways:

HashMap<String, Object> map = new HashMap<>();
map.put("1"."a");
map.put("4"."d");
map.put("2"."b");
map.put("9"."i");
map.put("3"."c");

Set<Map.Entry<String, Object>> entries = map.entrySet();
for (Map.Entry<String, Object> entry : entries) {
    System.out.println(entry.getKey() + ":" + entry.getValue());
}

System.out.println("-- -- -- -- -- -- --");

Set<String> keySet = map.keySet();
for (String key : keySet) {
    System.out.println(key + ":" + map.get(key));
}
Copy the code

Program output:

1: a
2: b
3: c
4: d
9: i
-------
1: a
2: b
3: c
4: d
9: i
Copy the code

From our previous analysis of the PUT source code, we know that HashMap is unordered and that the order of the output elements is generally different from the order of the insert elements. But if you run the program multiple times, you’ll see that the order is the same every time. So how does traversal work, how does it work internally?

Through entrySet or keySet traversal, the internal principle is the same. For example, entrySet.

By looking at the corresponding class file, you can see that the following code will actually be converted to iterator traversal:

Set<Map.Entry<String, Object>> entries = map.entrySet();
for (Map.Entry<String, Object> entry : entries) {
    System.out.println(entry.getKey() + ":" + entry.getValue());
}
Copy the code

The enhanced for loop compiles to:

Set<Entry<String, Object>> entries = map.entrySet();
Iterator var3 = entries.iterator();

while(var3.hasNext()) {
    Entry<String, Object> entry = (Entry)var3.next();
    System.out.println((String)entry.getKey() + ":" + entry.getValue());
}
Copy the code

HashMap traversal: entrySet, Iterator, hasNext, next

public Set<Map.Entry<K,V>> entrySet() {
    Set<Map.Entry<K,V>> es;
    // entrySet starts with null and is created with new entrySet ()
    return (es = entrySet) == null ? (entrySet = new EntrySet()) : es;
}

final class EntrySet extends AbstractSet<Map.Entry<K.V>> {
    public final int size(a){ return size; }
    public final void clear(a){ HashMap.this.clear(); }
    EntrySet contains iterator methods that create Entry iterators through new EntryIterator()
    public final Iterator<Map.Entry<K,V>> iterator() {
        return newEntryIterator(); }... }// EntryIterator inherits from HashIterator. The hasNext method that calls EntryIterator actually calls
// The hashNext method of the superclass HashIterator calls the next method of EntryIterator, which calls the superclass HashIterator
// nextNode method, so we'll focus on the source code for HashIterator
final class EntryIterator extends HashIterator implements Iterator<Map.Entry<K.V>> {
    public final Map.Entry<K,V> next(a) { returnnextNode(); }}abstract class HashIterator {
    Node<K,V> next;        // Next node
    Node<K,V> current;     // The current node
    int expectedModCount;  // Expected modulus value, used for quick failure
    int index;             // The current table index traversed

    HashIterator() {
        // Assigns the current module value to the expected module value, so while traversing, another thread calls the current hashMap instance's
        ExpectedModCount and expectedModCount are not equal to the expectedModCount, so the traversal operation is direct
        / / throw ConcurrentModificationException
        expectedModCount = modCount;
        Node<K,V>[] t = table;
        current = next = null;
        // Start with the header of the hashMap array
        index = 0;
        if(t ! =null && size > 0) { // advance to first entry
            // Start at the head of the array, incremented by index, and assign to next if the node at index is not empty
            // This means that the first non-empty node in the hashMap array is already found internally when the hashMap iterator is created
            do {} while (index < t.length && (next = t[index++]) == null); }}public final boolean hasNext(a) {
        // Next is empty
        returnnext ! =null;
    }

    final Node<K,V> nextNode(a) {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if(modCount ! = expectedModCount)// module judgment
            throw new ConcurrentModificationException();
        if (e == null)
            // NoSuchElementException will be thrown if next is empty and the nextNode method is called
            throw new NoSuchElementException();
        // This logic is also very simple, mainly contains the following two cases:
        // 1. If the next node of the current node is empty, that node does not need to do list traversal (just one node or at the end of the list), then do while loop until the next non-empty node in the hashMap array is found
        // 2. If the next node of the current node is not empty, it indicates that there is a list at that location. Then, when iterator next is called, the nextNode method is called repeatedly to iterate over the list operations
        if ((next = (current = e).next) == null&& (t = table) ! =null) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        returne; }... }Copy the code

In short, the process of traversing a HashMap is to search for non-empty nodes in the HashMap array from scratch. If there is a linked list under this node, the linked list will be traversed. After traversing the linked list, the next non-empty node in the HashMap array will be found, and the process will continue until the end of traversing.

So, if I have a red-black tree at a node, how do I traverse it? In fact, when the linked list is converted to a red-black tree, the next field information contained in the linked list node is retained, so we can still find the next node through the next field in the red-black tree node.

Major differences with JDK1.7

Array element types are different

JDK1.8 HashMap array element type Node

JDK1.7 HashMap array element type Entry

:
,v>
,v>

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

static class Entry<K.V> implements Map.Entry<K.V> {
    final K key;
    V value;
    Entry<K,V> next;
    inthash; . }Copy the code

It’s just a change of class name, there’s nothing fundamentally different.

The hash calculation rules are different

JDK1.7 hash calculation rules are as follows:

final int hash(Object k) {
    int h = hashSeed;
    if (0! = h && kinstanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

Hash methods in JDK1.7 perform slightly worse than hash methods in JDK1.8.

Different put operations

JDK1.7 does not use red-black trees. If a hash conflict occurs, it is resolved by a linked list. Unlike JDK1.8’s tail insert, JDK1.7 uses a header insert:

public V put(K key, V value) {   
    // The key is null and the element is placed at the 0 subscript of the table array
    if (key == null)  
        return putForNullKey(value); 
    // Computes hash and array index positions
    int hash = hash(key.hashCode());  
    int i = indexFor(hash, table.length);  
    // Iterate through the list. If the key is consistent, the key already exists. Replace the old value with the new value and return
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {  
        Object k;  
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
            V oldValue = e.value;  
            e.value = value;  
            e.recordAccess(this);  
            return oldValue;  
        }  
    }  
    modCount++;
    // Insert the linked list
    addEntry(hash, key, value, i);  
    return null;  
} 

private V putForNullKey(V value) { 
    // Replace the old and new values
    for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {  
        if (e.key == null) {  
            V oldValue = e.value;  
            e.value = value;  
            e.recordAccess(this);  
            return oldValue;  
        }  
    }  
    modCount++;  
    // Insert into the array at position 0
    addEntry(0.null, value, 0);  
    return null;  
} 

void addEntry(int hash, K key, V value, int bucketIndex) {
    // The new value header is inserted, and the original header becomes the next of the new header element
    Entry<K, V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<K, V>(hash, key, value, e);
    // Count and expand
    if (size++ >= threshold)
        resize(2 * table.length);
}
Copy the code

Different Capacity Expansion operations

In JDK1.8, if the result of e.hash & oldCap is 0, the element needs to be moved. In JDK1.7, the hash value of each element is recalculated, and the linked list is traversed in the positive order of the old list and inserted in the head of the new list.

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

/** * Transfers all entries from current table to newTable. */
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code