Basic implementation and Principles of HashMap

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Abstract

A Hash table (also called a Hash table) is a data structure that is directly accessed based on Key values. That is, it accesses records by mapping key values to a location in the table to speed up lookups. This mapping function is called a hash function, and the array of records is called a hash table.

HashMap is the most frequently used container by Java programmers for handling key value data. In JDK1.7(Java Developmet Kit), HashMap took the form of an array + linked list to store data. JDK1.8 optimizes the storage structure of HashMap, introducing red-black tree data structure, greatly enhancing the performance of HashMap access! Why were red black trees introduced? Because HashMap has a problem, even if the load factor and Hash algorithm design is reasonable, it cannot avoid the problem of long zippers on the linked list. If serious Hash conflicts occur in extreme cases, the access performance of HashMap will be seriously affected. Therefore, HashMap introduced red-black tree in JDK1.8. Optimize the performance of HashMap by using the features of red-black tree.

Introduction to the

Java defines an interface for mapping (key-value pairs) in data structures, java.util.map. This interface has four implementation classes that we use more frequently in development, HashMap<K,V>, Hashtable<K, V>, LinkedHashMap<K,V>, TreeMap<K,V>

The serial number	The implementation class	The characteristics of
1	HashMap<K, V>	1. Store data according to the hashcode of the key
2. Allow the key of one record to be NULL and the value of multiple records to be NULL
3. Not thread-safe
2	Hashtable<K, V>	1. Thread safety
2. If the performance is low, use ConcurrentHashMap instead
3	LinkedHashMap<K,V>	1. Access is sequential, preserving the insertion order
2. We can make it sort by access order by using constructor input parameters
4	TreeMap<K,V>	1. Ordered Map. You can use the sort comparator to customize the sorting rules for storing data
2. Key needs to implement the Comparable interface or pass in a custom Comparator through the constructor

Storage structure

After jdk1.8, HashMap uses array + linked list/red-black tree to store data

Source code analysis

The trunk

// The trunk of a HashMap, shown in green above, is a Node
      
        array, each containing a k-V key-value pair
      ,v>
transient Node<K,V>[] table;
Copy the code

Node element

// Node
      
        is a static inner class of HashMap that implements the internal Entry interface of the Map interface
      ,v>
static class Node<K.V> implements Map.Entry<K.V> {
   	// Record the hash value of the current Node key to avoid double calculation
    final int hash;
    / / key
    final K key;
    / / value
    V value;
    // Stores a reference to the next Entry, which is a one-way list structure
    Node<K,V> next;
    // ...
 }
Copy the code

Other important fields

// The default initial capacity is 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// Maximum capacity, 1 moved 30 bits to the left
static final int MAXIMUM_CAPACITY = 1 << 30;
// The default expansion factor is 0.75
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// The list length of a red-black tree
static final int TREEIFY_THRESHOLD = 8;
// The red black tree spinner threshold
static final int UNTREEIFY_THRESHOLD = 6;
// List to red black tree array length
static final int MIN_TREEIFY_CAPACITY = 64;
// The actual number of k-v key-value pairs stored
transient int size;
// Count the number of times HashMap is changed. Since HashMap is not thread safe, modCount can be used with FailFast mechanism
transient int modCount;
// Capacity expansion threshold, default 16*0.75=12, when filled to 13 elements, will be 32 after capacity expansion,
int threshold;
// loadFactor=capacity*threshold. For HashMap capacity expansion, refer to the value of loadFactor
final float loadFactor;
Copy the code

The constructor

// use HashMap(Map
       m) allocates memory for the table
public HashMap(int initialCapacity, float loadFactor) {
    // Check whether the initialized capacity is valid. If <0, an exception is thrown
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
    // Determine whether initialCapacity is greater than 1<<30, if so, 1<<30 = 2^30
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    // check whether the loadFactor is valid, if less than or equal to 0 or isNaN, loadFactor! =loadFactor, an exception is thrown
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +loadFactor);
    / / assignment loadFactor
    this.loadFactor = loadFactor;
    // Set threshold to a power of 2 closest to initialCapacity by bit operation (very important here)
    this.threshold = tableSizeFor(initialCapacity);
}
Copy the code

Implementation of Hash algorithm

The hash algorithm in HashMap is implemented in three steps. In the second step, 16 bits higher than the hash value is used to ensure that when the length of the array table is relatively small, both the high and low bits are used in the hash operation, so that the bit operation can be evenly distributed without too much performance consumption. In the third step, when n is 2, the power of the integer is hash&(n-1), which is equivalent to modulo the hash value, and bit operation is more efficient than modulo operation. The specific process can be viewed in the diagram.

Step 1: Get the key’s hashCode by key.hashCode();

(h = key.hashcode ()) ^ (h >>> 16);

Step 3: Modulo the hash value calculated by (n-1) & hash to get the array position of node insertion.

Put method of HashMap

Step 1: Check whether the key-value pair array table[I] is empty /null, if yes, perform resize() expansion.

If TAB [I]== null, insert it directly, and perform step 6. If the TAB [I]! = null, go to step 3.

Check whether the first element of TAB [I] is equal to the hashcode&equals of the insert key. If so, overwrite. Otherwise, perform step 4.

Step 4: check whether TAB [I] is a red-black TreeNode TreeNode. If yes, insert a node into the red-black tree. Otherwise, perform step 5.

Step 5: Traverse TAB [I] to determine whether the linked list is larger than 8. If the list is larger than 8, it may turn into a red-black tree (the array should be larger than 64). If so, insert nodes into the red-black tree. Otherwise insert into the linked list; If the hashcode&equals of the key exists in the list, replace it.

Step 6: Insert the hashMap successfully. Check whether the size of the HashMap exceeds threshold. If the value exceeds threshold, expand the hashMap.

put(K key, V value)

public V put(K key, V value) {
    // hash(key) Computes a hash value based on the key
    return putVal(hash(key), key, value, false.true);
}
Copy the code

Computes the hash value of the key

static final int hash(Object key) {
    int h;
    // If key is null, return 0
    // Otherwise, take the hashcode of the key, move h unsigned 16 bits to the right, get the high 16 bits, perform xor to both values and return
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // Check whether the table is empty. If the table is empty, resize() will be created
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // (n-1) &hash computes the index of the inserted node in the array
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // Overwrites the current element if the node key exists
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        // If the node does not exist and the chain on the current array node is RBT red-black tree, insert the node as red-black tree
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // The chain is a linked list
        else {
            for (int binCount = 0; ; ++binCount) {
                // The next node of the node is null, indicating that the last node of the list has been traversed, and next of the last node currently traversed points to the newly inserted node
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // If the list length is greater than 8, a red-black tree conversion may be required, depending on whether the array length is greater than 64 in treeifyBin()
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // Returns if the node exists
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break;
                // If a node does not exist and is not null, the next node is assigned to the current node and the next loop continuesp = e; }}// The HashmMap structure changes are not recorded
        if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            returnoldValue; }}// Record the structural changes of the HashMap
    ++modCount;
    // Determine whether the array of the hashMap needs to be expanded after node insertion
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

capacity

final Node<K,V>[] resize() {
    // Assign the pre-expansion array to oldTab
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // Check whether the array length is greater than or equal to the maximum allowed
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Move the array size to the left one bit, that is, double the size of the array
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // If the array size is less than or equal to 0 and the threshold is greater than 0(for example, all elements in the HashMap have been removed), then the value of oldThr is assigned to newCap
    else if (oldThr > 0) 
        newCap = oldThr;
    // For the first initialization, give the default value
    else {               
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // If the new capacity expansion threshold is still 0, newThr needs to calculate a value
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    // Assign a new array expansion threshold
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    // Instantiate a new Node array based on the new capacity
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        // Iterate over the old array and assign to the new array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            // If the node is not null, the node is assigned to e
            if((e = oldTab[j]) ! =null) {
                // Empty the node reference and wait for GC to collect it
                oldTab[j] = null;
                // If the next pointer of the node is empty, the index position of the node in the array will be recalculated according to the hash value recorded by the node & the expanded size -1
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                // If the node is a red-black tree node, insert it in the same way as the red-black tree node
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // The link on the current node is a list, traversing the list and transferring the list to the new array
                else {
                    // Create two lo chains at the original array node location
                    Node<K,V> loHead = null, loTail = null;
                    // Insert the hi chain into the original array index +oldCap position
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        / / the key
                        // Get the hash value of the current node, perform bit-and-operation with oldCap, if the result is 0, join lo chain
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Otherwise join hi chain
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // The original position of the new array points to the head node of the LO chain
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // The j+oldCap of the new array points to the head node of the hi chain
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

Resize () in JDk1.8

(e.hash & oldCap) == 0, the hash value and the length of the array are used to determine the position of the node in the new array. When (e.hash & oldCap) == 0, the position of the node in the new array is the same as in the old array; When (E. Hash & oldCap)! = 0, the position of the node in the new array is j + oldCap, which is the original position plus the length of the old array.

I’m going to pose two questions here, and I’m going to analyze it with questions,

Question 1: Why is this possible?

Q2: Nodes that used to have the same hash value in the same index position of the array will be randomly assigned to two new index positions. Will this result in the failure to obtain elements when getting (key)?

Three prerequisites

TableSizeFor (); tableSizeFor(); tableSizeFor();

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

2. The HashMap expansion method is oldCap << 1, which must be doubled as the original expansion, ensuring that the expansion is still an integer power of 2

3. HashMap computes an array index using (n-1) & hash

The initial size of the array is 16, and the expansion factor is 0.75. Insert three nodes in sequence, and the keys of the nodes are key1, key2, and key3 respectively. Assume that the number of nodes in the array reaches 12 after the insertion of the third node

At this point, the distribution of elements in the HashMap is assumed to be as shown in the following figure, size=12, for expansion

The algorithm for capacity expansion is E. hash & oldCap. After calculation, it is found that the value of key2 is 0, which remains in the subscript position of the old array. Key1 and key3 need to be moved to j + oldCap, that is, 1+16=17 subscript position

Node distribution after capacity expansion

Now let’s look at the code that gets elements by key after expansion

public V get(Object key) {
    Node<K,V> e;
    // Computes the hash value based on the current key, as described in detail above
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // When the array is not empty and the element of the array index corresponding to the key exists
    if((tab = table) ! =null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! =null) {
        // Check whether the obtained node is the first node, if so, return
        if(first.hash == hash && ((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        // If it is not a header, the next node exists
        if((e = first.next) ! =null) {
            // If the node is a red-black tree, the element is retrieved from the red-black tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            If e does not have a next node, the loop is terminated and null is returned
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}// Returns null for a match
    return null;
}
Copy the code

Key1, key2, key3, key1, key2, key3, key1, key2, key3, key1, key2, key3, key1, key2, key3, key1, key2, key3 The reason is that the value of n becomes twice the original value, but the hash value of the node does not change. At this time, the position of the calculated node in the array is consistent with the node movement operation (which can also be understood as recalculating the index in the array) during the expansion. Key1, key2, key3, key1, key2, key3

summary

HashMap has been greatly optimized in 1.8, both in terms of the algorithm for calculating hash values and the change of the underlying data structure from array + linked list to array + linked list/red-black tree
HashMap is thread-safe. If we need a thread-safe key-value store during production development, we can choose ConcurrentHashMap or the Collections synchronizedMap. ConcurrentHashMap is recommended
In order to ensure the performance of HashMap and reduce the performance loss caused by capacity expansion, it is necessary to estimate the size of stored value in advance and set reasonable initial size and capacity expansion factor during the use of HashMap
When using HashMap, we must override the hashcode and equals methods of the key if necessary