Learn more about HashMap

Introduction of a HashMap

A HashMap is a data structure commonly used in everyday development. In JDK 1.8, HashMap is implemented as an array + linked list + red-black tree. This article is based on JDK 1.8 source code for interpretation.

static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
    / /...
}
Copy the code

The base Node is a singly linked list, stored in the Node

[] TABLE array. When certain conditions are met, red-black tree is converted to avoid inefficient query with long linked list.
,v>

A constructor

1. By default, the loadFactor method is set to DEFAULT_LOAD_FACTOR (0.75F). LoadFactor affects threshold, which triggers expansion when a percentage of the maximum table capacity is reached. In general, 0.75f is a more suitable value after verification.

2. The constructor for the specified capacity is calculated against the initialCapacity passed in

/** * Returns a power of two size for the given target capacity. */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

TableSizeFor changes the low bits of incoming data to 1. If 1 is added, the data becomes a power of 2. The returned capacity is assigned to threshold.

put

The put method computes the hash of the key and then calls putVal to actually store the data.

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

Hash Uses the hash value of the key to obtain xOR from the highest 16 bits to reduce hash collisions.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
      // Initialize the TAB for the first time
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
      // There is no data in the corresponding position of the TAB label, create a new Node to store, use the method to create, easy to rewrite
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;// The first node has the same key as the inserted element
        else if (p instanceof TreeNode)
          // Try to insert into a red-black tree node
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) 
                      // The list length is too long, try converting red black tree
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
          // Reserve a burial point for LinkedHashMap
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
      // The critical point of capacity expansion is reached
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

The putVal method first checks whether the current TAB is initialized. If not, the resize method is called to initialize the TAB.

The subscript is (n-1) &hash. If there is no data in the TAB array, create a new node and store it. If the corresponding node is a red-black tree node, then try to store this data into the red-black tree. If there is data, and the node is an ordinary list node, the list is traversed, storing the data to the end of the list. If the list length reaches TREEIFY_THRESHOLD (8), the conversion is attempted to a red-black tree. Take a look at the treeifyBin method:

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
  // If the TAB length is smaller than MIN_TREEIFY_CAPACITY (64), capacity expansion is preferred to avoid too many collisions
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

You can see that the red-black tree conversion is performed when the list length is >=8 and the array length is >=64. After the putVal method operates on the data, it calls some buried methods, which provide the basis for the LinkedHashMap implementation.

resize

The resize method is a more important method when storing data, because the source code is relatively long, here is a separate analysis:

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
  // If the previous TAB capacity is not 0, it has been initialized
    if (oldCap > 0) {
      // The size of the array reaches the upper limit
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
          // If it can be expanded, it can be doubled
            newThr = oldThr << 1; 
    }
    else if (oldThr > 0) 
      // The capacity is set in the constructor
        newCap = oldThr;
    else {             
      // The uninitialized array is set to the default size 16
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
      // Capacity expansion is defined as array capacity *loadFactor
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
      // After capacity expansion, retrieve the subscript of the old data and store it
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                  // If there is no other node in the corresponding position, just take the new index and store it
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                  // For red-black tree nodes
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { 
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                      // The subscript of this object does not change from the original array
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                          // The subscript of this object is compared to the original array +oldCap
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

To read more, the resize method evaluates the size of the old array and tries to expand it if it has already been initialized. If the array capacity has reached the upper limit, it will not continue to expand, or it will try to expand to twice the previous size. If the constructor has a custom capacity, the new capacity is set to that capacity and the critical capacity is recalculated. If the default constructor is used, the array size is initialized to 16.

After capacity expansion, instead of simply copying the data from the existing array into the new array, the new index will be fetched again. If the node at the previous subscript position has no children, it is saved directly after fetching the new node. If there are child nodes, the processing becomes more interesting.

Why is the size of a table raised to a power of two?

This is a clever design in JDK 1.8, because the capacity is a power of 2, so in binary terms, tabCap values are 0 except for high values, and the low values are 1 after -1, and the subscript is hash & (n-1). Take the table initial capacity 16 as an example, assuming that the hash value of one key is 11001100 and the hash value of the other is 11011100, calculate the subscripts respectively:

int index0 = 11001100 & 1111;//hash & (n -1) = 1100
int index1 = 11011100 & 1111;//hash & (n -1) = 1100
Copy the code

In the old array, the two keys had different hash values, but the index was the same, 12. If the array is expanded, the capacity becomes 32, and the subscript becomes the following:

int index0 = 11001100 & 11111;//hash & (n -1) = 1100
int index1 = 11011100 & 11111;//hash & (n -1) = 11100
Copy the code

You can see that index0 will not change to 12, but index1 will change to 28.

This is where you see the ingenuity of designing an array with a power of two. When the array is expanded, we only need to care whether the hash bit in oldCap is 0 or 1 to confirm the new subscript. If it is 0, the new subscript will not change; if it is 1, the new subscript will be added to the old subscript, which can solve the problem of hash collision in the original array. It also avoids too much computation, which is really neat.

Split of TreeNode nodes

The nodes of red-black tree will also be cut during capacity expansion, and new trees will be generated based on different hash computations. The subscript fetching logic is the same as the method mentioned above.

if(loHead ! =null) {
    if (lc <= UNTREEIFY_THRESHOLD)
        tab[index] = loHead.untreeify(map);
    else {
        tab[index] = loHead;
        if(hiHead ! =null) // (else is already treeified)loHead.treeify(tab); }}if(hiHead ! =null) {
    if (hc <= UNTREEIFY_THRESHOLD)
        tab[index + bit] = hiHead.untreeify(map);
    else {
        tab[index + bit] = hiHead;
        if(loHead ! =null) hiHead.treeify(tab); }}Copy the code

After cutting the red-black tree, it will determine the node tree of the new tree. If the node tree is less than UNTREEIFY_THRESHOLD (6), it will be converted to the linked list again.

get

Get method is relatively simple, in the implementation of a data structure is generally more difficult to insert and delete operations, here is a brief introduction:

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code

The get method actually calls the getNode method to find the element:

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        if((e = first.next) ! =null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

GetNode retrieves the index of the array based on the hash and then finds the corresponding node, but the hash value is the same, the key is not the same, so it calls equals to match. For TreeNode red-black tree nodes, the binary search tree features of the red-black tree are used for quick search. For common linked list nodes, the matching can only be traversed one by one. The introduction of red-black trees is also a change in JDK 1.8 from previous releases.

remove

Remove is generally a difficult method to implement in data structures. In a HashMap, calling the remove method actually calls the removeNode method to operate:

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) ! =null) {
        Node<K,V> node = null, e; K k; V v;
      // First try to find the element to delete
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            node = p;
        else if((e = p.next) ! =null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while((e = e.next) ! =null); }}if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
          // Delete nodes of different types
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            returnnode; }}return null;
}
Copy the code

The remove method of a HashMap doesn’t look complicated. It first looks for the existence of a node and then deletes it if it finds one. If the node is a linked list type, the deletion logic is relatively simple, just change the reference. In fact, HashMap’s more complicated insert and delete logic is in the TreeNode red-black TreeNode, I won’t go into detail here.

traverse

A HashMap can be traversed by keySet, values, or entrySet. The keySet method traverses directly

Values iterate over the values stored in the HashMap, and entrySet retrieves information about each node stored in the HashMap.

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next(a) { returnnextNode().key; }}final class ValueIterator extends HashIterator
    implements Iterator<V> {
    public final V next(a) { returnnextNode().value; }}final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K.V>> {
    public final Map.Entry<K,V> next(a) { returnnextNode(); }}Copy the code

By looking at the source code, we can see that the implementation of the next method for the iterators of these three methods is based on the nextNode method, so we only need to care about the implementation of this method.

final Node<K,V> nextNode(a) {
    Node<K,V>[] t;
    Node<K,V> e = next;
  // Check for concurrent changes
    if(modCount ! = expectedModCount)throw new ConcurrentModificationException();
    if (e == null)
        throw new NoSuchElementException();
  // If the next list node is not empty next
    if ((next = (current = e).next) == null&& (t = table) ! =null) {
      // Take the next node from the array
        do {} while (index < t.length && (next = t[index++]) == null);
    }
    return e;
}
Copy the code

The traversal method is also uncomplicated, first determining whether concurrent changes are made and then whether the current node is empty. If it is not empty, it tries to get the next element of the node; if it cannot get it, it goes through the number group and tries to find the next node that is not empty.

Why traversal HashMap when no iterator method of remove removing elements, but call a HashMap the remove method can cause abnormal concurrent modification (ConcurrentModificationException)? First, when the iterator is initialized, the modCount of the current HashMap is saved as expectedModCount, which is tested each time the next element is traversed. If the remove method of HashMap is called directly, the modCount will change. However, the expectedModCount in the iterator does not change, which is when the exception occurs. Calling the iterator’s remove method to remove the element updates the expectedModCount after the deletion so there is no error.

Performance analysis

The two most commonly used basic maps in the JDK are HashMap and TreeMap.

First, the implementation of HashMap is based on array + linked list + red-black tree implementation. Using the hash function to compute the subscript, you can quickly get the root node in the array. Short nodes are stored in linked lists, which can be inserted and deleted efficiently. For long nodes, red-black tree storage can solve the problem of low query efficiency when the linked list is too long.

The bottom layer of TreeMap uses red-black tree to achieve this. Red-black tree is a balanced binary search tree, so we need to store keys that can be sorted. Stored elements can be retrieved in order through a middle-order traversal. However, because red-black trees need to be balanced, there is a performance loss due to the insertion and deletion operations required to maintain them.

Therefore, when we don’t need to care about the order of the stored elements, we can basically choose HashMap, and if we need to get sorted elements, we can choose TreeMap.

Thread safety

HashMap is not thread-safe, and there are thread-safe issues when manipulating a HashMap on different threads. If multiple thread operations are required, ConcurrentHashMap should be selected. ConcurrentHashMap locks the PUT and remove operations, but the GET method does not, so the read operation is not affected by performance.