The history of a HashMap

HashMap first appeared in JDK 1.2 and hasn’t changed much until JDK 1.7.

But with JDK 1.8, a big change has suddenly been made. One of the most significant changes was:

Prior to JDK 1.7, the storage structure was array + linked list. In JDK 1.8, it was array + linked list + red-black tree.

In addition, HashMap is not thread-safe, and there is no guarantee of data consistency when adding, deleting, or deleting.

The red-black tree is a self-balancing binary search tree, that is to say, the search efficiency of the red-black tree is very high, and the search efficiency will decrease from O (n) of the linked list to O (logn).

HashMap nouns are introduced

The name of the use
initialCapacity Initial capacity of HashMap
loadFactor The load factor
threshold The maximum number of key-value pairs that the current HashMap can hold, beyond which capacity will need to be expanded

Load factor

An important parameter for HashMap is the load factor, which reflects the use of the HashMap bucket array.

By adjusting the load factor, HashMap can behave differently in terms of time and space complexity.

As we lowered the load factor, the logarithmic number of keys that HashMap could hold became smaller.

When you restore the key-value pairs in a new bucket array, the number of collisions between keys decreases, and the length of the linked list becomes shorter. At this point, HashMap operations such as additions, deletions, changes, searches, etc. will become more efficient, which is a typical trade of space for time.

Conversely, if you increase the load factor (which can be greater than 1), the logarithmic number of keys that a HashMap can hold increases, resulting in a higher space utilization, but also a higher collision rate. This means that as the list gets longer, it becomes less efficient, which is a trade of time for space.

As for how to adjust the load factor, this depends on the use of the scene. In general, we’ll just use the default.

Node[] table

Node is an inner class of HashMap that implements the Map.Entry interface, which is essentially a Map (key-value pair).

Node[] table initialization length(default 16),

The Load factor is the Load factor (default is 0.75),

Threshold is the number of nodes (key-value pairs) of the maximum amount of data that a HashMap can hold.

Threshold = length * Load factor.

That is, the larger the load factor, the greater the number of key-value pairs that can be held after the length of the array is defined.

static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; }}

The source code parsing

A method to initialize the threshold

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

To sum it up in one sentence: find the smallest power of 2 greater than or equal to cap.

put(K key, V value)

   public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
That is, the put method actually calls the putVal method. The putVal method takes five arguments: (1) The first argument, hash, is used to calculate the hash value. (2) The second argument, key, is the key we passed in. (3) The third argument, value, is the hash value. This is the value we passed in, which is the 20 (4) fourth parameter onlyIfAbsent: if the keys are the same, the existing value is not modified. (5) The fifth parameter evict: If false, then the array is in create mode, so it is generally true.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; // Node<K,V> p; Int n, I; int n, I; // If the table has not been initialized, initialize the array here and assign the initial size. Recalculate the threshold the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); Else {Node<K,V> e;} else {Node<K,V> e; K k; / / the third part -- A / / p is the current node, if you need to insert the key and the hash values specify the key indices, directly replace the old value of e the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If the data type in the bucket is TreeNode, Else if (p instanceof treeNode) e = ((treeNode <K,V>)p).putTreEval (this, TAB, hash, key, value); Else {// look at.next, if it is an array, it will go in. // There is no threshold for newNode // tree, TREEIFY_THRESHOLD, So it doesn't exist and for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); If (binCount >= TREEIFY_THRESHOLD -1) // -1 for 1st treeifyBin(TAB, hash); break; } / / if the chain has the newly inserted node location data in the table is not empty, e assignment at this time as the value of the node, jump out of the loop if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; }} // If e is not null, then the value inserted above already exists in the current HashMap. If (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; }} // The fourth part is to determine whether the value is greater than the threshold, whether the expansion ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }



From the code point of view, the PUT method can be divided into three cases:

  • The table has not been initialized. The data is initialized
  • The table has been initialized, and the data in the position of the subscript is found to be empty through the hash algorithm, and the data is directly stored to the specified location
  • The table has been initialized, and the data at the position where the index is found by the hash algorithm is not empty. A hash conflict (collision) occurs. After the collision, the following operations will be performed:

    • If the inserted key is equal to the key at the current position, point e to the key pair
    • If the data type in the bucket is TreeNode at this point, insert it using a red-black tree
    • If it is a linked list, then make a circular judgment. If it contains the node in the linked list, break out of the loop. If it does not contain the node in the linked list, then insert the node to the end of the linked list. If the length of the linked list exceeds the TREEIFY_THRESHOLD and the table capacity exceeds the MIN_TREEIFY_CAPACITY, then the linked list is converted to a red-black tree (because the smaller the table capacity is, the more likely hash conflicts will occur. Therefore, when the table capacity is

      TREEIFY_THRESHOLD, capacity expansion will be preferred; otherwise, the list will be converted to red-black tree.)
      ,>

Small knowledge before expansion

How to calculate the Hash value?

static final int hash(Object key) {
     int h;
     return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Operates with the XOR bits of 16.

In the data structure, we often use methods to deal with hash conflicts, such as developing addressing method, re-hashing method, chain address method, and establishing common overflow area. The method of handling hash conflicts in HashMap is chain address method.

“Chain”, it is easy to think of, because the common operation such as deletion and modification, so more efficient.

InitialCapacity (the number closest to the NTH power of 2 you enter), LoadFactor Load Factor (0.75, Poisson distribution).

/* ---------------- Public operations -------------- */ /** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); } /** * Constructs an Empty < TT >HashMap</ TT > with the specified initial * capacity and the default load factor (0.75). *  * @param initialCapacity the initial capacity. * @throws IllegalArgumentException if the initial capacity is negative. */ public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } /** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor */ public HashMap() {this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted } /** * Constructs a new <tt>HashMap</tt> with the same mappings as the * specified < TT >Map</ TT >. The < TT >HashMap</ TT > is created with * default load factor (0.75) and an initial capacity sufficient to * hold the mappings in the specified <tt>Map</tt>. * * @param m the map whose mappings are to be placed in this map * @throws NullPointerException if the specified map is null */ public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); }

Expanding the resize ()

/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, Or move * with a power of two offset in the new table. * * @return the table */ / If it is empty, it is allocated based on the initial capacity target held in the field threshold. Otherwise, because we are using a power of 2 extension, the elements in each bin must keep the same index or move in the new table by an offset of a power of 2. final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; //threshold int newCap, newThr = 0; // If the maximum value of setting the threshold directly to an integer is exceeded, no, then it is doubled by shift operation. If (oldCap >= MAXIMUM_CAPACITY) {threshold = Integer.max_value; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap bb0 = DEFAULT_INITIAL_CAPACITY) // if the old capacity is not less than the default initial capacity, NewThr = oldThr << 1; newThr = oldThr << 1; // double threshold} // The second part // If initialized, then use the existing threshold // Otherwise, Else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; // if both threshold and table are uninitialized, Else {// zero initial threshold signifies using defaults NewCap = DEFAULT_INITIAL_CAPACITY; newCap = DEFAULT_INITIAL_CAPACITY; newCap = DEFAULT_INITIAL_CAPACITY; // Threshold = InitialCapacity *loadFactor newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // update threshold = newThr; @suppressWarns ({" rawTypes ","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; If (oldTab!), the hash value will change due to the size of the table. If (oldTab!), the hash value will change. = null) {for (int j = 0; j < oldCap; ++j) { Node<K,V> e; If ((e = oldTab[j])! OldTab [j] = null; oldTab[j] = null; If (e.ext == null) // store data directly under the newly computed hash value newTab[e.ash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e). Split (this, newTab, j, oldCap); <K,V> loHead = null, loTail = null; <K,V> loHead = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; Do {next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

The resize is divided into the following steps:

  • First of all, determine whether the table has been initialized. If not, this will solve the problem that Threshold and InitialCapacity have not been initialized when calling the no-argument constructor method. If they have been initialized, then expand the capacity to double the original capacity
  • After capacity expansion, create a new table and traverse all the data

    • If the newly computed location data is empty, insert it directly
    • If the position of the new calculation is a linked list, the index is recalculated using the hash algorithm to group the linked lists
    • If you have a red-black tree, you need to split it

Get method, find

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {// Use the hash algorithm to find the first data in the corresponding position. If the key is specified, Is returned directly if (first hash = = hash && / / always check first node ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; if ((e = first.next) ! If (first instanceof treeNode) return ((treeNode <K,V>)first).getTreeNode(hash, key); / / if the node is a linked list, the traversing the data to find the do {the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }

The hash code

 static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

This code is called the perturbation function, which is also the hash operation in a HashMap, and is divided into the following steps:

  • Key. HashCode (), which gets the key’s hashCode value, returns an int based on the memory address if it is not overridden
  • The hashCode obtained by key.hashCode() is shifted unsigned 16 bits to the right and ^ with the meta hashCode. The purpose of this is to mix the high and low values, allowing both to participate in the calculation, so that the hash values are distributed more evenly

Modular operation (n-1) & hash

In the HashMap code, you’ll see similar code in a number of places:

first = tab[(n - 1) & hash]) 

In the hash algorithm, in order to distribute elements more evenly, many use the modulo operation,

In HashMap, instead of using hash%n as the modulo operation, (n-1) & hash is used instead.

The reason is that in a computer, ampersand is much more efficient than %.

Note that (n-1) & hash is equivalent to hash%n only if the capacity is 2 to the NTH power, which is why any value passed in to the initial capacity of HashMap will be converted to 2 to the NTH power via the tableSizeFor(int cap) method. This clever design is really amazing.

For those who are interested, take a look at the article written by one of the great men: the problem of the sum of the remainder % and the operation & conversion arising from the HashMap hash algorithm

Remove method, remove

public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; } final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; If ((TAB = table)); if (TAB = table); = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) ! = null) { Node<K,V> node = null, e; K k; V v; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) node = p; else if ((e = p.next) ! = null) { if (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreeNode(hash, key); else { do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) ! = null); }} // If the node element is found, remove it if (node! = null && (! matchValue || (v = node.value) == value || (value ! = null && Value.equals (v)))) {// If TreeNode is null, RemoveReenode (this, TAB, movable) if (node instanceof TreeNode) ((TreeNode<K,V>)node). RemoveReenode (this, TAB, movable); Else if (node == p) TAB [index] = node.next; Else // If it is not the first node in the list, remove the node p.next = node.next; ++modCount; --size; afterNodeRemoval(node); return node; } } return null; }

List the tree

static final int TREEIFY_THRESHOLD = 8; /** * static final int MIN_TREEIFY_CAPACITY = 64; /** * static final int MIN_TREEIFY_CAPACITY = 64; static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // red-black tree links TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); }} /** * Final void treeifyBin(Node<K,V>[] TAB, int hash) {int n, index; Node<K,V> e; / / bucket array capacity is less than MIN_TREEIFY_CAPACITY, priority if an expansion rather than tree (TAB = = null | | (n = TAB. Length) < MIN_TREEIFY_CAPACITY) the resize (); else if ((e = tab[index = (n - 1) & hash]) ! TreeNode<K,V> HD = null, Tl = null; TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) ! = null); // if ((TAB [index] = HD)! = null) // Convert the tree to hd.treeify(TAB); } } TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) { return new TreeNode<>(p.hash, p.key, p.value, next); }

In the process of capacity expansion, treeing must meet two conditions:

  1. The length of the linked list is greater than or equal to TREEIFY_THRESHOLD
  2. The bucket array capacity is greater than or equal to MIN_TREEIFY_CAPACITY

Red-black tree chaining and splitting

Chain,

final Node<K,V> untreeify(HashMap<K,V> map) { Node<K,V> hd = null, tl = null; // Replace for (Node<K,V> q = this; q ! = null; Q = q.next) {// Node<K,V> p = map.replacementNode(q, null); if (tl == null) hd = p; else tl.next = p; tl = p; } return hd; } Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) { return new Node<>(p.hash, p.key, p.value, next); }

Break up

Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) { TreeNode<K,V> b = this; // Relink into lo and hi lists, preserving order TreeNode<K,V> loHead = null, loTail = null; TreeNode<K,V> hiHead = null, hiTail = null; int lc = 0, hc = 0; /* * The next reference is still retained by the red-black tree node, so you can still traversal the red-black tree as a linked list. */ for (treeNode <K,V> e = b, next; e ! = null; e = next) { next = (TreeNode<K,V>)e.next; e.next = null; if ((e.hash & bit) == 0) { if ((e.prev = loTail) == null) loHead = e; else loTail.next = e; loTail = e; ++lc; } else { if ((e.prev = hiTail) == null) hiHead = e; else hiTail.next = e; hiTail = e; ++hc; } } if (loHead ! If (lc <= UNTREEIFY_THRESHOLD) TAB [index] = lohead.untreeify (map); if (lc <= untreeify(map); else { tab[index] = loHead; If (hiHead! == null); if (hiHead! == null); if (hiHead! == null); = null) loHead.treeify(tab); }} // similar to above if (hiHead! = null) { if (hc <= UNTREEIFY_THRESHOLD) tab[index + bit] = hiHead.untreeify(map); else { tab[index + bit] = hiHead; if (loHead ! = null) hiHead.treeify(tab); }}}

When converting a normal linked list to a red-black tree, HashMap preserves the order of the nodes in the original linked list with two additional references next and prev.

In this way, when remapping the red-black tree, it can be done in the way of mapping linked lists. In this way, the red-black tree is avoided to be transformed into a linked list and then mapped, virtually improving the efficiency.

As you can see from the source code, the logic of remapping red-black trees is basically the same as the logic of remapping linked lists. The difference is that after remapping, the red-black tree is split into two linked lists made up of TreeNodes.

If the length of the list is less than UNTREEIFY_THRESHOLD, the list is converted to a normal linked list. Otherwise, the treeNode list is re-treed according to the condition.

Table variable modified by transient

It’s easy to understand what this keyword does,

In a simple sentence: add a keyword transient to a property that does not need to be serialized. When an object is serialized, the property will not be serialized.

    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

Instead of using the default serialization mechanism, HashMap customizes the serialized content by implementing the readObject/writeObject methods.

There are two problems with serializing Talbe:

  1. In most cases, the table will not be full, serializing unused parts and wasting space
  2. The same key-value pair may be in different buckets under different JVMs, and deserialization of the table under different JVMs may result in an error.

The first step of the get/put/remove method of HashMap is to find the bucket position of the key according to the hash.

However, if the key does not override the hashCode method, the hashCode method in the Object is eventually called when the hash is computed. However, the hashCode method in Object is native, and different JVMs may have different implementations and generate different hashes. This means that the same key in different platforms may produce different hashes, and then continue to operate in the same table, there will be a problem.

conclusion

  • The underlying data structure of HashMap was composed of arrays + linked lists before JDK1.7, and red-black trees were added after 1.8. List length less than 8, after the Hash collision will increase the length of the list, when the list length is more than 8, will first interpretation of the array capacity, if the capacity is less than 64 will be expanded, the reason is array capacity is smaller, the more prone to collisions, so when the capacity is too small, the first thing to consider is expansion), if the capacity is more than 64, The list will be converted to a red-black tree for efficiency.
  • The capacity of a HashMap is the power of 2 to the NTH. No matter what initial capacity is passed in during initialization, it will eventually be converted to the power of 2 to the NTH. The reason for this is that the & operator can be used in the modulo operation instead of % mod, which can greatly improve the efficiency and reduce hash collisions.
  • HashMap is not thread-safe. Exception may occur in multi-threaded operations (e.g., closed loop (1.7), closed loop has been fixed in 1.8, but it is still not safe). Hashtable or ConcurrentHashMap can be used instead

There are many more things like HashMap than can be covered in one article, but there are many more that we need to explore together.

Reference link:

HashMap source code analysis (JDK 1.8, guarantee you can understand) – Zhihu (zhihu.com)

Java 8 Series – HashMap – Zhihu (zhihu.com)

Lots of time to look at this, especially full: HashMap Source Detail Analysis (JDK1.8) – SegmentFault Think No

HashMap source code analysis — Get a HashMap interview in one article (juejin.cn)