How does HashMap work in Java?

It’s based on Hash.

2 What is a hash?

Hash, in its simplest form, is a way of assigning a unique code to any variable/object after applying any formula/algorithm to its properties.

A true hash method must follow the following principles:

A hash function should return the same hash code each time it applies a hash function to the same or equal object. In other words, two equal objects must consistently generate the same hash code.

All objects in Java have Hash methods, and all objects in Java inherit the default implementation of the hashCode() function defined in the Object class. This function typically generates a hash code by converting the internal address of an object to an integer, thus generating a different hash code for all different objects.

Node class in the HashMap

A Map is defined as an object that maps a key to a value.

Therefore, there must be some mechanism in the HashMap to store this key-value pair. The answer is yes. HashMap has an inner class, Node, like this:

static class Node<K,V> implements Map.Entry<K,V> { final int hash; // record the hash value so that the final K key is not recalcated; V value; Node<K,V> next; . // the rest of the code}

Of course, the Node class has mappings of keys and values stored as properties.

The key has been marked final, and there are two other fields: next and hash.

In the following sections, we will understand the necessity of these attributes.

How are 4-key-value pairs stored in a HashMap

The key-value pairs are stored in a HashMap as an array of Node inner classes, as shown below:

transient Node<K,V>[] table;

Once the hash code is computed, it is converted to the index of the array, where the key and value pairs corresponding to the hash code are stored. I won’t go into the hash collision details here.

The length of the array is always a power of two, and this is done with the following function

static final int tableSizeFor(int cap) { int n = cap - 1; / / if you don't do this operation, such as the incoming cap is the integer power of 2, the return value is twice the expected n | = n > > > 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }

The principle is to change all the low binary of the passed parameter (cap) to 1, and finally add 1 to get the corresponding power of 2 greater than cap as the array length.

Why would you use a power of two as the size of an array?

There is a hash function involved in HashMap and the calculation of the array index. The hash code calculated by the key may be larger than the size of the array, so what should we do?

It can be obtained by simple complementary operation, but this method is inefficient. HashMap uses the following methods to ensure that the hash value is less than the size of the array.

(n - 1) & hash

And that’s exactly why we need a power of two for the size of the array. Since n is a power of 2, n-1 acts like a low order mask.

Through the and operation, all the high-order hash values are zeroed to ensure that the low-order values are valid, so as to ensure that the values obtained are all less than n. In the meantime, recalculating the array subscript for each Node during the next resize() operation will be easy, as we’ll see later.

Take the default initial value of 16 as an example:

01010011 00100101 01010100 00100101 & 00000000 00000000 00001111 ---------------------------------- 00000000 // This ensures that the computed value is less than the length of the array n

However, after using this feature, hash collisions will become correspondingly severe due to the low order value. And that’s where we need to use the perturbation function.

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This function is obtained by XOR of the high 16 bits of the hash code by moving it to the right, as shown in the example above

Exclusive or

This method ensures that the high 16 bits remain the same and the low 16 bits change according to the result of XOR. The index of the computed array will change from 5 to 0.

With the “perturbation function”, the probability of hash collisions will decrease. Similar tests have been done specifically, and although using this “perturbation function” does not achieve the maximum probability of avoiding hash collisions, the JDK uses this method and only hashes once due to its computational performance and probability of collisions.

Hash collision and its processing

In an ideal world, the hash function maps each key to a unique bucket, however, this is not possible. Even with a well-designed hash function, there will be hash collisions.

Many methods of resolving hash conflicts have been studied before. In Wikipedia, four categories have been summarized

Hash collision solution

In Java’s HashMap, the first Separate chaining method (mostly translated as zipper method)+ linked lists and red-black trees is used to resolve conflicts.

HashMap results in JDK8

In a HashMap, the hash collisions pass through the Node class’s internal member variables Node

next; To form a linked list (nodes less than 8) or a red-black tree (nodes greater than 8 will be converted to a linked list if they are less than 6), thus resolving conflicts.
,v>

static final int TREEIFY_THRESHOLD = 8;

static final int UNTREEIFY_THRESHOLD = 6;

6 HashMap initialization

public HashMap();
public HashMap(int initialCapacity);
public HashMap(Map<? extends K, ? extends V> m);
public HashMap(int initialCapacity, float loadFactor); 

There are four constructors in a HashMap, most of which are operations that initialize capacity and load factors. Take public HashMap(int InitialCapacity, Float LoadFactor) as an example

public HashMap(int initialCapacity, Float LoadFactor) {// InitialCapacity < 0 if (InitialCapacity < 0) throw new IllegalArgumentException(" IllegalArgumentException(" IllegalArgumentException("Illegal Initial ") capacity: " + initialCapacity); InitialCapacity = MAXIMUM_CAPACITY if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; / / load factor not less than 0 if (loadFactor < = 0 | | Float. The isNaN (loadFactor)) throw new IllegalArgumentException (" Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }

The capacity and load factor are initialized with this function. If other constructors are called, the corresponding load factor and capacity will use the default values (default load factor = 0.75, default capacity = 16).

At this point, the storage container table has not been initialized, which will be delayed until the first use. HashMap Interview 21 Ask! You can read the recommendation of the interview. Follow the public number JAVA technology stack reply interview for more interview information.

Initialization or dynamic scaling of hash tables in a HashMap

The hash table refers to the following table variable of type Node inner class.

transient Node<K,V>[] table;

As an array, it is initialized with a length specified. In actual use, we may store more than that, so a threshold parameter is defined in the HashMap that needs to be expanded when the size of the storage reaches the specified threshold.

I personally think that initialization is also a form of dynamic scaling, except that the scaling is from 0 to the value in the constructor (default 16). And there is no need to re-hash the elements.

7.1 Conditions for capacity expansion

Initialize as long as the value is empty or the array length is zero. Scaling is triggered when the number of elements is greater than the threshold.

threshold = loadFactor * capacity

For example, the default loadFactor=0.75 and capacity=16 in HashMap

Threshold = loadFactor * capacity = 0.75 * 16 = 12

So when the number of elements is greater than 12, it expands. The capacity and threshold after expansion will also change accordingly.

The load factor affects the trigger threshold, so when its value is small, there are fewer hash collisions in the HashMap, and the access performance is high, with the corresponding disadvantage of requiring more memory. When its value is large, there are many hash collisions in the HashMap, and the performance of the access is relatively low, which has the corresponding advantage of requiring less memory. It is not recommended to change this default value. If you do, it is recommended to determine after testing accordingly.

7.2 Let’s talk about integer powers of capacity 2 and array index calculation

We said that the size of the array is an entire power of two, and that the index of the array is evaluated by the following code

index = (table.length - 1) & hash

In addition to being able to calculate the index of the array quickly, this method can also be very clever to calculate the new hash value when re-hashing after enlarging the size. Since the size of the array is twice as large as it is now, the size of the array will have one more significant bit than the original size, and the extra bit is in the same position as the original size binary. The sample

Before and after expansion

This will allow you to calculate the new index very quickly

Step 7.3

  1. First, determine whether to initialize or expand, which will be different when calculating newCap and newThr
  2. Calculate the capacity after expansion, critical value.
  3. Change the threshold value of the HashMap to the scaled-up threshold value
  4. Create a new array based on the expanded size, and then point a reference to the table of the HashMap to the new array.
  5. Copies the elements of the old array into the table. In this process, several cases are involved and need to be handled separately (there is only one element, general linked list, red-black tree).

See the specific code

Final Node<K, V>[] resize() {final Node<K, V>[] resize() {final Node<K, V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; int oldThr = threshold; int newCap, newThr = 0; // If (oldCap >= MAXIMUM_CAPACITY) {// If (oldCap >= MAXIMUM_CAPACITY) {// If (oldCap >= MAXIMUM_CAPACITY) Integer.MAX_VALUE; Return oldTab; return oldTab; If (newCap = oldCap << 1) < MAXIMUM_CAPACITY &&) if (newCap = oldCap << 1) < MAXIMUM_CAPACITY &&) if (newCap = oldCap << 1) < MAXIMUM_CAPACITY && NewThr = oldThr << 1; newThr = DEFAULT_INITIAL_CAPACITY; } else if (oldThr > 0) // if (oldThr = 0) // if (oldThr = 0) // if (oldThr = 0) Else {// If the old capacity <= 0 and the old threshold <= 0, the new capacity expands to the default initialization capacity, New threshold DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY newCap = DEFAULT_INITIAL_CAPACITY; NewThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); If (newThr == 0) {newThr == 0; newThr == 0; newThr == 0; Float ft = (float) newCap * loadFactor; float ft = (float) newCap * loadFactor; MAXIMUM_CAPACITY = float MAXIMUM_CAPACITY = float MAXIMUM_CAPACITY; MAX_VALUE newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY? (int) ft : Integer.MAX_VALUE); } // Set the threshold value of HashMap to newThr threshold = newThr; // create a new table; NewCap @suppressWarns ({" rawTypes ", "unchecked"}) Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap]; // Change the HashMap table to newTab table = newTab; // If the old table is not empty, copy the elements from the old table into the new table if (oldTab! = null) {for (int j = 0; for (int j = 0; j < oldCap; ++j) { Node<K, V> e; If ((e = oldTab[j])! OldTab [j] = null) {oldTab[j] = null; If (e.ext == null) // place e (oldTab[j] into newTab where e.ash = (newcap-1) newTab[e.ash (newcap-1)] =;  // Split (this, newTab, j, oldCap); // Split (this, newTab, j, oldCap); Else {// Node<K, V> loHead = null, loTail = null; Node<K, V> hiHead = null, hiTail = null; Node<K, V> next; // do {next = e.ext; If (e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else {// oldCap if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); If (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // drop index +oldCap into bucket if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

7.4 Precautions

While HashMap is well designed, you should avoid resize() as little as possible, which can be a time-consuming process.

Also, because HashMap does not automatically shrink the size. So, if your HashMap has a lot of capacity, but you perform a lot of remove operations, the capacity won’t decrease. If you feel you need to reduce the capacity, recreate a HashMap.

8 How does the HashMap.put() function work internally?

After using a HashMap many times, you can see how it adds elements: calculate the hash value of each key, calculate its position in the hash table after some calculation, place the key-value pair in that position, and perform a hash collision if there are hash collisions.

And here’s how it works (I saved it a long time ago, I forgot where it came from)

The source is as follows:

/* @Param Hash * @Param Key * @Param Value * @Param OnlyiFab * @Param OnlyiFab If the value is false, the array table in create mode * @return returns the old value if the value is replaced, otherwise returns null. Of course, maybe the value of the key is null. */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K, V>[] tab; Node<K, V> p; int n, i; // If the hash table is empty, call resize() to create a hash table, With variable n record length of hash table if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; * hash function, (n-1) &hash calculates the slot in which the key will be placed * (n-1) &hash is essentially hash % n, */ if ((p = TAB [I = (n-1) & hash]) == null) // Tab [I] = newNode(hash, key, value, null); Else {// Node<K, V> e; K k; / / is the first element in the bucket (nodes) in the array of the hash values are equal, the key is equal to the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = NULL && key.equals(k))))) // Assign the first element to e and use e to record that e = p; // There is no key-value pair in the current bucket, and the bucket is a red-black tree. Else if (p instanceof treeNode) e = ((treeNode <K, V>) p).putTreEval (this, TAB, hash, key, value); Else {for (int binCount = 0; int binCount = 0; ; ++binCount) {if ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); If (binCount >= TREEIFY_THRESHOLD -1) // -1 for 1st treeifyBin(TAB, hash); break; } / / of the linked list node < key, value > < key, value > with the put operation phase at the same time, do not repeat operation, jump out of the loop if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; }} // Find or create a key and hashCode pair equal to the inserted element and put if (e! = null) {// Existing mapping for key // Record e's value V oldValue = e.value; /** * If onlyIfAbsent is false or the old value is null, it is allowed to replace the old value * otherwise no replacement is required */ if (! onlyIfAbsent || oldValue == null) e.value = value; // after access callback afterNodeAccess(e); // return oldValue; }} // update structured change information ++modCount; // Rehash if (++ Size Threshold) resize(); Insertion(); // new insertion (); return null; }

In this process, hash collision resolution is involved.

9 How does the HashMap.get() method work internally?

/** * Returns the value of the specified key mapping. If the value is null, then return null. * GET is divided into three steps: * 1. The hash of the key is calculated using the hash(Object key) method. * 2. Get node via getNode(int hash, Object key) method. * 3. Returns null if node is null, otherwise returns node.value. * * @see #put(Object, Object) */ public V get(Object key) { Node<K, V> e; Return (e = getNode(hash(key), key)) == null? null : e.value; }

The end result is a call to the getNode function. The logic is as follows

GetNode working logic

The source is as follows:

/** * @Param hash () * @Param key () * @Param key () * @Param return (); Null */ final Node<K, V> getNode(int hash, Object key) {Node<K, V>[] TAB; Node<K, V> first, e; int n; K k; If (TAB = table); if (TAB = table) if (TAB = table); = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {/ / if the first node with the specified parameters of the barrel on the hash and key matching the if (first. The hash = = hash && / / always check first node ((k = first. Key) = = key | |  (key ! = null && key.equals(k)))) // return first; If ((e = first.next)); if (e = first.next); = null) {// If the current bucket uses a red-black tree, If (first instanceof treeNode) return ((treeNode <K, V>) first).getTreeNode(hash, key); / / if the current barrel did not use a red and black tree, the node structure in the bucket to the chain structure do {/ / traverse the list until the key matches the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); }} // If the hash table is empty, or no node is found, return null; }

Author: A Jin’s Writing Desk

Reference: https://www.cnblogs.com/homejim/