Learn about the interview HashMap data structure in super detail!

Write in front:

Hello, everyone! Today we are going to learn about HashMap as a must-ask in a job interview.

Mind mapping:

1. Introduction to the HashMap collection

HashMap is a Map interface implemented based on hash tables. It is a key-value store, that is, it is used to store key-value pairs. The implementation of HashMap is not synchronous, which means it is not thread-safe. Both key and value can be null. In addition, the mappings in a HashMap are not ordered.

Prior to JDK1.8, a HashMap consists of arrays + lists, which are the body of a HashMap, and lists are primarily used to save hash collisions **(” zip “resolving conflicts)**.

A major change in hash conflict resolution has been made since JDK1.8, when the list length is greater than the threshold (or the red-black tree boundary value, which defaults to 8) and the current array length is greater than 64, then all data at that index location is stored in a red-black tree instead.

The array is filled with instances of key-values called Entry before JDK1.8 and Node after JDK1.8.

Since the key and value are both null, an index value is calculated based on the hash of the key during insertion. The index is calculated as follows:

/ * ** The process of finding index by key* 1, first compute the hash value using key* /
static final int hash(Object key) {
 int h;  return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16); } Index = (n-1) &hash (n is the length of the array) int hash=hash(key); index=(n-1)&hash;  Copy the code

The Hash algorithm here essentially consists of three steps: taking the hashCode value of the key, the high level operation, and the modulus operation.

If put(“A”, hash(“A”)= hash(“A”), hash(“A”)= hash(“A”), hash(“A”)= hash(“A”), hash(“A”)= hash(“A”))

So what’s a linked list in a HashMap for?

We all know that the length of an array is finite. When a hash function is used to calculate the value of index within a finite length, it is possible to insert different k values but generate the same hash (also called hash collision), which is the probability of hash functions. Just like the above element with K of A, if you insert another element with K of A, it is likely to produce an index of 3, i.e. hash(” A “)=3; So this is a linked list, and this hash collision solution is also called the zipper method.

Insert different elements in the same place

When the list length is greater than the threshold 8 and the array length is greater than 64, the list is turned into a red-black tree.

Supplement:

Before converting the list to a red-black tree, it is determined that if the threshold is greater than 8, but the array length is 64 smaller, the list will not be turned into a red-black tree. Instead, choose array expansion.

And the whole point of doing that is because the array is small, so if you want to avoid red black trees, if you want to change to red black trees, it’s going to be inefficient, because red black trees need to do things like left rotation, right rotation, color change to maintain balance. When the length of the colleague array is less than 64, the search time is relatively fast. In order to improve performance and reduce search time, the linked list is converted to a red-black tree only when the threshold value is greater than 8 and the array length is greater than 64. For details, see the treeifyBin method.

Of course, although the red black tree is added as the underlying data structure, the structure becomes complicated, but when the threshold value is greater than 8 and the array length is greater than 64, the efficiency becomes more efficient when the linked list is converted into a red black tree.

Features:

Access unordered
Both key and value positions can be NULL, but only one key position can be NULL
The key position is unique to the underlying data structure that controls the key
Linked list + array + red-black tree
The list is converted to a red-black tree only when the threshold (boundary value) is greater than 8 and the array length is greater than 64. The purpose of red-black tree is efficient query.

2. HsahMap underlying data structure

2.1. Process of storing data in HashMap

Each Node Node contains the key, value, hash value calculated for the key and value pairs, and holds a reference to the next Node (next = null if there is no next Node).

static class Node<K.V> implements Map.Entry<K.V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
. } Copy the code

A HashMap stores data using the PUT () method, which will be explained in the next section.

public static void main(String[] args) {
        HashMap<String,Integer> hmap=new HashMap<>();
        hmap.put("Spot".55);
        hmap.put("Mirror".63);
        hmap.put("Zone".25);
 hmap.put("Skunk".9);  hmap.put("Sasuke.".43);  hmap.put("Spot".88);  System.out.println(hmap);  } Copy the code

When creating a HashMap collection object, prior to JDK1.8, the constructor created a number of 16-length Entry[] tables to store key-value pairs. After JDK1.8, instead of creating arrays underneath the constructor of a HashMap, arrays are created when the put method is first called. Node[] table is used to store key-value pairs.

Let’s say we store the data of “patch “-55 in a hash table and calculate the value based on the K value (” patch “) by calling the overwritten hashCode() method in the String class (which is an order of magnitude). Then compute the index value of the space to store data in the Node array by using the mod ((n-1)&hash) operation or other operations based on the array length. If there is no data in the calculated index space, the “speck “-55 data is directly stored in the array. Same as the a-King explosion above.

If we insert the “a-mushroom” element, we will call hashCode() based on the Key value (“A”) and the length of the array to calculate the hash value of the “A-mushroom” and the existing “A-king bomb”. If the hash is equal, a hash collision occurs.

If the Value of the key is equal to the Value of the key, the Value of the key is equal to the Value of the key. If the Value of the key is equal, the Value of the key is equal to the Value of the key. If they are not equal, compare them with other data keys. If they are not equal, plan a node to store data.

The key values of two nodes are compared to determine whether they are overwritten

2.2. Hashing collision related issues

What algorithm is used at the bottom of the hash table to compute the hash value? What other algorithms can compute hash values?

The underlying index is calculated by unsigned right shift (>>>), bitwise xor (^), bitwise and (&) using the value of key’s hashCode method combined with the length of the array

Can also be used: square method, remainder, pseudo random number method. All three are inefficient. Unsigned 16 – bit xor is the most efficient operation.

What happens when the Hashcodes of two objects are equal?

Hash collisions will occur, and old values will be replaced if the contents of the key are the same. Otherwise, join to the end of the list, and the list length exceeds the threshold 8 will be converted to red-black tree storage.

When and what are hash collisions, and how can they be resolved?

Hash collisions occur whenever two elements’ keys compute the same hash value. Use linked lists to resolve hash collisions before jdk8. After jdK8, use linked list + red black tree to resolve hash collisions.

If two keys have the same HashCode, how do you store key-value pairs?

Hashcode is identical, and equals is used to compare whether the contents are identical. Same: the new value overwrites the previous value. The new key-value pair is added to the hash table

2.3. Red-black tree structure

When there are many elements in a linked list, that is, there are many elements with the same hash value but not the same content, sequential searching by key value is inefficient. In JDK1.8, hash table storage is realized by array + linked list + red-black tree. When the length of the linked list (threshold) exceeds 8 and the length of the current array > 64, the linked list is converted to red-black tree, which greatly reduces the search time. The reason JDK8 introduces red-black trees in hash tables is simply to make lookups more efficient.

Before JDK 1.8, the implementation of HashMap was array + linked list, even if the hash function is good, it is difficult to achieve 100 percent uniform distribution of elements. When a large number of elements in a HashMap are stored in the same bucket and there is a long linked list under the bucket, the HashMap is equivalent to a single linked list. If the single linked list has N elements, the traversal time complexity is O(n), completely losing its advantage. In response, JDK 1.8 introduced red-black trees (search time complexity O(logn)) to optimize this problem. When the list length is very small, even traversal is very fast, but when the list length keeps getting longer, it will certainly have a certain impact on query performance, so it is necessary to turn into a tree.

2.4. Storage Flow chart

Put () is used to store data in HashMap. Put () internally calls putVal(), so analysis of PUT () is also analysis of putVal(). The whole process is complicated, and the flow chart is as follows:

Put () :

public V put(K key, V value) {
    // Hash key hashCode(), calling putVal
        return putVal(hash(key), key, value, false.true);
    }

 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,  boolean evict) {  Node<K,V>[] tab; Node<K,V> p; int n, i;  / *1. If TAB is empty, start creation.2, (TAB = table) == null: Assigns an empty table to TAB, and then checks whether TAB is null3, (n = tab.length) == 0 Indicates that no memory is allocated for the tableN = (TAB = resize()).length; Perform capacity expansion. And assign the initialized array length to n.N = (TAB = resize()).length, array TAB each space is null* /   if ((tab = table) == null || (n = tab.length) == 0)  // Call resize() to expand capacity  n = (tab = resize()).length;  / *1, I = (n-1) &hash computes the index of the array to I, which determines which bucket the element is in2, p = TAB [I = (n-1) & hash] to obtain the calculated position of the data assigned to node P3, (p = TAB [I = (n-1) & hash]) == nullIf null, execute code: TAB [I] = newNode(hash, key, value, null); Create a new node based on the key-value pair and place it into the bucket at that locationSummary: If the bucket has no hash collisions, the key-value pair is inserted directly into the spatial location* /  if ((p = tab[i = (n - 1) & hash]) == null)  // If the node position is null, the insert operation is performed directly  tab[i] = newNode(hash, key, value, null);  // If the node position is not null, it indicates that the position has a value, so the hash values need to be compared to see if they are equal  else {  Node<K,V> e; K k;  / *Compares the hash value and key of the first element in the bucket (node in the array)1. P.hash == P.hash in the hash field indicates the hash value of the original data. Then add the hash value to compare whether the two hash values are equal2, (k = P. key) == key: p.key obtains the key of the original data and assigns the value to k key to indicate whether the address values of the two keys are the same3, the key! = null && key. Equals (k) : to perform here show the address of the two key values are not equal, then the judgment before adding the key whether is equal to null, if is not equal to null to invoke the equals method to judge whether the content of the two key equal* /  if (p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))  / *Description: The hash values of two elements are equal (hash collision), and the key value is also equalAssign the old element whole object to e and record it with e* /  e = p;  // Hash value is not equal or key is not equal; Determine whether P is a red-black tree  else if (p instanceof TreeNode)  // is a red-black tree, call the tree insertion method  e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);  // Insert a list node  else {  / *1. If it's a linked list you need to traverse to the last node and insert2. Determine whether there are duplicate keys in the linked list by cyclic traversal* /  for (int binCount = 0; ; ++binCount) {  / *1)e = p.ext gets the next element of p and assigns it to e2)(e = p.ext) == null Determine whether P.ext is equal to null, which is equal to null, indicating that p has no next element, then the end of the linked list has been reached, and no duplicate key has been found, indicating that HashMap does not contain this keyInsert the key-value pair into the linked list* /  if ((e = p.next) == null) {  p.next = newNode(hash, key, value, null);  // If the list length is greater than 8, convert it to a red-black tree structure  if (binCount >= TREEIFY_THRESHOLD - 1)  // Convert to a red-black tree  treeifyBin(tab, hash);  break;  }  // There are key values and direct overrides value  if (e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))  break;  p = e;  }  }  // If the node is null, no insertion operation is performed  if(e ! =null) {  V oldValue = e.value;  if(! onlyIfAbsent || oldValue ==null)  e.value = value;  afterNodeAccess(e);  return oldValue;  }  }  // Number of times to modify records  ++modCount;  // Check whether the actual size is greater than threshold. If so, expand the capacity  if (++size > threshold)  resize();  // Callback after insertion  afterNodeInsertion(evict);  return null;  } Copy the code

Summary:

Determine whether to expand or tree based on the number of elements in the hash table
If you tree through the elements in the bucket, create the same number of tree nodes, copy the contents, and establish the connection
Then make the first element in the bucket point to the newly created root node, replacing the bucket’s list contents with the tree-like contents

3. Capacity expansion mechanism of HashMap

We know that the array has a finite capacity, so if you insert a certain amount of data, you’re going to expand it; Let’s start with two questions

When does it need to be expanded?

Array expansion occurs when the number of elements in a HashMap exceeds the array length loadFactor. The default value of loadFactor is 0.75, which is a compromise. That is, by default, the array size is 16, so when the number of elements in the HashMap exceeds 16×0.75=12(which is the threshold), the array size is doubled to 2×16=32, and the location of each element in the array is recalculated, which is a very expensive operation. Therefore, if we can predict the number of elements in a HashMap, predicting the number of elements can effectively improve the performance of the HashMap.

How to expand the capacity?

The HashMap uses the resize() method to calculate the new capacity of the table array and the new position of Node in the new array, and copies the values from the old array into the new array to achieve automatic expansion. Since each expansion doubles the value of the (n-1)&hash, with only one more bit, the nodes are either in their original location or assigned to the “original + old capacity” position.

Therefore, when we extend the HashMap, we do not need to recalculate the hash. We just need to see if the new bit of the hash value is 1 or 0. 0 is the same index, and 1 is old index +oldCap(old position +old capacity). Without going into details here, you can see the schematic diagram of resize expanded from 16 to 32 below:

4. Why is the HashMap array length a power of 2

Let’s look at its member variables first:

Serialized version number

private static final long serialVersionUID = 362498820763181265L;
Copy the code

Initialization capacity of the collection initCapacity

// The default initial capacity is 16 --1 <<4 equals 1*2 ^ 4 --1*16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;   
Copy the code

The default initial capacity is 16. If the initial capacity is too large, the traversal speed will be slowed down and the efficiency will be low. If the capacity is too small, it needs to be expanded more times, which costs performance.

The load factor

/ * *     * The load factor used when none specified in constructor.
* /
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code

The initial default value is 0.75. If the value is too large, hash conflicts are more likely. If it is too small, the number of capacity expansion will also increase.

Why does it have to be 2 to the NTH power?

When you add an element to a HashMap, you need to determine its position in the array based on the hash value of the key. In order to improve the access efficiency of HashMap, it is necessary to minimize collisions, that is, to distribute data evenly as far as possible. Each linked list has roughly the same length. This implementation is based on the algorithm to store data in which linked list.

This algorithm is actually taking modulo, hash%length, which is not as efficient as the displacement calculation in the computer. So the source code is optimized to use hash&(length-1), whereas actually hash%length is equal to hash&(length-1) if length is 2 to the NTH power.

What happens if the input value is not a power of two?

If the array length is not 2 to the NTH power, the calculated indexes are particularly likely to be identical, and are extremely prone to hash collisions, resulting in the rest of the array space largely storing no data, and the linked list or red-black tree being too long and inefficient.

Summary:

If n is a power of 2, data can be evenly inserted. If n is not a power of 2, data may never be inserted into some positions in the array, wasting space in the array and increasing hash conflicts.

2, you might want to do % remainder to determine the position, which is fine, but not as good as ampersand. And when n is a power of 2: hash & (length-1) == hash % length

3. Therefore, the reason why HashMap capacity is the second power is to evenly distribute data and reduce hash conflicts. After all, the larger hash conflicts are, the larger the length of a chain in the array is, which will reduce the performance of HashMap

Well, that’s all for today, and next time we’ll bring you the HashMap interview content! More dry goods, quality articles, welcome to pay attention to my original technology public number ~

Learn about the interview HashMap data structure in super detail!

Write in front:

Mind mapping:

1. Introduction to the HashMap collection

2. HsahMap underlying data structure

2.1. Process of storing data in HashMap

2.2. Hashing collision related issues

2.3. Red-black tree structure

2.4. Storage Flow chart

3. Capacity expansion mechanism of HashMap

When does it need to be expanded?

How to expand the capacity?

4. Why is the HashMap array length a power of 2

Serialized version number

Initialization capacity of the collection initCapacity

The load factor

Why does it have to be 2 to the NTH power?

What happens if the input value is not a power of two?

Summary:

Related Posts

Build your own SpringBoot background framework from scratch (three)

Redis learning | Redis commonly used instructions & AnotherRedisDesktopManager installation

This is the most comprehensive JVM tuning guide