introduce

It stores data based on the hashCode value of the key, and in most cases its value can be located directly, thus providing fast access, but the traversal order is uncertain.

A HashMap allows at most one record to have a null key and multiple records to have a NULL value. A HashMap is not thread-safe, meaning that more than one thread can write a HashMap at any one time, which can lead to data inconsistencies. If thread-safe is desired, the Collections synchronizedMap method can be used to make HashMap thread-safe, or ConcurrentHashMap can be used.

Storage structure – Fields

Implementation of the put function

Public V put(K key, V value) {// Hash return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / TAB is empty, create the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; / / the node is the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); Else {for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; }} // Write if (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; Load factor*current capacity, resize if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

  1. Check whether the key-value pair array table[I] is empty or null; otherwise, perform resize() for expansion.

  2. Table [I]==null; if table[I] is not null, go to ③; if table[I] is not null, go to ③.

  3. Check whether the first element of table[I] is the same as the first element of key.

  4. Check whether table[I] is a treeNode, that is, whether table[I] is a red-black tree.

  5. Table [I] is traversed to determine whether the length of the linked list is greater than 8. If the length is greater than 8, the linked list is converted into a red-black tree and the insertion operation is performed in the red-black tree; otherwise, the insertion operation of the linked list is carried out. If the key already exists in the traversal process, the value can be overwritten directly.

  6. After the insert is successful, check whether the number of existing key value pairs size exceeds the maximum capacity threshold. If so, expand the capacity.

The implementation of get function

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {/ / a direct hit the if (first. The hash = = hash && / / always check first node ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; If ((e = first.next)! = null) {// Get if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreenode (hash, key); / / get the do in the linked list {the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code
  1. The first node in the bucket, a direct hit;

  2. If there is a conflict, the corresponding entry is searched through key.equals(k)

If it is a tree, check key.equals(k) in the tree, O(logn);

If it is a linked list, it is searched through key.equals(k), O(n).

Implementation of the hash function

The implementation of the hash on hashCode() looks like this:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

This function basically does what it does: the high 16bit stays the same, the low 16bit and the high 16bit make an xor.

When designing the hash function, since the current table length n is a power of 2, the subscript is computed as follows (ampersand instead of % mod) :

N minus 1 hashCopy the code

The designers think this method is prone to collisions. Why do you say that? Consider that when n-1 is 15(0x1111), the hash is really only valid at the lower 4bits, which is of course collision prone.

Therefore, the designers came up with a holistic approach (considering speed, function, and quality) by xorning the high and low 16bits. The designers also explained that since most hashcodes are now well distributed, collisions are done using O(logn) trees. Only xor, not only reduces the overhead of the system, but also will not cause the collision caused by the high level does not participate in the calculation of the subscript (table length is small).

The realization of the resize

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

When put, resize occurs if it is found that the bucket is currently occupied by more than the proportion that the Load Factor wants. In the resize process, it is simply a matter of doubling the bucket, recalculating the index, and placing the node in the new bucket.

Resize occurs when the limit is exceeded, but since we are using a 2-power extension, the element is either in its original position or moved to a 2-power position.

For example, when we expand from 16 to 32, the specific changes are as follows:

Therefore, after the element is recalculated, the mask range of n-1 is 1bit more (red) at the high level because n is doubled, so the new index will look like this:

Therefore, when we extend the HashMap, we don’t need to recalculate the hash. We just need to see if the new bit of the hash value is 1 or 0. 0 is the same index, and 1 is the old index +oldCap. You can see the resize from 16 to 32 below:

This design is indeed very clever, not only saves the time to recalculate the hash value, but also because the resize process can be considered random whether the new 1bit is 0 or 1, so the resize process evenly distributes the previously conflicting nodes to the new bucket.

Thread safety

In multithreaded usage scenarios, try to avoid using a thread-safe HashMap and use a thread-safe ConcurrentHashMap instead.

Public class HashMapInfiniteLoop {private static HashMap<Integer,String> map = new HashMap<Integer,String>(2, 0.75f); Public static void main(String[] args) {map.put(5, "C"); new Thread("Thread1") { public void run() { map.put(7, "B"); System.out.println(map); }; }.start(); new Thread("Thread2") { public void run() { map.put(3, "A); System.out.println(map); }; }.start(); }}Copy the code

LoadFactor =0.75, threshold=2*0.75=1, that is, when the second key is put, the map needs to be resized.

Debug thread 1 and thread 2 simultaneously to the first line of the Transfer method (section 3.3 block) by setting a breakpoint. Notice that at this point both threads have successfully added data. Release thread1 breakpoints to transfer method’s “Entry Next = E.next;” This line; Then release the breakpoint on thread 2 and let thread 2 resize. The result is shown below.

Note that e of Thread1 points to key(3) and next points to key(7), which, after thread two rehash, points to the reassembled list of thread two.

As soon as the thread is scheduled to execute, newTalbe[I] = e, then e = next, causes E to point to key(7), and next = e.next causes next to point to key(3).

E.ext = newTable[I] causes key(3). Next points to key(7). Note that key(7).next already points to key(3), and the circular list appears.

So, when we call map.get(11) with the thread, the tragedy appears — Infinite Loop.

Q&a summary

Interviewer: Have you ever used HashMap?

Interviewer: Yes

Interviewer: What is a HashMap?

Interviewer:

  1. It stores data based on the hashCode value of the key, and in most cases its value can be located directly, thus providing fast access, but the traversal order is uncertain.
  2. A HashMap allows at most one record to have a null key and multiple records to have a NULL value.
  3. A HashMap is not thread-safe, meaning that more than one thread can write a HashMap at any one time, which can lead to data inconsistencies.
  4. If thread-safe is desired, the Collections synchronizedMap method can be used to make HashMap thread-safe, or ConcurrentHashMap can be used.

Note: Candidates are free to expand their understanding of Hashmap, such as the difference between Hashmap and HashTable, etc

Interviewer: Do you know how HashMap works?

Interviewer: HashMap is based on hashing, where we use put(key, value) to store objects in a HashMap, and get(key) to get objects out of a HashMap. When we pass the key and value to the put() method, we first call the hashCode() method on the key, and the returned hashCode is used to find the bucket location to store the Entry object.

Interviewer: Do you know how get() works?

Interviewee: According to the blog post (omitted…)

Interviewer: Do you know how put() works?

Interviewee: According to the blog post (omitted…)

Interviewer: Do you know how capacity expansion works?

Interviewee: According to the blog post (omitted…)

Interviewer: What happens when two objects have the same Hashcode?

Interviewer: Because hashCode is the same, the two objects are equal, and the HashMap will either throw exceptions or not store them.

Interviewer: Equals () and hashCode(). These two objects are the same as hashCode, but they may not be equal.

Interviewer: Because hashCodes are the same, their bucket positions are the same, and a ‘collision’ occurs. Because HashMap uses a linked list to store objects, this Entry(a Map.entry object with key-value pairs) is stored in the linked list.

Interviewer: If two keys have the same Hashcode, how do you get the value object?

Interviewer: When we call the get() method, the HashMap uses the hashcode of the key object to find the bucket location and then retrieves the value object.

Interviewer: What if there are two value objects stored in the same bucket?

Interviewer: Will traverse the list until it finds the value object

Interviewer: If you don’t have value objects to compare, how do you make sure you find value objects?

Once the bucket location is found, the keys.equals() method is called to find the correct node in the linked list and finally the value object.

Interviewer: What do equals() and hashCode() do?

Using immutable objects declared final and using the appropriate equals() and hashCode() methods will reduce collisions and increase productivity. Immutability makes it possible to cache hashCode with different keys, which improves the speed of retrieving the entire object. Using Wrapper classes like String and Interger as keys is a good choice. Hash hash hash hash hash hash hash hash hash hash hash hash hash If a collision occurs, the key-.equals () method is used to find the corresponding node in the linked list or tree

Interviewer: Do you know the implementation of Hash? Why is it implemented this way?

Interviewee: In Java 1.8 implementations, this is done with 16 bits high or 16 bits low of hashCode() : (h = k.hashcode ()) ^ (h >>> 16), mainly from the speed, efficiency, quality, so that when the bucket n is small, it can also ensure that the high and low bits are involved in the hash calculation, and there is no too much overhead.

Interviewer: What if the size of the HashMap exceeds the capacity defined by the Load factor?

Interviewer: The default load factor is 0.75, which means that when a map is 75% full of buckets, as with other collection classes such as ArrayList, an array of buckets twice the size of the original HashMap will be created to resize the map. And put the original object into the new bucket array. This process is called rehashing because it calls the hash method to find the new bucket location.

Interviewer: Do you understand the problem with resizing a HashMap?

Interviewer: There is a real conditional race when resizing a HashMap, because if both threads find that the HashMap needs to be resized, they will try to resize it at the same time. During resizing, the order of the elements stored in the list is reversed because the HashMap does not place the elements at the end of the list when moving to the new bucket location, but at the head, to avoid tail-traversing. If conditional competition occurs, then the cycle is endless. So why use HashMap in a multithreaded environment?