Concern public number: IT elder brother, read a dry goods article every day, a year later you will find a different self.

1. What is the underlying data structure of HashMap?

There are some differences between JDK1.7 and JDK1.8:

In JDK1.7, it consists of arrays, which are the body of a HashMap, and lists, which are primarily used to resolve hash conflicts.

In JDK1.8, there is “array + linked list + red-black tree” composition. When the linked list is too long, the performance of HashMap will be seriously affected. The red-black tree search time complexity is O(logn), while the linked list is O(n). For this reason, JDK1.8 further optimizes the data structure by introducing red-black trees. Linked lists and red-black trees convert when certain conditions are reached:

  • A red-black tree is converted when the list exceeds 8 and the array length (total data) exceeds 64

  • If the length of the current array is less than 64, array expansion will be selected rather than conversion to a red-black tree to reduce search time.

2. Talk about the features of HashMap

  • Hashmap access is unordered

  • Both key and value positions can be NULL, but only one key position can be NULL

  • The key position is unique and the underlying data structure controls the key

  • Array + linked list + red-black tree

  • When the threshold (boundary value) is greater than 8 and the array length is greater than 64, the linked list is converted into a red-black tree. The purpose of red-black tree is to improve the search speed and efficient query

3. What are the methods to resolve hash conflicts? What kind of HashMap is used?

Hash conflict resolution methods include: open addressing, rehash, chain address (common zip in HashMap), resume public overflow area. The chain-address approach is used in HashMap.

  • Open addressing is also called rehash. The basic idea is that if p=H(key) conflicts, hash again based on P, p1=H(p). If P1 conflicts again, hash again based on P1, and so on until a non-conflicting hash address PI is found. Therefore, open addressing requires a hash table whose length is greater than or equal to the number of elements to be stored, and because of the re-hash, only the deleted nodes can be marked, not actually deleted

  • Rehash (double hash, multiple hash), provide multiple different hash functions, R1=H1(key1) conflict, then calculate R2=H2 (key1), until there is no conflict. This does not result in clustering, but increases the calculation time.

  • Linked address method (zipper method), the hash value of the same element to form a single linked list of synonyms, and the single linked list of the head pointer stored in the hash table I cell, search, insert and delete mainly in the synonym linked list, linked list method is suitable for often insert and delete the situation.

  • Establish a public overflow area, divide the hash table into public table and overflow table, when overflow occurs, put all overflow data into the overflow area

Notice the difference between open addressing and rehash

  • Open addressing can only use the same hash function for rehash, and rehash can call multiple hash functions for rehash

4. Why does the list evolve into a red-black tree after the array length is greater than 64

When the array comparison is small, if there is a red-black tree structure, it will reduce efficiency, while red-black trees need left-right rotation, color change, and other operations to maintain balance. At the same time, when the array length is less than 64, the search time is relatively faster, in order to speed up the search and improve performance

In JDK1.8, the implementation of HashMap was array + linked list. Even if the hash function is good, it is difficult to achieve 100% uniform distribution of elements. When a large number of elements in HashMap are stored in the same bucket and there is a long linked list under the bucket, HashMap is equivalent to a single linked list. If the single linked list has N elements, the traversal time complexity will degenerate from O (1) to O (n), completely losing its advantage. In order to solve this situation, Red-black trees (search time complexity O (logn)) were introduced in JDK1.8 to optimize this problem

5. Why is the loading factor set to 0.75 and the initialization threshold to 12?

Threshold in a HashMap is the maximum number of key-value pairs that a HashMap can hold. The calculation formula is length*LoadFactory. That is, the larger the load factor, the greater the number of key-value pairs that can be accommodated after the array has been defined in length

The closer the loadFactory is to 1, the more data (more entries) will be stored in the array, the more dense the data will be, and there will be more linked lists with longer values. Our query efficiency will be lower, and the probability of hash conflicts will be higher when we add data

The default loadFactory is 0.75. The smaller the loadFactory is, the closer it is to 0, the less entries there are in the array and the more sparse it is

0.75 is a balanced choice for space and time efficiency

If the load factor is small, for example, 0.4, then the initial length is 16*0.4=6. If the array occupies 6 Spaces, it will be expanded. Many Spaces may have few or even no elements, resulting in a large amount of space wasted

If the load factor is larger, such as 0.9, it will be very inefficient to find elements before capacity expansion

Loadfactory is set to 0.75, which is a reliable value verified by multiple computations to minimize rehash times and avoid excessive performance consumption

6. What algorithm is used at the bottom of the hash table to calculate hash values? What other algorithms can compute hash values?

The hashCode method is a method in Object, which can be used by all classes. First, the initial hash value h1 is generated by calling the hashCode method, and then the unsigned h1 is moved 16 bits to the right to get H2. After that, the bitwise xor (^) operation between H1 and H2 is performed to get the final hash value H3. Then h3 and (length-1) are bitwise and (&) to obtain the hash index

Other algorithms that can compute hash values are

  • Square the middle

  • modulo

  • Pseudo random number method

7. What happens when the hashcodes of two objects are equal

HashCode equality generates hash collisions. HashCode equality calls equals to compare whether the contents are equal. If the contents are equal, they are overwritten, and if they are not equal, they are joined to the back of the list

8. When and what are hash collisions, and how can they be resolved?

Hash collisions occur whenever the keys of two elements have the same hash code value

9. Put method flow of HashMap

Taking JDK8 as an example, the process is as follows:

  1. We first compute the hash value based on the key’s value and find the index of the element stored in the array

  2. If the array is empty, call resize to initialize it;

  3. If there’s no hash conflict, just put it in the corresponding array index

  4. If a conflict occurs and the key already exists, override the value

  5. If there is a linked list structure after the conflict, it determines whether the list is larger than 8. If the list is larger than 8 and the array capacity is smaller than 64, it expands the list. If the list node number is greater than 8 and the array size is greater than 64, the structure is converted to a red-black tree. Otherwise, the list inserts key-value pairs, overwriting the value if the key exists

  6. If the node is found to be a red-black tree after the conflict, the node is attached to the tree

10. Capacity expansion using HashMap

The HashMap expands when it exceeds the capacity defined by the load factor. Java arrays cannot be scaled up by themselves. Make a HashMap twice the size of the original array

We look at jdK1.8 expansion source code

Final Node<K,V>[] resize() {//oldTab: reference hash table Node<K,V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; Int newCap, newThr = 0; //newCap: table array size after capacity expansion //newThr: condition for triggering capacity expansion after capacity expansion int newCap, newThr = 0; If (oldCap > 0) {// Check whether the old capacity is greater than or equal to the maximum capacity, if so, the capacity cannot be expanded, and set the expansion condition to int maximum capacity, If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; }// Set newCap's new capacity to double oldCap's old capacity (<<1), and < maximum capacity, and >=16, Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr < < 1; // double threshold} // If oldCap=0 and the boundary value is greater than 0, the hash table is null. //1.new hashMap (intitCap,loadFactor) //2.new hashMap (initCap) //3.new HashMap(map) else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; // in this case oldThr=0; OldCap =0, indicating no initialization, [so] // Zero initial threshold Using defaults newCap = [05.6 so DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } //newThr if (newThr == 0) {// Capacity *0.75 float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @suppressWarnings ({"rawtypes","unchecked"}) // Create a larger array of nodes <K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // Point table to the newly created array table = newTab; // Table is not null before expansion if (oldTab! For (int j = 0; j < oldCap; ++j) {// Set e to current node node node <K,V> e; If ((e = oldTab[j])! // If ((e = oldTab[j])! OldTab [j] = null; oldTab[j] = null; If (e.next == null) newTab[e.hash & (newcap-1)] = e; if (e.next == null) newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// preserve order // preserve order: // preserve order: // preserve order: // LoTail Node<K,V> loHead = null, loTail = null; //hiHead: hiHead Node //hiTail: hiHead Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; //oldCap = 16:10000, &e. sah = 0 If ((e.hash & oldCap) == 0) {if (loTail == null) //loHead points to e loHead = e; else loTail.next = e; loTail = e; Else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // If (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // If (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

After capacity expansion, the original node can be adjusted only in two ways

  • Hold original position (when new bit is 0)

  • Hash the original index + position to expand the size (when the new bit is 1)

After the expansion, the hashing of elements is set very cleverly, saving the time of calculating hash values. Let’s take a look at the concrete implementation

When the array length is from 16 to 32, it is only an extra bit operation. We only need to care whether the extra bit is 0 or 1. If it is 0, the index will remain unchanged, and if it is 1, the index will change to the current index value + the expanded length

This method not only saves the time of recalculating hash, but also ensures that the total number of elements in the current bucket is less than or equal to the number of elements in the original bucket. In this way, more serious hash conflicts are avoided and the nodes that have previously clashed are evenly distributed to new buckets

11. What is commonly used as a HashMap key?

Immutable classes such as Integer and String are commonly used as HashMap keys

  • Because String is immutable, its Hashcode is cached when a String is created and does not need to be evaluated again, which is faster than other objects

  • Since the equals() and hashCode() methods are used when retrieving objects, it is important that the key object overrides them properly. These classes override hashCode() and equals() properly

12. Why does a Map bucket become a red-black tree when the number of nodes exceeds 8?

8 as a threshold as a member variable of the HashMap, there is no comment in the source code explaining why the threshold is 8

There’s a comment in the HashMap, so let’s move on

* Because TreeNodes are about twice the size of regular nodes, we * use them only when bins contain enough nodes to warrant use * (see TREEIFY_THRESHOLD). And when they become too small (due to * removal or resizing) they are converted back to plain bins. In * usages with well-distributed user hashCodes, tree bins are * rarely used. Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) With a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, The expected * occurrences of list size k are (exp(-0.5) * POw (0.5, k) / * factorial(k)).Copy the code

translation

Since tree nodes are about twice the size of normal nodes, we only use tree nodes when the bin contains enough nodes (see TREEIFY_THRESHOLD). When their edges are too small (due to deletion or resizing), they are converted back to normal buckets, and tree boxes are rarely used when using well-distributed Hashcode. Ideally, under random hash codes, the frequency of nodes in the box obeds the Poisson distribution and the first value is: * 0:0.60653066 * 1:0.30326533 * 2:0.07581633 * 3:0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * More: less than 1 in ten millionCopy the code

Tree nodes occupy twice as much space as ordinary nodes. If there are not enough linked list nodes but they are converted into red-black trees, it will undoubtedly consume a lot of space resources. In addition, the distribution frequency of all bin nodes under the random hash algorithm follows Poisson distribution, and the probability of the linked list length reaching 8 is only 0.00000006, which is almost impossible. So there’s a lot of science behind the number eight

  • In terms of average search length, the average search length of red-black tree is logn, if the length is 8, logn=3, while the average search length of linked list is N /4, when the length is 8, n/2=4, so the threshold 8 can greatly improve the search speed

  • When the length is 6, the red-black tree degenerates into a linked list because logn=log6 is approximately 2.6, and n/2=6/2=3, which are not much different. Red-black tree nodes take up more memory space, so the conversion is most friendly at this time

13. Why is HashMap thread unsafe?

  • Infinite loop under multithreading. The HashMap in JDK1.7 uses header insertion to insert elements. In a multi-threaded environment, expansion can lead to an infinite loop of circular lists. Therefore, JDK1.8 uses tail insertion method to insert elements, which will keep the original order of linked list elements during expansion without the problem of circular list

  • Multithreaded PUT can cause elements to be lost. If multiple threads perform the put operation at the same time, if the computed index positions are the same, the previous key will be overwritten by the next key, resulting in element loss. This problem exists in both JDK1.7 and JDK1.8

  • If PUT and GET are concurrent, GET may be null. This problem can occur in both JDK1.7 and JDK1.8 when thread 1 performs put and rehash because the number of elements exceeds threshold and thread 2 performs GET

14. Why xOR processing of low and high 16bits is used to compute hash values

  • We compute the index by bitwise adding hashCode to Leng-1. If the array is small, such as 16, such a value is xerobed from hashCode or only the last four bits of hashCode are actually being computed. Hash is a random value. However, if the generated hashCode value has a large change in the high order and a small change in the low order, there is a high probability of hash conflict. Therefore, in order to make the elements hash better, we also use the high order of the hash value \

For example

If we do not bitwise xor hashCode, but bitwise sum hash and Length-1 directly, the following is possible

If the hashCode value generated next time fluctuates greatly from high to low, the high cannot participate in the calculation

As you can see, the two hashes calculated are equal, resulting in hash conflicts

Therefore, the purpose of unsigned 16 bits to the right is to make a balance between the high degree of chaos and the ground degree of chaos, improve the randomness of the low degree, and reduce the hash conflict

Please point out any mistakes in the interview questions in the comments section, and I will correct and optimize them. If the article is helpful to you, please give the leader a free like bar, thank you.

Concern public number: IT elder brother, read a dry goods article every day, a year later you will find a different self.