Just experienced the autumn recruitment, read a lot of interviews, HashMap is a frequent interview question, here is a summary of HashMap frequently met questions, and according to the frequency of the question is roughly made a mark. The more stars you have after a question, the more likely it is to come up in an interview. Figure out all the following questions and you’ll never have to worry about your interviewer asking you for HashMap again.

Search the public account zhang on wechat, reply to the interview manual, and get the PDF version of this document and more interview materials.

Understand all the Basics of Java

How does HashMap differ in JDK1.7 and JDK1.8? The underlying implementation of HashMap * * *

  • Underlying data structures for JDK1.7 (arrays + linked lists)

  • Underlying data structures for JDK1.8 (Array + linked list)

  • Hash function for JDK1.7

    static final int hash(int h){
    	h ^= (h >>> 20) ^ (h >>>12);
        return h^(h >>> 7) ^ (h >>> 4);
    }
    Copy the code
  • JDK1.8 Hash function

    static final int hash(Onject key){    
        int h;    
        return (key == null)?0 : (h = key.hashCode())^(h >>> 16);
    }
    Copy the code

    JDK1.8 has two perturbations with one xor bit operation, while JDK1.7 has nine perturbations with five xor bits and four bit operations. Key.hashcode () and key.hashcode () move 16 bits to the right for xor. In this way, the high 16 bits remain the same, and xOR operations are performed between the low 16 bits and the high 16 bits to reduce collisions. Both high and low bits participate in the Hash calculation. How to not perturb it, because the hash value has 32 bits, and just mod the length of the array, just a few bits of the hash value.

How does HashMap differ between JDK1.7 and JDK1.8?

JDK1.7 JDK1.8 The advantage of JDK1.8
The underlying structure Array + linked list Array + list/red-black tree (list > 8) To improve query efficiency, avoid excessively long linked lists
Hash value calculation method 9 disturbances = 4 bit operations + 5 XOR operations 2 disturbances = 1 bit operation + 1 xOR operation You can evenly split previously conflicting nodes into new buckets (see capacity expansion below for details)
Data insertion mode Head insertion method (first move the data from the original position to the last 1 bit, and then insert the data to the position) Tail insertion (directly into the end of the list/red-black tree) Solve the problem that multiple threads cause an endless loop
How to calculate the storage location after capacity expansion Rehash the hash Original location or original location + old capacity This saves the time for recalculating the hash value

Why is the length of a HashMap a power of 2 * * *

Since a HashMap determines the storage location by the hash value of the key, but the hash value ranges from -2147483648 to 2147483647, it is impossible to build an array that large to cover all the hash values. If the length of the array is a power of 2, (leng-1)&hash is the same as hash%length. You can use (leng-1)&hash to replace the % mod operation to improve performance.

How does a HashMap put function flow? * *

The main flow of a HashMap can be seen in the following flow chart, which is very logical.

How is the HashMap expansion operation implemented? * * *

  • The initial value is 16, the load factor is 0.75, and the threshold is load factor x capacity

  • The resize() method is called to expand the hashMap when the key-value pair in the hashMap exceeds the threshold or is initialized.

  • Each time it expands, it doubles the capacity

  • During capacity expansion, it is necessary to determine whether E. hash & oldCap is zero, which is equivalent to the hash value mod to the array length. If it is 0, the position will remain unchanged; if it is 1, the position will change to the original position plus the old capacity.

    The source code is as follows:

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null)?0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) { // If the old capacity exceeds the maximum value, the threshold is an integer
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1;  // If it does not exceed the maximum value, it becomes twice the original value
        }
        else if (oldThr > 0) 
            newCap = oldThr;
    
        else {               
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
    
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if(oldTab ! =null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { 
                        Node<K,V> loHead = null, loTail = null;//loHead,loTail indicates the original location after expansion
                        Node<K,V> hiHead = null, hiTail = null;//hiHead,hiTail means in the original position after expansion + old capacity
                        Node<K,V> next;
                        do {             
                            next = e.next;
                            if ((e.hash & oldCap) == 0) { // If zero is assigned to loHead, hiHead is not assigned to zero
                                if (loTail == null)
                                    loHead = e;
                                else                                
                                    loTail.next = e;
                                loTail = e;                           
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;   //loHead is placed in the original position
                        }
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;  // Put the hiHead in the original position + old capacity
                        }
                    }
                }
            }
        }
        return newTab;
    }
    
    Copy the code

Why is the default HashMap load factor 0.75?

This is mainly a compromise between space utilization and query cost. If the loading factor is too high, the space utilization will be improved, but the probability of hash conflict will increase. If the loading factor is too low, frequent capacity expansion reduces the probability of hash conflicts, but the space utilization becomes low. Why 0.75, not 0.74 or 0.76, is a conclusion based on mathematical analysis (Poisson distribution) and industry regulations.

Why set the threshold for linked list red black tree to 8? Why not just start with red black trees?

Now, a lot of people might say, well, if red-black trees are so good, why don’t you just start with a red-black tree, and you start with a linked list, and then you convert to a red-red-black tree when the list is longer than 8.

  • Because the nodes of a red-black tree take up twice as much space as the nodes of an ordinary linked list, but the search time complexity is low, the advantages of a red-black tree can be realized only when there are many nodes. As for the reason of 8, it is a statistical result obtained through data analysis. The probability of the length of the linked list reaching 8 is very low. Considering the advantages and disadvantages of the linked list and red-black tree, it is considered to convert the linked list larger than 8 into red-black tree.
  • Linked lists convert to red black trees exceptIf the list length is greater than 8, andHashMapThe array length is greater than 64. Which is ifHashMapIf the list length is less than 64 and the list length is greater than 8, it will not be converted into a red-black tree.

How does HashMap resolve hash conflicts? * * *

Hash collision: When storing elements in a hashMap, the hash value of the key is computed first to determine the storage location. Since the hash value of the key is computed at the end of an array length operation, even different keys may compute the same hash value, causing hash collisions. The linked list/red-black tree in the underlying structure of the hashMap solves this problem.

Hash conflict resolution in HashMap can be considered in three main ways (with JDK1.8 as background)

  • Zipper method

    The data structure of HasMap is array + linked list/red-black tree. When different keys compute the same hash value, nodes (conflicting keys and their corresponding values) are attached to the array in the form of linked lists.

  • The hash function

    The hash of the key is perturbed twice, the hashCode of the key is xOR 16 bits to the right of the hashCode of the key, and then the length of the array is mod (actually bitwise operation is used to improve performance, but the purpose is the same as mod). In this way, the high value of the hashCode is also involved in the operation. The probability of hash conflicts is further reduced to make data distribution more even.

  • Red and black tree

    In the zipper method, if the hash conflict is particularly serious, the length of the linked list attached to the array will be too long, resulting in poor performance. Therefore, when the length of the list is larger than 8, the linked list can be converted into a red-black tree to improve the speed of traversing the list.

Why not use the hash value of hashCode() as the subscript of the table? * * *

The hash range processed by hashCode() is too large to build such a large array in memory.

Can I use any class as a Map key? * * *

Yes, but pay attention to the following two points:

  • If the class is rewrittenequals() Method should also be rewrittenhashCode()Methods.
  • The best definitionkeyClasses are immutable, like thiskeyThe correspondinghashCode() Values can be cached, so performance is better, that’s whyStringParticularly suited toHashMapthekey.

Why wrapper classes like String and Integer in HashMap are suitable for keys? * * *

  • These packaging classes are allfinalModification is immutable, guaranteedkeyCan not be modified, will not be put in and get different hash values.
  • They’ve been rewritten internallyhashcode().equal()Methods.

What if you use Object as the Key of a HashMap? * *

  • rewritehashCode()Method because the hash value needs to be computed to determine the storage location
  • rewriteequals()Method, because you need to guaranteekeyUniqueness of.

HashMap multithreading causes an infinite loop

Since JDK1.7’s hashMap uses headers for hash collisions, it can cause an infinite loop in multi-threaded situations, but this problem is no longer present in JDK1.8. However, it is important to note that HashMap in JDK1.8 is still not secure and can still be used with thread safety issues in multi-threaded situations. Basically, the interview said that here can be, the specific process is difficult to clear with dictation, interested in this article. An infinite loop of a HASHMAP

Is the ConcurrentHashMap underlying implementation known? * *

  • JDK1.7

    In JDK1.7, ConcurrentHashMap is implemented as a Segment array + HashEntry array. Segment implementation ReentrantLock, so the Segment has locking properties, HashEntry is used to store key-value pairs. A ConcurrentHashMap contains an array of segments, and a Segment contains an array of Hashentries. A HashEntry is a linked list structure.

  • JDK1.8

    In JDK1.8, it is not the Segment+HashEntry structure, but the Node array + linked list/red-black tree, CAS+synchronized to ensure thread safety. When the list length is greater than 8, the list is converted to a red-black tree. Synchronized in JDK1.8 is only the head node of a chain table or red-black tree, and is a more fine-grained lock than segment. Lock competition is smaller, so it is more efficient.

To sum up:

  • JDK1.7 bottom isReentrantLock+Segment+HashEntry, JDK1.8 underlying issynchronized+CAS+ linked list/red black tree
  • JDK1.7 uses segmented locking, where several locks are held simultaneouslyHashEntry, JDK1.8 locks Node nodes, as long as there is no hash conflict, there is no lock contention. So JDK1.8 provides a smaller granularity lock than JDK1.7, reducing lock contention and improving itConcurrentHashMapThe concurrency capability of.

Do you know the underlying implementation of HashTable? * *

The underlying data structure of HashTable is array + linked list. Linked list is mainly to solve hash conflict, and the whole array is synchronized modified, so HashTable is thread safe. However, the granularity of lock is too large, and the lock competition is very fierce and the efficiency is very low.

The difference between HashMap, ConcurrentHashMap, and Hashtable * * *

A HashMap (JDK1.8) ConcurrentHashMap (JDK1.8) Hashtable
The underlying implementation Array + linked list/red black tree Array + linked list/red black tree Array + linked list
Thread safety unsafe Security (SynchronizedModify the Node. Security (SynchronizedModify the entire table)
The efficiency of high higher low
capacity It starts at 16, and it expands to 2N each time It starts at 16, and it expands to 2N each time We start with 11, and we expand it to 2n plus 1
Whether Null keys and values are supported There can be one Null key and multiple Null values Does not support Does not support

The common method for Map is * *

These common methods need to be memorized, although not on the interview, but the written test or interview algorithm will often be used.

methods
void clear() Clears elements in the collection
boolean containsKey(Object key) Queries whether the Map contains the specified key, and returns true if the Map contains the specified key
Set entrySet() Return a Set of key-value pairs contained in a Map. Each Set element is an object of Map.Entry
Object get(Object key) Returns the value specified by key, or null if the Map does not contain key
boolean isEmpty() Check whether the Map is empty. If the Map is empty, return true
Set keySet() Returns the set of all keys in the Map
Object put(Object key,Object value) Add a key/value pair. If the same key already exists, the new key/value pair overwrites the old one and returns the value of the original key/value, otherwise null
void putAll(Map m) Copy the key-value pairs from the drawn Map into the Map
Object remove(Object key) Deletes the key-value pair corresponding to the specified key and returns the associated value, or NULL if the key does not exist
int size() Returns the number of key-value pairs in the Map
Collection values() Returns a Collection of values in the Map
boolean containsValue ( Object value) Determines whether the value value exists in the image