I. Overview of Map

We all know that HashMaps are thread-unsafe, but hashMaps are among the most commonly used of all maps. Because it’s enough for most of our scenarios.

Map class inherits the diagramCopy the code

Map is an interface. The commonly used implementation classes are HashMap, LinkedHashMap, TreeMap, and HashTable. A HashMap holds a value based on the hashCode value of a key. Note that a HashMap does not guarantee that the order of traversal is the same as the order of insertion. A HashMap allows a record to have a null key, but does not require the value to be null. The HashTable class uses Synchronize for thread safety. There is only one lock globally. The efficiency of HashTable is relatively low when threads compete intensively. This is because when one thread accesses the synchronous method of the Hashtable, another thread attempts to access it again and enters a blocking or polling state. For example, when thread 1 uses PUT to add elements, thread 2 cannot use put to add elements and cannot use GET to get elements. So, the competition will be more and more fierce. ConcurrentHashMap, by contrast, uses segmented locking to improve concurrency. Data that are not in the same segment does not affect each other. Multiple threads operating on different segments do not affect each other. Use one lock per segment. Therefore, in thread-safe business scenarios, ConcurrentHashMap is recommended, while HashTable is not recommended in new code. If thread-safe is required, ConcurrentHashMap is used, otherwise HashMap is sufficient.

LinkedHashMap is a subclass of HashMap and differs from HashMap in that LinkedHashMap holds the order in which records are inserted. TreeMap implements the SortedMap interface. TreeMap has the ability to sort inserted records by key. The default order is ascending.

Implementation of HashMap

Java7 and Java8 implement HashMap differently. Of course, Java8 is more efficient. The main reason is that The HashMap of Java8 adds the data structure of red-black tree on the basis of Java7, which reduces the complexity of finding data in buckets from O(n) to O(logn). Of course, there are some other optimizations, such as resize. Since The HashMap of Java8 is relatively complex, this article will be based on the HashMap implementation of Java7 to explain that the main implementation part is the same. The implementation of Java8 is mainly optimized, the content is still unchanged, and it is still not safe for threads.

The implementation of a HashMap uses an array with a linked list inside each entry. Because a HashMap uses the hashCode of a key to find the storage location, different keys may have the same hashCode. This is when hash collisions occur, also called hash collisions. There are open address methods and chain address methods. In the implementation of HashMap, the chain address method is selected, that is, the entry with the same hash value is saved in the same array item. An array item can be regarded as a bucket, and the hashCode of the key of the entry in the bucket is the same.

Structural model of HashMap (Java8)Copy the code

The image above shows our description, and there is a very important data structure Node<K,V>, which is the data structure that actually holds our key-value pair. Here is the main content of this data structure:

        final int hash;    
        final K key;
        V value;
        Node<K,V> next;   
Copy the code

The hash field is used to locate the index position of the bucket, and the key and value are the contents of our data. Our key is final, that is, not allowed to change, which makes sense because a HashMap uses the key’s hashCode to find the index position of the bucket. Once the key is changed, the key’s hashCode will probably change. So arbitrarily changing the key would cause us to lose the record. The next field points to the next node in the list.

The initial number of buckets in a HashMap is 16, and the loadFact value is 0.75. When the number of buckets exceeds the threshold, the HashMap will expand and double its original size each time until it is no longer able to resize.

The following is a brief introduction to the implementation of HashMap. The specific implementation depends on the code. For the implementation of HashMap in Java8, you also need to be able to understand the red-black tree data structure.

1. Use the hashCode of the key to determine which bucket the record should be placed in. Whether it is inserted, searched or deleted, this is the first step. Since the length of a HashMap is always 2 to the NTH power, we can use the following method to do modular arithmetic:

                                    h&(length-1)
Copy the code

H is the hashCode value of the key. After the hashCode is calculated, the above method is used to mold the number of buckets and record the data into a bucket. Of course, modulating is what java7 does, and java8 has been optimized to do it more subtly, because our length is always 2 to the NTH power, so after a resize, the record of the current position either stays the same or just moves the length forward. So the resize of a HashMap in java8 does not require a recalculation of hashCode. We can abstract the algorithm by looking at the calculation methods in java7 and then optimize it, depending on the code for details.

2. Put method of HashMap

HashMap put method handling logic (Java8)Copy the code

The figure above shows the logic of the put method in java8, with more red-black tree parts than in java7, and optimizations in some details. The put logic is the same as in java7.

3. Resize mechanism

The capacity expansion mechanism of a HashMap is to reapply an array of buckets twice the size of the current one, remap the original records to the new bucket one by one, and null the original bucket one by one to invalidate the reference. As we will see later, HashMap is not thread safe because of the problem with resize.

3.1. Why are HashMap threads unsafe? (JDK7 version)

As mentioned above, HashMap does resize operations, which can cause thread insecurity. Here are two possible places where thread insecurity can occur.

1. Multi-thread data inconsistency caused by PUT. Well imagine this problem, such as A and B have two threads, first A want to insert A key – the value of the HashMap, firstly calculates the index records to fall to the barrel coordinates, and then get to the header node, the inside of the bucket chain thread A ran out of time right now, while the thread B be scheduled to perform, as well as thread A execution, Thread B success just insert records into the barrel inside, assuming that thread A insert record calculated index of barrels and thread B to insert record calculated index of barrel is the same, so when A thread B success after insertion, thread A was scheduled to run again, it is still holding the expired head but it don’t know anything about it, that it says it should do so, This overwrites the records inserted by thread B, and the records inserted by thread B disappear into thin air, causing inconsistent behavior.

Get (cpu100%); get (cpu100%); get (cpu100%);

The following code is the core of resize:

This is how JDK7 is implemented, not jdK8.

// The function of this method is to recalculate the old record in the new bucket location, and then migrate over. void transfer(Entry[] newTable, booleanrehash) {  
        int newCapacity = newTable.length;  
        for (Entry<K,V> e : table) {  
            while(null ! = e) { Entry<K,V> next = e.next;if (rehash) { e.hash = null == e.key ? Zero:hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

Resize of a multithreaded HashMapCopy the code

Let’s assume that two threads need to resize at the same time. Our original number of buckets is 2, and the number of records is 3. We need to resize buckets to 4, and the original records are [3,A],[7,B],[5,C]. Suppose thread thread1 executes Entry NEXT = e.next of the Transfer method, and then the time slice runs out. At this point, e = [3,A], next = [7,B]. Thread2 is scheduled to execute and completes resize successfully. It should be noted that next of [7,B] is [3,A]. When thread1 is scheduled to run again, the reference held by thread1 is the result of having been resized by Thread2. Thread1 will migrate [3,A] to the new array and then process [7,B], which is linked to the next array of [3,A]. Next of [7,B] changes to [3,A], at which point, [3,A] and [7,B] form A circular linked list. During get, if the bucket index of the key of GET is the same as that of [3,A] and [7,B], then it will fall into an infinite loop.

If the list is fetched from the beginning (now from the tail), the order between the nodes is guaranteed, so there is no such problem. Taken together, the above two points suggest that HashMap is thread-unsafe.

3.2 Why is HashMap insecure? (JDK8 version)

If you read the source code of JDK1.8, you will find that there is no transfer function, because JDK1.8 directly completed the data migration in the resize function. By the way, JDK1.8 uses tail inserts for element inserts.

JDK1.8: data overwrite (); put: data overwrite ();

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null) // If nohashTAB [I] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p;else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }}if(e ! = null) { // existing mappingfor key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

Copy the code

Which judge whether there is the sixth line hash collision, assuming that A and B are two threads to put in operation, and the hash function to calculate the insert subscripts are the same, when A thread A performed after the 6th line of code to be suspended due to run out of time slice, and thread B time slice after insert the elements in the subscript place, completed the normal insert, Then thread A obtains the time slice. Since the judgment of hash collision has been made before, thread A will not judge at this time, but directly insert. As A result, the data inserted by thread B will be overwritten by thread A, making thread unsafe.

Also, there is the 38th lines have A + + code size, we think so, or the thread of A and B, the two threads simultaneously put operation, assuming that the current HashMap zise size of 10, when the thread A execution to 38th lines of code, from main memory size has A value of 10 + 1 after preparation for operation, Thread B happily gets the CPU from main memory and writes size=11 back to main memory. Thread A then gets the CPU again and continues to execute (size is still 10 at this point). Thread A and thread B both performed A put operation, but the value of size increased by 1.

Conclusion: Thread insecurity of HashMap is mainly reflected in the following two aspects:

1. In JDK1.7, simultaneous expansion operations cause ring chains and data loss.

2. In JDK1.8, data overwrite occurs when concurrent PUT operations are performed.