This analysis is based on JDK1.7’s HashMap

Problems with HashMap in multithreaded situations

  1. After a multithreaded PUT operation, the GET operation causes an infinite loop.
  2. Multithreading put operation, causing element loss.

Endless loop scene reappears

public class HashMapTest extends Thread {

    private static HashMap<Integer, Integer> map = new HashMap<>(2);
    private static AtomicInteger at = new AtomicInteger();

    @Override
    public void run(a) {
        while (at.get() < 1000000) { map.put(at.get(), at.get()); at.incrementAndGet(); }}public static void main(String[] args) {
        HashMapTest t0 = new HashMapTest();
        HashMapTest t1 = new HashMapTest();
        HashMapTest t2 = new HashMapTest();
        HashMapTest t3 = new HashMapTest();
        HashMapTest t4 = new HashMapTest();
        HashMapTest t5 = new HashMapTest();
        t0.start();
        t1.start();
        t2.start();
        t3.start();
        t4.start();
        t5.start();

        for (int i = 0; i < 1000000; i++) { Integer integer = map.get(i); System.out.println(integer); }}}Copy the code

Repeat several times, if this is the case, it is an infinite loop:

As can be seen from the above, ThREAD-7 caused an infinite loop due to the expansion of HashMap.

A HashMap analysis

Expand key capacity source code

1    void transfer(Entry[] newTable, boolean rehash) {
2        int newCapacity = newTable.length;
3        for (Entry<K,V> e : table) {
4            while(null! = e) {5                Entry<K,V> next = e.next;
6                if (rehash) {
7                    e.hash = null == e.key ? 0 : hash(e.key);
8                }
9                int i = indexFor(e.hash, newCapacity);
10               e.next = newTable[i];
11               newTable[i] = e;
12               e = next;
13           }
14       }
15   }
Copy the code

Normal expansion process

Let’s look at the normal rehash process in the single-thread case:

  1. Suppose our hash algorithm is simply a key mod for the table size (that is, the length of the array).
  2. The top hash table is old, where size=2, so key=3,5, and 7 collide with table[1] after mod 2.
  3. Then expand the HASH table, resize=4, and HASH all

    as follows:
    ,value>

In the single-threaded case, everything looked great and the expansion process went quite smoothly. Now let’s look at capacity expansion in the concurrent case.

Capacity expansion in concurrent cases

  1. There are two threads, highlighted in red and blue.

  2. Line 5 of thread 1 is suspended by the CPU to execute thread 2, and thread 2 completes all the above code. So let’s look at the state at this point

  1. The CPU then switches to thread 1 and executes lines 4-12 (line 5 has been executed), starting with Entry 3:

Now all entries in the table are up to date. Next = 3; next = null; Now that the first loop has been completed, 3 has been placed.

  1. Here’s what happens next:
    • e=next=7;
    • e! =null, the loop continues
    • next=e.next=3
    • E.ext 7 next points to 3
    • Place Entry 7, now as shown:

  1. After placing 7, run the code:
    • e=next=3;
    • Check not empty and continue the loop
    • Next = e.next = null
    • E.ext =7, next of 3 points to 7.
    • Place the Entry 3, and the state is shown in the figure

At this time, there is an infinite loop. 3 moves the nod position of the section and points to Entry 7. Before that, the next of 7 also pointed to Entry 3.

  1. The code continues, e=next=null, at which point the conditional judgment terminates the loop. The expansion is over. However, if there are subsequent queries (no matter for query iteration or expansion), they will hang in the position of table[3]. Now looking back at the Demo at the beginning of the article, it is hanging on the method of transfer in the expansion stage.

The key cause of the problem is as follows: If two adjacent entries before expansion are still assigned to the same table after expansion, an infinite loop will occur. In complex production environments, this situation, while uncommon, can occur.

Multithreading put operation, causing element loss

Let’s talk about missing elements. This time we choose the order of 3, 5 and 7 to demonstrate:

  1. If the thread is suspended by CPU scheduling as soon as line 5 is executed:

  1. Thread 2 completes:

  1. Insert Entry 7 into thread 1:

  1. Then place 5 entries:

  1. As next in 5 is null, the expansion is complete, and Entry 3 is lost.

Improvements to JDK 8

JDK 8 uses the bit bucket + list/red-black tree approach. When the length of a bit bucket list exceeds 8, the list is converted to a red-black tree

(JDK 8 uses head and tail to keep lists in the same order; JDK 7 Rehash inverts linked list elements), but also has drawbacks such as data loss (concurrency itself). Therefore, ConcurrentHashMap is recommended for multithreading

Why are threads unsafe

There are two main problems with HashMap concurrency:

  1. If multiple threads add elements using the put method at the same time, and assuming that there are exactly two put keys that collide (the same bucket as the hash value), the implementation of the HashMap adds both keys to the same location in the array. This will eventually cause one of the threads put data to be overwritten

  2. If multiple threads simultaneously detect that the number of elements exceeds the array size * loadFactor, then multiple threads will simultaneously expand the Node array, recalculating the element position and copying the data, but only one thread will eventually assign the expanded array to the table, which means that the other threads will lose the array. And the data put by the respective threads is also lost