Hashtable is synchronized; HashMap does not consider synchronization. So a HashMap is efficient in a single thread. In the multi-threaded case of Hashtable, synchronous operation can ensure the correct execution of the program.

But every time a Hashtable synchronization is executed, the entire structure is locked. See below:

As is clearly marked on the left side of the figure, the entire structure is locked each time.

ConcurrentHashMap was created to solve this problem.

The way the ConcurrentHashMap lock works is slightly fine-grained. ConcurrentHashMap Divides the hash table into 16 buckets (default value). Common operations such as GET, PUT, and remove lock only the buckets that are currently used.

Imagine that 16 writer threads can enter simultaneously from a single thread (the writer thread is locked, while the reader thread is almost unlimited, as discussed later), and the increase in concurrency is obvious.

Even more surprising is the concurrency of the reads in ConcurrentHashMap, since most of the reads are read without locking, so the reads are almost entirely concurrent, while the write locking is very fine-grained and much faster than before (this becomes more apparent when there are more buckets). You only need to lock the entire table for operations such as size.

When iterating, ConcurrentHashMap uses another iteration method, called weak consistent iterator, which is different from the fast failure iterator of traditional sets. In this kind of iteration method, when the iterator is created after the collection and change is no longer throw ConcurrentModificationException, instead of changing the new new data which does not affect the original number According to, The iterator thread can use the old data, and the writer thread can make changes concurrently. More importantly, this ensures continuity and scalability for concurrent execution by multiple threads, which is a key to improved performance.

The source of ConcurrentHashMap is analyzed below. It mainly analyzes the segments in it. Because operations are basically on segments. First look at the definition of the data inside the Segment.

 

As you can see from the figure above, an important one is the table variable. Is an array of Hashentries. A Segment stores data in this array. In addition to this quantity, there are variables such as loadfactor and modcount.

Segment get function

Add hashEntry code:

As you can see, hashEntry is a chain-type data structure.

In the segment get function, getFirst is used to get the first value, and next is used to find the desired object. Returns if it is not empty. If it is null, another thread may be modifying the node. For example, a weakly consistent iterator changes a pointer to a new value. The previous get operations were not locked. According to Bernstein’s condition, either read or write will cause data inconsistency, so we need to lock the e again and read it again to ensure that we get the correct value. ReadValueUnderLock uses lock() to lock.

The put operation locks the entire segment already started. This is because modification operations cannot be performed concurrently.

The same is true for remove (similar to put, which locks the segment in the first place).

But notice the difference, what does that for loop in the middle do? (The screenshot is not complete, you can look for the code to see). From the point of view of the code, it is to clone all the entries after positioning and put them back to the front, but is that necessary? Every time you delete an element you have to clone the previous element, right? This is determined by the immutability of entry. A close look at the entry definition shows that all attributes except value are final. This means that after the next field is set for the first time, it cannot be changed. As for why Entry is made immutable, it has to do with the fact that immutable access does not need to be synchronized to save time.