An overview of the

ConcurrentHashMap is recommended for concurrent multithreading. What is ConcurrentHashMap? What is its design idea, how is the source code implemented?

What is ConcurrentHashMap

Concurrent translates to Concurrent and is literally used as a HashMap for handling Concurrent situations, a recap before introducing it. We learned from the previous two sections that a HashMap is unsafe for multithreaded concurrency (e.g., an infinite loop). More generally, a HashMap is unsafe for multithreaded concurrency (e.g., an infinite loop). Since the heap memory is shared by each thread, and the HashMap put method is not an atomic operation, let’s assume that Thread1 puts first. Then sleep for 2 seconds, during which time the value is changed by Thread2. When Thread1 “wakes up” and gets again, it finds that the value is no longer the original value, which can cause problems.

So how to avoid this multi-threaded “Audi into Otto” situation? The common idea is to lock the HashMap put method (synchronized) and ensure that only one thread can write to the HashMap at a time. However, if thread 1 takes a long time to operate and doesn’t get out of the pit for half a day, other threads that need to operate on the HashMap will have to queue up at the door for half a day, which can seriously affect the user experience (which is what HashTable does). In addition to saving money, many banks also support saving and saving valuables, which are kept in safe deposit boxes.

A thread is a person, as follows:

  • HashMap Bank: Our service is designed to eliminate queues and allow multiple people to change the contents of safes at the same time. You think you’re saving dollars? What they took out was yen, and the bankruptcy was instantaneous.

  • HashTable bank: Our service is to wait in line so that only one person at a time has a chance to change the contents of the safe, and the rest of us can only see and not make changes. What? What do you think happens if the guy falls asleep in there and doesn’t come out? Don’t worry. Here, sit down and play mah-jongg until he comes out.

Using HashMap in multithreading is too uncertain and has the risk of bankruptcy, so it cannot be selected. HashTable is not bankrupt, but the user experience is not very good, so how can multiple access without affecting others to save value, without queuing? Some people propose to establish a “bankers’ union”. It would be good to open more banks like HashTable with locks. As many people handle business, we can open as many banks, providing one-to-one services.

Here’s what happened next: So Galen and Yasso go to the bank to deposit their big sword, and the league of Bankers says to Galen, bank Number one is empty, you can deposit your big sword there, you don’t have to wait in line, and Galen goes to Bank Number one to deposit his big sword, and bank Number one takes Galen in, pulls the brakes, and runs, Then put Galen’s great sword in the x safe, row X, and wait until Galen has left before opening the lock. In the same way, the Union of Bankers said to Yasso, bank No. 2 is empty now, so you can go there to deposit your sword without waiting in line. Then Yasso went to Bank No. 2 to deposit his sword, and bank No. 2 took Yasso in and immediately pulled the gate. No matter how long Galen and Yasuo stay in their respective banks, they don’t have to worry about their swords being swapped. So that’s the idea of ConcurrentHashMap, just a graph

As you can see from the figure above, the corresponding individual banks are locked, not the entire “bankers union”. Analyze the characteristics of this design:

  • A “banker’s union” of banks

  • When someone comes to do business, the bankers’ union needs to determine which bank that person goes to

  • When this person goes to the designated bank to handle business, the bank is locked, other people can not perform modification operations at the same time, until this person leaves the unlock

These basic ideas can trigger some thinking, such as:

1. How many banks did you first know when you set up the “Bankers’ Union”? How to design reasonably?

The above chart does not give the conclusion of whether there is a need to queue, because it needs to be analyzed based on the actual situation. For example, there are 16 banks at the beginning and only two people to handle business, so there is no need to queue. If 16 banks have people doing business and the 17th person shows up, he still has to wait in line. As the “bankers’ Union” cannot know how many people will come to handle business in advance, so it needs to establish a “standard” at the time of its establishment, that is, the initial number of banks. In the case of a large number of people, the “bankers’ Union” should open more banks to avoid others queuing. Fewer people should be less open, to avoid wasting money (what, you say not poor money? That won’t do either)

2. When someone comes to do business, how does the bankers’ union determine which bank the person goes to?

Normally, if all the banks were unlocked, there would be no queue for anyone to go anywhere, and if some of the banks were already locked, the League of Bankers would not be able to direct customers to the locked banks, otherwise they would be hammered into muggles in minutes. Therefore, the “bankers union” needs to keep a clear head at all times, know their bank’s idle situation, and recommend the best choice to users every time.

3. How can the “bankers’ Union” guarantee that no two people will have access to the same bank at the same time?

This is done by locking/unlocking the specified bank.

2

Source code analysis

Java7 source code analysis

Through Java7 source code analysis code implementation, first look at some important members

This is going to be a little bit of a shock, but I’m going to cover it.

Next, we will start with the simplest cognition

The default constructor calls a constructor that takes three arguments

There are so many temporary variables defined and so few comments that at first glance you don’t know what the hell they mean, but we can plug in the data we already know, figure out the values of these variables, and then analyze it to see if there’s something wrong with them. Assume this is the first default creation:

  • Step 1 concurrencyLevel = 16, sshift = 4, ssize = 16, segmentShift = 28, segmentMask = 15;

  • Step 2 c = 16/16 = 1, cap = 2;

  • Segment array segments [0]; Segment array segments [0]; Segment array segments [0]; Now look at the initialization of ss. Ssize is 16, so the default array length is 16, so it feels exactly the same as the concurrencyLevel we passed, right? Look at the example below

ConcurrencyLevel is not necessarily the length of the final array.

Length = 2 ^ n (2 ^ n >= concurrencyLevel)

Create Segment array with length 16 and initialize position 0 of Segment array

Now let’s look at the put method

If the key/value value is null, a null pointer exception is reported. Step 2 first calculates the hash value based on the key value, and then calculates which Segment the key should be placed in with the previously calculated two variables (if you are interested in how to calculate the Segment, you can study it first, then take and). Assume that the key value pair should be placed in number 5, and step 3 determines that number 5 is empty. Take a look at the ensureSegment() method

Segment [0], which copies segments[0], is the same configuration as Segment [0]. Looking UNSAFE, the new Segment must have the same configuration as Segment [0].

Now that you have “stage”, please start your performance by looking at the Segment put method


The above put method is essentially the same as in the Java7 HashMap, except that the lock/unlock step is added to ensure that only one thread can modify at a time. Step by step to analyze the above process:

  • Step 1 Run the tryLock method to obtain the lock. If the lock is obtained, return null. If the lock is not obtained, run the scanAndLockForPut method.

  • Step 2 is the same as the set of ideas in the HashMap. If you do not understand, please refer to the introduction of the previous article (case 2 is introduced below).

  • Step 3 Use unLock to unLock the device

If Thread1 now comes in and no one has been there before, it can successfully acquire the lock and calculate that the key/value pair it wants to store should be placed at position 0 of HashEntry[], which is empty. Create a HashEntry and place it at position 0 using the setEntryAt() method. However, before Thread1 can release the lock, the system time slice is cut to Thread2

Segments [5], Thread2 tries to acquire the lock, and fails. Execute scanAndLockForPut() :

Thread2 does not have access to the lock, but it does not have access to the lock. It uses the lock time to calculate the location of the key/value pair in the array, so that when Thread2 takes the lock, it can locate it immediately. For example, Thread2 finds that the value it wants to change is in position 0, but in Thread1 it changes the value to position 1.

Suppose Thread2 put value is (” yasso “, “98”), corresponding to position 1, then in the scanAndLockForPut method corresponding to situation ①, drawing archive:

Segment[5] : select * from Segment[5] : select * from Segment[5] : select * from Segment[5] : select * from Segment[5];


As we learned from HashMap expansion in the previous chapter, the new location of the key-value pair after expansion is either the same as the original location or equal to the original location + the length of the old array, so draw a graph to understand the reason for the above code:

Premise: The current length of HashEntry[] is 8, and the threshold is 8*0.75 = 6, so the 7th key pair of PUT needs to be expanded, and the positions of Galen and Yasso remain unchanged before and after expansion, and the positions of Glamour and Carter need to be added to the original array length, so the above code flow is executed:






In the above code, nodes that need to be transferred before and after capacity expansion are identified, transferred first, and then the remaining nodes on the chain are transferred. The reason for this is to achieve reuse effect. It is also stated in the note that only 1/6 nodes need to be cloned under the default threshold. Note that so far, Java7 has adopted the header insertion mode for both expansion migration and new nodes. The flow chart is as follows:

In contrast, the get method does not lock/unlock operations, so the code is relatively simple and will not be analyzed.

A little bit about Java8

Java8 has some major differences from Java7, such as eliminating the Segments array and allowing concurrent expansion.

Let’s look at the initialization of ConcurrentHashMap

Unlike Java7, this is an empty method, so how does it initialize? Let’s look at the put method


The code is a bit long and may cause discomfort at first glance, mainly due to the introduction of red-black tree judgment and manipulation, as well as thread-safe manipulation. If key/value is null, a null pointer exception is reported. This is also a significant difference from HashMap.

Note 1.

Call initTable to initialize the array

The put method does not lock, so how does it ensure concurrency safety when creating a new table? SizeCtl defaults to 0. When one thread initializes the array, it changes sizeCtl to -1. This change is visible to other threads because it is volatile. The code above sees that the subsequent thread decides sizeCtl is less than 0 and gives up execution.

Note 2.

Java8 does away with segments and instead locks individual positions in arrays. When the node at the specified location is not null, the situation is similar to the Java8 HashMap operation in that new nodes are added tail-inserted.

Note 3.

Java8 ConcunrrentHashMap supports concurrent capacity expansion. Previously, it was always a thread transferring key-value pairs from an old array to a new array. Of course, the corresponding concurrent processing control logic is more complex, expansion transfer by transfer method to complete, Java8 in the method is very long, interested can look at the source…

Use a diagram to represent what the Java8 ConcurrentHashMap looks like

3

conclusion

Through the analysis of source, compared the difference of the HashMap and ConcurrentHashMap, and Java 7 and Java8 ConcurrentHashMap design is different, of course, a lot of pit without filling, such as the call a lot of UNSAFE methods of CAS, Can reduce the performance of consumption, usually rarely used, understand less; And the specific principle and implementation of red black tree, then slowly fill in…