preface

A few days ago, I had a chat with my friends, and we were very interested in talking about Map. Among them, ConcurrentHashMap is not very clear to myself, so I write this article to check the gaps.

Map is one of the Java classes we encounter most frequently in our daily work, and it’s no. 1. However, we all know that common Hashmaps have a lot of flaws in concurrent scenarios, such as thread safety issues and expansion dead loops. Although dead loops have been fixed in Java8, thread safety issues are still difficult to solve. You might think of thread-safe key-value collections like HashTable. But the efficiency is very low, and may not be suitable for the current environment of high-paying and talking all day. Is there really no thread-safe, efficient key-value collection? Yes, this is ConcurrentHashMap. A word of warning: there is no source code analysis in this article, because I talked to friends who wanted to learn how to interview in a short time, so this article will cover the whole ConcurrentHashMap in the form of an overview. If you want to see the source code analysis, the next one will be. In the process of telling the story will be interspersed with a variety of interview questions, I think the arrangement is good, if there are mistakes also hope for more guidance.

The body of the

The structure of ConcurrentHashMap

ConcurrentHashMap uses the same array + linked list + red-black tree as HashMap. Compared with Java7, Java8 uses synchronized and CAS to ensure that ConcurrentHashMap can be added or deleted in concurrent scenarios. Doug Lee, the ConcurrentHashMap wizard, has set the size and threshold of a HashMap to final and cannot change it.

ConcurrentHashMap has the same default size as HashMap (16), loading factor (0.75), and expanded hash table size (twice the original size). The storage structure of ConcurrentHashMap is still Node structure, which contains keys, values, Pointers to the next Node and hash values. Next addresses the use of linked lists where one element points to the next node when the element store conflicts.

Disgusting sizeCtl

SizeCtl, sizeControl, which runs through the entire ConcurrentHashMap. Why is it disgusting, because not only does it have various values, but the various values can be divided into various cases in different ways. So let’s briefly understand the meaning of each value, the specific embodiment will be brought out in the specific method.

  1. SizeCtl = 0, the default value, using DEFAULT_CAPCITY as the size when creating the table array.
  2. sizeCtl < 0
    • -1: The table is being initialized (some thread is creating the table array). The current thread needs to spin wait.
    • Other negative numbers: indicates that the table array is being expanded. 16 bits higher indicates that the table array is being expanded. 16 bits lower indicates the number of threads (1 + nThread) participating in concurrent expansion.
  3. sizeCtl > 0
    • If the table is not initialized, the initialization is small
    • If the table is already initialized, the threshold for the next capacity expansion (CapCity * 0.75)

So what? Gross you out? Now let’s go into methods and see what these values do under what circumstances.

Put add method

First, ConcurrentHashMap, like HashMap, is lazy-loaded. That is, when new comes out, no space is allocated, and when it is used, the size is allocated. So when we run into the PUT method and find that the current map is empty, we will go into the initTable method area and initialize it. If the sizeCtl value is smaller than 0 during initialization, the probability is -1. At this time, other threads may be creating tables, and the current thread enters spin waiting and temporarily releases the CPU occupied by it, allowing the current thread to re-compete for CPU resources. To make CPU resources available to other, hungrier threads. If it’s not less than 0 then I’m going to make the next judgment, and I’m going to try to set sizeCtl to -1 with CAS. If set successfully indicates the initialization performed by the current thread, start creating hash array size. If not, exit.

Initialization is done, but what if it has already been initialized? Then we must compute the hash value of the element to be inserted and the index of its position in the current hash table. If the value of the bucket bit is null, the current bucket bit has no value. Insert the bucket bit directly. Otherwise, the bucket bit has a value. Then you need to determine the type of the value. If the hash value of the bucket head node is MOVE (an identifier). In this case, the current node is FWD node (the migrated elements are marked as FWD nodes), which means that the map is being expanded, and the thread is pulled to help expand the map. Expansion migration is segmtioned, and a thread may take a segment element to migrate. (For example, if you go to your friend’s house and don’t think they are making dumplings, you have to help them make dumplings.)

Special cases such as empty buckets and initializing are said, so the next case is normal. If the current bucket bit has a value, it is a conflict. Conflict no more than two cases: one, the formation of linked list; Two forms a red-black tree. This is where the safety effects come in. ConcurrentHashMap locks the current bucket bit with sync to ensure that operations in the current bucket bit are synchronized. Start with the list: if the value in the list is found to be different from any other value, append it to the end of the list, and replace and return the old value if it keeps going. Red-black trees are next. If the current bucket header is of type treeBin, add red-black tree nodes. (I feel that the interview will not be too serious, if you really want to know about the source can see my next explanation).

So, how does concurrent map guarantee write data security?

From the above description, we can calculate the position of the element to be added. If the position has no value in the hash table, we can add it through CAS. If there is a value, the bucket head node is locked through sync to ensure that the bucket is operated synchronously.

capacity

In ConcurrentHashMap and HashMap, the real difficulty is adding elements and scaling, and there are many scenarios involved in both. Looking for get is nothing more than looking for elements and not finding them. So it’s time to expand. The difficulty is almost gone. But it’s rare that an interviewer will stick to it unless there’s an offer (which is possible). Because I think the interviewer almost thinks you’ve read it and there’s no need to go on about it. Wouldn’t it be nice to spend that time on scenarios and algorithms (manual dog heads)? But, you know, since we’re doing a feature, let’s do it, and then we’ll talk about the gross side of expansion.

The concurrent map expansion has one of the longest spin codes ever, which is pretty gross. We can break it down and analyze it step by step. If you want to know more, you can read my next source code analysis (I only slept 3 hours on the day I wrote the source code).

First of all, we need to calculate the stride length, which means that in data migration, if the migration is slow one by one, then it is faster and easier to migrate one by one. This interval is the step size. If the current thread is the one that triggered the expansion, it creates a new array twice the size of the original hash array. Assign the new array to the global variable nextTable so that subsequent helper threads know where to migrate data. And record the overall data migration progress ferIndex. (Concurrent map migrates from subscript large elements, so transferIndex = original array length). The migrated elements are marked as FWD nodes. FWD has a field that points to the new table and provides a find method to query the new bucket. If the current thread is not the thread that triggered the expansion, it is the helper thread. ConcurrentHashMap then assigns the helper thread a migration task of the aforementioned step size, knowing which block to migrate to and to which (nextTable). If you migrate to an empty node, you only need to set it to FWD via CAS. If the bucket has a value, then sync is used to lock the bucket. If the bucket has a linked list, then the node position in the new table is redefined by high and low bits, and elements are dispersed. If it is a red-black tree, the bidirectional linked list of the Treebin agent is iterated, recalculating the position of the elements in the new array. See the next source code analysis for details.

So we know how to expand capacity, how to safely expand capacity, how to expand multiple threads (one triggers capacity expansion, the other assists in migrating data).

Get query method

The query is very simple, which is to calculate the hash value of the key of the element to be queried, and then use this value to find the bucket. If the current bucket value is NULL, null is returned. If the value is not null, check whether the current bucket value is the same as the element to be searched. If yes, the value is returned. No, we want to see if the bucket hash is less than 0. < 0 or -1 indicates expansion. The element has been migrated. Call the find method of the FWD inner class. Or -2, in a red-black tree, call the red-black tree’s find method. Finally, there is the case of linked lists. The linked list is iterated, returns null if it is not found.

Tricky little question

How to handle write requests during capacity expansion?

If the bucket accessed by the write operation has not been migrated, the lock of the bucket can be obtained first, and then the bucket can be inserted normally. If the bucket head node accessed by the write operation is the FWD node, the capacity is being expanded. As a concurrent map, of course, the faster the better. The ConcurrentHashMap expansion method pulls non-expansion threads to work and assigns them tasks. Plan the current task interval according to the global variable transferIndex, such as elements between subscripts [20,25] you can carry. After the migration, the next batch of tasks are assigned according to the global transferIndex. When the current thread can no longer allocate tasks, the capacity expansion is almost complete.

How does the expansion worker maintain the lower 16 bits of sizeCtl during expansion?

Before each thread performs a capacity expansion task, it updates the lower 16 bits of the sizeCtl to make the lower 16 bits + 1 to record the remaining threads. If the thread cannot allocate the task before exiting, it will update the sizeCtl again, making it 16 bits lower by 1.

What happens when Node.hash is negative

  1. If the current node is FWD, the hash value is -1
  2. The current node is a proxy Treebin node of the red-black tree type, and the hash value is -2

What if the bucket is upgraded to a red-black tree and there is a reader thread on the tree?

The TreeBin has a state property, and each reader thread CAS the state value to +4 before reading the data, and CAS the state value to -4 after reading the data. The state field is checked before the writer thread writes data to the red-black tree structure. Let’s see if state is 0. If 0, the tree is not accessed by any reader thread, and the writer thread uses CAS to set state to 1. Other threads can’t manipulate the tree. If it is not 0, a thread is accessing the tree and the write cannot be performed. The writer Thread exposes its Thread reference to the TreeBin object and sets the second bit of state to 1. Indicates that a writer thread is in the waiting state. The writer thread then suspends itself using the locksupport.park () interface.

When do you wake up?

At the end of the read, the reader thread checks to see if state is 2. If state is 2, a writer thread is pending. If = 2, a writer thread is waiting, and the reader calls unpark() to wake up the writer thread.

What if the reader thread comes and finds state = 1?

At this point we need to know that treeBin is a red-black tree with a bidirectional list structure. Since I can’t access a red-black tree, I’ll access a linked list.

conclusion

ConcurrentHashMap overview is complete, I am tired, only 3 hours sleep, source code article 30% short, going to liver, must liver down tonight. The above is the interview to say, I feel already very detailed, really will not ask so. So if you are interested and feel that some points are not made too clear, check out my source code analysis article for answers.

Where there is life, there is more.