The evolutionary history of ConcurrentHashMap

1. Learning Objectives

1. Why are HashTable containers inefficient

How to implement thread-safe operation in JDK 1.7 ConcurrentHashMap

3. What are the similarities and differences between the ConcurrentHashMap of JDK 1.8 and that of JDK 1.7

Inefficient HashTable containers

The HashTable container uses synchronized to ensure thread-safety, but HashTable is very inefficient in the context of competitive threads.

Because HashTable(the same lock) uses a global lock, when one thread accesses the synchronization method of HashTable, other threads may enter a blocking or polling state while accessing the synchronization method of HashTable. For example, thread 1 uses PUT to add elements, thread 2 cannot use put to add elements, nor can it use GET to get elements. Therefore, the more fierce the competition, the lower the efficiency.

JDK 1.7 ConcurrentHashMap

1. Lock segmentation technique for ConcurrentHashMap

The reason a HashTable container is inefficient in a competitive concurrent environment is that all threads accessing a HashTable must compete for the same lock.

If there are multiple locks in the container, and each lock is used to lock one part of the data in the container, there will be no lock competition between threads when accessing different data segments in the container, thus effectively improving the efficiency of concurrent access. This is the lock segmentation technology used by ConcurrentHashMap.

The core of lock segmentation is to divide data into sections for storage, and then assign a lock to each section of data. When a thread uses the lock to access one section of data, the data of other sections can also be accessed by other threads.

2. ConcurrentHashMap structure

ConcurrentHashMap Important attributes:

SegmentMask: Segment mask, used in conjunction with segmentShift to locate segments
SegmentShift: Segment offset, used in conjunction with segmentMask to locate segments
Segments: segments [] an array consisting of segments (array + linked list)
RETRIES_BEFORE_LOCK: maximum 2 attempts before the locking method is adopted, used in ConcurrentHashMap#size() and ConcurrentHashMap#containsValue()

ConcurrentHashMap consists of an array of segments [], which consist of a HashEntry[] array

Segment: is a ReentrantLock that acts as a lock in a ConcurrentHashMap
HashEntry: stores key-value pair data

The structure of a Segment is similar to that of a HashMap. It is an array + linked list structure. The corresponding source is array (table), linked list (HashEntry). When modifying the data in the HashEntry[] array, you must first obtain its corresponding Segment lock.

static final class Segment<K.V> extends ReentrantLock implements Serializable {

	// An array of data
	transient volatile HashEntry<K,V>[] table;

	// The size of the table array
	transient int count;

	// Number of times a segment has been modified (e.g. Put or remove)
	transient int modCount;

	// The capacity expansion threshold
	transient int threshold;

	// Load factor
	final float loadFactor;

	Segment(float lf, int threshold, HashEntry<K,V>[] tab) {
		this.loadFactor = lf;
		this.threshold = threshold;
		this.table = tab; }}Copy the code

A HashEntry is a linked list, where value and next are volatile to ensure visibility

static final class HashEntry<K.V> {
	final int hash;
	final K key;
	volatile V value;
	volatile HashEntry<K,V> next;

	HashEntry(int hash, K key, V value, HashEntry<K,V> next) {
		this.hash = hash;
		this.key = key;
		this.value = value;
		this.next = next; }}Copy the code

3. Initialize ConcurrentHashMap

Constructor preview:

The ConcurrentHashMap initialization method uses the following three parameters by default:

InitialCapacity the initialCapacity is 16
The loadFactor is 0.75
ConcurrentLevel (concurrency level) is 16

Calculate the length of the segments array ssize:

From the source can be seen:

Segments array length ssize is calculated using concurrencyLevel. By default, the length is 16. The segments array must have a length of 2 to the power of N, so ssize must be greater than or equal to the lowest power of 2 of concurrentLevel.
By default:
- SegmentMask: segmentMask. The default value is 15
- SegmentShift: Segment offset, 28 by default
- Ssize: specifies the length of the segments array. The default value is 16

4. Put operation of ConcurrentHashMap

Let’s start with an example:

AtomicLong atomicLong = new AtomicLong(); Map<String, String> map = new ConcurrentHashMap<>(); // Create a thread pool of fixed size 5. The size of the fixed-length thread pool is best set based on system resources. Such as the Runtime. GetRuntime (). AvailableProcessors () the ExecutorService fixedThreadPool = Executors. NewFixedThreadPool (10); for (int i = 0; i < 8; i++) { fixedThreadPool.execute(new Runnable() { @Override public void run() { String s = RandomStringUtils.randomAlphanumeric(8); String v1 = map.put("k1", s); System. Out.println (" is the first "+ atomicLong. IncrementAndGet () +", "s =" + s + ", Thread = "+ Thread. The currentThread (). The getName () +", Print v1=" + v1); }}); }Copy the code

Running results:

The first time, thread 3 puts cpkSeFpP and returns null. The second time, thread 5 puts cpkSeFpP and returns the value that thread 3 put in the first time.

Now look at the source code

In step 2 above, the element’s hashCode is re-hashed once. The purpose of rehashing is to reduce hash conflicts, so that elements can be evenly distributed in different segments, so as to improve the access efficiency of containers. If the hash quality is extremely poor, then all the elements are in one Segment. Not only is accessing elements slow, but Segment locking is meaningless.

In step 4 above, call ConcurrentHashMap#ensureSegment() to locate the Segment and ensure that the Segment is initialized

// Retrieve Segment from segments array based on index, and return Segment if Segment already exists. Private Segment<K,V> ensureSegment(int K) {final Segment<K,V>[] ss = this.segments; long u = (k << SSHIFT) + SBASE; // raw offset Segment<K,V> seg; Segment if ((seg = (Segment<K,V>) unsafe. getObjectVolatile(ss, Segment<K,V> proto = ss[0]; // use segment 0 as prototype int cap = proto.table.length; float lf = proto.loadFactor; int threshold = (int)(cap * lf); HashEntry<K,V>[] tab = (HashEntry<K,V>[])new HashEntry[cap]; // unsafe. getObjectVolatile(ss,); // unsafe. getObjectVolatile(ss,); u)) == null) { // recheck Segment<K,V> s = new Segment<K,V>(lf, threshold, tab); // Spin insert, Successful retreat to the while ((seg = (Segment > < K, V) UNSAFE. GetObjectVolatile (ss, u)) = = null) {if (UNSAFE.com pareAndSwapObject (ss, u, null, seg = s)) break; } } } return seg; }Copy the code

Insert element is at real ConcurrentHashMap. Segment# put () method, because of the need to write operation of Shared variables, so for thread safety in the operation of the Shared variables must be locked. The source code is as follows:

In the above step 1, if two threads of A and B at the same time operating ConcurrentHashMap. Segment# put (), thread attempts to acquire A lock, acquiring A lock is successful, returns the node = null, then thread A assignment on this node for operation, thread B failed to get the lock, Then call back ConcurrentHashMap. Segment# scanAndLockForPut (), spin locks, without access to the current thread is blocked, when A thread A after operation the lock is released, Thread B retrieves the latest node data that thread A just put (combined with the above example).

ConcurrentHashMap. Segment# scanAndLockForPut (), of which the while (! TryLock ()) is similar to the function of spin lock, which is cyclical to determine whether the lock can be successfully obtained and will not exit the loop until the lock is obtained, preventing the thread that performs the PUT operation from frequently blocking. These optimizations improve the performance of the PUT operation.

Incidentally, expansion mechanism:

Need to expand: Before inserting an element, the PUT operation checks whether the HashEntry array in the Segment exceeds the threshold. If the threshold is exceeded, the array is expanded. Segment expansion judgment is more appropriate than HashMap, because HashMap determines whether elements have reached their capacity after they are inserted. If they have reached the capacity, HashMap will expand. However, it is likely that no new elements will be inserted after the expansion, and then HashMap will perform an invalid expansion.
How to expand capacity: To expand capacity, an array twice the size of the original is created, and the elements of the original array are hash and inserted into the new array. For efficient ConcurrentHashMap, only one segment is expanded, not the entire container.

5. Get operation of ConcurrentHashMap

The ConcurrentHashMap get operation is unlocked. The ConcurrentHashMap get operation is unlocked.

The reason: ConcurrentHashMap#get() specifies that the shared variables used by ConcurrentHashMap#get() are volatile. Volatile variables ensure that they remain visible between threads, can be read simultaneously by multiple threads, and do not read expired values. But it can only be written by a single thread (in one case, it can be written to a value that does not depend on the original value).
Volatile principle: According to the Happen Before principle of the Java memory model, writes to volatile fields precede reads, and even if both threads modify and acquire volatile variables at the same time, get gets the latest value. This is a classic case of using volatile to replace locks.

6. ConcurrentHashMap size operation

If we want to count the size of elements in the entire ConcurrentHashMap, we must count the size of elements in all segments and sum them.

The Segment count is volatile. In multithreaded scenarios, add count of all segments to obtain the ConcurrentHashMap size. Count () {count (); count () {count (); count ();

Therefore, the safest method is to lock all the put, remove and clean methods of all segments during size statistics, but this method is obviously very inefficient. In the process of counting, the probability of the count accumulated before changing is very small, so the ConcurrentHashMap is to try twice to count the size of each Segment by not locking the Segment. If in the process of counting, If the count of the container changes, lock the size of all segments.

So how does ConcurrentHashMap determine if the container has changed during the count? The modCount variable is incremented by one before the elements are operated on in the PUT, remove, and clean methods. Then the modCount variable is compared before and after the size count to see if the container size has changed.

7. Remove ConcurrentHashMap

Remove (); remove (); remove (); remove ();

JDK 1.8 ConcurrentHashMap

Cons of JDK 1.7 ConcurrentHashMap: Security of HashMap concurrency is addressed, but query traversal is still slow when conflicting lists are too long.

JDK 1.8 ConcurrentHashMap optimization: Using the same data structure as JDK 1.8 HashMap, array + linked list/red-black tree, abandoned the original Segment Segment locking implementation, using CAS + synchronized to ensure the security of concurrency.

The ConcurrentHashMap in JDK1.8 uses the volatile keyword for shared variables in the Node class, as in JDK1.7, to ensure the visibility of variables in multithreaded operations.

1. ConcurrentHashMap put operation

When performing the PUT operation, the process is as follows:

If the key and value are null, an exception is thrown. Therefore, neither KV of ConcurrentHashMap can be null.
Check whether the container array is empty and initialize the array if it is empty.
Determine whether the element f to be inserted is first inserted in the current array subscript. If so, insert through CAS.
Determine whether F. hash == -1 is established. If so, it indicates that f is the ForwardingNode node, indicating that other threads are expanding capacity, and expand capacity together.
In other cases, the new Node is inserted into the appropriate position in a linked list or red-black tree.
After the nodes are inserted, the length of the linked list is judged to be more than 8. If so, the linked list is transformed into a red-black tree structure.
Finally, after the insertion is completed, the expansion judgment is performed.

This is the general process of put operation, as you can see, it is more complex than JDK1.7.

2. InitTable of ConcurrentHashMap initializes the array operation

SizeCtl is an object attribute that uses the volatile keyword to ensure concurrency visibility and defaults to 0.

When the put operation is performed for the first time, the Unsafe.compareAndSwapInt() method, commonly known as CAS, is used to change the sizeCtl to -1. Only one thread can successfully change the sizeCtl. Then, the table initialization task is performed.

If another Thread finds sizeCtl<0, it means that another Thread performed the CAS operation successfully, and the current Thread gives up CPU time slices by executing thread.yield () to wait for the table initialization to complete.

3. Expand the helpTransfer help of ConcurrentHashMap

HelpTransfer () == -1 if f hash == -1, f is a ForwardingNode, which means that other threads are adding capacity.

private transient volatile int sizeCtl; Is an object attribute, which can be interpreted as an expansion indicator. The default value is 0. If sizeCtl is -1, ConcurrentHashMap is being initialized or expanded. Calculation rule: – (1 + Number of expanded threads)

In this process, the steps are as follows:

Step 1 verify data of table, Node, and nextTable of node.
Step 2, get an identifier based on the length of the array;
If nextTab is not concurrently modified and TAB is also not concurrently modified, and sizeCtl < 0, the system is still expanding.
SizeCtl + 1 will be used if none of the sizeCtl values are satisfied. Call Unsafe#compareAndSwapInt() to add a thread to help it expand.

4. ConcurrentHashMap addCount determines expansion

In this process, the steps are as follows:

Step 1, update the baseCount value with the CAS method
Step 2: Check whether expansion is required. By default, check = 1.
Step 3: If the capacity expansion conditions are met, determine whether the capacity is being expanded. If yes, expand the capacity together.
Step 4: If there is no expansion, change the sizeCtl value to negative and perform expansion.

5. Get operation of ConcurrentHashMap

JDK 1.8 ConcurrentHashMap#get() does the same thing as JDK 1.7 ConcurrentHashMap#get()

The reason: ConcurrentHashMap#get() specifies that the shared variables used by ConcurrentHashMap#get() are volatile. Volatile variables ensure that they remain visible between threads, can be read simultaneously by multiple threads, and do not read expired values. But it can only be written by a single thread (in one case, it can be written to a value that does not depend on the original value).
Volatile principle: According to the Happen Before principle of the Java memory model, writes to volatile fields precede reads, and even if both threads modify and acquire volatile variables at the same time, get gets the latest value. This is a classic case of using volatile to replace locks.

As can be seen from the source, the steps are as follows:

The first step is to determine whether the array is empty and determine whether the array subscript is empty by key.
Step 2: Check whether the first element of node is to be found.
Step 3, if it is a red-black tree structure, query from the red-black tree;
Step 4, if it is a linked list structure, loop through the judgment;

Five, the summary

1. Why are HashTable containers inefficient

Reason: Synchronized is used for thread safety, but HashTable(the same lock) uses a global lock. For example, thread 1 uses PUT to add elements, thread 2 cannot use put to add elements, nor can it use GET to get elements. Therefore, the more fierce the competition, the lower the efficiency.

2. ConcurrentHashMap in JDK 1.7

ConcurrentHashMap lock segmentation technique:

First, the data is divided into sections for storage, and then each section of data is equipped with a lock. When a thread uses the lock to access one section of data, the data of other sections can also be accessed by other threads.

ConcurrentHashMap structure:

Consists of a Segment[] array, which consists of a HashEntry[] array

Segment structure:

Consisting of a HashEntry[] array, it is a ReentrantLock that acts as a lock. The structure is similar to HashMap, which is an array + linked list structure, also known as hash table. The corresponding source is array (table), linked list (HashEntry). When modifying the data in the HashEntry[] array, you must first obtain its corresponding Segment lock.

Structure of HashEntry:

Is a linked list structure used to store key-value pair data

Initialization of ConcurrentHashMap:

The segments array must have a length of 2 to the power of N, so ssize must be greater than or equal to the lowest power of 2 of concurrentLevel.
By default:
- SegmentMask: segmentMask. The default value is 15
- SegmentShift: Segment offset, 28 by default
- Ssize: specifies the length of the segments array. The default value is 16. Calculated by concurrencyLevel (concurrencyLevel).

ConcurrentHashMap put operation:

First, nullates the value. ConcurrentHashMap does not allow a value to be nullated
Then, the hash value of the key is recalculated. The purpose of rehashing is to reduce hash conflicts, so that elements can be evenly distributed across different segments, thus improving the access efficiency of the container. If the hash quality is extremely poor, then all the elements are in one Segment. Not only is accessing elements slow, but Segment locking is meaningless.
Then, locate the Segment and ensure that the positioned Segment is initialized
Finally, call the ConcurrentHashMap. Segment# put (), because of the need to write operation of Shared variables, so for the thread safety, in the operation of the Shared variables must be locked.
- Suppose you have two threads of A and B at the same time operating ConcurrentHashMap. Segment# put (), thread attempts to acquire A lock, acquiring A lock is successful, returns the node = null, then thread A assignment on this node for operation, thread B failed to get the lock, Then call back ConcurrentHashMap. Segment# scanAndLockForPut (), spin locks, without access to the current thread is blocked, until after you exit the loop and when A thread A after operation the lock is released, Thread B retrieves the latest node data that thread A just put (combined with the above example).

Expansion mechanism of ConcurrentHashMap:

Need to expand: Before inserting an element, the PUT operation checks whether the HashEntry array in the Segment exceeds the threshold. If the threshold is exceeded, the array is expanded. Segment expansion judgment is more appropriate than HashMap, because HashMap determines whether elements have reached capacity after they are inserted. If they have reached capacity, HashMap will be expanded. However, it is very likely that no new elements will be inserted after expansion, and then HashMap will have an invalid expansion.
How to expand capacity: To expand capacity, an array twice the size of the original is created, and the elements of the original array are hash and inserted into the new array. For efficient ConcurrentHashMap, only one segment is expanded, not the entire container.

ConcurrentHashMap get operation:

The ConcurrentHashMap get operation is unlocked. The ConcurrentHashMap get operation is unlocked.

The reason: ConcurrentHashMap#get() specifies that the shared variables used by ConcurrentHashMap#get() are volatile. Volatile variables ensure that they remain visible between threads, can be read simultaneously by multiple threads, and do not read expired values. But it can only be written by a single thread (in one case, it can be written to a value that does not depend on the original value).
Volatile principle: According to the Happen Before principle of the Java memory model, writes to volatile fields precede reads, and even if both threads modify and acquire volatile variables at the same time, get gets the latest value. This is a classic case of using volatile to replace locks.

ConcurrentHashMap size operation:

If we want to count the size of elements in the entire ConcurrentHashMap, we must count the size of elements in all segments and sum them.

The Segment count is volatile. In multithreaded scenarios, add count of all segments to obtain the ConcurrentHashMap size. Count () {count (); count () {count (); count ();

JDK 1.8 ConcurrentHashMap

Disadvantages of JDK 1.7 ConcurrentHashMap:

The security of HashMap concurrency is addressed, but the query traversal is still slow when the conflicting lists are too long.

JDK 1.8 ConcurrentHashMap optimization:

Data structure optimization: Use the same data structure as JDK 1.8 HashMap, array + linked list/red black tree
Re-implementation of lock: abandoned the original Segment Segment lock implementation, CAS + synchronized to ensure the security of concurrency

The ConcurrentHashMap in JDK1.8 uses the volatile keyword for shared variables in the Node class, as in JDK1.7, to ensure the visibility of variables in multithreaded operations.

ConcurrentHashMap put operation:

First, nullates key and value. Neither KV of ConcurrentHashMap can be null
Then, the hash value of the key is recalculated. The purpose of rehashing is to reduce hash conflicts, so that elements can be evenly distributed across different segments, thus improving the access efficiency of the container. If the hash quality is extremely poor, then all the elements are in one Segment. Not only is accessing elements slow, but Segment locking is meaningless.
Then, check whether the container array is empty and initialize the array if it is empty.
Then, whether the subscript of the current array is inserted for the first time, if so, by CAS (JDK 1.7 is locked insert);
Determine whether F. hash == -1 is established. If so, it indicates that f is the ForwardingNode node, indicating that other threads are expanding capacity, and expand capacity together.
In other cases, the new Node is inserted into the appropriate position in a linked list or red-black tree.
After the nodes are inserted, the length of the linked list is judged to be more than 8. If so, the linked list is transformed into a red-black tree structure.
Finally, after the insertion is completed, the expansion judgment is performed.

ConcurrentHashMap’s initTable initializes the array:

SizeCtl is an object attribute that uses the volatile keyword to ensure concurrency visibility and defaults to 0.
When the put operation is performed for the first time, the Unsafe.compareAndSwapInt() method, commonly known as CAS, is used to change the sizeCtl to -1. Only one thread can successfully change the sizeCtl. Then, the table initialization task is performed.
If another Thread finds sizeCtl<0, it means that another Thread performed the CAS operation successfully, and the current Thread gives up CPU time slices by executing thread.yield () to wait for the table initialization to complete.

HelpTransfer help of ConcurrentHashMap expanded:

The point is: sizeCtl is an object property and can be understood as a capacity expansion indicator. The default value is 0. If sizeCtl is -1, ConcurrentHashMap is being initialized or expanded. Calculation rule: – (1 + Number of expanded threads).

In this process, the steps are as follows:

Step 1 verify data of table, Node, and nextTable of node.
Step 2, get an identifier based on the length of the array;
If nextTab is not concurrently modified and TAB is also not concurrently modified, and sizeCtl < 0, the system is still expanding.
SizeCtl + 1 will be used if none of the sizeCtl values are satisfied. Call Unsafe#compareAndSwapInt() to add a thread to help it expand.

ConcurrentHashMap addCount to determine expansion:

Step 1, update the baseCount value with the CAS method
Step 2: Check whether expansion is required. By default, check = 1.
Step 3: If the capacity expansion conditions are met, determine whether the capacity is being expanded. If yes, expand the capacity together.
Step 4: If there is no expansion, change the sizeCtl value to negative and perform expansion.

ConcurrentHashMap get operation:

JDK 1.8 ConcurrentHashMap#get() does the same thing as JDK 1.7 ConcurrentHashMap#get()

The reason: ConcurrentHashMap#get() specifies that the shared variables used by ConcurrentHashMap#get() are volatile. Volatile variables ensure that they remain visible between threads, can be read simultaneously by multiple threads, and do not read expired values. But it can only be written by a single thread (in one case, it can be written to a value that does not depend on the original value).
Volatile principle: According to the Happen Before principle of the Java memory model, writes to volatile fields precede reads, and even if both threads modify and acquire volatile variables at the same time, get gets the latest value. This is a classic case of using volatile to replace locks.

Six, reference

Mp.weixin.qq.com/s?__biz=MzI…

www.infoq.cn/article/Con…

Blog.csdn.net/zzti_erlie/…

The evolutionary history of ConcurrentHashMap

1. Learning Objectives

Inefficient HashTable containers

JDK 1.7 ConcurrentHashMap

JDK 1.8 ConcurrentHashMap

Five, the summary

Six, reference

Related Posts

What about the thread lifecycle and the transition process?

Oracle 10G is available, but arcGIS is not installed, and you want to use a spatial database solution

Are you still writing “naked” apis in 2021?