Java AQS core data structure -CLH lock

Click on the blue and follow us

Tung chi wai

I joined Qunar in July 2018 and am now responsible for hotel search and recall, ticket search and big search. I was honored to win the honorary title of “God of Code” of Qunar. In technology, I like to ask “why” and explore the depth of technology.

In concurrent programming, locking is a common way to ensure thread safety. There are two main types of locks commonly used in Java. One is Synchronized modified locks, known as Java built-in locks or monitor locks. The other is the various synchronizers found in the java.util.concurrent package after J2SE 1.5, including ReentrantLock, ReentrantReadWriteLock, Semaphore, CountDownLatch, etc. The synchronizer is based on AbstractQueuedSynchronizer (hereinafter referred to as AQS) this is a simple framework to build, The core data structure of the AQS class is a variant called Craig, Landin, and Hagersten locks (CLH locks). This article will take a closer look at CLH locking as a data structure and how AQS improves CLH locking.

CLH lock is an improvement on spin lock. Before INTRODUCING CLH locks, let me briefly introduce spin locks.

1 the spin lock

1.1 What is spinlock

Spin locks are an implementation of mutex, as shown in the Java implementation below.

public class SpinLock {
    private AtomicReference<Thread> owner = new AtomicReference<Thread>();

public void lock() {
    Thread currentThread = Thread.currentThread();
    // If the lock is not occupied, set the current thread as the owner of the lock
    while(! owner.compareAndSet(null, currentThread)) {
    }
}

public void unlock() {
    Thread currentThread = Thread.currentThread();
    Only the owner of the lock can release the lock
    owner.compareAndSet(currentThread, null); }}Copy the code

As the code shows, when the lock is acquired, the thread loops through the compareAndSet method on an atomic variable until the method returns success. The underlying compareAndSet method is the general compare-and-swap (CAS) implementation. This operation compares the value in memory with the specified data, and replaces the data in memory with the new value if the value is the same. The operation is atomic. Atomicity ensures that a new value is computed based on the latest information, and if the value has been updated by another thread at the same time, the write will fail. Therefore, this code can implement the mutex function.

1.2 Disadvantages of spin locking

Spinlocks are simple to implement and avoid the overhead of operating system process scheduling and thread context switching, but they have two disadvantages:

The first is hunger. In the case of lock contention, there may be a case where one thread has been “queued” by another thread and has not been able to obtain the lock.
The second is performance. Spin-locks running on actual multiprocessing perform poorly when locks are hotly contested.

Here is how long it takes n threads to execute a critical section at a fixed time, as quoted in The Art of Multiprocessor Programming.

TASLock and TTASLock are similar to the previous code in that they are both spinlock implementations polling against an atomic state variable, with the bottom curve showing how long the thread takes without interference.

Obviously, spin-lock performance is far from ideal. This is because the spin lock state is centralized, and in a competitive situation, lock state changes can lead to frequent synchronization of caches across multiple cpus, which can slow down CPU efficiency.

Therefore, spin locks are suitable for scenarios where lock competition is not fierce and the lock is held for a short time.

* * 2 CLH lock

六四运动

2.1 What is CLH lock

CLH lock is an improvement on the spin lock, effectively solve the above two shortcomings. First, it organizes the threads into a queue to ensure that the first requested thread gets the lock first, avoiding the problem of starvation. Secondly, the lock state is decentralized, allowing each thread to spin in a different state variable, so that when a thread releases its lock, it can only invalidate the cache of its subsequent threads, reducing the scope of influence and thus reducing the CPU overhead.

The CLH lock data structure is very simple, similar to a linked list queue. All threads requesting locks are arranged in the linked list queue, and spin accesses the state of the previous node in the queue. When a node releases a lock, only its next node can acquire the lock. The CLH lock itself has a queue Tail pointer, which is an atomic variable pointing to the CLH node at the end of the queue. Each CLH node has two attributes: the thread it represents and a state variable that identifies whether the lock is held. When a thread wants to acquire a lock, it performs a getAndSet atomic operation on Tail. This operation returns the node to which Tail is currently pointing, and then causes Tail to point to the CLH node corresponding to this thread as the new Tail node. Once enqueued, the thread polls the state variable of the last node in the queue, and when the last node releases the lock, it gets the lock.

The following diagram shows the whole process of CLH lock acquisition and release.

When the CLH lock is initialized, Tail points to an empty node with a false state, as shown in Figure 1.
When Thread 1 (T1) requests the lock, the Tail node points to the node corresponding to T1 and returns the empty node. T1 checks that the status of the previous node is false, then it successfully obtains the lock and can execute the corresponding logic, as shown in Figure 2.
When Thread 2 (T2) requests the lock, the Tail node points to the node corresponding to T2 and returns the node corresponding to T1. T2 checks that the state of the previous node is True and cannot obtain the lock, so it starts polling the state of the previous node, as shown in Figure 3.
When T1 releases the lock, the state variable is set to False, as shown in Figure 4.
When the status of the previous node changes to False after T2 polling, the lock is successfully obtained, as shown in Figure 5.

2.2 CLH lock Java implementation parsing

The above figure vividly shows the data structure of CLH and the whole process of initialization, acquisition and release of lock, which is convenient for everyone to understand the principle of CLH lock. But even if you understand the principles, you may not be able to implement a thread-safe CLH mutex. In the world of concurrent programming, the adage “the devil is in the details” also applies. Here’s a look at the source code for the CLH lock Java implementation and share some details of concurrent programming.

The code looks like this, with three things to focus on, highlighted in red:

1. Why are state variables on nodes volatile? Can we not use volatile?

Using volatile to modify a state variable is not intended to take advantage of the memory visibility of volatile, because the state variable is written only by the thread holding the state variable, and is read only by the thread corresponding to the thread’s driver node in the queue, which polls for readings. Therefore, visibility issues do not affect the correctness of the lock. In the example above, T2 will constantly poll T1’s state variable, and it does not matter if T1 changes its state to False immediately. The state variable is eventually written back to memory and T2 eventually senses the changed value.

But to implement a lock that can execute correctly in a multithreaded program, you need to solve the reordering problem. The reordering problem is described in Java Concurrent Programming In Action: In the absence of synchronization, the compiler, processor, and runtime can make unexpected changes to the order in which operations are executed. In multithreaded programs without sufficient synchronization, it is almost impossible to get correct conclusions about the execution order of memory operations. For built-in locks (aka monitors) provided by the Java Synchronized keyword, the Java Memory Model (JMM) specification has a happens-before rule: “Unlocking on a monitor lock occurs before subsequent locking on that monitor lock,” so the JVM guarantees that this rule holds.

Custom mutexes require their own guarantee of this rule, so the code uses volatile happens-before to solve the reordering problem. The JMM happens-before rule has a rule for the volatile keyword: “Writes to a volatile variable occur Before subsequent reads of that variable.”

CLH lock is a linked list queue. Why does Node not point to a precursor or successor pointer?

CLH locks are implicitly linked list queues with no explicit maintenance precursor or successor Pointers. Because each thread waiting to acquire the lock only polls the state of the previous node, it does not need to traverse the entire queue. In this case, you only need to use a local variable to hold the precursor node, without explicitly maintaining the precursor or successor pointer.

This.node. set(new node ())

Without this line of code, Node could be overused, resulting in a deadlock, as shown in the following figure:

1. At first, T1 holds the lock, while T2 spins and waits, as shown in Figure 1.

2. When T1 releases the lock (set to false), T2 has not yet preempted the lock, as shown in Figure 2.

3. If T1 calls lock() again to acquire the lock, it will set the state to True and spin to wait for T2 to release the lock. T2 also spins until the state of the precursor node changes to False, resulting in a deadlock, as shown in Figure 3.

Therefore, this line of code is required to generate a new Node to avoid deadlock caused by Node overuse.

2.3 Advantages and disadvantages of CLH

As an improvement of the spin lock, CLH lock has the following advantages:

Excellent performance with low lock acquisition and release overhead. The lock state of CLH is no longer a single atomic variable, but is scattered across the state of each node, reducing the overhead of frequent synchronization when the splock is hotly contested. The cost of releasing locks is also reduced by eliminating the need to use CAS directives.
Fair lock. The thread that joins the queue first gets the lock.
The implementation is simple and easy to understand.
Good scalability. Below we’ll see how AQS extends CLH locks to implement a rich variety of synchronizers in the J.U.C package.

Of course, it also has two disadvantages: first, because of the spin operation, the lock can be held for a long time and incur a large CPU overhead. Second, the basic CLH lock has a single function and cannot support complex functions without modification.

3 transformation of CLH queue lock by AQS

Aiming at the disadvantages of CLH, AQS reformed CLH queue lock to some extent. To address the first shortcoming, AQS changes the spin operation to a blocking thread operation. To address the second shortcoming, AQS has adapted and extended CLH locks, which author Doug Lea calls “CLH lock variants.” The AQS low-level details and improvements to CLH locks will be detailed below. The improvements to the CLH lock data structure in AQS mainly include three aspects: extending the state of each node, explicitly maintaining precursor and successor nodes, and auxiliary GC optimizations such as explicitly setting the queue node to NULL. It is these improvements that enable AQS to support J.U.C’s colorful synchronizer implementation.

3.1 Expand the state of each node

The status of each AQS node is as follows, as shown in the source code:

volatile int waitStatus;
Copy the code

AQS also provides atomic read/write operations for this state variable, but unlike synchronizer states, node states are clearly defined in AQS, as shown in the following table:

The state name	describe
SIGNAL	Indicates that the node is waiting properly
PROPAGATE	ReleaseShared should be propagated to other nodes
CONDITION	This node is in a conditional queue and cannot be used for synchronous queue nodes
CANCELLED	The node was canceled due to timeout, interruption, or other reasons

3.2 Explicit maintenance of precursor and successor nodes

We mentioned earlier that in the original VERSION of CLH locks, nodes were not even linked to each other. However, by explicitly maintaining precursor nodes in nodes, CLH locks handle timeouts and various forms of cancellation: if a precursor node is canceled, that node can slide to use the status field of the previous node. For CLH locks that acquire the lock by spin, only explicit maintenance of the precursor node is required to achieve cancellation, as shown in the following figure:

But the implementation in AQS is slightly different. Because AQS replaces the spin operation with a blocking wait, the thread blocks waiting for the lock to be released and cannot actively sense the state change of the precursor node. For the explicit maintenance of precursor nodes and successor nodes in AQS, the node that needs to release the lock will explicitly notify the next node to unblock, as shown in the following figure. After releasing the lock, T1 actively wakes up T2 to make T2 detect that the lock has been released and obtain the lock successfully.

One detail to note is that since there is no compareandset-like atomic lock-free insert instruction for bidirectional linked list nodes, the backdrive nodes are not set as part of the atomic insert operation, but simply assigned after the node is inserted. When the lock is released, if the back-end node of the current node is not available, the queue Tail pointer is used to traverse from the Tail until the correct back-end node of the current node is found.

3.3 assist GC

The JVM’s garbage collection mechanism eliminates the need for developers to manually release objects. However, in AQS, the lock must be explicitly set to NULL to avoid residual references and assist garbage collection.

4 summary

Java AQS is an important part of Java concurrency, but as an abstract concurrency synchronization framework, the source code is obscure and not conducive to reading and learning. This article from the spin lock, detailed introduction of CLH lock and AQS on THE transformation of CLH, try to use rich graphs to show the relevant data structure, avoid citing obscure Java source code, hope to understand the principle of AQS and the correct use of various types of synchronization under J.U.C help.