Preface:

In the context of concurrency, a non-blocking algorithm is one that allows threads to access shared state while blocking other threads. In the vast majority of projects, an algorithm is said to be non-blocking if the suspension of one thread does not cause other threads to suspend.

To better understand the difference between a blocking algorithm and a non-blocking algorithm, I’ll talk about blocking algorithms first and then non-blocking algorithms.

1. Block concurrent algorithms

A blocking concurrency algorithm generally consists of the following two steps:

  • Performs the operation requested by the thread

  • Block the thread until it is safe to perform the operation

Many algorithms and concurrent data structures are blocked. For example, Java. Util. Concurrent. BlockingQueue different implementations are blocking the data structure. If a thread wants to insert an element into a blocking queue and there is not enough space in the queue, the inserting thread will block until there is enough space in the queue to store the inserted element.

The following diagram illustrates the behavior of a blocking algorithm to guarantee a shared data structure:

2. Non-blocking concurrency algorithm

A non-blocking concurrency algorithm generally consists of the following two steps:

  • Performs the operation requested by the thread

  • Notifies the requesting thread that the operation cannot be performed

Java also contains several non-blocking data structures. AtomicBoolean, AtomicInteger AtomicLong, AtomicReference are blocking the examples of data structure.

The following diagram illustrates the behavior of a non-blocking algorithm that guarantees a shared data structure:

3. Non-blocking algorithms vs. blocking algorithms

The main difference between blocking and non-blocking algorithms lies in the second step of their behavior described in the two sections above. In other words, the difference between them is what the blocking and non-blocking algorithms do when the requested operation cannot be performed.

A blocking algorithm blocks the thread until the requested operation can be performed. A non-blocking algorithm notifies the requesting thread that the operation cannot be performed and returns.

A thread using a blocking algorithm may block until it is possible to process the request. Often, the actions of other threads make it possible for the first thread to perform the requested action. If, for some reason, the thread is blocked somewhere in the program so that the first thread’s requested action cannot be executed, the first thread will block — either until or until the other thread has finished performing the necessary action.

For example, if a thread inserts an element into a blocking queue that is already full, the thread will block until another thread removes some element from the blocking queue. If, for some reason, the thread fetching an element from the blocking queue is assumed to be blocked somewhere in the program, then the thread trying to add a new element to the blocking queue will block, either by continuing to block or knowing that the thread fetching an element from the blocking queue has finally removed an element from the blocking queue.

4. Non-blocking concurrent data structures

In a multithreaded system, threads usually “communicate” with each other through some data structure. For example, it can be any data structure, from variables to more advanced Russian data structures (queues, stacks, etc.). To ensure correctness, these data structures must be guaranteed by some concurrent algorithm when accessed by concurrent threads. These concurrent algorithms make these data structures concurrent data structures.

An algorithm is said to be a blocking algorithm if it ensures that a concurrent data structure is blocking. This data structure is also known as a blocking, concurrent data structure.

An algorithm is said to be a non-blocking algorithm if it ensures that a concurrent data structure is non-blocking. This data structure is also referred to as a non-blocking, concurrent data structure.

Each concurrent data structure is designed to support a specific method of communication. Which concurrent data structure to use depends on your communication needs. In the next section, I’ll introduce some non-blocking concurrent data structures and explain how they can be used. The explanation of the working principle of concurrent data structures should give some inspiration to the design and implementation of non-blocking data structures.

5. Volatile variables

Volatile variables in Java are variables that read values directly from main memory. When a new value is assigned to a volatile variable, the value is always immediately written back to main memory. This ensures that the latest value of a volatile variable is always visible to threads running on other cpus. Other threads will read the value of the variable from main memory each time, instead of, say, the CPU cache of the CPU on which the thread is running.

The colatile variable is non-blocking. Changing the value of a volatile variable is a slap atomic operation. It cannot be interrupted. However, a read-update-write operation on a volatile variable is not atomic. Therefore, the following code can cause race conditions if executed by multiple threads.

volatile myVar = 0; . int temp = myVar; temp++; myVar = temp;Copy the code

First, the value of myVar, a volatile variable, is read from main memory and assigned to the temp variable. The temp variable then increments by 1. The value of the temp variable is then assigned to the volatile variable myVar meaning that it will be written back to main memory.

If two threads execute this code, then they both read the value of myVar, add one, and write its value back to main memory. There is a risk that myVar will only be incremented by 1, not by 2.

You might think you wouldn’t write code like this, but in practice it equals code like this:

myVar++;
Copy the code

When the above code is executed, the value of myVar is read to a CPU register or a local CPU cache, myVar is incremented by 1, and the value of the CPU register or CPU cache is written back to main memory.

6. Scenario of a single writer thread

In some scenarios, you have only one thread writing to a shared variable, and multiple threads reading the variable. When only one thread is updating a variable, no race condition occurs, no matter how many threads are reading the variable. Therefore, whenever only one thread is writing a shared variable, you can declare it as volatile.

Race conditions occur when multiple threads perform a sequential read-update-write operation on a shared variable. If you have only one thread performing a raed-update-write sequential operation and all the other threads are performing reads, no race conditions will occur.

Here is an example of a single writer thread that is not synchronized but is still concurrent.

public class SingleWriterCounter{ private volatile long count = 0; /** *Only one thread may ever call this method *or it will lead to race conditions */ public void inc(){ this.count++; } /** *Many reading threads may call this method *@return */ public long count(){ return this.count; }}Copy the code

Multiple threads access the same Counter instance as long as only one thread calls the inc() method. Here, I don’t mean one thread at a time, I mean that only the same, single thread is allowed to call the inc()> method. Multiple threads can call the count() method. No race conditions will occur in such a scenario.

The following figure illustrates how threads are accessedcountFor this volatile variable.

7. More advanced data structures based on volatile variables

It is possible to create data structures with multiple volatile variables, each of which is written by a single thread and read by multiple threads. Each volatile variable may be written by a different thread (but only one). With data structures like these, multiple threads can use these volatile variables to send messages to each other in a non-blocking manner.

Here’s a simple example:

public class DoubleWriterCounter{ private volatile long countA = 0; private volatile long countB = 0; /** *Only one (and the same from thereon) thread may ever call this method, *or it will lead to race conditions. */ public void incA(){ this.countA++; } /** *Only one (and the same from thereon) thread may ever call this method, *or it will lead to race conditions. */ public void incB(){ this.countB++; } /** *Many reading threads may call this method */ public long countA(){ return this.countA; } /** *Many reading threads may call this method */ public long countB(){ return this.countB; }}Copy the code

As you can see, DoubleWriterCoounter now contains two volatile variables and two pairs of increment and read methods. At some point, only a single thread can call inc() and only a single thread can access incB(). However, different threads can call incA() and incB() at the same time. CountA () and countB() can be called by multiple threads. This will not raise race conditions.

DoubleWriterCoounterCan be used for example for inter-thread communication. CountA and countB can be used to store the number of produced and consumed tasks, respectively. The following figure shows two threads communicating via a data structure similar to the one above.

Smart readers should already be aware that using two SingleWriterCounter can achieve the same effect as using DoubleWriterCoounter. You can even use multiple threads and SingleWriterCounter instances if desired.

8. Use CAS optimistic locks

If you do need multiple thread areas to write the same shared variable, volatile variables are not appropriate. You will need some type of exclusive (pessimistic) lock to access this variable. The following code demonstrates exclusive access using synchronous blocks in Java. public class SynchronizedCounter{ long count = 0; public void inc(){ synchronized(this){ count++; } } public long count(){ synchronized(this){ return this.count; }}}Copy the code

Note that the, inc() and count() methods both contain a synchronization block. This is also something we like to avoid — synchronous blocks and wait()-notify calls.

We can use a Java atomic variable to replace the two synchronized blocks. In this case, AtomicLong. Below is the AtomicLong implementation version of the SynchronizedCounter class.

import java.util.concurrent.atomic.AtomicLong; public class AtomicLong{ private AtomicLong count = new AtomicLong(0); public void inc(){ boolean updated = false; while(! updated){ long prevCount = this.count.get(); updated = this.count.compareAndSet(prevCount, prevCount + 1); } } public long count(){ return this.count.get(); }} This version is simply a thread-safe version of the previous version. In this edition we are interested in the implementation of the Inc () method. The inc() method no longer contains a synchronized block. Boolean updated = false; while(! updated){ long prevCount = this.count.get(); updated = this.count.compareAndSet(prevCount, prevCount + 1); }Copy the code

The above code is not an atomic operation. That is, call the inc() method for two different threads and execute the Long prevCount = this.count.get() statement, thus obtaining the last count of this counter. However, the above code does not contain any race conditions.

The secret lies in the second line of the while loop. The compareAndSet() method call is an atomic operation. It compares an expected value with the internal value of the AtomicLong, and if the two values are equal, it replaces the internal value of the AtomicLong with a new value. CompareAndSet () is usually supported directly by the compare-and-swap directive in the CPU. Therefore, there is no need to de-synchronize and no need to suspend the thread.

Suppose that the internal value of this AtomicLong is 20. Then, to read the value, both threads attempt to call compareAndSet(20, 20 + 1). Even though compareAndSet() is an atomic operation, this method is executed by both threads in succession (only one at a time).

The first thread compares the expected value of 20 (the last value of this counter) to the internal value of the AtomicLong. Since the two values are equal, AtomicLong updates its internal value to 21 (20 + 1). The variable updated is changed to true, and the while loop ends.

Now the second thread calls compareAndSet(20, 20 + 1). Since AtomicLong’s internal value is no longer 20, the call will not succeed. The AtomicLong value will no longer be changed to 21. The updated variable is changed to false, and the thread will spin out of the while loop again. At this point, it reads the value 21 and tries to update it to 22. If no other thread calls inc() during that time. The second iteration will successfully update the internal value of AtomicLong to 22.

9. Why is it called optimistic lock

The code shown in the previous section is called Optimistic locking. Optimistic locks are different from traditional locks and are sometimes called pessimistic locks. Traditional locks block access to critical regions using synchronized blocks or other types of locks. A synchronized block or lock can cause a thread to hang.

Optimistic locking allows all threads to create a copy of shared memory without blocking. These threads may then modify their copies and attempt to write their modified versions back into shared memory. The CAS operation allows the thread to write its changes back to the shared memory if no other thread has made any changes to the shared memory. If another thread has modified the shared memory, that thread will have to get a new copy, make changes on the new copy, and try to write them back into the shared memory again.

The reason it’s called an “optimistic lock” is that threads take a copy of the data they want to modify and make changes, in the optimistic case that there are no threads making changes to the shared memory in the meantime. When this optimistic assumption is true, the thread only completes the shared memory update without locking. When this assumption is not made, the work done by the thread is discarded, but the lock is still not used.

Optimistic locks are used when shared memory race is not very high. If there is a lot of content on the shared memory, a lot of CPU cycles will be wasted copying and modifying just because the shared memory fails to be updated. However, if there is a lot of content on shared memory, you will have to design your code to generate less contention anyway.

Optimistic locks are non-blocking

The optimistic locking mechanism we mentioned here is non-blocking. If a thread gets a copy of shared memory and blocks when attempting to modify it, other threads accessing the memory region will not block.

In a traditional lock/unlock mode, when one thread holds a lock, all other threads block until the thread that holds the lock releases the lock again. If the thread holding the lock is blocked somewhere, the lock can’t be released for a long time, or even forever.

11. Non-replaceable data structures

Simple CAS optimistic locks can be used to share data results so that the entire data structure can be replaced into a new data structure with a single CAS operation. However, it is not always possible to replace an actual data structure with a modified copy.

Suppose this shared data structure is a queue. Every time a thread tries to insert or remove an element from the queue, it must copy the queue and make the desired changes on the copy. We can achieve the same goal by using an AtomicReference. Copy the reference, copy and modify the queue, try to replace the reference in the AtomicReference to point to the newly created queue.

However, a large data structure may require a large amount of memory and CPU cycles to replicate. This will make your program use a lot of memory and waste a lot of time copying operations. This will slow down the performance of your application, especially if the data structure is highly competitive. Furthermore, the longer it takes one thread to copy and modify the data structure, the more likely it is that other threads will modify the data structure in the meantime. As you know, if another thread changes the data structure after it is copied, all other threads do not wait to perform copy-modify operations again. This increases the performance impact and memory waste, and more.

The next section will show you a way to implement non-blocking data structures that can be modified concurrently, rather than just copied and modified.

12. Share expected changes

Instead of copying and modifying the entire data structure, one thread can share their expected changes to the shared data structure. A thread changes the process of modifying a data structure to the following:

  • Check if another thread has committed a change to the data structure

  • If no other thread has committed an expected change, create an expected change, and then commit the expected change to the data structure

  • Perform modifications to shared data structures

  • Remove the reference to the expected change, signaling to other threads that the expected change has been executed

As you can see, the second step blocks another thread to commit an expected change. So the second step actually works as a lock on this data structure. If one thread has successfully committed an expected change, no other thread can commit another expected change until the first expected change has been executed.

If a thread commits an expected change and then blocks while doing some other work, the shared data structure is actually locked. Other threads can detect that they cannot commit an expected change and go back to doing something else. It is clear that we need to address this issue.

13. Expected modifications that can be made

To prevent a committed expected change from locking a shared data structure, a committed expected change must contain enough information for another thread to complete the change. Therefore, if a thread that commits the intended change never completes the change, other threads can complete the change with its support, ensuring that the shared data structure is available to other threads.

The diagram below illustrates the blueprint for the non-blocking algorithm described above:

,

Modifications must be performed as one or more CAS operations. Therefore, if two threads attempt to complete the same intended change, only one thread can perform all CAS operations. Once a CAS operation is completed, attempts to complete the SAME CAS operation fail.

14. A – B – A problem

The algorithm demonstrated above can be called the A-B-A problem. The A-B-A problem refers to A situation in which A variable is modified from A to B and then back to A. Other threads know nothing about this situation.

If thread A checks that data is being updated, copied, or suspended by the thread scheduler, thread B may have access to the shared data structure during this period. If the thread performs A full update to the data structure, removing its expected changes, it will appear as if thread A has made no changes to the data structure since it copied it. However, a modification has indeed taken place. While thread A continues to perform its update based on the now expired copy of the data, this data modification has been corrupted by thread B’s modification.

The following figure illustrates the A-B-A problem mentioned above:

15. Solutions to a-B-A problems

The usual solution for a-B-A is to not just replace A pointer to an object to be modified, but to combine the pointer with A counter and then replace the pointer + counter with A single CAS operation. This is possible in languages that support Pointers like C and C++. Thus, although the current modification pointer is set back to point to no ongoing modification, the counter part of the pointer + counter is incremented, making the modification visible to other threads.

In Java, you cannot merge a reference and a counter to form a single variable. Java, however, provides an AtomicStampedReference class that automatically replaces a reference and a stamp with a CAS operation.

16. A non-blocking algorithm template

The following code is intended to give some insight into how to implement non-blocking algorithms. This template is based on what is covered in this tutorial.

Note: I’m not an expert on non-blocking algorithms, so the following template may be wrong. Do not implement your own non-blocking algorithms based on the templates I provide. This template is intended to give you an idea of what a non-blocking algorithm might look like. If, you want to implement your own non-blocking algorithm, first learn some actual industrial-level non-blocking algorithm time, learn more about non-blocking algorithm implementation in practice.

import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicStampedReference;
public class NonblockingTemplate{
    public static class IntendedModification{
        public AtomicBoolean completed = new AtomicBoolean(false);
    }
    private AtomicStampedReference<IntendedModification> ongoinMod = new AtomicStampedReference<IntendedModification>(null, 0);
    //declare the state of the data structure here.
    public void modify(){
        while(!attemptModifyASR());
    }
    public boolean attemptModifyASR(){
        boolean modified = false;
        IntendedMOdification currentlyOngoingMod = ongoingMod.getReference();
        int stamp = ongoingMod.getStamp();
        if(currentlyOngoingMod == null){
            //copy data structure - for use
            //in intended modification
            //prepare intended modification
            IntendedModification newMod = new IntendModification();
            boolean modSubmitted = ongoingMod.compareAndSet(null, newMod, stamp, stamp + 1);
            if(modSubmitted){
                 //complete modification via a series of compare-and-swap operations.
                //note: other threads may assist in completing the compare-and-swap
                // operations, so some CAS may fail
                modified = true;
            }
        }else{
             //attempt to complete ongoing modification, so the data structure is freed up 
           //to allow access from this thread.
            modified = false;
        } 
       return modified;
    }
}
Copy the code

17. Non-blocking algorithms are not easy to implement

Correctly designing and implementing non-blocking algorithms is not easy. Before trying to design your non-blocking algorithm, see if someone has already designed a non-blocking algorithm that meets your needs.

Java already provides some non-blocking implementations (such as ConcurrentLinkedQueue), and future versions of Java are expected to bring more implementations of non-blocking algorithms.

In addition to built-in Java non-blocking data structures, there are many open source non-blocking data structures available. For example, the non-blocking HashMap of the LAMX Disrupter and Cliff Click implementations. Check out my Java Concurrency References page for more resources.

18. Benefits of using non-blocking algorithms

Non-blocking algorithms have several advantages over blocking algorithms. Let’s take a look at each:

choose

The first benefit of non-blocking algorithms is that they give threads a choice about what to do when the action they request cannot be performed. Instead of being blocked there, the requesting thread has a choice about what to do. Sometimes, a thread can do nothing. In this case, it can choose to block or wait on itself, thus ceding CPU usage to other tasks. But at least it gives the requesting thread a choice.

On a single CPU system, it is possible to suspend a thread that is unable to perform the requested action, allowing other threads to acquire CPU usage. But even in a single CPU system blocking can cause concurrency problems such as deadlocks, thread starvation, and so on.

Not a deadlock

The second benefit of the non-blocking algorithm is that one thread’s suspension cannot cause another thread to suspend. This also means that deadlocks do not occur. Two threads cannot wait for each other to acquire a lock held by the other. Because threads do not block, they cannot block each other waiting when they cannot perform their request action. A non-blocking algorithm may still produce a live lock, in which two threads keep requesting some action, but keep being told it can’t be executed (because of another thread’s action).

No threads are suspended

Suspending and resuming a thread is expensive. Yes, operating systems and thread libraries have become more efficient over time, and the cost of thread suspension and recovery has decreased. However, thread suspension and connection are still costly.

Whenever a thread is blocked, it is suspended. As a result, thread hangs and recovery overloads are caused. Since threads are not suspended using non-blocking algorithms, this overload does not occur. This means that the CPU is likely to spend more time executing the actual business logic than context switches.

On a system with multiple cpus, the blocking algorithm has a significant impact on the blocking algorithm. A thread running on CPUA blocks and waits for a thread running on CPU B. This reduces the inherent parallelism of the program. Of course, CPU A can schedule other threads to run, but suspending and activating threads (context switching) is expensive. The fewer threads that need to be suspended, the better.

Reduce thread latency

The latency we refer to here is the time between a request being generated and the thread actually executing it. Because threads are not suspended in non-blocking algorithms, they do not have to pay the expensive, slow thread activation costs. This means faster responses when a request is executed, reducing their response latency.

Non-blocking algorithms usually wait busily until the request action can be performed to reduce latency. Of course, in a system with high thread contention for non-blocking data structures, the CPU may stop consuming a large number of CPU cycles while they are busy waiting. This needs to be kept in mind. Non-blocking algorithms may not be the best choice if your data structure has high thread contention. However, there is often a way to achieve lower thread contention by refactoring your program.

Stay tuned for more in the next update!