This is the seventh and final chapter of the JVM column on thread safety and lock optimization. The last article was about memory models and threads. If you haven’t seen it yet, you can make up for it.

Click the jump

Thread safety

BrianGoetz, author of Java concurrent programming in action, offers a more appropriate definition of “thread safety” : “When multiple threads access to an object at the same time, if don’t have to consider these threads in the runtime environment of scheduling and execution alternately, also do not need to undertake additional synchronization, or any other coordinated operation in the caller, call the object’s behavior can get the right results, it is said that object is thread safe.”

Thread safety in Java

The data shared by various Operations in Java falls into the following five categories

  • immutable

    Immutable objects must be thread-safe and do not require any thread-safe safeguards. Immutable security is the most immediate and pure. There are many ways to ensure that the behavior of an object does not affect its state. The simplest way is to declare all variables with state in the object as final, so that it is immutable after the constructor ends

  • Absolute thread safety

    Absolute thread-safety satisfies the definition of thread-safety

  • Relative thread safety

    Relative thread safety is what we generally speaking thread-safe it needs guarantee for this object to a single operation is thread-safe, we do not need additional when calling to safeguard measures, but for some particular sequence of consecutive calls, may need to end the call to use additional synchronization method to ensure the correctness of the call.

    Most of the classes that claim to be thread-safe in Java are of this type, such as vectors, HashTable, Collections wrapped by the synchronizedCollection() method of Collections, and so on.

  • The thread is compatible with

    Thread-compatible means that the object itself is not thread-safe, but can be safely used in a concurrent environment by properly using synchronization on the calling side. When we say that a class is not thread-safe, this is usually the case. Most classes in the Java class library API are thread-compatible, such as the collection classes ArrayList and HashMap, which correspond to Vector and HashTable.

  • Thread opposite

    Thread antagonism is the inability to use code concurrently in a multithreaded environment, regardless of whether or not the caller has taken synchronization measures. Because Java inherently supports multithreading, thread-opposition code that excludes multithreading is rare, often harmful, and should be avoided. An example of thread-antagonism is the suspend() and resume() methods of the Thread class. If you have two threads simultaneously holding a thread object, one trying to interrupt and the other trying to resume, the target thread is at risk of deadlock regardless of whether the call is synchronized or not — if suspend() is interrupted by the same thread that is about to resume(), Deadlocks are bound to occur. It is for this reason that both the suspend() and resume() methods have been declared obsolete. Common thread opposing and operating System. SetIn (), the Sytem. SetOut () and System. RunFinalizersOnExit (), etc.

Thread-safe implementation

The mutex synchronization

Mutually exclusive synchronization is one of the most common and important concurrency correctness guarantees.

1. Synchronization refers to ensuring that shared data is used by only one thread at a time when multiple threads concurrently access the shared data.

2. Mutex is a means to achieve synchronization. Critical sections, mutex and semaphore are common ways to achieve mutex. So in the word mutex and synchronization, mutual exclusion is the cause and synchronization is the effect. Mutual exclusion is the method, synchronization is the destination.

3. In Java, the most basic means of mutually exclusive synchronization is the synchronized keyword, which is a block-structured synchronization syntax.

Principle of synchronized:

The synchronized keyword is compiled by Javac and forms two bytecode instructions, Monitorenter and Monitorexit, respectively, before and after the synchronized block. Both bytecode instructions require a reference parameter to specify which object to lock and unlock. If synchronized explicitly specifies an object parameter in the Java source code, a reference to that object is used as reference. If not explicitly specified, the lock to be held by the thread is determined by the type of synchronized modified method that takes the object instance of the code or the corresponding Class object of the type.

When monitorenter executes, it first attempts to acquire the lock on the object. If the object is not locked or the current thread already holds the lock on that object, it increments the lock counter by one and decrement the lock counter by one when monitorexit is executed. Once the counter reaches zero, the lock is released. If the object lock fails to be acquired, the current thread should block and wait until the object requesting the lock is released by the thread holding it.

Based on the above principles, we can draw two inferences about synchronized:

  • A synchronized block is reentrant to the same thread. This means that the same thread can enter the synchronized block repeatedly without locking itself.
  • A synchronized block unconditionally blocks subsequent threads until the thread that holds the lock completes execution and releases the lock. This means that you cannot force a thread that has acquired a lock to release it, as you can with locks in some databases, nor can you force a thread that is waiting for a lock to interrupt the wait or timeout out

From the perspective of the execution cost of holding a lock is a heavyweight operation Java thread is mapped to the native operating system kernel thread, if you want to block or resume a thread, you will need to done with the help of the operating system, this inevitably in user mode to kernel mode, the state transition requires a lot of CPU time. In particular, synchronous block state transitions with very simple code can take longer than the user code itself. So synchronized is a heavyweight operation.

Since JDK5, Java provides Java. Util. Concurrent packages, including Java. Util. Concurrent. The locks. The Lock interface becomes Java means another new mutex synchronization. Lock-based interface can realize mutually exclusive synchronization by non-block structure, so as to get rid of the constraints of language features, and to achieve synchronization at the class library level.

ReentrantLock is the most common implementation of the Lock interface and, as the name implies, is reentrant like synchronized. In basic usage, ReentrantLock is similar to synchronized, but the code is written differently. However, ReentrantLock adds some advanced features over synchronized, including the following three

1, waiting can be interrupted

When the thread holding the lock does not release the lock for a long time, the waiting thread can choose to abandon the wait and process other things instead. The interruptible feature is helpful for handling synchronized blocks that take very long to execute.

2, can achieve fair lock

When multiple threads are waiting for the same lock, they must obtain the lock in sequence according to the time sequence of lock application. Non-fair locks do not guarantee this, and any thread waiting for the lock has the opportunity to acquire it when the lock is released. A lock in synchronized is unfair, and a ReentrantLock is unfair by default, but a fair lock can be demanded through a constructor with a Boolean value. However, when fair locking is used, ReentrantLock performance degrades dramatically and throughput is significantly affected.

3. Locks can bind multiple conditions

Condition: means that a ReentrantLock object can bind multiple Condition objects at the same time. In synchronized, the wait() of a lock object is combined with its notify() or notifyAll() methods to implement an implicit condition. If more than one condition is associated, an additional lock must be added. ReentrantLock doesn’t have to do this, just call the newCondition() method multiple times.

ReentrantLock is a good option if you need this functionality.

Non-mutex synchronization

The main problem with mutex synchronization is the performance cost of thread blocking and waking up, so it is also known as blocking synchronization. Mutex synchronization is a type of pessimistic lock that assumes that there must be a thread unsafe situation, so lock it no matter what. This leads to the overhead of user-to-core mindset transitions, maintaining lock counters, and checking if there are blocked threads that need to be woken up. So we still have another choice, optimistic concurrency strategy based on collision detection, popular culture is regardless of the risk, first, if there is no other threads are used together to share data, then the operation is successful, if there is a conflict, then give up the last operation, constantly try again, until there is no other threads to contention data, this is the optimistic locking, Also called non-blocking synchronization.

Using an optimistic concurrency strategy requires “hardware instruction set evolution” because we must require atomicity in the two steps of operation and collision detection. The hardware that does this ensures that semantically multiple operations can be performed with a single processor instruction. Common examples of this are:

  • Test and Set (test-and-set)
  • Fetch and Increment
  • Exchange (Swap)
  • Compare and Swap (also called CAS)
  • Load-linked/store-conditional (also called LL/SC)

CAS

Focus on common CAS. The CAS instruction requires three operands, which are the memory location (the memory address of the variable, represented by V), the old expected value (represented by A), and the new value to be set (represented by B). When the CAS instruction executes, the processor updates the value of V with B if and only if V conforms to A, otherwise it does not perform the update. However, the old value of V is returned regardless of whether the value of V is updated, and the above processing is an atomic operation that is not interrupted by other threads during execution. CAS operations were only used in Java libraries after JDK5, with several methods wrapped in the Sun.misc. Unsafe class, such as compareAndSwapInt() and compareAndSwapLong(). The HotSpot virtual machine does special processing for these methods internally, and the result of instant compilation is a platform-specific processor CAS instruction with no method calls, or can be thought of as unconditionally inlined.

ABA problem

Although CAS looks very beautiful, simple and efficient, but obviously this operation can’t cover A mutex synchronization of all usage scenarios, and CAS semantically is not really perfect, it is A logical flaw: if A variable is A value V first read, and are checked ready to assign A value to it still is A value, Does that mean its value hasn’t been changed by any other thread? This is not possible, because if its value has been changed to B and later changed back to A during this period, the CAS operation will assume that it has never been changed. This vulnerability is called the “ABA problem” of CAS operations. The J.U.C package addresses this problem by providing a tagged atom reference class AtomicStampedReference that guarantees the correctness of CAS by controlling the version of the variable value. For now, however, this class is in a rather weak position. ABA problems do not affect concurrency in most cases, and if ABA problems need to be solved, switching to traditional mutex synchronization may be more efficient than atomic classes.

Asynchronous scheme

If you can make a method that doesn’t involve sharing data in the first place, then it doesn’t need any synchronization to be correct, so some code is inherently thread-safe. The following two categories are introduced

1. Reentrant code

Also known as pure code. Code execution can be interrupted at any point in time, and another piece of code can be executed, without any error or effect on the result after control is returned. Reentrant code has some common characteristics, such as no dependence on global variables, data stored on the heap, and common system resources, uses state quantities passed in from parameters, and does not call non-reentrant methods. Reentrant code can be judged by a simple rule: if a method returns predictable results that return the same results as long as it inputs the same data, it meets the reentrant requirement, which is, of course, thread-safe.

2. Thread local storage

If the data needed in one piece of code must be shared with other code, see if the code that shares the data can be guaranteed to execute in the same thread. If we can, we can limit the visibility of shared data to the same thread, so that synchronization is not required to ensure that there is no data contention between threads.

Lock the optimization

Efficient concurrency is one of the major improvements from JDK5 to JDK6, where the HotSpot VIRTUAL machine development team has spent a lot of resources implementing various lock optimization techniques. Adaptive Spinning, Lock Elimination, Lock inflation, Lightweight Locking, Biased Locking, and so on. These techniques are designed to share data and resolve contention issues more efficiently between threads, thus improving program execution efficiency.

Spin-locking and adaptive spin

The biggest impact of mutex synchronization on performance is the implementation of blocking. The operations of suspending and resuming threads need to be carried out in the kernel mode. These operations put great pressure on the concurrent performance of The Java VIRTUAL machine. The locked state of shared data only lasts for a short period of time, which is not worth suspending and resuming threads. If the physical machine has more than one processor or processor core that allows two or more threads to execute in parallel at the same time, we can tell the next thread that requests the lock to “wait a minute” without giving up the processor’s execution time to see if the thread that holds the lock will release the lock soon. To make the thread wait, we simply have the thread execute a busy loop (spin), which is called spin.

Lock elimination

Lock elimination refers to the fact that the virtual machine just-in-time compiler, while running, requires some code to be synchronized, but it detects that there is no possibility of a shared data contention lock elimination. The main criterion of lock elimination is supported by the data of escape analysis. If it is determined that in a piece of code none of the data on the heap will escape and be accessed by another thread, it can be treated as if it were on the stack and is thread private, and synchronization locking is no longer necessary.

Lock coarsening

If a series of consecutive operations repeatedly lock and unlock the same object, even if the locking operation occurs in the body of the loop, frequent mutex synchronization can lead to unnecessary performance losses, even if there is no thread contention. If the virtual machine detects a string of fragmented operations that lock the same object, the lock synchronization will be extended (coarsed) out of the entire operation sequence so that only one lock is required.

Lightweight lock

Lightweight locking is a new locking mechanism introduced in JDK6. The word “lightweight” in its name refers to the traditional locking mechanism implemented using operating system mutex, so it is called “heavyweight” locking. However, it is important to note that lightweight locks are not intended to replace heavy locks, but are designed to reduce the performance cost of traditional heavy locks using OS mutex without multi-threaded competition.

Understanding lightweight locking requires some knowledge of the memory layout of HotSpot Virtual machine objects, especially the object header. The HotSpot VIRTUAL machine object header is divided into two parts, the first part is used to store the object’s own runtime data, such as hash code, GC generation age, and so on. The length of this data can take up 32 or 64 bits in 32-bit and 64-bit Java virtual machines respectively, and is officially called “MarkWord”. This part is the key to implement lightweight locking and biased locking. Another section is used to store Pointers to the object type data in the method area, and an additional section is used to store the array length if it is an array object. Because object header information is an additional storage cost unrelated to the data defined by the object itself, MarkWord is designed as a non-fixed dynamic data structure to store as much information as possible in a very small amount of space, considering the space efficiency of the Java virtual machine. It reuses its own storage space based on the state of the object. For example, in a 32-bit HotSpot VIRTUAL machine, when the object is not locked, 25 bits of MarkWord’s 32 bit space will be used to store the object hash code, 4 bits will store the age of the object generation, 2 bits will store the lock flag bit, and 1 bit will be fixed at 0(which indicates that the object is not in bias mode). In addition to the normal state of an object that is not locked, there are several different states such as lightweight locked, heavyweight locked, GC marked, and biased. The contents of the object header stored in these states are shown in the figure below.

After reviewing the memory layout of the object, we can now describe how lightweight locking works: When code is about to enter a synchronized block, if the synchronized object is not locked (the lock flag bit is in the “01” state), the virtual machine first creates a space called the lock record in the current thread’s stack frame. The product is used to store the current copy of the product of the product where the product of the herbivore is herbivore, and the state of the thread stack and the object header is shown in the figure below.

The virtual machine will then use the CAS operation to try to update the object’s MarkWord to a pointer to the LockRecord. If the update action succeeds, it means that the thread owns the lock on the object, and the lock flag bit (the last two bits of MarkWord) of the object MarkWord changes to 00, indicating that the object is in a lightweight locked state. The state of the thread stack and object header is shown below.

If the update fails, it means that at least one thread is competing with the current thread to acquire the lock on the object. The virtual machine first checks whether the MarkWord of the object points to the stack frame of the current thread. If it does, the current thread already owns the lock of the object. If not, the lock object has been preempted by another thread. If more than two threads are competing for the same lock, the lightweight lock is no longer valid and must be expanded to a heavyweight lock. The status of the lock flag changes to “10”, and the pointer to the heavyweight lock (mutex) is stored in MarkWord, and the thread waiting for the lock must also enter the blocking state.

If the MarkWord of the object still points to the lock record of the thread, then the CAS operation replaces the current MarkWord of the object with the DisplacedMarkWord copied from the thread. If the replacement is successful, the synchronization process is complete. If the replacement fails, another thread has attempted to acquire the lock, and the suspended thread must be awakened at the same time the lock is released.

Lightweight locks improve application synchronization performance based on the rule of thumb that “for the vast majority of locks, there is no contest for the entire synchronization cycle.” If there is no competition, lightweight locks successfully avoid the mutex overhead through CAS operations; But if lock contention does exist, the cost of the CAS operation occurs in addition to the cost of the mutex itself. Lightweight locks are therefore slower than traditional heavyweight locks in competitive situations.

Biased locking

Biased locking is also a lock optimization measure introduced in JDK6, which aims to eliminate data synchronization without contention and further improve the performance of programs. If a lightweight lock uses a CAS operation to eliminate the mutex used in synchronization without contention, a biased lock eliminates the entire synchronization without contention, even the CAS operation.

Bias lock medium “slant”, be eccentric “slant”, partial “slant”. This means that the lock is biased in favor of the first thread that acquired it, and if the lock is never acquired by another thread during subsequent execution, the thread that holds the biased lock will never need to synchronize again.

Assume that bias locking is enabled for the current VM. When a thread obtains the lock object for the first time, the VM sets the flag bit in the object header to 01 and the bias mode to 1, indicating that the vm enters bias mode. The CAS operation is also used to record the ID of the thread that acquired the lock in the object’s MarkWord. If the CAS operation succeeds, the VM does not perform any synchronization operations (such as locking, unlocking, and updating MarkWord) every time the thread that holds the biased lock enters the lock related synchronization block.

The bias mode ends as soon as another thread attempts to acquire the lock. Determine whether to revoke bias (bias mode set to “0”) according to whether the lock object is currently in the locked state, and then restore the flag bit to the unlocked state (flag bit is “01”) or lightweight lock (flag bit is “00”). Subsequent synchronization operations will be performed as described above. The state transition of biased lock, lightweight lock and the relationship between object MarkWord are shown in the figure below.

Hash code storage problem

There is a problem with biased locking. When the object enters the biased state, most of the space in MarkWord is used to store the ID of the thread holding the lock. This part of space occupies the original location of storing the hash code of the object.

In Java, if an object evaluates to a hash code, that value should always remain the same, otherwise many apis that rely on object hash codes are at risk of error. The Object::hashCode() method, which is the source of most Object hashes, returns the Object’s consistent hashCode. This value is enforceable by storing the result of the calculation in the Object header to ensure that after the first calculation, Calling this method again never changes the value of the hash code. Therefore, after a consistent hash code has been computed, an object can no longer enter the biased lock state; When an object is currently in a biased lock state and receives a request to compute its consistent hash code, its biased state is immediately revoked and the lock expands to a heavyweight lock. In the implementation of the heavyweight lock, the object header points to the position of the heavyweight lock. The ObjectMonitor class representing the heavyweight lock has a field that can record the MarkWord under the unlocked state (flag bit is “01”), which can naturally store the original hash code.

Biased locking can improve the performance of programs with synchronization but no contention, but it is not always beneficial to program execution. If most of the locks in your program are always accessed by multiple different threads, the bias pattern is redundant.