Analyses the synchronized the underlying implementation | Java related to lock

The introduction

It all starts with a series of why?

Why is it lockedsynchronizedKeyword, can achieve synchronization?

synchronizedWhat are the underlying optimizations?

JavaAll kinds of locks and lock expansion?

What are user-mode, kernel-mode, and context switching?

What is spinlock and it is related toCASThe relationship?

What is the object head? What is the object headMarkWord ?

This article is the second in the Java concurrency series:

A simple, from already to AQS | Java
Analyses the synchronized the underlying implementation | Java related to lock

An overview of the

Synchronizrd is the most common and easiest way to solve synchronization problems in development. From the beginning of our study of concurrent programming, we know that synchronization problems can be largely solved simply by adding the synchronizrd keyword. In principle, it is a heavy operation, especially in JDK1.5, which is inferior to the Lock in JUC. However, with jdk1.6 optimizations for synchronized, synchronizrd is less heavy and performs almost as well as Lock in most cases.

The purpose of this paper is to explore the principle of synchronized, so as to complete the understanding and learning of various locks.

Synchronizrd has three common functions:

atomic

That is, one or more operations, either all performed without interruption by any factor, or none performed at all;
visibility

Visibility means that when multiple threads access the same variable, if one thread changes the value of the variable, other threads can immediately see the changed value.
order

Prevent the compiler and processor from reordering instructions, that is, inhibit instruction reordering.

parsing

Synchronzied implementations in the JVM are based on entering and exiting Monitor objects for method synchronization and code block synchronization. Although the details vary, synchronzied implementations are implemented through pairs of MonitorEnter and MonitorExit directives. The specific comparison is as follows:

Sample code:

Bytecode comparison:

In the case of synchronized methods, it can be seen from the bytecode above that methods are not synchronized by monitorenter and Monitorexit. Instead, the synchronzied modifier is added directly to the method. At a lower level, the constant pool includes the ACC_SYNCHRONIZED identifier, which the JVM uses to synchronize methods: When a method is invoked, the calling directive checks whether the ACC_SYNCHRONZED access flag for the method is set. If so, the thread of execution fetches monitor before executing the method body and releases Monitor when the method completes execution. During method execution, the same Monitor object is no longer available to any other thread.

For synchronized blocks, the underlying implementation of SynchronzIED inserts the MonitorEnter directive at the beginning of the synchronized block. When executed, MonitorEnter attempts to acquire ownership of the object Monitor, that is, to acquire the lock on the object. While the monitorExit directive is inserted at the end of the method and at the exception, the JVM guarantees that each MonitorEnter must have a monitorExit.

Speaking of synchronized’s implementation of bytecode, where does synchronzied lock flags go for virtual machines? When it comes to this question, we have to mention the concept of object headers.

Object head

What is an object head, and what does an object head do?

If you’ve seen garbage collection, you probably know this. The object header is a business card that contains basic information about the object, as shown below:

Note: The object header does not necessarily contain an array length if the object is not an array ✋

The entire object header consists of two parts, namely KlassPoint and Mark Word.

KlassPoint

When a new class is created, the virtual machine knows which class it is using the above KlassPoint, which points to the class metadata (mteaData) information.

metadata

In a computer, there are all kinds of [meta] data. Files have metadata, web pages have meta tags. The term meta comes from the Greek and means about. So the metadata in the file is the data about the file, and the metadata in the class is a raw label for the class information. That is, the information that describes this class

Mark Word

Used to store different state information, which changes over time. Generally, the default data is to store information such as the HashCode of the object in which synchronzied, our topic for this article, is stored. As shown in the figure below

The length of the	content	instructions
32/64 bit	Mark Word	Store objects such as hasCode or lock information
32/64 bit	Class Metadata Address	A pointer to data stored in an object type
32 / 32bit	Array length	The length of the array if the current object is an array

The lock state	25bit	4bit	1bit whether bias lock	2bit lock flag bit
Unlocked state	The object’s hashCode	Age of object generation	0	01

The information in Markword changes over time, such as when a GC occurs and the internal GC is marked null.

The topic of synchronized in this paper also exists in MarkWord. In the process of object running and changing, the lock state has four kinds of changing states, namely, no lock state, biased lock state, lightweight lock state and heavyweight lock state. Locks can be upgraded but cannot be downgraded. The main purpose is to improve the efficiency of acquiring and releasing locks.

So why do there exist several 🔐? Or don’t see why? Can you go on, please? 👇

Context switch

Synchronizrd was optimized after JDK1.6, and locks were added to avoid the time-consuming context switching associated with direct locking.

This may sound a little silly if you don’t know about context switching, but let’s start with the basics:

What is context switching?

1. In the single-processor era, the operating system can handle multi-threaded concurrent tasks. The processor allocates a CPU time slice to each thread, and the thread executes the task within the CPU time slice

The CPU time slice is the amount of time that the CPU allocates to each thread for execution, typically tens of milliseconds

2. Time slice determines how long a thread can continuously occupy the running time of the processor

When one thread runs out of time, or is forced to pause for some reason, another thread is selected by the operating system to occupy the processor

Context Switch: A process in which one thread is suspended and another thread is selected to start or continue

Cut out: a thread is suspended by depriving the processor

Cut in: a thread selected to occupy the processor to start or continue running

In the process of cutting, the operating system needs to save and restore the corresponding progress information, which is the * context *

3. Context content

Register storage: CPU registers are responsible for storing tasks that have been, are, and will be executed

The program counter stores instruction content: The program counter stores the location of the instruction being executed by the CPU, and the location of the next instruction to be executed

4.Context switching becomes even more so when the number of cpus is much greater than one, and the operating system alternates CPU allocation to threaded tasksfrequent

There is also a cross-CPU context switch, which is more expensive

Excerpt from: Extreme Parsing of Thread Context Switching for Java Performance

So when we lock a resource using synchronizrd:

When thread A acquires the lock, thread B will block while it acquires, i.eBLOCKEDState, when thread B is suspended by the operating systemCut out, the operating system saves the context at this time;
When thread A releases the lock, let’s say thread B acquires the lock from thread BBLOCKEDEnter theRUNNABLEState, where the thread wakes up and continues execution with the context saved last time by the operating system.

In the above process, thread B performs two context switches, each of which takes 3~5 microseconds, and the CPU only needs 0.6ns to execute an instruction. Therefore, if the CPU only executes several common instructions after locking, such as the increment of a variable or others, the context switch will have a great impact on performance. So after JDK1.6, synchronizrd was optimized to add several locks and state changes to avoid the performance penalty of direct heavyweight locks.

Common lock

spinlocks

If the thread holding the lock can lock is released in a very short time resources, and the thread lock wait for competition there is no need to do between kernel mode and user mode switch into blocking pending state, they just need to wait for a while (spin), such as thread holding the lock immediately after releasing the lock locks, thus avoiding the user threads and kernel of switching cost.

However, thread spin consumes CPU, which means that the CPU is doing idle work. Threads can’t keep doing idle work, so they need to set a maximum spin wait time.

If the thread holding the lock does not release the lock after the spin wait time exceeds the maximum, the other thread contending for the lock still cannot acquire the lock within the maximum wait time, then the contending thread will stop spinning and enter the blocking state.

The advantages and disadvantages

Spin-locking minimizes thread blocking, which is a significant performance improvement for code blocks that are less competitive for locks and occupy a very short lock time, because the cost of spin is less than the cost of thread blocking and suspending operations.

However, if the lock is contested, or if the thread that holds the lock needs to execute the code block for a long time, then it is not suitable to use the spin lock, because the spin lock is constantly trying to occupy the CPU before acquiring the lock, and the cost of thread spin is greater than the cost of thread blocking and suspending operations. As a result, other threads that need CPU cannot obtain CPU, resulting in CPU waste.

Spin lock time threshold

The purpose of a spin lock is to hold CPU resources until the lock is acquired.

How do you choose the execution time of the spin? If the spin execution time is too long, there will be a large number of threads in the spin state occupying CPU resources, which will affect the overall system performance, so the number of spins is very important.

Adaptive spin lock

In JDK 1.6, adaptive spin locking was introduced. Adaptive spin locking means that the time of spin is not fixed, but determined by the previous spin time on the same lock and the state of the lock owner. It is basically considered that the time of a thread context switch is the best time.

For example, after thread A has acquired A lock, thread B has released the lock and thread A has successfully acquired the lock. At this point, thread A applies for the lock again. Thread B has not released the lock, so thread A can only spin and wait. Since thread A has just acquired the lock, the virtual machine assumes that thread A’s spin is likely to succeed in acquiring the lock again, thus extending thread A’s spin count.

For a lock, after a thread spins, the success probability of acquiring the lock is not high. Therefore, when the thread wants to acquire the lock in the future, it is possible to directly ignore the spin process and upgrade to a heavyweight lock, so as not to waste resources waiting in an empty cycle.

Biased locking

background

In practical development, in most cases, there is no multi-thread competition, and it is always acquired by the same thread multiple times. Biased locking is introduced to reduce the cost of obtaining the lock for the thread, so as to reduce unnecessary CAS operations.

summary

Bias locks, as the name implies, favor the first thread that accesses the lock. If only one thread accesses the lock during execution, and there is no multi-thread contention, then the thread does not need to trigger synchronization. In this case, a bias lock is added to the thread. Thus reducing some CAS operations for locking/unlocking (such as CAS operations in wait queues (CLH queue locks)). If, while running, another thread preempts the lock, the thread holding the biased lock will hang, and the JVM will remove the biased lock from it and restore the lock to the standard lightweight lock. It further improves the performance of the program by eliminating the synchronization primitive in the case of uncontested resources.

Bias lock upgrade process

accessMark WordWhether the mid-bias lock identifier is set to1Is the lock flag bit01And confirm that it is biased state;
If yes, test whether the thread ID points to the current thread. If yes, go to Step 5; otherwise, go to Step 3.
ifThread IDDoes not point to the current threadCASOperation contention lock. If the competition is successful, thenMark Word 中 Thread IDSet to currentThread IDThen go to Step 5. If the competition fails, go to Step 4.
ifCASIf the biased lock fails to be obtained, contention exists. When the safepoint is reached, the thread that acquired the bias lock is suspended, the bias lock is upgraded to a lightweight lock, and the thread blocked at the safepoint continues to execute the synchronization code.
Execute synchronization code.

Bias lock release

The undo bias lock was mentioned in Step 4. Biased lock Only when other threads try to compete for biased lock, the thread holding biased lock will release biased lock, the thread will not actively release biased lock. The revocation of biased lock needs to wait for the global safe point (that is, no bytecode is being executed at this time point). It will first suspend the thread that owns the lock and determine whether the lock object is in the locked state. After revocation of biased lock, it will return to the unlocked (flag bit “01”) or lightweight lock (flag bit “00”) state.

Applicable scenario

Always only one thread in execution code block, after it did not perform before releasing the lock, no other threads to execute the synchronized block, the lock without competition use, once the competition will upgrade for lightweight locks, upgrade for lightweight locked when you need to reverse bias, revocation of deviator lock will cause the stop the word (STW) operation.

In lock contention, biased lock will do a lot of extra operations, especially when the partial lock is revoked, it will lead to the safe point, and the safe point will lead to STW, resulting in performance degradation.

What is Stop the word?

When a garbage collection algorithm is executed, all other threads of a Java application are suspended (except for the garbage collection helper), a global pause in Java, similar to an application that has paused without any response.

Lightweight lock

Lightweight locks are upgraded from biased locks. Biased locks work when one thread enters a synchronized block. Biased locks are upgraded to lightweight locks when the second thread joins the lock contention. The purpose of lightweight locks is to reduce the performance cost of using heavyweight locks when there is no real competition. For example, kernel mode and user mode switch caused by system call, thread switch caused by thread blocking, etc.

Lightweight lock locking process:

When the code enters the synchronization block, if the synchronization object Lock state has no Lock state and does not allow bias (the Lock flag bit is “01” state, whether the bias Lock is “0”), the virtual machine will first establish a space named Lock Record in the stack frame of the current thread. Store product of product product product product product product product product product product product product product product product product product product

After the copy is successful, the VM uses the CAS operation to try to update the Mark Word of the object to a pointer to the Lock Record, and the owner pointer in the Lock Record to the Object Mark Word. If the update succeeds, perform Step 4 in biased locking; otherwise, perform Step 5.

If the update succeeds, the thread owns the lock on the object, and the lock flag of the object MarkWord is set to “00”, indicating that the object is in a lightweight lock state.

If the update fails, the virtual machine first checks to see if the Mark Word object points to the current thread’s stack frame. If it does, the current thread has the lock on the object and can proceed directly to the synchronization block. Otherwise, it indicates that multiple threads compete for the lock. When the competing thread fails to occupy the lightweight thread many times, the lightweight lock will expand to the heavyweight lock. The heavyweight thread points to the competing thread, and the competing thread will also block, waiting for the lightweight thread to release the lock and wake it up. At the same time, the status value of the lock flag changes to “10”. In the following figure, the pointer to the heavyweight lock (mutex) is stored in the Mark Word, and the thread waiting for the lock also enters the blocking state.

Heavyweight lock

Lightweight locks are upgraded to heavyweight locks as their spin expands. A heavyweight lock relies on the internal monitor lock, while moitor relies on the system MutexLock, so it is also called a mutex.

Why do heavyweight locks cost more?

When the system checks to lock the heavyweight lock, will lock is waiting for the thread to block, blocked threads will not consume CPU, but blocking and awakened threads, all operating system is needed to deal with, this needs from the user mode to kernel mode, and switch from user mode to kernel mode, needs to be done through a system call.

A CPU context switch occurs during a system call. Two context switches occur during a system call. This process often takes longer than synchronizing code blocks.

Comparison between different locks

The lock	advantage	disadvantage	Applicable scenario
Biased locking	Locking and unlocking require no additional cost, and there is only a nanosecond difference compared to executing asynchronous methods.	If there is lock contention between threads, there is additional lock cancellation cost	This applies to scenarios where only one thread accesses a synchronized block.
Lightweight lock	Competing threads do not block, improving the response time of the program.	Threads that never get a lock contention use spin, which consumes CPU, reaches a certain number of spins, and bulges into heavyweight locks	Pursue response time. Synchronous block execution is very fast.
Heavyweight lock	Threads do not use spin and do not cost CPU	Threads are blocked and response time is slow	Pursue throughput. The synchronization block execution speed is long.

conclusion

Here we see why the underlying implementation of the synchronized keyword and how the lock state changes. To be honest, these might be difficult scenarios for an Android development, but for me personally, it finally explains the whys and whys of the past, as well as some boundary concepts. The more you know, the more you don’t know.

Thank you

In-depth analysis: lock upgrade process and lock status, read this article you will understand!

Extreme parsing of Java performance in thread context switching

Synchronized -Mark

About me

Hello, I’m Petterp, an Android engineer training in Didu.

If you find this article valuable to you, please feel free to visit 👏🏻 and follow me on Github.

If you feel bad after reading, also welcome to leave a message to me.