Implementation principle of Synchronized

Remember when I just started learning Java, synchronized was the first time I encountered multithreading. Synchronized was magical and powerful to us back then. We gave it a name “synchronization,” and it became our cure for multithreading. However, as we learned that synchronized is a heavyweight Lock, it can be so cumbersome compared to Lock that we consider it less efficient. With Javs SE 1.6 tweaks to synchronized, it doesn’t seem as heavy.

Synchronized ensures that only one method at a time enters the critical section of a method or code block during runtime, and also ensures the memory visibility of shared variables.

1. Implementation principle

Every object in Java can be used as a lock, which is the basis for synchronized:

For normal synchronization methods, the lock is the current instance object.
Statically synchronized methods that lock the Class object of the current Class.
The lock is an object configured in synchronized parentheses. When a thread accesses a block of synchronized code, it must acquire the lock before it can execute the synchronized code, and it must release the lock when it exits or runs out of an exception. Where does the lock exist? What information is stored inside the lock?

The JVM specification defines that the JVM implements method synchronization and code block synchronization based on entering and exiting Monitor objects:

Block synchronization: Implemented using the Monitorenter and Monitorexit directives.

Method synchronization: Use a different approach, but use the same two instructions. Just specific expression form differs somewhat

public class SynchronizedTest {
    public synchronized void test1(a) {}public void test2(a) {
        synchronized (this) {}}}Copy the code

Synchronizedtest. class javap-verbose synchronizedtest. class

Omit some code, as follows:

{
  publiccom.zero.test.SynchronizedTest(); .public synchronized void test1(a);
    descriptor: ()V
    flags: ACC_PUBLIC, ACC_SYNCHRONIZED
    Code:
      stack=0, locals=1, args_size=1
         0: return
      LineNumberTable:
        line 5: 0

  public void test2();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=3, args_size=1
         0: aload_0
         1: dup
         2: astore_1
         3: monitorenter
         4: aload_1
         5: monitorexit
         6: goto          14
         9: astore_2
        10: aload_1
        11: monitorexit
        12: aload_2
        13: athrow
        14: return. }Copy the code

As you can see above, the synchronization code block is implemented using the Monitorenter and Monitorexit directives, and the synchronization method uses the ACCSYNCHRONIZED implementation on the method modifier. Either way, the essence is that an object’s monitor fetches it.

Before we go any further, there are two concepts to understand: Java object headers and Monitor.

1.1. Java Object headers

Synchronized locks are stored in Java object headers. Hotspot virtual machine headers contain two parts of data: a Mark Word and a Klass Pointer. Klass Point is a pointer to the object’s class metadata. Virtual machines use this pointer to determine which class instance the object is. Mark Word is used to store the runtime data of the object itself, which is the key to achieve lightweight locking and biased locking.

Mark Word

The Mark Word in the Java object header is used to store runtime data about the object itself, such as HashCode, GC generation age, lock status flags, locks held by threads, bias thread ids, bias timestamps, and so on. Java object headers typically occupy two machine codes (in a 32-bit virtual machine, one machine code equals four bytes, or 32 bits), but three machine codes are required if the object is an array because the JVM can determine the size of a Java object from its metadata information. There is no way to determine the size of an array from its metadata, so a block is used to record the length of the array. The default Mark Word storage structure for 32-bit JVMS is as follows:

The lock state	25bit	4bit	1bit whether bias lock	2bit lock flag bit
Unlocked state	The object’s hashCode	The generational age of the object	0	01

During runtime, the data stored in Mark Word changes as the lock flag bit changes

Monitor

It can be understood as a synchronization tool or described as a synchronization mechanism, and is usually described as an object.

Like all objects, all Java objects are born Monitor, and every Java object has the potential to become a Monitor, because in Java design, every Java object comes out of the womb with an invisible lock called an internal lock or Monitor lock.

2. Synchronize performance optimization

Synchronized is known to be a heavy-duty lock and has always been in the back of one’s mind, but various optimizations have been made for the Implementation of Synchronize in JDK 1.6 to make it seem lighter.

2.1 lock optimization

JDK1.6 introduces a number of optimizations to lock implementation, such as spin locking, adaptive spin locking, lock elimination, lock coarser, biased locking, lightweight locking and other techniques to reduce the overhead of locking operations.

There are four main lock states, which are lockless state, partial lock state, lightweight lock state and heavyweight lock state. They will gradually upgrade as the competition heats up. Note that locks can be upgraded and not degraded. This strategy is intended to improve the efficiency of acquiring and releasing locks.

2.1.1. Spin locks

Thread blocking and wake up need CPU from user state to core state, frequent blocking and wake up is a heavy burden for CPU, is bound to bring great pressure to the system concurrency performance. At the same time, we found that in many applications, object locks only last for a short period of time, and it is not worth it to frequently block and wake up threads for this short period of time.

What is spinlock?

The idea of a spinlock is to let the thread wait a certain amount of time, without being immediately suspended, to see if the thread holding the lock will release it soon.

2.1.2 Adapt to spin lock

JDK 1.6 introduced a more clever spin lock, known as adaptive spin locking. Adaptive means that the number of spins is no longer fixed, but is determined by the time of the previous spin on the same lock and the state of the lock owner.

How does it do that?

If the thread spins successfully, it spins more next time, because the virtual machine thinks that if it succeeded last time, it will likely spin again, and it will allow the spin wait to last more times. On the other hand, if few spins are successful for a lock, the spins are reduced or even omitted in future attempts to acquire the lock to avoid wasting processor resources. With adaptive spin locking, the virtual machine will become smarter and more accurate in predicting the state of the application lock as application execution and performance monitoring information improves.

2.1.3 Lock elimination

In order to ensure data integrity, we need to synchronize this part of the operation, but in some cases, the JVM detects that there is no possibility of a shared data race, so the JVM removes these synchronization locks. Lock elimination is based on the data support of escape analysis.

When we use JDK built-in apis such as StringBuffer, Vector, HashTable, etc., there are implicit locking operations.

For example, append() for StringBuffer and add() for Vector:

public void vectorTest(a){
    Vector<String> vector = new Vector<String>();
    for(int i = 0 ; i < 10 ; i++){
        vector.add(i + "");
    }
    System.out.println(vector);
}
Copy the code

When running this code, the JVM can obviously detect that the vector has not escaped from the vectorTest() method, so the JVM can boldly eliminate the locking inside the vector.

2.1.4 Lock coarsening

We know that when using synchronous locks, we need to keep the scope of the synchronized block as small as possible and synchronize only in the actual scope of the shared data. The goal is to keep the number of operations that need to be synchronized as small as possible, so that if there is a lock contention, the thread waiting for the lock can get it as quickly as possible. In most cases, this is true. However, if a series of continuous lock unlocking operations, may lead to unnecessary performance loss, so the concept of lock coarsening is introduced.

So what is lock coarsening?

It is to connect a number of continuous locking and unlocking operations together to expand into a larger range of locks.

If the JVM detects that a vector needs to be locked and unlocked consecutively, it will merge a larger range of lock and unlock operations and move the lock and unlock operations out of the for loop.

2.1.5 bias lock

The main purpose of introducing biased locking is to minimize unnecessary lightweight lock execution paths without multithreading competition. The locking and unlocking operation of lightweight lock depends on several CAS atomic instructions. So how can biased locking reduce unnecessary CAS operations? We can see this by looking at the structure of Mark Work. A simple test is to see if the Mark Word in the object header stores a bias lock to the current thread. If the test succeeds, the thread has acquired the lock. If the test fails: test whether the bias lock identifier is set to 1 (indicating that the current bias lock is). If it is not set, then only CAS can be used to compete for the lock. If it is set, then CAS can be used to try to point the bias lock of the object header to the current thread.

The process is as follows:

Check whether Mark Word is biased, that is, whether it is biased lock 1, and the lock identifier bit is 01.
If yes, test whether the thread ID is the current thread ID. If yes, perform Step (5); otherwise, perform Step (3).
If the thread ID is not the current thread ID, the CAS operation will compete for the lock, and the competition is successful, then the Mark Word thread ID will be replaced with the current thread ID, otherwise the execution thread (4);
The failure of CAS lock competition proves that there is multi-thread competition at present. When the global safety point is reached, the thread that obtains biased lock is suspended, the biased lock is upgraded to lightweight lock, and then the thread blocked at the safe point continues to execute synchronized code block.
Execute synchronized code blocks.

The release of bias lock uses a mechanism that only competition can release the lock. Threads do not actively release bias lock and need to wait for other threads to compete. Revocation of bias locks requires waiting for the global safe point (the point in time when no code is executing).

The steps are as follows:

Suspend the thread with bias lock to judge whether the lock object stone is still locked;
Undo bias lock and revert to lockless (01) or lightweight lock.

2.1.6 lightweight locks

The main purpose of introducing lightweight locks is to reduce the performance cost of traditional heavyweight locks using operating system mutex with only a few threads competing.

When the biased lock function is disabled or the biased lock is upgraded to lightweight lock due to multiple threads competing for biased lock, the lightweight lock will be tried to obtain, and the steps are as follows:

Determines whether the current object is in a lock-free state (hashcode, 0, 01). If so, the JVM first creates a space called a Lock Record in the current thread’s stack frame. Used to store the lock object is the Mark of Word’s copy (officials add that copy a Displaced prefix, which Displaced Mark Word); Otherwise, perform Step (3).
The JVM uses CAS to try to update the object’s Mark Word to a pointer to a Lock Record. If the Lock is successfully contested, the Lock flag bit is changed to 00 (indicating that the object is in a lightweight Lock state) and the synchronization operation is performed. If no, perform Step (3).
Determine whether the Mark Word of the current object points to the stack frame of the current thread. If it does, it means that the current thread has held the lock of the current object, and the synchronous code block is directly executed. Otherwise, it indicates that the lock object has been preempted by another thread. In this case, the lightweight lock should be expanded to the heavyweight lock, and the lock flag bit should be changed to 10. The waiting thread will enter the blocking state.

Release lock Lightweight locks are also released through CAS operations as follows:

Retrieve data stored in the lightweight lock of herbivore herbivore product;
Replace the extracted data in the Mark Word of the current object with CAS operation. If it succeeds, the lock is released successfully. Otherwise, perform (3).
If the CAS operation fails to be replaced, it indicates that another thread is trying to obtain the lock. In this case, the suspended thread must be woken up when the lock is released.

2.1.7 heavyweight locks

The heavyweight Lock is implemented through the internal monitor of the object. The essence of the monitor is the Mutex Lock implementation of the underlying operating system. The switching between the threads of the operating system needs to switch from the user state to the kernel state, and the switching cost is very high.

Open this public number will share their knowledge, will continue to output, hope to bring help to readers friends. Guest officer feel useful please click “like” or “favorites”, follow the public account JavaStorm, you will find an interesting soul.

3. Comparison of locks

Here’s a comparison of biased locks, lightweight locks, and heavyweight locks:

Table 3 Advantages and disadvantages of various locks and application scenarios

The lock	advantages	disadvantages	Applicable scenario
Biased locking	Locking and unlocking require no additional cost, and there is only a nanosecond difference from the execution of asynchronous methods	If there is contention between threads, there will be additional lock cancellation costs	Applies when only one thread accesses a synchronized block
Lightweight lock	Competing threads do not clog, improving the response time of the program	A thread that never gets a lock uses spin that consumes CPU	Pursuit of response time, synchronized block execution speed is very block, only two threads compete for the lock
Heavyweight lock	Thread contention does not use spin and does not consume CPU	Threads are blocked and response time is slow	In pursuit of throughput, the execution speed of synchronized blocks is slow, and more than two threads compete for locks

Table source: The Art of Concurrent Programming in Java, p. 16

Guest officer feel useful please click “like” or “favorites”, follow the public account JavaStorm, you will find an interesting soul. Have a great weekend!

References:

Understanding the Java Virtual Machine
Fang Tengfei: The Art of Java Concurrent Programming

Follow the public account JavaStorm for the latest content

JavaStorm.png