The stain

The JVM lock optimization

In fact, JDK1.6 has introduced a lot of optimizations to lock implementation, such as spin lock, adaptive spin lock, lock elimination, lock coarser, biased lock, lightweight lock and other technologies to reduce the overhead of lock operation.

Locks mainly exist in four states, the order is: no lock state, biased lock state, lightweight lock state, heavyweight lock state, they will gradually upgrade with the fierce competition. Note that locks can be upgraded and not degraded. This strategy is intended to improve the efficiency of acquiring and releasing locks.

Heavyweight lock

Heavyweight lock is the implementation of built-in lock before JDK1.6. In simple terms, a heavyweight lock uses a mutex to control access to a mutex resource.

Historical review: prior to JDK1.6, synchronized implemented heavy built-in locks (and this is what many students understand). The MONITorenter and Monitorexit bytecodes in the JVM rely on the underlying operating system’s Mutex Lock to implement them, but the switch is expensive because Mutex locks require the current thread to suspend and switch from user to kernel mode for execution. In most cases, however, synchronous methods run in a single-threaded (lockless race) environment. If Mutex Lock is called every time, performance will be severely affected.

spinlocks

Thread blocking and wake up need CPU from user state to core state, frequent blocking and wake up is a heavy burden for CPU, is bound to bring great pressure to the system concurrency performance. At the same time, we found that in many applications, object locks only last for a short period of time, and it is not worth it to frequently block and wake up threads for this short period of time. So spin locks.

What is spinlock?

The idea of a spinlock is to let the thread wait a certain amount of time, without being immediately suspended, to see if the thread holding the lock will release it soon. How to wait? Perform a meaningless loop (spin).

Spin waiting is not a substitute for blocking, not to mention the number of processors required (multi-core, it seems like there are no single-core processors anymore), and while it avoids the overhead of thread switching, it takes up processor time. If the thread holding the lock releases the lock quickly, then the spin is very efficient, whereas the spin thread is wasting processing resources, not doing any meaningful work, typically squatting in the manger, which leads to wasted performance. Therefore, there must be a limit to the spin wait time (the number of spins), and if the spin exceeds the defined time and still does not acquire the lock, it should be suspended.

Spin-locking was introduced in JDK 1.4.2 and is turned off by default, but can be turned on using -xx :+UseSpinning and is turned on by default in JDK1.6. The default number of simultaneous spins is 10, which can be adjusted with -xx :PreBlockSpin;

If you adjust the spin number of the spin lock with the -xx :preBlockSpin parameter, it will cause a lot of inconvenience. If I set the parameter to 10, but many threads in the system release the lock just after you quit (if you can get the lock with one or two more spins), you will not be embarrassed. So JDK1.6 introduced adaptive spin locks to make virtual machines smarter and smarter.

Adapt to spin locks

JDK 1.6 introduced a more clever spin lock, known as adaptive spin locking. Adaptive means that the number of spins is no longer fixed, but is determined by the time of the previous spin on the same lock and the state of the lock owner. How does it do that? If the thread spins successfully, it spins more next time, because the virtual machine thinks that if it succeeded last time, it will likely spin again, and it will allow the spin wait to last more times. On the other hand, if few spins are successful for a lock, the spins are reduced or even omitted in future attempts to acquire the lock to avoid wasting processor resources.

With adaptive spin locking, the virtual machine will become smarter and more accurate in predicting the state of the application lock as application execution and performance monitoring information improves.

Lock elimination

In order to ensure data integrity, we need to synchronize this part of the operation, but in some cases, the JVM detects that there is no possibility of a shared data race, so the JVM removes these synchronization locks. Lock elimination is based on the data support of escape analysis.

If there is no competition, why lock? So lock elimination can save time on pointless lock requests. Whether a variable escapes requires data flow analysis for the virtual machine, but is it not clear to us programmers? Do we put synchronization in front of blocks of code where we know there is no data race? But sometimes programs aren’t what we think they are, right? We don’t show locking, but when we use JDK built-in apis such as StringBuffer, Vector, HashTable, etc., there are implicit locking operations. For example, append() for StringBuffer and add() for Vector:

When running this code, the JVM can obviously detect that the vector has not escaped from the vectorTest() method, so the JVM can boldly eliminate the locking inside the vector.

Lock coarsening

We know that when using a synchronized lock, we need to keep the scope of the synchronized block as small as possible — only synchronize in the actual scope of the shared data. The goal is to keep the number of operations that need to be synchronized as small as possible, so that if there is a lock contention, the thread waiting for the lock can acquire the lock as quickly as possible.

In most cases, the above view is correct, and LZ has always adhered to this view. However, if a series of continuous lock unlocking operations, may lead to unnecessary performance loss, so the concept of lock slang is introduced.

The concept of lock slang is easy to understand, which is to connect multiple consecutive lock and unlock operations together to expand a wider range of locks. If the JVM detects that a vector needs to be locked and unlocked consecutively, it will merge a larger range of lock and unlock operations and move the lock and unlock operations out of the for loop.

Biased locking

Given the built-in lock, any access to the synchronization code involves the action of acquiring and releasing the lock. And there’s a cost to doing that. There is an overhead for both heavy locking to obtain mutex semaphore and light locking to compare. Many times, however, synchronized code segments bound by built-in locks have only one thread to acquire the lock, with no concurrent access at all. There is an additional cost to locking and unlocking frequently. Therefore, biased locking also emerged.

When biased locking is used, if a thread accesses a mutex resource for the first time, the thread ID of the biased lock is stored in the lock record of the object header and stack frame (interpreted as the action of acquiring the “lock”). Biased locks do not release locks until a race occurs after a lock is acquired. That is, if there is no competition for a long time, the biased lock is always holding the lock. In this way, the next time a thread enters a synchronized block, it can access the mutex without having to acquire any locks. This saves the cost of frequently obtaining locks and releasing locks.

Lightweight lock

Lightweight locks, as the name suggests, are much cheaper to lock and unlock than heavyweight locks. The key to the high cost of heavyweight locks is the overhead of thread context switching. Lightweight locking avoids the overhead of context switching by implementing CAS in JAVA. When compare fails, the thread is not suspended; When compare succeeds, the mutexes can be modified directly (as if they were “locked”). A heavy lock is implemented using a mutex. If the mutex is not picked up, the thread is suspended. If you get a mutex, you can access the mutex directly.

From the above analysis, it can be seen that whether or not to get the “lock” actually has different meanings for different lock implementation methods. The heavyweight lock is based on the mutex, so it is considered that the lock is acquired when the mutex is acquired. The CAS operation determines whether the lock has been obtained by comparing successfully. The “lock” here is not a specific thing, but a “condition”, get the “lock”, that means to meet the “condition”, can access mutually exclusive resources. Of course, essentially, either way, you get the lock and you modify the Mark Word to record that you actually got the lock; Releasing the lock clears your thread ID in Mark Word.

The important difference between lightweight and heavyweight locks is the overhead of thread scheduling and context switching when the “lock” is not available.

Lightweight lock and lock:

Before a thread executes a synchronized block, JVM will create a Lock Record in the stack frame of the current thread for storing the product's product and copy the Mark Word in the object header into the Lock Record, which is officially called product Mark Word. The thread then tries to use CAS to replace the Mark Word in the object header with a pointer to the lock record. If successful, the current thread acquires the lock. If the update fails, the virtual machine first checks to see if the object's Mark Word points to the current thread's stack frame. If it is pointed to, it means that the current thread already owns the lock on the object, and can proceed directly to the synchronized block. Otherwise, it means that the lock object has been preempted by another thread. If there are more than two threads competing for the same lock, the lightweight lock is no longer valid. To inflate to the heavyweight lock, the pointer to the heavyweight (mutex) is stored in Mark WordCopy the code

Lightweight lock unlock:

When the lightweight product is unlocked, the CAS operation of the atom is used to replace the product Mark Word back to the object head, and the synchronization process is complete. If the replacement fails, another thread has tried to acquire the lock, and the other thread must wake up the suspended thread at the same time as the lock is released.Copy the code

The process of adding and unlocking lightweight locks is simply as follows:

  • Try CAS to modify the mark word: if this step can directly succeed, then the cost is small, can directly obtain the lock
  • If the lock fails to acquire, use spin lock to acquire the lock (policy adopted after the CAS modification attempt fails)
  • The spinlock attempt fails, the lock expands and becomes a heavyweight lock: the spinlock attempt fails, the heavyweight lock has to be used, and the thread is blocked.

conclusion

Therefore, synchronize is not as cumbersome as you might expect. In fact, you can find synchronize in a lot of source code, including the juC tool classes, etc. It has its place in the source code, so make use of it. (You have to understand it, of course)