This is the fifth article in the in-depth Understanding of Multithreading. In the previous articles, we started with the implementation principle of synchronized and continued to introduce the implementation principle of Monitor.

Antecedents feed

From previous articles, we already know that:

1. Methods are implicitly locked by ACC_SYNCHRONIZED. When a thread executes a method labeled ACC_SYNCHRONIZED, it needs to acquire a lock to execute the method. An In-depth Understanding of Multithreading (I) — The Realization principle of Synchronized

2. Synchronization blocks are locked by monitorenter and Monitorexit implementations. When the thread reaches the Monitorenter, it must acquire the lock before it can perform the following methods. The lock is released when the thread executes to Monitorexit. Understanding Multithreading, Part 4: How Moniter works

3. In the HotSpot VIRTUAL machine, objects are represented using the Oop-Klass model. For each Java class, when it is loaded by the JVM, the JVM creates an instanceKlass for that class, which is stored in the methods area and used to represent the Java class at the JVM layer. When we create an object using new in Java code, the JVM creates an instanceOopDesc object that contains the object header and instance data. Understanding Multithreading in Depth (PART 2) : Java’s Object Model

4. The object header mainly contains GC generation age, lock status mark, hash code, epoch and other information. There are five states of an object, namely lockless, lightweight, heavyweight, GC, and biased locks. Deep Understanding of Multithreading (PART 3) — Object Headers in Java

At the end of the last article, we said that, in fact, only before JDK1.6,synchronizedThe implementation of theObjectMonitortheenterandexitThis type of lock is called a heavyweight lock.

Efficient concurrency is a major improvement from JDK 1.5 to JDK 1.6, where the HotSpot VIRTUAL Machine development team put a lot of effort into optimizing locks in Java such as adaptive spin, lock elimination, lock coarser, lightweight and biased locks. These techniques are designed to share data more efficiently between threads, and to solve competition problems.

In this article, we will introduce the techniques of spin, lock elimination and lock coarsening.

As a quick note, the concepts described in this article, along with lightweight and biased locking, are shielded from use by developers. That is, as a Java developer, all you need to know is that you want to use synchronized for locking. Specific lock optimization is determined by the virtual machine based on the competition situation.

That is, after JDK 1.5, the concepts we’ll cover are encapsulated in Synchronized.

Thread state

An important concept to mention in order to explain locks is threads and thread state. Let’s take a simple example to see how locks relate to threads.

Do business, for example, you want to go to the bank today, at the end of the bank, you want to take a number first, then you sat in the lounge area and wait your turn, over time, the radio after the call to your number, will tell you which one counter business, at this time, the number you hold your hand, to go to the corresponding counter, find corresponding counter start to deal with business. This counter and the teller behind it can only serve you when you do business. When you have finished your business and left, the loudspeaker calls for other customers to come forward.

In this example, each customer is a thread. The chair in front of the counter, that’s the lock. The teller behind the counter is a shared resource. When you find that you can’t do business directly, you have to wait for a number. This process is called blocking. When you hear your number called, you get up and do your business. That’s a wake-up call. When you sit in the chair and start doing business, you get the lock. When you leave after your business, you release the lock.

There are five states for threads: initial state (New), ready state (Runnable), Running state (Blocked), and Dead state.

spinlocks

In the previous article, we introduced a synchronized implementation that uses Monitor to lock, a mutex we call a heavyweight lock to indicate its impact on performance.

This mutex has a significant performance impact on mutex synchronization. Java threads are mapped on top of OS native threads, and blocking or waking up a thread requires the operating system to help it transition from user state to kernel state, so state transitions take a lot of processor time.

Take the example of going to the bank to do business. When you come to the bank and there are people at the front of the counter, you need to take a number and then go to the waiting area and wait for the number to be called. This process is a waste of time, so is there any way to improve it?

A better design is that the bank provides ATM, when you go to the bank to withdraw money, you don’t need to take the number, you don’t need to go to the rest area and wait for the number, you just need to find an ATM, wait behind other people and wait for the money.

This is possible because the withdrawal process is relatively time – saving. If everyone went to the bank only to withdraw money, or the transaction time was very short, there would be no need to take a number, no need to go to a separate rest area, no need to listen to the number, no need to run to the corresponding counter.

On the other hand, Java virtual machine developers, after analyzing large amounts of data, found that shared data is generally locked for only a short period of time, which is not worth suspending and resuming threads.

If there are multiple processors on the physical machine and you can have multiple threads executing simultaneously. We can ask subsequent threads to “wait a bit”, but not forfeit the processor’s execution time, to see if the thread holding the lock will release it quickly. This “wait a minute” process is called spin.

Spin-locking was introduced in JDK 1.4 and enabled by default in JDK 1.6.

If you don’t know what a spin lock is, you might wonder: It sounds like a spin lock is no different from a blocking lock, just waiting.

  • There’s one big difference between standing in front of an ATM machine and going to a lounge and waiting to be called:

    • If you wait in the rest area, you don’t need to do anything during that time, just go about your business and wait to be woken up.

    • If you are waiting in front of an ATM, you need to keep an eye on whether there are people in front of you because no one will wake you up.

    • Obviously, it’s more efficient to go straight to the ATM and wait in line.

So, the biggest difference between a spin lock and a blocking lock is whether you give up the processor’s execution time at all. For both blocking and spin locks, you wait to acquire a shared resource. But blocking locks give up CPU time and enter the waiting area, waiting to be woken up. The spin-lock is always “spinning” there, checking to see if the shared resource can be accessed.

Since the spin lock simply keeps the current thread executing the loop body without changing the thread state, the response is faster. However, when the number of threads keeps increasing, performance degrades significantly because each thread needs to execute, consuming CPU time. If the thread contention is not fierce, and the lock period is maintained. Suitable for use with spin locks.

Lock elimination

In addition to spin locking, there is another lock optimization in the JDK called lock elimination. Take the example of going to the bank to withdraw money.

You go to the bank to withdraw money, in all cases need to take the number and wait? It is need not actually, when the person that deals with business is not much when the bank, the likelihood does not need to take a number at all, it is good to deal with business to counter front directly.

You can only do that if you don’t have anyone else competing for your business.

The above example, known as “lock elimination” in lock optimization, is an optimization made by the JIT compiler to implement a specific internal lock.

When a synchronized block is dynamically compiled, the JIT compiler can use a technique called Escape Analysis to determine whether the lock object used by the synchronized block is accessible to only one thread and not published to other threads.

If the lock object used by the synchronized block is verified by this analysis to be accessible by only one thread, the JIT compiler unsynchronizes that part of the code when the synchronized block is compiled.

Such as the following code:

public void f() { Object hollis = new Object(); synchronized(hollis) { System.out.println(hollis); }}Copy the code

The hollis object is locked in the code, but the hollis object’s lifetime is only in the f() method and is not accessed by other threads, so it is optimized for JIT compilation. Optimized:

public void f() {
    Object hollis = new Object();
    System.out.println(hollis);
}
Copy the code

Here, there may be readers will question, the code is written by programmers, programmers do not have the ability to judge whether to lock it? As with the above code, there is no need to lock it at all, as experienced developers can see at a glance. This is true, but it’s possible to slip up. For example, we often use StringBuffer as a local variable in our code, and the Append of a StringBuffer is thread-safe and synchronized modified, which developers may overlook. At this point, the JIT can help with optimization and lock elimination.

Those of you who know me know that at this point, I usually decompile, and then show the decompilated code to prove that lock optimization exists.

However, in many previous examples, decomcompiling tools were possible because the “optimizations”, such as syntactic sugar, occurred at the JavAC compilation stage, not the JIT compilation stage. Lock optimization, on the other hand, is a function of the JIT compiler, so you can’t use existing decomcompiler tools to see the specific optimization results. (I posted a separate article on my Knowledge planet about the relationship and differences between JavAC compilation and JIT compilation.)

However, if you are interested, you can read it, but it is a bit more complicated. First you need to build a FastTest VERSION of the JDK yourself, and then add -xx :+PrintEliminateLocks to the.class file using the Java command. The JDK must also be in Server mode.

In summary, all the reader needs to know is that with synchronized, the JIT does lock elimination if it finds no thread-safety issues after escape analysis.

Lock coarsening

As many of you know, when locking is required in code, we advocate minimizing the granularity of locks to avoid unnecessary blocking.

This is why many people use synchronized code blocks instead of synchronized methods, because they tend to be less granular, which makes sense.

Or do we go to the bank counter to do business, the most efficient way is when you sit in front of the counter, only do banking related things. It would be a waste of time if you pulled out your cell phone, made a few calls, and asked your friends which account you wanted to put money into. The best way is certainly to prepare the relevant information in advance, when dealing with business directly.

The same is true with locking, where irrelevant preparations are put outside the lock and only concurrency is handled inside the lock. This helps to improve efficiency.

So what does this have to do with lock coarsening? It can be said that reducing the granularity of locks is the right thing to do most of the time, but there is only one special case where an optimization called lock coarsening occurs.

Just like when you go to the bank to do business, in order to reduce the time for each transaction, you divide the five transactions into five transactions, which is counterproductive. This simply increases the amount of time you have to resend, queue up and wake up.

If you repeatedly lock and unlock the same object in a piece of code, it is relatively expensive. In this case, you can appropriately broaden the scope of locking to reduce performance consumption.

When the JIT finds that a series of consecutive operations repeatedly lock and unlock the same object, or even lock operations occur in the body of the loop, the scope of lock synchronization spreads (coarser) out of the entire operation sequence.

Such as the following code:

for(int i=0; i<100000; i++){ synchronized(this){ do(); }Copy the code

Will be coarsed to:

synchronized(this){ for(int i=0; i<100000; i++){ do(); }Copy the code

This actually does not conflict with our requirement to reduce the lock granularity. Reducing lock granularity emphasizes not doing preparatory work at the bank counter and doing things unrelated to the transaction. And lock coarser suggestion is that the same person, to deal with a number of business, can be done in the same window one-time, rather than many times to take the number for many times.

conclusion

Since Java 6/Java 7, the Java Virtual machine has made some improvements to the implementation of internal locking. These optimizations mainly include Lock Elision, Lock Coarsening, Biased Locking and Adaptive Locking. These optimizations only work in Java Virtual Machine Server mode (i.e., we may need to specify the Java virtual machine parameter “-server” on the command line to turn these optimizations on when running a Java program).

This paper mainly introduces the concepts of spin lock, lock coarsening and lock elimination. During JIT compilation, the virtual machine uses these three techniques to optimize locks as appropriate, with the goal of reducing lock contention and improving performance.