preface

Synchronized performance was a concern prior to JDK 1.6, but since 1.6, JVM teams have done a lot of optimization for synchronized, making it comparable in performance to ReentrantLock. So what optimizations did the JVM team make?

First of all, how do you optimize? As we know, “locking” is actually a specific implementation of mutex synchronization, and the most significant performance impact of mutex synchronization is the implementation of blocking, suspending and resuming threads need to be converted from user state to kernel state. These operations put a lot of pressure on the concurrency performance of the system.

Therefore, the direction of optimization is to reduce thread blocking, since suspending and resuming threads require switching to the kernel state of the operating system.

Java 1.6 introduces “biased lock” and “lightweight lock” in order to reduce the performance loss caused by acquiring and releasing locks. In Java SE 1.6, there are four lock states, from lowest to highest: no lock state, biased lock state, lightweight lock state, and heavyweight lock state. These states escalate (or swell) with competition. Note: Locks cannot be degraded after they have been upgraded (more on why later).

  1. Biased locking
  2. Lightweight lock
  3. Heavyweight lock
  4. Lock elimination
  5. Lock coarsening
  6. Besides virtual machines, how can programmers optimize locks themselves

1. The biased locking

The virtual machine team found from experience that in most cases, locks are not only not contested by multiple threads, but are always acquired multiple times by the same thread. Biased locking was introduced to make the cost of acquiring locks lower for threads.

When a thread, when accessing a synchronized block and obtain the lock will lock in the head and stack frame object record store to lock in the thread ID, after this thread synchronization block on the entry and exit for CAS operation is not needed to lock and unlock, simply test the object of “Mark Word” whether to store the biased locking points to the current thread.

If the test succeeds, the thread has acquired the lock. If the test fails, you need to test whether the Mark of bias lock is set to 1 (indicating that the current bias lock is still) in The Mark Word. If not, CAS is used to compete for the lock. If set, try using CAS to point the bias lock of the object header to the current thread.

If the lock is not acquired by any other thread, the thread holding the biased lock will never need to synchronize.

When another thread attempts to acquire the lock, the bias mode ends and subsequent operations are upgraded to lightweight locks.

Note: Biased locking can improve the performance of synchronized but uncontending programs. It also has drawbacks: biased locking is unnecessary if most of the locks in a program are always accessed by multiple different threads. For VMS after 1.6, biased locking is enabled by default and can be disabled by using JVM parameters: -xx: -usebiasedlocking =false; The program will enter lightweight lock state by default.

As can be seen, Mark Word is the key to achieve biased locking. The latter lightweight locks are also implemented through this.

2. Lightweight locks

What is a lightweight lock? “Lightweight” is as opposed to traditional locking using operating system mutexes, hence the traditional locking mechanism is called “heavyweight” locking. It is important to note that lightweight locks are not intended to replace heavyweight locks. They are intended to reduce the performance cost of traditional heavyweight locks using OS mutex without multi-threaded competition.

JVM will create a space for storing lock records in the stack frame of the current thread before executing the synchronized block and copy the Mark Word of the object header into the lock record, which is officially called product Mark Word. The thread then tries to use CAS to replace the Mark Word in the object header with a pointer to the lock record.

If successful, the current thread lock, if failed, said other thread lock competition, the current thread will try to use the spin to get the lock, note: this thread didn’t hang himself, but by a certain number of spin (the default 10 times, you can use – XX: PreBlockSpin modify), to prevent the overhead caused by switching to kernel mode.

If more than two threads are competing for the same lock, the lightweight lock is invalidated and upgraded to the heavyweight lock.

So why can’t you downgrade after upgrading to heavyweight? For example, if the lock was upgraded to heavyweight and one of the threads holding the lock was blocked and waited a long time, the lightweight thread would spin and wait, consuming CPU performance. So, once you upgrade to a heavyweight lock, you can’t downgrade it, preventing the spin of a lightweight lock from consuming CPU.

Bias locks store the thread ID in the object header after the first thread acquires the lock. All subsequent operations are not synchronized, which is equivalent to no lock. Lightweight locking, on the other hand, still requires CAS to modify the record of the object header each time the lock is acquired. This operation is very lightweight without thread contention and does not require the mutual exclusion mechanism of the operating system.

3. Heavyweight locks

Lightweight locking is acquired through spin, while heavyweight locking is achieved by the operating system switching threads to kernel mode and blocking. The price is very high.

Here’s a look at the pros and cons of each lock:

The lock advantages disadvantages Applicable scenario
Biased locking Locking and unlocking require no additional cost, with a nanosecond difference compared to implementing asynchronous methods If there is lock contention between threads, there is additional lock cancellation cost This applies to scenarios where only one thread accesses a synchronized block
Lightweight lock Competing threads do not block, improving the response time of the program If you never get a thread competing for the lock, using spin consumes CPU Pursuit of response time, synchronous block execution is very fast
Heavyweight lock Thread contention does not use spin and does not consume CPU Threads are blocked and response time is slow Throughput pursuit, synchronous block execution time is long

You can see when to use which lock.

4. Eliminate the lock

What is lock elimination? This is when the JIT compiler removes locks that are synchronized when code does not need to be synchronized. This is a complete lock optimization. With lock elimination, you can save time on pointless lock requests.

So you have to ask, who would be stupid enough to synchronize when they don’t need to?

Take a look at the following code:

  public String[] createStrings(String[] args) {
    Vector<String> v = new Vector<>();
    for (int i = 0; i < 100; i++) {
      v.add(Integer.toString(i));
    }
    return v.toArray(new String[]{});
  }

Copy the code

Note: the v variable is only used in this method. It is purely a local variable, allocated on the stack, and thus is not thread-safe. No synchronization is necessary, whereas Vector add operations are synchronized. So the virtual machine detects this and removes the lock.

Lock elimination involves a technique: escape analysis. Escape analysis is the observation that a variable ten will escape from a scope. In this case, the variable v does not escape the function. If the function returns v itself, instead of a string array, then the task V escapes the current function. That is, v could be accessed by another thread. If so, the virtual machine cannot eliminate v’s lock operation.

5. Lock coarsening

In principle, when writing code, we always recommend keeping synchronized blocks as small as possible. This is so that the number of operations that need to be synchronized is small, and if there is a lock contention, the thread waiting for the lock can acquire the lock as quickly as possible.

For the most part, this principle is true. If a series of consecutive operations repeatedly lock and unlock the same object, even if the locking operations occur in the body of the loop, frequent synchronization operations can cause unnecessary performance losses even without thread contention.

If the virtual machine detects that many piecemeal operations are locking the same object, the scope of lock synchronization will be extended (coarsened) outside the entire operation sequence. That is, increase the synchronization block.

6. In addition to virtual machines, how can programmers optimize locks themselves

  1. Reduce lock holding time.

  2. Reduce the granularity of locks.

  3. Replace the exclusive lock with a read-write lock

  4. Lock the separation

  5. Reduce lock holding time.

If you hold a lock for a long time, the next thread will wait for a long time. If one thread waits for a second, 10,000 threads will wait for an extra 10,000 seconds. Therefore, synchronize only when necessary, and this will significantly reduce the time the thread holds the lock. Improve system throughput.

  1. Reduce the granularity of locks.

This is the opposite of what we said above about virtual machines helping us coarsening. However, we said that in most cases, reducing the granularity of locks also undermines effective means of multithreaded contention, such as ConcurrentHashMap, which locks only one bucket of the Hash bucket, unlike HashTable, which locks the entire object.

  1. Replace the exclusive lock with a read-write lock

ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock, ReadWriteLock It can effectively improve the concurrency of the system. Because read operations do not affect data integrity and consistency, as with ConcurrentHashMap’s GET method, there is no need to lock at all. You can imagine that.

  1. Lock the separation

If the read/write lock is further extended, it is lock separation. Read/write locks are effectively separated according to different read/write operation functions. The JDK’s LinkedBlockingQueue is a best practice for lock separation. Two different locks are used for the take and put operations. Because they are not competing with each other at all, or use the data structure of queues to separate the originally coupled business.

7. To summarize

Today we summarizes some lock optimization, the optimization of the virtual machine, such as biased locking, lightweight locks, spin locks, lock coarsening, lock, also have our own optimization strategy, need to pay attention to when writing code at ordinary times, such as reducing the lock, holding time, reducing lock granularity, reading writing more little occasion to use to read and write locks, as far as possible by reasonable design of separation lock.

In short, concurrency is an art. How to improve the performance of concurrency is the pursuit of every advanced programmer.

Good luck!!