This is the 17th day of my participation in Gwen Challenge


User mode and kernel mode

All the underlying principles of the JVM can not escape the operating system, from the operating system level to see the program is divided into kernel state and user state

What are user mode and kernel mode

  • User mode: Only memory access is limited and peripheral devices are not allowed to access. CPU usage is deprived and CPU resources can be obtained by other programs.
  • Kernel mode: the CPU can access all data in memory, including peripheral devices such as hard disks and network cards. The CPU can also switch itself from one program to another.

Why user mode and kernel mode?

Due to the need to limit the access ability between different programs, prevent them from obtaining the memory data of other programs, or obtain the data of peripheral devices, and send to the network, divided into two levels of permission – user mode and kernel mode.

Why user mode and kernel mode?

This article mainly introduces Hotspot versions of Oracle implementations, such as IBM J9 and taobaoVM, which have similar implementations.

Syncronized lock escalation

Many articles have covered the early days of the JDK (not including 1.6 before 1.6), when Syncronized was the heavyweight lock.

What is the lock

From the latch and iron lock in ancient times to the current combination lock and fingerprint lock, the portability and security of locks are constantly improved, and the protection of private property is more efficient and sound. In the computer world, there is no lock in the single-machine thread era. Since the emergence of competition for resources, we have realized that we need to perform on-site locks on some scenes to indicate temporary ownership. At the beginning of the computer lock is pessimistic lock, developed to the present optimistic lock, biased lock, segment lock and so on. Locks provide two main features: mutual exclusion and invisibility. Because of the lock, some operations are black-box to the outside world, and only the lock holder knows what changes have been made to the variable.

What is a heavyweight lock?

Early synchronized was called heavyweight lock because it was required to apply for lock resources through the kernel system call.

Why heavyweight locks

Why the kernel is a heavyweight lock? This starts from the bottom layer of synchronized. In the early bottom layer implementation, Mutex Lock was directly used for simplicity. Synchronized should be called Monitor Lock, which essentially relies on the Mutex Lock of the underlying operating system to implement. Each object corresponds to a marker, which can be called a “mutex,” which ensures that only one thread can access the object at any one time. The mutex’s CPU and memory northbridge signal or bus. At the same time, the execution of the mutex requires the call of the operating system. Since the Java thread is mapped to the native thread of the operating system, if you want to block or wake up a thread, the operating system needs to help complete it, which requires the transition from the user state to the kernel state, which requires a lot of processor time. Synchronized, therefore, is a heavyweight operation in Java language. The reason for its weight is that it needs the big brother of the operating system to help it schedule, which involves system call and interrupt. The following assembly code briefly describes the process of system call

section .text global _start _start: mov edx, len mov ecx, msg mov ebx, 1 ; File descriptor 1 std_out mov eax, 4; 4 int 0x80 mov ebx, 0 mov eax, 1; The exit function system call number int 0x80Copy the code

When a program performs a system call, it first uses a soft interrupt command similar to int 80H to save the scene, remove the system call, kernel execution, and then restore the scene. Each thread will have two stacks, one kernel stack and one user stack. When the interrupt execution time there will be a user mode to kernel mode, the system calls will perform stack switch, is don’t trust to user mode and kernel mode, you need to do a series of additional checks, the system calls the return process requires a lot of check whether you need such as scheduling, but also save the context, this is reason why is heavyweight lock.

Why is it a synchronization monitor

Start with the code below, a simple synchronized block

public class SynLock {
    public void testSynBlock(a) {
        synchronized (this) {
            System.out.println("steven"); }}}Copy the code

Javap decompiles the bytecode into readable code, and decompiles the bytecode into readable code as follows:

Monitorenter and Monitorexit directives monitorenter and Monitorexit directives moniterEnter and Monitorexit directives

General meaning: Each object has a Moniter associated with it, and the thread that executes the MoniterEnter directive gains ownership of the monitor associated with the Objectref. If another thread already owns the monitor associated with the Objectref, the current thread will wait until the object is unlocked.

Java language synchronization blocks monitorenter and MonitoreXit directives are not used for synchronization methods

Now there are two more questions:

  1. Why one Monitorenter and one or more Monitorexit
  2. Why are monitorenter and Monitorexit directives not used for synchronization methods

Here’s the answer:

The monitorenter process is as follows:

If monitor’s number of entries is 0, the thread enters Monitor, then sets the number of entries to 1, and the thread is the owner of Monitor. If the thread already owns the monitor and simply re-enters, the number of entries into the monitor is increased by one (typical reentrant lock logic). If another thread has occupied monitor, the thread blocks until the number of monitor entries is zero, and then tries again to acquire ownership of monitor

The thread of execution of Monitorexit must be the owner of Monitor.

When the instruction is executed, the number of entries in the monitor is reduced by one. If the number of entries is zero after the instruction is reduced by one, the thread exits the Monitor and is no longer the owner of the monitor. Other threads blocked by the Monitor can try to take ownership of the monitor

That is, monitorexit releases locks and must execute at all possible exits of the code block. As you can see from the decompile diagram above, one Monitorenter corresponds to two Monitorexit.

InterpreterRuntime: Monitorenter

IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorenter(JavaThread* thread, BasicObjectLock* elem))
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
  if (PrintBiasedLockingStatistics) {
    Atomic::inc(BiasedLocking::slow_path_entry_count_addr());
  }
  Handle h_obj(thread, elem->obj());
  assert(Universe::heap() - >is_in_reserved_or_null(h_obj()),
         "must be NULL or an object");
  if (UseBiasedLocking) {
    // Retry fast entry if bias is revoked to avoid unnecessary inflation
    ObjectSynchronizer::fast_enter(h_obj, elem->lock(), true, CHECK);
  } else {
    ObjectSynchronizer::slow_enter(h_obj, elem->lock(), CHECK);
  }
  assert(Universe::heap() - >is_in_reserved_or_null(elem->obj()),
         "must be NULL or an object");
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
IRT_END
Copy the code

Second question: First look at the method code

public class SynLock { private synchronized void testSynMethod() { System.out.println("steven"); }}Copy the code

Using the command line to compile and decompile is tedious, so this time use the IDEA and Jclasslib plug-ins to analyze:

As you can see from the figure, there are no Monitor related operations on the bytecode in the method

The JVM uses this tag to identify the attributes of a method: access _Flags (synchronized) ¶ When the JVM execution engine executes a method, it retrieves the method’s access_flags from the method area to check whether the method has the ACC_SYNCRHONIZED identifier. If the identifier is present, it indicates that the current method is a synchronous method and requires the monitor of the current object before executing the method.

The object is this if it’s an instance method, or class if it’s a static method. Common flags are shown below:

Lock escalation

Synchronized performance was widely criticized before JDK1.6, and synchronized developers felt that the lock was too different from the current AQS implementation in JUC, so the 1.6 version had a major update. So there is now commonly said lock upgrade or lock expansion concept, the overall idea is to do not disturb the operating system eldest brother do not disturb the eldest brother, can solve in the user state does not go through the kernel.

The upgrade process

No lock (lock object initialization) -> biased lock (thread request lock) -> Lightweight lock (multi-threaded mild competition) -> heavyweight lock (too many threads or long time operation, thread spin excessive CPU consumption);

Object head

Before verifying, a little bit of knowledge is needed. Where is the lock state stored?

Through the above analysis, all synchronization monitors are corresponding to monitoring. The lock state is on the object Markword, which is a part of the Java object data structure. Markword of the object is closely related to various types of Java locks.

The length of markword data is 32bit and 64bit respectively in 32-bit and 64-bit VMS (with compressed pointer disabled. JVM configuration parameters :UseCompressedOops, compressed– oop– object pointer). Markword is used to mark the current state of the object. The state of the object determines what markWord stores, as shown in the following table: The object header contains two words. Mark Word is the first word, which contains lock information, hashcode, GC information, etc. According to the document, klass Word is the second word, which refers to the metadata of the object. 64-bit VM lock object status:In a nutshell:

state Sign a Store content
unlocked 01 Object hash code, object generation age
Lightweight locking 00 A pointer to a lock record
Heavyweight lock 10 A pointer that performs a heavyweight lock
The GC tag 11 Empty (no need to record information)
Biased locking 01 Bias thread ID, bias timestamp, object generation age

Why choose this process??

The JDK developers did a lot of math and concluded that while the developers added synchronized to mutually exclusive resource access, the actual competition for resources was minimal or short-lived, meaning that many locks were unnecessary.

The source code for synchronizer in Hotspot is synchronizer. CPP as shown below, where you can see BiasedLock and CAS locks

revoke_and_rebias

void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock, bool attempt_rebias, TRAPS) {
 if (UseBiasedLocking) {
    if(! SafepointSynchronize::is_at_safepoint()) {
      BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD);
      if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) {
        return; }}else {
      assert(! attempt_rebias,"can not rebias toward VM thread");
      BiasedLocking::revoke_at_safepoint(obj);
    }
    assert(! obj->mark() - >has_bias_pattern(), "biases should be revoked by now");
 }

 slow_enter (obj, lock, THREAD) ;
}
Copy the code
void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
  markOop mark = obj->mark(a);assert(! mark->has_bias_pattern(), "should not see bias pattern here");

  if (mark->is_neutral()) {
    // Anticipate successful CAS -- the ST of the displaced mark must
    // be visible <= the ST performed by the CAS.
    lock->set_displaced_header(mark);
    if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj() - >mark_addr(), mark)) {
      TEVENT (slow_enter: release stacklock) ;
      return ;
    }
    // Fall through to inflate() ...
  } else
  if (mark->has_locker() && THREAD->is_lock_owned((address)mark->locker())) {
    assert(lock ! = mark->locker(), "must not re-lock the same lock");
    assert(lock ! = (BasicLock*)obj->mark(), "don't relock with same BasicLock");
    lock->set_displaced_header(NULL);
    return;
  }

#if 0
  // The following optimization isn't particularly useful.
  if (mark->has_monitor() && mark->monitor() - >is_entered(THREAD)) {
    lock->set_displaced_header (NULL);return ;
  }
#endif

  // The object header will never be displaced to this lock,
  // so it does not matter what the value is, except that it
  // must be non-zero to avoid looking like a re-entrant lock,
  // and must not look locked either.
  lock->set_displaced_header(markOopDesc::unused_mark());
  ObjectSynchronizer::inflate(THREAD, obj() - >enter(THREAD);
}
Copy the code

How do I prove that an upgrade process exists

The lock status can be known by observing the object header, so it can be verified by observing the object header. I know there are two ways to print the lock status, one is to add the agent to the object after creation with ObjectSizeService. SizeOf, the other is to implement JOL provided by OpenJDK.

JOL (Java Object LayeOut) Java object layout, introducing Maven coordinates:

<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.10</version>
</dependency>
Copy the code

Test code:

@Test
public void test_object_layout() {
    Object o = new Object();
    System.out.println(VM.current().details());
    System.out.println(ClassLayout.parseInstance(o).toPrintable());
}
Copy the code

The Object header is 12B, and 4B are aligned bytes (because objects must be multiples of 8 on 64-bit virtual machines). Since there are no fields in this Object, the instance data of this Object is 0B.

What is 12B for ObjectHeader

So what do we store in each of these 12B’s? (Different bits of VM object header length is different, I am a 64-bit VM), openJdk documentation explains:

According to the document, mark Word is the first word, which contains lock information, hashcode, GC information, etc. Klass Word is the second word for the object header, which mainly refers to the metadata of the object. According to the object header information printed by JOL above, it can be known that an object header is 12Byte. 8Byte is the Mark word, and the remaining 4Byte is the Klass word, and the lock-related is the Mark word. Then, the information in the Mark Word will be analyzed

Code verification bias lock

@Test
public void test_syn_lock() {
    Object o = new Object();
    synchronized (o){
        System.out.println(ClassLayout.parseInstance(o).toPrintable());
    }
    System.out.println("--------------------------------------");
    System.out.println(ClassLayout.parseInstance(o).toPrintable());
}
Copy the code

From the result markword screenshots, there is no bias lock, directly become lightweight lock, what is the reason? This was done intentionally by the JDK developers, since there are usually too many object allocations, JVM, GC, etc., to start bias locking immediately, which is delayed by 4 seconds by default. Add the code for sleeping for five seconds to the code above and you get a different result

@Test
public void test_syn_lock() {
	TimeUnit.SECONDS.sleep(5L);
    Object o = new Object();
    synchronized (o){
        System.out.println(ClassLayout.parseInstance(o).toPrintable());
    }
    System.out.println("--------------------------------------");
    System.out.println(ClassLayout.parseInstance(o).toPrintable());
}
Copy the code

JVM default delay 4 s auto open biased locking (at this time as an anonymous biased locking, does not point to task thread), by – XX: BiasedLockingStartUpDelay = 0 cancel delay; To avoid biased locking, run -xx: -usebiasedlocking = false.

Verify heavyweight locks:

@Test public void test_syn_heavy_lock() throws InterruptedException { Object o = new Object(); For (int I = 0; i < 100; i++) { new Thread(()->{ synchronized (o){ try { TimeUnit.SECONDS.sleep(1L); } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); } TimeUnit.SECONDS.sleep(5L); System.out.println(ClassLayout.parseInstance(o).toPrintable()); TimeUnit.SECONDS.sleep(100L); System.out.println(ClassLayout.parseInstance(o).toPrintable()); }Copy the code

The results can be seen: In other words, locks can only be upgraded and cannot be degraded. When the competition is serious, they can be upgraded to heavyweight locks. Biased locks and lightweight locks do not need to go through kernel state in user mode maintenance, while heavyweight locks need to switch to kernel state (OS) for maintenance. This is the essence of JDK1.6’s post-synchronized performance improvement.

Lightweight lock

There is a lightweight lock in the lock upgrade process. Lightweight locks are generally referred to as spunlocks CAS(Compare And Exchange), which can also be considered lock free for Java developers because there is no locking code at the Java code level.

CAS (V, A, B), variable V, expected value A, and modified value B. ABA problems may occur (version number AtomicStampedReference). Simple values of the basic type do not require a version number.

After JDK1.6, a large number of CAS operations are introduced, such as AtomicXXX atomic operation class, synchronized all C ++ implementation cannot track, so AtomicInteger as an example of CAS, AtomicInteger calls the incrementAndGet method, which calls the unsafe class compareAndSwapInt. You can download Hotspot code analysis from Oracle’s official website. The general idea is to use Lock and CMPXCHG instructions of Linux_x86 assembly language. These instructions are implemented in user mode without switching kernel mode, so the efficiency is relatively high. That’s why it’s called a lightweight lock.

The following is the core code part of CAS typical JUC package AtomicIntegerr, this code is a little clearer than synchronized, C++ based can be studied

Java: AtomicInteger:

public final int incrementAndGet() { for (;;) {// spin current = get(); int next = current + 1; if (compareAndSet(current, next)) return next; } } public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); }Copy the code

Java :Unsafe:

public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5);
Copy the code

jdk8u: unsafe.cpp: cmpxchg = compare and exchange

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
  UnsafeWrapper("Unsafe_CompareAndSwapInt");
  oop p = JNIHandles::resolve(obj);
  jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
  return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END
Copy the code

Jdk8u: atomic_linux_x86. Inline. HPP 93 rows

Is_MP = Multi Processors

inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  int mp = os::is_MP(a);__asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
                    : "cc"."memory");
  return exchange_value;
}
Copy the code

The bottom is through the instruction CMPXCHGL to achieve, if the program is multi-core environment, will first generate lock command prefix before CMPXCHGL, on the contrary, if it is in a single-core environment, there is no need to generate lock command prefix. Why do multiple cores generate lock instruction prefixes? Because the CAS is an atomic operation, atomic operation reference to the implementation of computer assembly level, multi-core CPU time, if the operation to the multiple cpus, it destroyed the atomicity, so multicore environment must have to add a lock instruction, whether it is the bus or to cache lock, lock mononuclear there had been no such problem. In addition to CAS, there are eight atomic instructions in the JVM that you can learn on your own.


jdk8u: os.hpp is_MP()

  static inline bool is_MP(a) {// Check if it is multicore
    // During bootstrap if _processor_count is not yet initialized
    // we claim to be MP as that is safest. If any platform has a
    // stub generator that might be triggered in this phase and for
    // which being declared MP when in fact not, is a problem - then
    // the bootstrap routine for the stub generator needs to check
    // the processor count directly and leave the bootstrap routine
    // in place until called after initialization has ocurred.
    return(_processor_count ! =1) || AssumeMP;
  }
Copy the code

jdk8u: atomic_linux_x86.inline.hpp

#define LOCK_IF_MP(mp) "cmp $0, " #mp "; je 1f; lock; 1:"
Copy the code

Finally: CMPXCHG = cas changing variable values can be found in C++ code. Cas is ultimately implemented by the lock CMPXCHG instruction. Both instructions are assembly instructions that can be understood as hardware-level code for us Java application developers.


Usage scenarios

Compared with AQS lock, synchronized is more concise in obtaining and releasing locks without display, and has high-performance methods such as biased lock and spin lock. Therefore, it is better to use synchronized when there may be resource competition but the possibility is small or the competition wait is very short.

I leave you with a few questions to think about

Describe the lock upgrade process.

When does a spin lock become a heavyweight lock?

Why do you need a heavyweight lock when you have a spin lock?

Is bias locking necessarily more efficient than spin locking?