Moment For Technology

Explore the implementation principle of Java atomic CAS - from HotSpot source code to CPU instruction CMPXCHG

Posted on Sept. 23, 2022, 9:47 a.m. by 崔惠如
Category: The back-end Tag: java

In the Java Java. Util. Concurrent. Atomic package, class provides a number of atoms. These atomic classes are mainly dependent on the underlying CAS mechanism to implement atomic update operations of internal values.

AtomicInteger source

JDK 1.8 AtomicInteger

public class AtomicInteger extends Number implements java.io.Serializable {

    // setup to use Unsafe.compareAndSwapInt for updates
    private static final Unsafe unsafe = Unsafe.getUnsafe();
    private static final long valueOffset; // Value memory offset

    static {
        try {
            valueOffset = unsafe.objectFieldOffset
                (AtomicInteger.class.getDeclaredField("value"));
        } catch (Exception ex) { throw newError(ex); }}private volatile int value;

    public AtomicInteger(int initialValue) {
        value = initialValue;
    }

    public AtomicInteger(a) {}/** * If the value of the current object is equal to expect, atomically set the value of the variable to update */
    public final boolean compareAndSet(int expect, int update) {
        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
    }
Copy the code

The internal value of AtomicInteger is stored in the variable value. ValueOffset represents the memory offset of the value field in the object. The compareAndSet() method is used to compare atoms and update the value of the value.

For the compareAndSet() method, it simply calls the Unsafe compareAndSwapInt() method, which operates on the value field's memory offset, valueOffset.

The Unsafe source

The sun.misc.Unsafe class compareAndSwapInt() method is declared as follows:

    /** * atom will replace the object o at offset with x if the old value is the same as expected */
    public final native boolean compareAndSwapInt(Object o, long offset,
                                                  int expected,
                                                  int x);
Copy the code

Unsafe class compareAndSwapInt is a native method, concrete implementation in the hotspot of the source code/SRC/share/vm/prims/Unsafe. The CPP file.

The unsafe. CPP source

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
  UnsafeWrapper("Unsafe_CompareAndSwapInt"); / / 1
  oop p = JNIHandles::resolve(obj); Line / / 2
  jint* addr = (jint *) index_oop_from_field_offset_long(p, offset); / / 3
  return (jint)(Atomic::cmpxchg(x, addr, e)) == e; / / 4
UNSAFE_END
Copy the code

The implementation logic is in lines 2 through 4, where lines 2 and 3 mainly calculate the address of the field based on the offset of the field.

The key is line 4, which essentially calls the Atomic:: CMPXCHG method to implement the comparison swap operation (CAS) of field values.

CMPXCHG method of linux_x86

For the linux_x86 platform, the source code for CMPXCHG is: hotspot\ SRC \ OS_CPU \linux_x86\vm\atomic_linux_x86.inline.hpp.

inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  int mp = os::is_MP();
  __asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
                    : "cc"."memory");
  return exchange_value;
}
Copy the code

The first line is used to determine if there are multiple processors, if so mp is 1, otherwise MP is 0.

__asm__
Copy the code

Indicates that the following parentheses are assembly instructions, consisting of four parts.

The parts in parentheses are separated by colons (:), focusing on the first three parts, which represent the assembly instruction, output parameter list, and input parameter list respectively.

In order of the output and input parameter lists, the assembly instruction references to them are numbered as follows (the comparison between the number in the assembly instruction and the input and output parameters) :

%0:"=a" (exchange_value) // The first output parameter. After the CMPXCHGL instruction finishes execution, the value of register EAX is saved to Exchange_value and returned at the end of method execution
%1:"r" (exchange_value) // the first input parameter. The compiler selects any register available to store exchange_value, for example using register ECX.
%2:"a" (compare_value) // The second input parameter. Register EAX is used to store the compare_value to be compared, and is used when executing the CMPXCHGL instruction.
%3:"r" (dest) // The third input parameter. The compiler selects any register available to store the destination address dest, for example using register EDx.
%4:"r" (mp) // the fourth input parameter. The compiler selects any register available to store multiprocessor flags
Copy the code

Where, constraint A represents register Eax. The constraint r stands for register, and the compiler selects any available register to use. For a description of each parameter, see the comments above.

The operands referenced by %1 and %3 are preceded by a constraint r, which means that the compiler can select any available register to manipulate the data. Assuming registers ECX and EDx are used respectively, the CMPXCHGL instruction is replaced as follows:

cmpxchgl %%ecx,(%%edx)
Copy the code

(See x86 inline assembly for the syntax above)

CMPXCHGL instruction description

The CMPXCHGL instruction compares the value of the memory cell corresponding to the register EAX and the target operand EDx. If they are equal, the value of register ECx is saved to the memory location corresponding to the operand EDx, replacing the old value. If not, the value of the memory cell corresponding to EDX is loaded into register EAX.

At the end of the CMPXCHG () method, Exchange_value is returned as the result of the method. In fact, after the CMPXCHGL instruction completes, Exchange_value is updated to the corresponding value of register EAX.

Therefore, the CMPXCHG () method may return either the original compare_value passed in, or the memory location corresponding to the destination address dest, depending on whether the CMPXCHGL directive successfully updated the destination address value.

(CMPXCHG instruction reference: Intel x86 comparison switch instruction CMPXCHG function and principle)

Before the CMPXCHGL directive, a macro is also used to define LOCK_IF_MP.

// Adding a lock prefix to an instruction on MP machine
#define LOCK_IF_MP(mp) "cmp $0, " #mp "; je 1f; lock; 1:"
Copy the code

The LOCK_IF_MP macro takes one parameter mp, which defines a piece of assembly code.

Format the macro code and incorporate the CMPXCHGL code to take a full look:

cmp $0." #mp "
je 1f
lock 
1: cmpxchgl %%ecx,(%%edx)
Copy the code

This code is the core logic of the entire CMPXCHG () method.

First, the CMP instruction is used to compare whether the value of the input parameter mp is 0.

The JE directive is then used to jump based on the result of the comparison.

If the CMP comparison results are equal, the CMPXCHGL instruction is executed by jumping to the position of 1.

If the CMP comparison results are not equal, simply perform later. That is, execute the LOCK instruction first, then execute the CMPXCHGL instruction.

Obviously, when mp is 1 (multiprocessor), the CMP result is unequal, and an additional LOCK prefix instruction is executed.

Lock the prefix

Through "Intel® 64 and IA-32 Architectures Software Developers Manual Combined volumes 3A, 3B, 3C, And 3D System Programming Guide chapter 8 (8.1)

The bus can be locked using the LOCK# signal.

The use of the lock prefix directive can be used to pronounce the lock # signal. That is, adding the prefix LOCK to the instruction triggers the lock # signal.

For Intel486 and Pentium processors, a LOCK # signal is always spoken on the bus during a LOCK LOCK operation, even if the memory region being locked is cached in the processor.

For P6 and later processors, the LOCK# signal will not be spoken if the region of memory that the LOCK operation is locking is already cached in the processor. Instead, it locks the cache of this memory area and writes back to memory. At the same time, a cache consistency mechanism is used to ensure atomicity of modification operations. This is called cache locking.

The cache consistency mechanism automatically prevents "two or more processors that have cached the same memory region from simultaneously modifying data in that region."

Windows_x86 CMPXCHG method

For windows_x86 platforms, the corresponding source code for CMPXCHG is: hotspot\ SRC \ OS_CPU \windows_x86\vm\atomic_windows_x86.inline. HPP.

// Adding a lock prefix to an instruction on MP machine
// VC++ doesn't like the lock prefix to be on a single line
// so we can't insert a label after the lock prefix.
// By emitting a lock prefix, we can define a label after it.
#define LOCK_IF_MP(mp) __asm cmp mp, 0  \
                       __asm je L0      \
                       __asm _emit 0xF0 \
                       __asm L0:

inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  // alternative for InterlockedCompareExchange
  int mp = os::is_MP();
  __asm {
    mov edx, dest
    mov ecx, exchange_value
    mov eax, compare_value
    LOCK_IF_MP(mp)
    cmpxchg dword ptr [edx], ecx
  }
}
Copy the code

Merge the LOCK_IF_MP macro code and the method assembly code together, the overall code is as follows:

    mov edx, dest			// The destination address is saved to register EDx
    mov ecx, exchange_value	// The new value is saved to register ECx
    mov eax, compare_value	// The value to be compared is saved to register eax
    cmp mp, 0				// use the CMP command to compare mp to 0
    je L0					// If the value of mp is 0, it means the single processor, then the CMPXCHG command is executed directly at L0
    _emit 0xF0				// If the value of mp is not 0, which indicates the multiprocessor, execute the lock prefix instruction. _EMIT 0xF0 is converted to a LOCK prefix command.
    L0:
    cmpxchg dword ptr [edx], ecx  // Execute the CPU CMPXCHG command
Copy the code

As you can see, windows_x86 and Linux_x86 implement the CMPXCHG method in the same way. There are just some differences in the way the code is written due to different platforms and different assembly syntax.

summary

Atomic classes in Java rely mainly on CAS to implement atomic update operations.

The Cas-related methods in the Java library are all provided by the Sun.misc. Unsafe class.

The methods defined in the Unsafe class are, in fact, based on native implementations.

Unsafe class local method specific implementation, the hotspot of the source code/SRC/share/vm/prims/Unsafe. The CPP file.

The underlying implementation of CAS ends up calling the Atomic:: CMPXCHG method.

The HotSpot VIRTUAL machine provides different implementations for the Atomic:: CMPXCHG method on different operating system platforms, but the principles are the same.

Therefore, Java's CAS mechanism is actually implemented by calling the processor's CMPXCHG instruction.

However, when the CMPXCHG instruction is executed on multiple processors, it is preceded by a lock prefix instruction. Because the CMPXCHG instruction itself is not an atomic operation, it needs to be locked with the lock prefix instruction to make the atomic operation happen.

Through the Intel x86 comparison switching instruction CMPXCHG function and principle of this article can understand that the CMPXCHG instruction contains several operations. To ensure atomization of instructions on multiple processors, if the server has multiple processors, atomic execution of CMPXCHG instructions needs to be implemented by adding the LOCK prefix.

If you want to better understand the above assembly code, you can take a look at the following articles, and then it will be easier to look back:

X86 assembler guide to register and memory addressing modes

GCC assembly syntax differs from Intel assembly syntax in several ways

X86 inline assembly

The function and principle of Intel x86 comparison switch instruction CMPXCHG

reference

The Intel documentation: software.intel.com/content/www...

Conditionals: Goto and Branch Instructions:www.cs.uaf.edu/2015/fall/c...

Loops and Branches in Assembly:www.cs.uaf.edu/2012/fall/c...

About me

Public number: Binary road

Blog: binarylife. Icu

Tutorial: 996 geek.com

Search
About
mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.