The original article is from my blog yuequan’s blog

It is recommended that you have prior knowledge of the JMM and the CPU Cache consistency protocol and associated memory barriers and understand out-of-order CPU execution. If this article has been helpful to you, please point a small star to the Github of your blog. You can also directly submit an issue on Github if you have misunderstood the article

The topics discussed in this article are:

  • Volatile in Java semantics
  • The memory barrier
  • The implementation of the JVM
  • The generated assembly instruction
  • How to ensure visibility and order
  • Why does volatile not guarantee atomicity for compound operations

Volatile in Java semantics

Let’s start with a very common case

public class Test {
    public static void main(String[] args) throws InterruptedException {
        Demo demo = new Demo();
        demo.setName("demo-thread");
        demo.start();
        Thread.sleep(1000);
        demo.flag = false;
        demo.join();
        System.out.println(demo.getName() + "Thread completes execution:" + demo.count);
    }

    static class Demo extends Thread{
        boolean flag = true;
        int count = 0;
        @Override
        public void run(a){
            while(flag){ count ++; }}}}Copy the code

It is obvious that the demo-thread reads the flag value true from main memory and places it in working memory. Then, while checks whether the flag value is true. If it is true, the demo-thread loops until flag value is false. However, the Demo-thread thread did not perceive that the flag value had been changed to true by the main thread. Therefore, it could not be stopped. To put it simply, this is the problem of visibility between threads.

How to solve this problem? Volatile is the subject of this article. (There are actually several ways to solve this, and I’m doing this for the purpose of this article.)

public class Test {
    public static void main(String[] args) throws InterruptedException {... }static class Demo extends Thread{
        volatile boolean flag = true; . }}Copy the code

The variable is decorated with the volatile keyword so that changes to it are visible to other threads. Volatile also prohibits instruction-optimized reordering. After volatile, memory barriers are inserted when operations are performed on the variable.

The memory barrier

About the processor memory barriers, and then to discuss the JVM memory of the definition of barrier, first understand what is memory barrier essence, memory barrier in essence is actually a kind of synchronous barrier instruction, added a barrier, if the barrier has read and write operations before and after the barrier also has read and write operations, the barrier before the read and write operations must must prior to read and write operations after the barrier, The read/write behind the barrier must also follow the read/write behind the barrier.

Processor memory barrier

  • Read memory barrier

    Ensure that read operations that precede the barrier are followed by read operations that are later than the barrier

  • Write memory barrier

    Ensures that writes that precede the barrier are followed by writes that are later than the barrier

  • Full memory barriers

    Ensure that read and write operations prior to the barrier are followed by read and write operations later than the barrier

Let’s start with a few semantic instructions, because we’ll find them in JVM implementations later, so I’ll briefly explain them

Acquire: Commands in front of the barrier will not be queued behind the barrier

Release: Instructions behind the barrier are not emitted to the barrier

Instructions in front of the fence do not queue behind the fence, and instructions behind the fence do not queue behind the fence

Memory barriers for the JVM

LoadLoad read barrier: for example, there are Load1 and Load2 instructions, then if the insert barrier instruction is Load1; LoadLoad; Load2: Insert a LoadLoad barrier in the middle to ensure that read operations will not be optimized out of order. That is, when Load2 is executed, the read operation of Load1 should be completed.

StoreStore Write barrier: For example, if there are Store1 and Store2 instructions, if the insert barrier instruction is Store1; StoreSotre; Store2. Insert StoreStore barriers in the middle to prevent write operations from being out of order. That is, when Store2 is being executed, the write operations of Store1 should be completed and the write operations of Sotre1 should be visible to Store2.

LoadStore read/write barrier: for example, there are Load1 and Store2 instructions, then if the insert barrier instruction is Load1; LoadStore; Store2: LoadStore barrier is inserted in the middle to ensure that the previous read operations and the subsequent write operations will not be optimized out of order. That is, Load1 should be completed when Store2 is executed

StoreLoad Write/read barrier: for example, there are directives Store1 and Load2. StoreLoad; Load2: The StoreLoad barrier is inserted in the middle to ensure that the previous write operations will not be optimized out of order for the subsequent read operations and is visible. That is, when Load2 is executed, Store1 should be completed and its write operations are visible to the read operations behind the barrier

If either with what kind of barrier, such as namely barrier LoadStore barrier before the read operation will read values from the main memory, after the barrier instruction will into main memory write values, such as StoreLoad barrier in front of the barrier again write operation will be to write value, main memory to read operation after the barrier, of course, it’s just writing a value into the main memory also can’t guarantee is visible, So subsequent read operations also read values from main memory

The implementation of the JVM

So, is it clear that volatile is visible to other threads? So that our modified flag can be effective in multi-threaded environment. With a very simple example, further discussion.

public class Demo{
	static volatile int i;
	public static void main(String[] args){
        i = 1; }}Copy the code

View the generated bytecode (partial snippets)

 static volatile int i;
    descriptor: I
    flags: ACC_STATIC, ACC_VOLATILE
Copy the code

You can see that there is an ACC_VOLATILE identifier on the bytecode file, and then open up the JVM (which I use with Hotspot) code to look ~

You can see in the JVM source code that there is an IS_volatile to determine whether the volatile access qualifier is qualified, and then look at the partial source code for the bytecode interpreter

In this case, we call release_int_field_put because it is an int. Finally, we insert a barrier called storeload. Let’s look at the itOS definition first

As the name suggests: represents data of type int cached at the top of the stack

Then see release_int_field_put

You’ll notice that it calls OrderAccess:: Release_store

So what does this method actually do? Notice first that the volatile keyword is added to the method argument. This is the c++ volatile keyword and Java(Java syntax has the same name, doesn’t it?). A variable decorated with this keyword is meant to be mutable, and a variable decorated with this keyword in c++ is retrieved from its memory address every time it is used and the compiler does not optimize it.

So what is OS ::atomic_copy64? This is going to be for different systems, but I’m only looking at Linux

Crudely, generating assembly code to copy values, right?

And then let’s see

Then see OrderAccess: : storeload

Please tell me do these four things look familiar? ! ? This is of course just defining the implementation of different systems, but we’re still looking at Linux

If you look at the implementation of this method, what are the other three implementations? Did this article also explain the semantics above? Move on to the implementation under Linux

What is FULL_MEM_BARRIER

For the environment is different, here is no longer specific

The generated assembly instruction

= = When I wrote this section, I was using Windows, so I need to update the source code of Windows fence implementation

Take a look at the assembly code generated on my machine

[Disassembling for mach='amd64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo'
  # parm0:    rdx:rdx   = '[Ljava/lang/String;'
  #           [sp+0x40]  (sp of caller)
  0x00000000037e5320: mov     dword ptr [rsp+0ffffffffffffa000h],eax
  0x00000000037e5327: push    rbp
  0x00000000037e5328: sub     rsp,30h
  0x00000000037e532c: mov     rsi,17cf2af8h     ;   {metadata(method data for {method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo')}
  0x00000000037e5336: mov     edi,dword ptr [rsi+0dch]
  0x00000000037e533c: add     edi,8h
  0x00000000037e533f: mov     dword ptr [rsi+0dch],edi
  0x00000000037e5345: mov     rsi,17cf2a30h     ;   {metadata({method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo')}
  0x00000000037e534f: and     edi,0h
  0x00000000037e5352: cmp     edi,0h
  0x00000000037e5355: je      37e537eh          ;*iconst_1
                                                ; - org.yuequan.thread.test.Demo::main@0 (line 6)

  0x00000000037e535b: mov     rsi,0d5b0dad0h    ;   {oop(a 'java/lang/Class' = 'org/yuequan/thread/test/Demo')}
  0x00000000037e5365: mov     edi,1h
  0x00000000037e536a: mov     dword ptr [rsi+68h],edi
  0x00000000037e536d: lock add dword ptr [rsp],0h  ;*putstatic i
                                                ; - org.yuequan.thread.test.Demo::main@1 (line 6)

  0x00000000037e5372: add     rsp,30h
  0x00000000037e5376: pop     rbp
  0x00000000037e5377: test    dword ptr [2f20100h],eax
                                                ;   {poll_return}
  0x00000000037e537d: ret
  0x00000000037e537e: mov     qword ptr [rsp+8h],rsi
  0x00000000037e5383: mov     qword ptr [rsp],0ffffffffffffffffh
  0x00000000037e538b: call    37e20a0h          ; OopMap{rdx=Oop off=112}
                                                ;*synchronization entry
                                                ; - org.yuequan.thread.test.Demo::main@-1 (line 6)
                                                ;   {runtime_call}
  0x00000000037e5390: jmp     37e535bh
  0x00000000037e5392: nop
  0x00000000037e5393: nop
  0x00000000037e5394: mov     rax,qword ptr [r15+2a8h]
  0x00000000037e539b: mov     r10,0h
  0x00000000037e53a5: mov     qword ptr [r15+2a8h],r10
  0x00000000037e53ac: mov     r10,0h
  0x00000000037e53b6: mov     qword ptr [r15+2b0h],r10
  0x00000000037e53bd: add     rsp,30h
  0x00000000037e53c1: pop     rbp
  0x00000000037e53c2: jmp     374ece0h          ;   {runtime_call}
  0x00000000037e53c7: hlt
  0x00000000037e53c8: hlt
  0x00000000037e53c9: hlt
  0x00000000037e53ca: hlt
  0x00000000037e53cb: hlt
  0x00000000037e53cc: hlt
  0x00000000037e53cd: hlt
  0x00000000037e53ce: hlt
  0x00000000037e53cf: hlt
  0x00000000037e53d0: hlt
  0x00000000037e53d1: hlt
  0x00000000037e53d2: hlt
  0x00000000037e53d3: hlt
  0x00000000037e53d4: hlt
  0x00000000037e53d5: hlt
  0x00000000037e53d6: hlt
  0x00000000037e53d7: hlt
  0x00000000037e53d8: hlt
  0x00000000037e53d9: hlt
  0x00000000037e53da: hlt
  0x00000000037e53db: hlt
  0x00000000037e53dc: hlt
  0x00000000037e53dd: hlt
  0x00000000037e53de: hlt
  0x00000000037e53df: hlt
[Exception Handler]
[Stub Code]
  0x00000000037e53e0: call    3750aa0h          ;   {no_reloc}
  0x00000000037e53e5: mov     qword ptr [rsp+0ffffffffffffffd8h],rsp
  0x00000000037e53ea: sub     rsp,80h
  0x00000000037e53f1: mov     qword ptr [rsp+78h],rax
  0x00000000037e53f6: mov     qword ptr [rsp+70h],rcx
  0x00000000037e53fb: mov     qword ptr [rsp+68h],rdx
  0x00000000037e5400: mov     qword ptr [rsp+60h],rbx
  0x00000000037e5405: mov     qword ptr [rsp+50h],rbp
  0x00000000037e540a: mov     qword ptr [rsp+48h],rsi
  0x00000000037e540f: mov     qword ptr [rsp+40h],rdi
  0x00000000037e5414: mov     qword ptr [rsp+38h],r8
  0x00000000037e5419: mov     qword ptr [rsp+30h],r9
  0x00000000037e541e: mov     qword ptr [rsp+28h],r10
  0x00000000037e5423: mov     qword ptr [rsp+20h],r11
  0x00000000037e5428: mov     qword ptr [rsp+18h],r12
  0x00000000037e542d: mov     qword ptr [rsp+10h],r13
  0x00000000037e5432: mov     qword ptr [rsp+8h],r14
  0x00000000037e5437: mov     qword ptr [rsp],r15
  0x00000000037e543b: mov     rcx,6601c4e0h     ;   {external_word}
  0x00000000037e5445: mov     rdx,37e53e5h      ;   {internal_word}
  0x00000000037e544f: mov     r8,rsp
  0x00000000037e5452: and     rsp,0fffffffffffffff0h
  0x00000000037e5456: call    65cd4510h         ;   {runtime_call}
  0x00000000037e545b: hlt
[Deopt Handler Code]
  0x00000000037e545c: mov     r10,37e545ch      ;   {section_word}
  0x00000000037e5466: push    r10
  0x00000000037e5468: jmp     3727600h          ;   {runtime_call}
  0x00000000037e546d: hlt
  0x00000000037e546e: hlt
  0x00000000037e546f: hlt
Decoding compiled method 0x00000000037e4ed0:
Code:
Argument 0 is unknown.RIP: 0x37e5020 Code size: 0x00000110
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo'
  # parm0:    rdx:rdx   = '[Ljava/lang/String;'
  #           [sp+0x40]  (sp of caller)
  0x00000000037e5020: mov     dword ptr [rsp+0ffffffffffffa000h],eax
  0x00000000037e5027: push    rbp
  0x00000000037e5028: sub     rsp,30h           ;*iconst_1
                                                ; - org.yuequan.thread.test.Demo::main@0 (line 6)

  0x00000000037e502c: mov     rsi,0d5b0dad0h    ;   {oop(a 'java/lang/Class' = 'org/yuequan/thread/test/Demo')}
  0x00000000037e5036: mov     edi,1h
  0x00000000037e503b: mov     dword ptr [rsi+68h],edi
  0x00000000037e503e: lock add dword ptr [rsp],0h  ;*putstatic i
                                                ; - org.yuequan.thread.test.Demo::main@1 (line 6)

  0x00000000037e5043: add     rsp,30h
  0x00000000037e5047: pop     rbp
  0x00000000037e5048: test    dword ptr [2f20100h],eax
                                                ;   {poll_return}
  0x00000000037e504e: ret
  0x00000000037e504f: nop
  0x00000000037e5050: nop
  0x00000000037e5051: mov     rax,qword ptr [r15+2a8h]
  0x00000000037e5058: mov     r10,0h
  0x00000000037e5062: mov     qword ptr [r15+2a8h],r10
  0x00000000037e5069: mov     r10,0h
  0x00000000037e5073: mov     qword ptr [r15+2b0h],r10
  0x00000000037e507a: add     rsp,30h
  0x00000000037e507e: pop     rbp
  0x00000000037e507f: jmp     374ece0h          ;   {runtime_call}
  0x00000000037e5084: hlt
  0x00000000037e5085: hlt
  0x00000000037e5086: hlt
  0x00000000037e5087: hlt
  0x00000000037e5088: hlt
  0x00000000037e5089: hlt
  0x00000000037e508a: hlt
  0x00000000037e508b: hlt
  0x00000000037e508c: hlt
  0x00000000037e508d: hlt
  0x00000000037e508e: hlt
  0x00000000037e508f: hlt
  0x00000000037e5090: hlt
  0x00000000037e5091: hlt
  0x00000000037e5092: hlt
  0x00000000037e5093: hlt
  0x00000000037e5094: hlt
  0x00000000037e5095: hlt
  0x00000000037e5096: hlt
  0x00000000037e5097: hlt
  0x00000000037e5098: hlt
  0x00000000037e5099: hlt
  0x00000000037e509a: hlt
  0x00000000037e509b: hlt
  0x00000000037e509c: hlt
  0x00000000037e509d: hlt
  0x00000000037e509e: hlt
  0x00000000037e509f: hlt
[Exception Handler]
[Stub Code]
  0x00000000037e50a0: call    3750aa0h          ;   {no_reloc}
  0x00000000037e50a5: mov     qword ptr [rsp+0ffffffffffffffd8h],rsp
  0x00000000037e50aa: sub     rsp,80h
  0x00000000037e50b1: mov     qword ptr [rsp+78h],rax
  0x00000000037e50b6: mov     qword ptr [rsp+70h],rcx
  0x00000000037e50bb: mov     qword ptr [rsp+68h],rdx
  0x00000000037e50c0: mov     qword ptr [rsp+60h],rbx
  0x00000000037e50c5: mov     qword ptr [rsp+50h],rbp
  0x00000000037e50ca: mov     qword ptr [rsp+48h],rsi
  0x00000000037e50cf: mov     qword ptr [rsp+40h],rdi
  0x00000000037e50d4: mov     qword ptr [rsp+38h],r8
  0x00000000037e50d9: mov     qword ptr [rsp+30h],r9
  0x00000000037e50de: mov     qword ptr [rsp+28h],r10
  0x00000000037e50e3: mov     qword ptr [rsp+20h],r11
  0x00000000037e50e8: mov     qword ptr [rsp+18h],r12
  0x00000000037e50ed: mov     qword ptr [rsp+10h],r13
  0x00000000037e50f2: mov     qword ptr [rsp+8h],r14
  0x00000000037e50f7: mov     qword ptr [rsp],r15
  0x00000000037e50fb: mov     rcx,6601c4e0h     ;   {external_word}
  0x00000000037e5105: mov     rdx,37e50a5h      ;   {internal_word}
  0x00000000037e510f: mov     r8,rsp
  0x00000000037e5112: and     rsp,0fffffffffffffff0h
  0x00000000037e5116: call    65cd4510h         ;   {runtime_call}
  0x00000000037e511b: hlt
[Deopt Handler Code]
  0x00000000037e511c: mov     r10,37e511ch      ;   {section_word}
  0x00000000037e5126: push    r10
  0x00000000037e5128: jmp     3727600h          ;   {runtime_call}
  0x00000000037e512d: hlt
  0x00000000037e512e: hlt
  0x00000000037e512f: hlt
Copy the code

What about this long? Just look at the key parts

The lock command is used. So the question is what is the lock instruction, I will explain superficial here: The CPU provides a means to lock the bus during the execution of the instruction, so the assembly generation machine code with lock makes the CPU pull down the potential of #HLOCK pin when executing the instruction, and release it at the end of the instruction, so as to lock the bus, so as to ensure the atomicity of the execution of the instruction

How to ensure visibility and order

The memory barrier is used to remind the compiler and CPU not to optimize instructions to prevent them from being executed out of order, and visibility between threads is achieved through main memory read and write before and after the barrier. (O(∩_∩)O I want to explain no more)

Why does volatile not guarantee atomicity for compound operations

For example, in multithreading, if multiple threads increment I of an instance variable, such as I ++, a race condition occurs. For example, if you increment I 5000 times, the result may be 5000 or it may be less than 5000, even though you volatile it. Remember that volatile is protected only by the mechanism of memory barriers

For example,

load1; load2; store1; store2; StoreLoad; load3; store3.....Copy the code

Although you guarantee visibility, you can’t guarantee atomicity. Atomicity essentially means that instructions are executed without interruption or without execution. If you think about it, i++ is a three-step compound operation: evaluate, add, assign, as in: You do not assign when other threads execute, other threads are also in the state of assignment, language explanation trouble to see the following example

I = 0 Thread A Thread B Value 0 Value 0 add 1 Add 1 assign 1 assign 1Copy the code

Although you guarantee visibility, you can’t guarantee that the values you get are always up to date.