Personal creation convention: I declare that all articles created are their own original, if there is any reference to any article, will be marked out, if there are omissions, welcome everyone critique. If you find online plagiarism of this article, welcome to report, and actively submit an issue to the Github warehouse, thank you for your support ~

This article is based on OpenJDK 11 and above

From the bottom of the hardware, up to a comprehensive analysis of the Java memory model design, and to each conclusion is equipped with the relevant reference papers and verification procedures, I found that there are many misunderstandings about the Java memory model over the years. And I found that many people have such a misunderstanding, so this time I will help you eliminate this misunderstanding by continuously optimizing a classic DCL (Double Check Locking) program example.

First, we want to implement a singleton that is initialized only on the first call, and that is accessed by multiple threads, so we have:

The implementation of getValue is the classic DCL notation.

Within the limitations of the Java memory model, this ValueHolder has two potential problems:

  1. If according to the definition of the Java memory model, regardless of the actual JVM implementation, it is possible for getValue to return NULL.
  2. It is possible to read a field Value that has not been initialized.

Let’s further analyze and optimize these two problems.

According to the definition of the Java memory model, regardless of the actual JVM implementation, getValue may return NULL

inThe core Java new memory model analysis and experimentThe articleCoherence vs. OpaqueGiven that some object field int x starts with 0, a thread executes:Another thread executes (r1, r2 are local variables) :

So this is actually two reads of the field (corresponding to the bytecode getField), and under the Java memory model, the possible results include:

  1. r1 = 1, r2 = 1
  2. r1 = 0, r2 = 1
  3. r1 = 1, r2 = 0
  4. r1 = 0, r2 = 0

And the third result, which is interesting, programmatically understood, is that we first saw x = 1, and then we saw x go to 0. This is actually because the compiler is out of order. Coherence is the only feature we need if we don’t want this third result. Since the private Value Value is a common field, coherence is not guaranteed according to the Java memory model.

Going back to our program, we have three reads of the field (corresponding to the bytecode getField) located at:

Because there is an obvious branching relationship between 1 and 2 (2 executes or does not execute based on the result of 1), you should execute 1 first and then 2 from any compiler’s point of view. However, for 1 and 3, there is no such dependency between them, and some simple compilers think they can be executed out of order. Under the Java memory model, there is no limit to whether the order between 1 and 3 must be out of order. So, maybe your program performs a read from 3, then a read from 1 and other logic, and finally the method returns a read from 3.

However, in OpenJDK Hotspot’s compiler environment, this is avoided. The OpenJDK Hotspot compiler is a serious compiler that produces two reads of 1 and 3 (two reads of the same field) that are interdependent and not out of order at the compiler dimension. This is not to say that there is a memory barrier and possible CPU disorder is avoided, but this is for the same field, which is only related to compiler disorder, not CPU disorder.

This is only for general programming, but we can trick the compiler into reading the data twice by using some strange notation, such as the 7.1. Coherence and Opaque section in Java’s new memory model analysis, which is the largest on the web. OpenJDK Hotspot has no compiler out-of-order for the following applications:

But you can fool the compiler by writing it like this:Let’s look at one of the results without going too far into the mechanics: We can fool the compiler by writing the DCL, but we don’t usually write it that way, so I won’t go over it here.

It is possible to read a field Value that has not been initialized

It’s not just the compiler that’s out of order, it’s also CPU instructions that are out of order and CPU caches that are out of order, requiring a memory barrier to solve the visibility problem.

Let’s start with the Value constructor:

forvalue = new Value(10);In this step, the code is broken down into more detailed and easily understood pseudocode:There is no memory barrier. According to semantic analysis, there is a dependency between 1 and 5, because 5 depends on the result of 1 and must execute 1 before 5. There is also a dependency between 2 and 3, because 3 depends on the outcome of 2. But 2 and 3 are not dependent on 4 and 5, so they can be out of order. Let’s test this out of order using code:

Although the reason for writing this code is stated in the comments, I want to emphasize the reason for writing this code:

  1. Jcstress’s @actor method uses a single thread to execute the code in this method. In tests, the code was interpreted each time with different JVM startup parameters, C1 and C2, and the compilation parameters were modified to make the code behave differently for JIT compilation. In this way we can see if there are different compiler out-of-order effects under different execution modes.
  2. Jcstress’s @actor uses a single thread to execute the code in this method, binding the @actor to a single CPU each time a test is started using a different JVM, ensuring that the method will only execute on that CPU during the test. The CPU cache is exclusive to this method’s code, which makes it easier to test for out-of-order CPU cache inconsistencies. Therefore, our @actor annotation methods need to be smaller than the number of cpus.
  3. With only two cpus on our test machine, there could only be two threads, and if they were both executing the original code, they would most likely be waiting for synchronized blocks, which themselves act as memory barriers (mentioned later). To make it easier to test for synchronized blocks that are not synchronized, we use the second @actor notation to remove the synchronized block logic directly, and if value is null, we set the result to -1 to distinguish between synchronized blocks

I tested the program on x86 and ARM CPUS, and the results were:

x86 – AMD64:

arm – aarch64:

And we can see,On more consistent cpus such as x86, we do not see uninitialized field values, but on less consistent cpus such as ARM, we see uninitialized values. In my other series -The core Java new memory model analysis and experimentWe have also mentioned the CPU out of order table several times:

In this case, the memory barrier we need is StoreStore (and as we can see from the table above, x86 does not inherently need StoreStore, as long as the compiler is not out of order, the CPU level will not be out of order, Arm requires a memory barrier to ensure that Store and Store are not out of order), as long as the barrier ensures that step 2,3 precedes step 5 and step 4 precedes step 5, what can we do? Referring to my article on the new memory model of the most hardcore Java on the web and the corresponding relationship between the various memory barriers in the experiment, we can do the following, each of which we will compare its memory barrier consumption:

1. Use the final

Final is the StoreStore barrier added at the end of the assignment statement, so we only need to add the StoreStore barrier at the end of steps 2,3 and 4 to make a2 and b final, as shown below:

Corresponding pseudocode:

Let’s test it out:

The results on ARM this time are:

As you can see, there are no uninitialized values on the ARM CPU this time either.

A1 doesn’t need to be final hereBecause we said that there are dependencies between 2 and 3, we can treat them as a whole, just add a memory barrier behind the whole. butIt’s not reliable!!!!!!!!!! Because in some JDKS you might call this code:

Optimized like this:

In this way, a1 and A2 have no dependence on !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! So it is best to set all variables to final

However, this doesn’t work when we can’t set the field to final.

2. Use volatile, which is the common and officially recommended practice

Set value to volatile in another series of my articlesThe core Java new memory model analysis and experimentWe know that volatile writes are implemented by adding the LoadStore + StoreStore barrier before writes and the StoreLoad barrier after writes. If value is set to volatile, The previous pseudocode becomes:

Let’s test with the following code:

Still tested on arm machines, the results are:

I don’t see any uninitialized values

3. For Java 9+ you can use acquire/ Release of Varhandle

Volatile is a bit heavy and can be implemented by using Varhandle’s acquire/ Release visibility API. The pseudocode becomes:

Our test code becomes:

The test results are:

I don’t see any uninitialized values either. This method uses the least memory barrier and does not restrict the use of final fields in the target type.

4. An interesting but useless idea – if the method is static, it can be easily written using the classloader mechanism

If we, the methods and fields inside ValueHolder can be static, for example:

ValueHolder as a separate class, or as an inner class, can also ensure the visibility of the fields in the Value. This is achieved through the classloader mechanism, which initializes the static fields and runs the static block code when loading the same class. Synchronized block is protected by the synchronized keyword, refer to which ClassLoader (classloader.java) source code:

ClassLoader.java

For syncrhonized lower-level counterparts monitorenter and Monitorexit, monitorenter has the same memory barrier as volatile reads, adding LoadLoad and LoadStore after operations. Monitorexit writes to the same memory barrier as Volatile, adding LoadStore + StoreStore barrier before operation and StoreLoad barrier after operation. So, visibility is guaranteed. But while this may seem easy to write, it is less efficient (much less, class loading requires more work) and less flexible, just as an extension of knowledge.

conclusion

  1. DCL is a common programming pattern that presents two types of field visibility problems for a lock protected field value:
  2. If according to the definition of the Java memory model, regardless of the actual JVM implementation, it is possible for getValue to return NULL. However, this is generally avoided by modern JVM designs and can be ignored in actual programming.
  3. It is possible to read field values of values that have not been initialized, which can be resolved by adding StoreStore memory barriers between constructor completion and assignment to variables. This can be resolved by setting the Value field to final, but it is not flexible enough.
  4. The easiest way to do this is to make the value field volatile, which is used in the JDK and officially recommended.
  5. The most efficient way to do this is to use VarHandle’s release mode, which only introduces StoreStore and LoadStore memory barriers, much less than volatile writes (without StoreLoad, There is no memory barrier for x86, because x86 naturally has LoadLoad, LoadStore, and StoreStore, x86 just can’t naturally guarantee StoreLoad.

Wechat search “Dry goods Manman Zhang Hash” concern public account, add the author’s wechat, a daily brush, easily improve skills, won a variety of offers