Synchronized and volatile both play important roles in multithreaded programming. In previous articles, we learned about Synchronized, a built-in Lock in Java that ensures visibility and atomicity during concurrency and avoids the error of sharing data. Volatile, on the other hand, is the lightweight equivalent of Synchronized, which guarantees only the visibility of shared variables. After thread A modifies the shared variable that is volatile, thread B reads the correct value. In the JMM article, we learned that Java has problems with instruction reordering and shared variable working memory caching when operating on shared variables in multiple threads.

Volatile is simple enough to use as a modifier, but how much work is behind it?

First we need to understand that local memory is an abstraction, including caches, read and write buffers, registers, and even compiler reordering and CPU reordering. The JVM implements special treatment for volatile in the CPU by following the JMM specification.

Underlying principle of volatile

In the computer system, the hard disk is responsible for storing data, but the data exchange speed is slow, and the CPU runs very fast. The data exchange efficiency of the hard disk directly by the CPU is very low, so the memory is generated. The data exchange between the memory and the CPU is carried out, but the memory speed is still not fast enough, which seriously slows down the overall operating efficiency. Therefore, a cache is added inside the CPU as a temporary memory of the CPU to interact with the data in memory.

  • On a single-core CPU, multiple threads run on the same CPU, share a cache, and use the same shared variables without data visibility problems
  • While multi-core CPU due to multithreading may be allocated to different cpus, in this case, when computing, there will be a CPU core calculation completed, and not synchronized back to the main memory, and other cpus can not use the latest data, and visibility problems.

By adding volatile modifier and optimizing the JVM, the CPU will fetch data from memory, store it in the cache, and then fetch data from the cache for calculation. The computed value is not immediately flushed back to main memory, and the other cpus are unaware of the change. The same variable value is used for the calculation, which results in a data error. This mechanism is similar to the relationship between main memory and working memory in the JMM we discussed earlier.

As we know, the Javac compiler compiles.java code into.class bytecode, and the JVM runs the instructions in the bytecode through the interpreter and just-in-time (JIT) compiler, translating the bytecode instructions into concrete machine code instructions that are shared variables that are volatile. Add the special machine code instruction prefix Lock XXXX to its assignment operation during translation to machine code.

public class Test{ private volatile int i=1; Public void setVar(){I =2; } thread B gets public int getVar(){return I; }}Copy the code

When executing this instruction, the Lock instruction has two functions:

  • Causes the local CPU’s cache to be written to memory
  • The above write action can also cause caches in other cpus or other kernels to be invalid.

So with such an instruction prefix, changes to volatile variables can be made visible to other cpus.

Instruction reordering

The Lock prefix prevents reordering of instructions.

From the PERSPECTIVE of JMM:

In JMM logic, when a variable is operated on and putfield is assigned, the JVM checks to see if the variable is volatile. If so, the JVM adds a memory barrier to the variable to separate it from operations before and after. This disallows operations on volatile variables and out-of-order operations.

From the art of Concurrent programming in Java: To implement the memory semantics of volatile, the compiler inserts a memory barrier into the instruction sequence to prevent reordering of a particular type of handler when it generates the bytecode. It is almost impossible for the compiler to find an optimal arrangement that minimizes the total number of insertion barriers. To this end, the JMM takes a conservative approach. The following is a JMM memory barrier insertion strategy based on a conservative policy. · Insert a StoreStore barrier before each volatile write. · Insert a StoreLoad barrier after each volatile write. · Insert a LoadLoad barrier after each volatile read. · Insert a LoadStore barrier after each volatile read.







image.png

From a CPU execution perspective:

The above memory barriers will generate the corresponding machine code with the Lock prefix at execution time (mentioned throughout). In the CPU, the execution of the program calculation is made up of CPU on the premise of not affect the logical result is assigned to different circuit to processing logic, the Lock instruction prefix refresh memory back, must be in this instruction before operation the entire calculation is done, obtain the correct results can refresh back to memory, so it also formed a memory barrier, Indicates that operations before and after operations on this variable will not be out of order and subsequent operations will not be out of order.

In summary, the implementation of volatile is simply a Lock instruction prefix.

Precautions for Use

Volatile guarantees visibility, but it does not guarantee atomicity.

Statements such as i++, the steps at execution:

  1. The value is taken from memory and placed in the CPU cache
  2. I + 1 in the CPU
  3. Stored in cache
  4. Refresh memory

This is not a simple assignment, but the CPU kernel does not see the value change until step 4 is complete. Using volatile only ensures that the value is flushed back to memory immediately after step 3, but does not guarantee atomicity between steps 2 and 3 and 4. If thread A calculates +1 and does not flush back, thread B also does +1, then the final result will be smaller than expected. Therefore, in multi-threaded operation ++, synchronized and other synchronization operations should be used to ensure atomicity.

Volatile is lighter than synchronized and only ensures visibility. For this reason, in java.util.Concurrent AQS uses volatile modified variables to mark state, implementing a flexible variety of locks, complementing the disadvantages of synchronized and other built-in locks.