The Java memory model and its atomicity, visibility, and orderliness

🎓 Do your best and obey the destiny. I am a postgraduate student in Southeast University and a summer intern in Java background development in Ctrip. I love fitness and basketball, and I am willing to share what I have seen and gained related to technology. I follow the public account @flying Veal and get the update of the article as soon as possible

🎁 This article has been included in the “CS-Wiki” Gitee official recommended project, has accumulated 1.6K + STAR, is committed to creating a perfect back-end knowledge system, in the road of technology to avoid detours, welcome friends to come to exchange and study

🍉 If you do not have a good project, you can refer to a project I wrote “Open source community system Echo” Gitee official recommended project, so far has accumulated 700+ STAR, SpringBoot + MyBatis + Redis + Kafka + Elasticsearch + Spring Security +… And provide detailed development documents and supporting tutorials. Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo

The title “Yongbu Qianli” is mainly to highlight the basic and importance of this article (dog head), concurrent programming this piece of knowledge is really mainly around the JMM and three properties to launch.

The text is as follows:

1) Why learn concurrent programming?

2) Why do you need concurrent programming?

3) Introduction to the Java memory model

4) Explain the three properties of the Java memory model (atomicity, visibility, orderliness), which are also three important indicators to judge thread safety. Taking atomicity as an example, the general logic is as follows:

What is atomicity
What’s the problem with not being atomic
How do you guarantee atomicity

Why learn concurrent programming

To the question “Why should we learn concurrent programming?” The question is, “Why do we study politics?” Similarly, we (at least as the students’ party) rarely touch, then recite a bunch of “correct and great nonsense”, and eventually become eight and quickly forgotten.

It was only when I began to dig deeper into this piece of knowledge rather than blindly reciting it that I realized that it was true and great, but not nonsense.

Despite all the underlying principle of concurrent programming and its vast knowledge system is easy to make people’s fear, but the Java language and Java virtual machine provides a considerable amount of concurrent tool, hiding many concurrent threads details for us, so that you can pay more attention to when we coding business logic, lowered the threshold of the concurrent programming a lot.

But no matter how advanced languages, middleware, and frameworks are, we shouldn’t rely on them to do everything concurrency can do. Understanding the underside of concurrency and learning the ideas behind it is still essential to becoming an advanced programmer.

I think the above paragraph may answer the question “Why should we learn concurrent programming?” That’s the question.

Why concurrent programming

I don’t know if you have heard of Moore’s Law, which is known as the first law of computer. It is the experience summed up by Intel co-founder Gordon Moore’s long-term observation. Although it is not a strictly derived truth, it is still firmly believed at least until now. The core idea is that, in layman’s terms, processor performance doubles every two years. Looks like crap.

In fact, the pace of multicore CPU development is supporting Moore’s Law. Under the background of The Times, concurrent programming has become a prairie fire. Through the form of concurrent programming, the computing capacity of multi-core CPU is brought into full play and the performance is improved.

For example, in today’s ragnarok image processing field, many image processing algorithms, after the initial coding and debugging is correct, actually still need to carry out a long optimization process. Because even though some algorithms work well, if the computation is too time-consuming, they can’t be integrated into the product for users to use.

For a 1000 x 800 resolution image, our original processing idea is to start from the first pixel and iterate to the last pixel. In the face of such a large and complex amount of computation, the most direct and easiest way to improve the performance of the algorithm is to make full use of the computing power of multi-core CPU based on multi-threading.

We can divide the entire image into several blocks. For example, if we have an 8-core CPU, we can divide the entire image into 8 blocks, and each image size is 1000 * 100 pixels. We can create 8 threads, each thread processes one image block, and each CPU is allocated to execute one thread. In this way, the speed of operation will be significantly improved.

Of course, this is not a scary fourfold increase in speed, as thread creation and release and context switching are both costly.

Here’s an excerpt from the book “The Art of Concurrent Programming in Java” to answer the question, why do we need concurrent threads?

The arrival of multi-core CPU era breaks the limitation of single-core CPU to multi-thread efficiency. Multiple cpus mean that each thread can run on its own CPU, which reduces the overhead of thread context switching, but as the performance and throughput demands on applications increase, there is a need to handle massive amounts of data and requests, which make highly concurrent programming imperative.

As for the reasons for the popularity of multi-core cpus, the book “Understanding the Java Virtual Machine – 3rd edition” also mentioned, here I slightly modified excerpted below:

Multitasking has become an almost obligatory feature in modern computer operating systems. In many situations, let the computer to do several things at the same time, not only because the computer operation ability strong, more important is the speed of the computer and its storage and speed of communication subsystem of the gap is too big, so the CPU has to spend a lot of time waiting for other resources, such as disk I/O, network communication and database access, etc.

To do this, we have to use some means to “squeeze” the processing power out of the processor, otherwise there will be a lot of wasted performance, but the most obvious and proven effective method of “squeezing” the computer to do several tasks at once.

In addition to taking full advantage of the power of the computer processor, a single server serving multiple clients at the same time is a more specific concurrency scenario.

Get inspired by physical machines

In fact, there are a lot of similarities between physical and virtual machine concurrency problems, and the physical machine to deal with the problem of virtual machine implementation has considerable reference significance, therefore, we need to learn the physical machine to deal with the problem of the method.

As mentioned above, one of the main reasons you can use concurrent programming to maximize CPU resources is that the computer’s storage is orders of magnitude faster than the CPU, so the CPU has to spend a lot of time waiting for other resources.

This is at the software level, while at the hardware level, modern computer systems buffer memory with one or more layers of caches that read and write as fast as possible to the CPU.

Copying the data needed for an operation to the cache allows the operation to proceed quickly and then synchronizing it back from the cache when the operation is complete so that the processor does not have to wait for slow memory reads and writes.

This inevitably introduces a new problem: Cache Coherence.

This means that when multiple CPU tasks involve the same main memory area, it may cause their cache data to be inconsistent. If this happens, whose cached data should be used when synchronizing back to main memory?

To solve the problem of consistency, it is necessary for each CPU to follow some protocol when accessing the cache, and to operate according to the protocol when reading and writing data. Thus, we introduced the concept of the memory model.

At the physical machine level, the memory model can be understood as a process abstraction of read and write access to a specific memory or cache under a specific operating protocol.

Obviously, physical machines with different architectures can have different Memory models, and Java virtual machines have their own Memory Model, called the Java Memory Model (JMM), which is designed to mask differences in Memory access between hardware and operating systems. In order to achieve Java programs in a variety of platforms can achieve consistent memory access effect.

Of course, the JMM is highly analogous to the memory model of the physical machine we describe here.

Java memory model

The JMM specifies that all variables are stored in Main Memory, and that each thread has its own Working Memory.

The working memory of a thread holds the main memory copy of variables used by the thread. All operations (reads, assignments, etc.) on variables must be performed by the thread in the working memory, instead of reading or writing data directly from the main memory.

The main memory here can be compared to the physical machine’s main memory, of course, it is actually only a part of the virtual machine memory, working memory can be compared to the cache.

The Art of Concurrent Programming in Java refers to “working Memory” as “Local Memory.” “Working memory” is the notation used in the book, Understanding the Java Virtual Machine in Depth, 3rd Edition.

It includes instance fields, static fields, and elements that make up array objects, but it does not include local variables and method parameters. These two variables are private to the thread and will not be shared. Just keep it in the loop. Don’t go too far.

atomic

What is atomicity

Similar to physical machines, where there is a cache consistency protocol that dictates the logic of operation between main memory and cache, is there a specific protocol for interaction between main memory and working memory in the JMM?

Of Course! The following eight operations are defined in the JMM to accomplish implementation details such as copying a variable from main memory to working memory and synchronizing it from working memory back to main memory. Java virtual machine implementations must ensure that each of the operations described below is atomic and non-separable.

Let’s put aside for a moment what the eight operations are, and let’s talk about what is an atom?

An atom means “the smallest particle that cannot be further divided”, while an atomic operation means “an operation or series of operations that cannot be interrupted”.

Take A simple classic example, bank transfer, where PERSON A transfers 100 yuan to person B. The transfer operation actually involves two discrete steps:

Step 1: Subtract 100 from account A
Step 2: Add 100 to account B

We require the transfer operation to be atomic, meaning that steps 1 and 2 are executed sequentially and without interruption, either successfully or unsuccessfully.

Imagine what would happen if the transfer operation were not atomic.

For example, if step 1 is successfully executed but Step 2 is not executed or fails to be executed, account A loses 100 but account B does not have 100 more.

In this case, the atomistic transfer operation should be that if step 2 fails, the entire transfer operation fails and Step 1 is rolled back without reducing account A by 100.

OK, now that we understand the concept of atomicity, let’s look at the eight atomic operations defined by the JMM. There is no need to memorize them:

lockLock: variable acting on main memory that identifies a variable as a thread-exclusive state.
unlockUnlock: A variable applied to main memory that frees a locked variable before it can be locked by another thread.
read(Read) : a variable acting on main memory that transfers the value of a variable from main memory to the thread’s working memory for subsequent load action.
load(Load) : a variable operating on working memory that puts the value of the variable obtained from main memory by the read operation into a copy of the variable in working memory.
use(use) : variable applied to working memory that passes the value of a variable in working memory to the execution engine whenever the virtual machine reaches a bytecode instruction that needs to use the value of the variable.
assignAssignment: a working memory variable that assigns a value received from the execution engine to the working memory variable. This operation is performed whenever the virtual machine accesses a bytecode instruction that assigns a value to the variable.
store(Storage) : a variable operating on working memory that passes the value of a variable in working memory to main memory for subsequent write operations.
write(write) : a variable that operates on main memory and places the value of a variable in main memory obtained by the store operation from working memory

In fact, the load, store, read, and write operations on some platforms allow exceptions for variables of type double and long, called “nonatomic agreements for long and double,” but they generally don’t require our attention and won’t be repeated here.

These eight operations are certainly not to be taken lightly, and in order to ensure that memory access operations in Java programs remain thread-safe under concurrent conditions, the JMM specifies a set of rules that must be met when performing these eight basic operations.

I won’t list them all, but the reason for mentioning them more is that the following will cover some of the rules, so in order to prevent you from getting confused, it’s better to say it in advance.

We used the transfer example above, so what problems might non-atomic operations cause in specific code?

If two threads increment and decrement a static variable with an initial value of 0 5000 times each, will the result be 0?

We can’t guarantee the consistency of the result of this code execution. It could be a positive number, a negative number, or of course 0.

So, we call this piece of code thread-unsafe, which means that a piece of code that works in a single-threaded environment can happen in a multi-threaded environment and not get the right result.

It may be easier to understand the concept of thread insecurity in reverse from a thread-safe perspective by referring to the above statement in The Java Concurrent Programming Practice:

A piece of code that can be accessed by multiple threads and still behave correctly is thread-safe.

The reason this code thread is not safe is that Java’s static increment and decrement operations are not atomic operations, but actually contain three discrete operations:

Step 1: Read the current value of I
Step 2: Add the value of I by 1 (minus 1)
Step 3: Write back the new value

You can see that this is a read-change-write operation.

Taking the I ++ operation as an example, let’s look at its corresponding bytecode instruction:

The bytecode corresponding to the code above looks like this:

A brief explanation of what these bytecode instructions mean:

getstatic i: gets the value of static variable I
iconst_1: Prepare constant 1
iadd: Autoincrement (the autosubtract operation corresponds to isUB)
putstatic i: stores the modified value into the static variable I

If you increment 5000 times and then decrement 5000 times in a single-threaded environment, of course nothing would happen.

However, in a multi-threaded environment, due to CPU time slice scheduling, Thread1 may be performing an increment operation, and the CPU is depriving it of resources and allocating them to Thread2, which is a thread context switch. As a result, what should have been a continuous read rewrite (three consecutive steps) could have been interrupted.

Here’s what happens when the result ends up being negative:

In summary, if multiple CPU to read a Shared variable at the same time – to – write operations, then the Shared variables will be multiple CPU processing at the same time, due to reasons such as CPU time slice scheduling, a thread to read – to – write operations may be interrupted by other threads, lead to after the operation the value of the Shared variables and is not the same as we expect.

By the way, in addition to adding and subtracting, our common I = j operation is also nonatomic, and consists of two discrete steps:

Step 1: Read the value of j
Step 2: Assign the value of j to I

How do you guarantee atomicity

So, how do you do atomic operations, that is, how do you guarantee atomicity?

For this problem, in fact, at the level of the processor and Java programming language, they provide some effective measures, such as the processor provides the bus lock and cache lock, Java provides the lock and loop CAS way, here we briefly explain Java to ensure atomicity measures.

Atomic variable operations directly guaranteed by the Java memory model include read, load, assign, use, Store, and write. Access, read and write to basic data types are atomic (the exception is the non-atomic protocol for long and double, so you don’t have to worry too much about exceptions that rarely happen).

If the application scenario requires a broader atomicity guarantee, the Java memory model also provides lock and UNLOCK operations to meet this requirement.

Although the JVM does not expose lock and unlock operations directly to the user, it does provide higher-level bytecode instructions Monitorenter and Monitorexit to implicitly use them. These two bytecode instructions are reflected in Java code as synchronized blocks — the synchronized keyword, and thus the atomicity of operations between synchronized blocks.

And in addition to the Lock on the level of the synchronized keyword the Java language, juc and contract awarding the Java. Util. Concurrent. The locks. The Lock interface library also provides some level of locks, such as already.

In addition, with the development of hardware instruction sets, CAS operations based on the CMPXCHG instruction began to be used in Java class libraries after JDK 5. This operation is provided by wrapping several methods, such as compareAndSwapInt() and compareAndSwapLong(), in the sun.misc.Unsafe class. Prior to JDK 9, however, the Unsafe class was not open to consumers, only Java libraries, such as the integer atom class in the JUc package, Methods such as compareAndSet() and getAndIncrement() are implemented using the CAS operation of the Unsafe class.

Code that uses this CAS measure is also often referred to as lock-free programming.

visibility

What is visibility

Back to the physical machine, as mentioned earlier, the introduction of caching inevitably introduces a new problem: cache consistency. This problem also exists in the Java virtual machine in the form of a delay in synchronization between the working memory and main memory, which is a memory visibility problem.

What is visibility? This means that when one thread changes the value of a shared variable, other threads are immediately aware of the change.

Review the Java memory model:

From the above figure, if thread A and thread B want to communicate, they must go through the following two steps:

1) Thread A flusher the updated shared variables from working memory A to main memory
2) Thread B goes to main memory to read the shared variable that thread A has updated before

That is, thread A must pass through main memory in order to communicate with thread B.

Here’s a possible problem. For a simple example, look at this code:

// Thread 1 executes the code
int i = 0;
i = 1;
// Thread 2 executes the code
j = i;
Copy the code

When thread 1 executes the statement I = 1, it reads the initial value of I from main memory, loads it into thread 1’s working memory, and assigns it a value of 1. At this point, the value of I in thread 1’s working memory becomes 1, but it has not yet written to main memory.

If thread 2 executes j = I while thread 1 is about to write the new value of I back to main memory, it will fetch the value of I from main memory and load it into thread 2’s working memory, where the value of I is still 0, so j will be 0 instead of 1.

This is a memory visibility problem. Thread 1 changes the value of the shared variable I, and thread 2 is not immediately aware of the change.

How do you guarantee visibility

In addition to the volatile keyword, which you might think of as volatile, sunchronized and final also guarantee visibility.

As I mentioned above, in order to ensure that memory access operations in Java programs are thread-safe under concurrent conditions, the JMM specifies a set of rules that must be met when performing eight basic atomic operations. One of these rules is supported by sychronized atomicity, as follows:

Before performing an unlock operation on a variable, the variable must be synchronized back to main memory (store, write)

In other words, synchronized, after modifying variables in the working memory, will refresh the modified contents of the working memory into the main memory before unlocking, ensuring that the value of the shared variable is the latest and thus ensuring visibility.

As for the visibility of the final keyword, it needs to be combined with its memory semantics. Here is a brief summary: Once a field modified by final is initialized in the constructor, and the constructor does not pass a reference to this, the value of the final field is visible in other threads.

order

What is order

OK, visibility, and we go back to the physical machine, in fact, in addition to increasing cache, in order to make the CPU computing unit can be make full use of as far as possible, within the CPU may out-of-order execution of input code are optimized, the CPU can be carried after calculating the out-of-order restructuring as a result, ensure that the result is consistent with the result of sequential, However, it is not guaranteed that the sequence of each statement in the program is the same as that in the input code. Therefore, if there is a calculation task that depends on the intermediate results of another calculation task, the sequence cannot be guaranteed by the sequence of the code.

Similarly, Java compilers have an optimization tool called Instruction Reorder.

So, given that performance can be optimized, can reordering be used without limitation?

When reordering, both the CPU and the compiler need to follow the as-if-serial semantics: no matter how reordering is done, the execution of a program in a single-threaded environment cannot be changed.

To comply with the as-IF-serial semantics, the CPU and compiler do not reorder operations that have data dependencies because such reordering changes the execution result.

Here again, we introduce the concept of “data dependency”.

If two operations access the same variable, and one of them is a write operation, there is a data dependency between the two operations.

There are three types of data dependencies: write after read, write after write, and read after write, as shown in the figure below

In the above three cases, the execution result of the program will be changed by reordering the execution order of the two operations.

In fact, when you think about data dependencies, you can draw a picture of it. Here’s an example:

int a = 1;		 // A
int b = 2;		 // B
int sum = a + b; // C
Copy the code

The data dependencies of the above three operations are shown in the figure below:

As can be seen, there are data dependencies between A and C, and between B and C, so C cannot be reordered before A or B in the final instruction sequence executed. But there is no data dependency between A and B, so the CPU and processor can reorder the execution order between A and B. Here are the two execution sequences of the program:

It doesn’t seem to be a problem. The reordering process does not change the results and improves performance.

Unfortunately, however, the data dependencies we are talking about here are only for sequences of instructions executed in a single CPU and operations performed in a single thread. Data dependencies between different cpus and between different threads are not considered by the CPU and compiler.

That’s why I put “single thread” in bold when I write as-if-serial.

Take a look at this code:

Suppose you have two threads, A and B, with A first executing writer() and then THREAD B executing Reader (). Can thread B perform operation 4 and see thread A change the shared variable A to 1 in operation 1?

The answer is not necessarily.

Since operations 1 and 2 have no data dependencies, the CPU and compiler can reorder these two operations; Similarly, operations 3 and 4 have no data dependencies, and the compiler and processor can reorder these two operations.

What might be the effect of reordering operations 1 and 2, for example?

As shown on the right in the figure above, thread A first writes the flag variable, which thread B then reads. Since the condition is judged true, thread B will read variable A. At this point, variable A has not yet been written by thread A, so thread B still reads the value of A as 0. This is where the semantics of multithreaded programs are broken by reordering.

Thus, we can conclude that the CPU and The Java compiler spontaneously reorder instruction sequences to optimize program performance. In multithreaded environment, reordering may lead to errors in program running results.

Given the concept of reordering, we can summarize the natural orderliness of Java programs as follows:

If you look inside this thread, all operations are ordered (in short, serial within the thread)
If you observe another thread in one thread, all operations are out of order (this disorder is mainly referred to as “instruction reorder” phenomenon and “working memory and main memory synchronization delay” phenomenon)

How to ensure order

The Java language provides the keywords volatile and synchronized to ensure order between threads.

Volatile inherently guarantees order because it contains the semantics of prohibiting instruction reordering in addition to ensuring visibility.

The theoretical underpinning of synchronized order is still provided by one of the set of rules that the JMM prescribes must be met in order to perform eight basic atomic operations:

Only one thread can lock a variable at a time

This rule determines that two synchronized blocks holding the same lock can only be entered serially.

It’s not hard to understand that synchronized, in layman’s terms, uses exclusive locking to ensure that synchronized modified code executes single-threaded at the same time. Therefore, this satisfies a key premise of as-if-serial semantics, that is, single thread, thus, with the guarantee of as-IF-serial semantics, single thread order is guaranteed.

Happens-before principle

Happens-before is the soul of the JMM and is a very useful means of determining whether data is competing and threads are safe. For the integrity of the knowledge system, it is briefly mentioned here and will be explained in detail in subsequent articles.

If all the ordering in the Java memory model was done by volatile and synchronized alone, a lot of operations would be very verbose, but we don’t realize this when we write Java concurrent code. This is thanks to the happens-before principle.

Relying on this principle, we can quickly resolve all the questions about whether two operations might collide in a concurrent environment with a few simple rules, without getting bogged down in the bitter definitions of the Java memory model.

References

The Art of Concurrent Programming in Java
“Understanding the Java Virtual Machine in Depth – Version 3”

| flying veal 🎉 pay close attention to the public, get updates immediately

I am a postgraduate student in Southeast University and a summer intern in Java background development of Ctrip. I run a public account “Flying Veal” in my spare time, which was opened on 2020/12/29. Focus on sharing computer fundamentals (data structure + algorithm + computer network + database + operating system + Linux), Java technology stack and other related original technology good articles. The purpose of this public account is to let you can quickly grasp the key knowledge, targeted. Pay attention to the public number for the first time to get the article update, we progress together on the way to growth
And recommend personal maintenance of the open source tutorial project: CS-Wiki (Gitee recommended project, has accumulated 1.6K + STAR), is committed to creating a perfect back-end knowledge system, in the road of technology to avoid detdettions, welcome friends to come to exchange learning ~ 😊
If you don’t have any outstanding projects, you can refer to the Gitee official recommended project of “Open Source Community System Echo” written by me, which has accumulated 700+ star so far. SpringBoot + MyBatis + Redis + Kafka + Elasticsearch + Spring Security +… And provide detailed development documents and supporting tutorials. Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo