The memory model

Memory model Memory model Memory model

The memory model allows the compiler to perform many important optimizations. Compiler optimizations, such as loop fusion, move statements around the program, which can affect the order of read and write operations on potentially shared variables. Changes in the read and write order can lead to competitive situations. Without a memory model, the compiler generally does not allow this optimization to be applied to multithreaded programs, or only in special cases.

The compiler simply needs to ensure that the value of the variable (possibly shared) at the synchronization barrier is the same between optimized and non-optimized code. In particular, the compiler assumes that it is safe to reorder statements in a block of code that does not contain synchronization obstacles.

Most research in the field of memory modeling revolves around:

  • Design an in-memory model that allows for compiler optimization with the greatest degree of freedom, while providing adequate guarantees for programs with no contention and (perhaps more importantly) with contention.
  • It is proved that the program optimization for this memory model is correct.

Therefore, the memory model is a rule model, within the scope of this model, gives the compiler maximum freedom to optimize the code, and at the same time, the memory model can guarantee the correctness of the program in the case of shared variables with competition/no competition. Compiler optimizations include adjusting the order of read and write operations on shared variables, also known as memory sorting.

This is also described in the Go101 article Memory Order Guarantees in Go

Many compilers (at compile time) and CPU processors (at run time) often make some optimizations by adjusting the instruction orders, so that the instruction execution orders may differ from the orders presented in code. Instruction ordering is also often called memory ordering.

Memory ordering

This is a description of the Memory ordering system, based on Wikipedia

Memory ordering describes the order of accesses to computer memory by a CPU. The term can refer either to the memory ordering generated by the compiler during compile time, or to the memory ordering generated by a CPU during runtime.

Memory sort describes the order in which the CPU accesses the computer’s memory. This term can refer either to the order of memory generated by the compiler at compile time or by the CPU at run time.

In modern microprocessors, The memory ordering characterizes the CPU’s ability to reorder memory operations — it is a type of out-of-order execution.

In modern microprocessors, memory sort is an expression of the CPU’s ability to rearrange memory – it is out of order

On most modern uniprocessors memory operations are not executed in the order specified by the program code. In single threaded programs all operations appear to have been executed in the order specified, With all out-of-order execution hidden to the programmer — however in multi-threaded environments (or when interfacing) with other hardware via memory buses) this can lead to problems. To avoid problems, memory barriers can be used in these cases.

On most modern uniprocessors, memory operations are not performed in the order specified by program code. In a single-threaded program, all operations appear to be executed in the specified order, and all out-of-order execution is hidden from the programmer — but in a multithreaded environment (or when interfacing with other hardware via a memory bus), this can cause problems. To avoid problems, memory barriers can be used in these cases.

When the compiler compiles, it generates the order in which the memory accesses are executed. Instead of executing the order specified in the program code, the compiler reorders the order in which the memory accesses are executed. The OoOE (out-of-order execution) description is consistent with the in-memory model above. The ultimate goal is to improve performance

There are two kinds of memory sort, compiler memory sort and runtime memory sort, first look at compiler memory sort

Memory sort at compile time

Jeff Preshing Memory Ordering at Compile Time

The cardinal rule of memory reordering, which is universally followed by compiler developers and CPU vendors, could be phrased as follows:

Thou shalt not modify the behavior of a single-threaded program.

That is, neither the compiler developer nor the CPU manufacturer has a principle of memory rearrangement that they should not change the behavior of a single-threaded program, that is, they should not break the logic of the original program.

Therefore, the memory rearrangement at compile time should not be a problem in a single-threaded environment.

So, in a multithreaded environment, how does the problem arise?

Take a look at go101 Memory Order Guarantees in Go:

package main import "log" import "runtime" var a string var done bool func setup() { a = "hello, world" done = true if done { log.Println(len(a)) // always 12 once printed } } func main() { go setup() for ! done { runtime.Gosched() } log.Println(a) // expected to print: hello, world }Copy the code

The behavior of the above program depends on the compiler and CPU. Different compilers or versions of the compiler, or different CPU architectures, may produce different results. For example, it is possible for the compiler to rearrange instructions in memory sorts like this:

func setup() {
	done = true
	a = "hello, world"
	if done {
		log.Println(len(a))
	}
}
Copy the code

Since the main thread has observed a rearrangement in setup, and setup and main are executed in parallel in a multi-threaded environment, in

done = true
a = "hello, world"
Copy the code

Log.println (a) may have been executed in the middle of the main thread. Therefore, the log.println (a) printed in the main thread function is unpredictable.

Therefore, we need to ensure that some lines of code in one Goroutine must be executed before some lines of code in another goroutine to keep the program correct.

In this case, because there is also a reordering of instructions by the compiler, reordering must be prevented if the code is to be executed in order.

So, how to solve this problem? How do we ensure that, on any CPU architecture, all of our interactions are executed in the order we want them to be?

We can use the Memory barrier directive. For example, in C++, we can use the volatile keyword. There is no instruction in GO that allows us to display various instructions to build memory barriers and prevent memory rearrangements at compile time.

So how does GO prevent memory rearrangements at compile time?

Runtime memory sort

Jeff Preshing has also written about Memory Barriers Are Like Source Control Operations regarding runtime Memory rearrangement

The first is that modern cpus have out-of-order execution. Second, the CPU executes instructions from memory at a much faster rate than it retrives them from memory. As a result, most of the CPU’s time is spent on data I/O rather than computation.

To this end, the design of caches between CPU and main memory, such as l1-L2-L3 level cache, and each core has its own Store buffer and Invalidate Queue

The architecture diagram of the CPU is shown below:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ CPU 0 │ │ │ CPU 1 └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ┘ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ┘ bring │ bring │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ │ ▼ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ ◀ ─ ─ ─ ┤ Store │ │ ◀ ─ ─ ─ ┤ Store │ ├ ─ ─ ─ ▶ │ Buffer │ ├ ─ ─ ─ ▶ │ Buffer │ │ └ ─ ─ ─ ─ ┬ ─ ─ ─ ┘ │ └ ─ ─ ─ ┬ ─ ─ ─ ─ ┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ │ ▼ ┌ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ Cache │ │ Cache │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ┘ │ │ │ │ │ │ ┌ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┐ │ Invalidate │ │ Invalidate │ │ Queue │ │ Queue │ └ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┘ │ │ │ Interconnect │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ │ │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ Memory │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

Their purpose is to improve the efficiency of CPU execution and maximize CPU performance.

With caches and out-of-order CPU execution (OoOE), data consistency across multiple CPU caches becomes a problem to solve. And this problem can not be solved in the software level, need to be supported in the hardware level.

Such as:

thread 1      |     thread 2 
A = 1         |     B = 1 
print B       |     print A
Copy the code

Because of the existence of Store Buffer, both threads failed to obtain the latest data in time when they went to obtain data from main memory, resulting in print 00. Cache coherence CC, for example, is an implementation of Intel’s MESI protocol.

However, while the CPU can support cache consistency, it does not know when optimization is allowed and when it is not. So the CPU leaves the task to the writer of the code, and therefore the writer of the code has to go through Memory Barriers to achieve it.

A memory barrier, also known as a membar, memory fence, or fence instruction, is a barrier instruction that causes the central processing unit (CPU) or compiler to impose sorting constraints on memory operations issued before and after a barrier instruction. This usually means that actions issued before a barrier are guaranteed to be performed before actions issued after a barrier.

In conclusion, memory barriers address not only compile-time memory rearrangements, but also run-time memory rearrangements.

Memory model of GO

  • The Go memory model specifies the conditions under which reads of a variable in one goroutine can be guaranteed to observe values produced by writes to the same variable in a different goroutine.
  • The Go memory model specifies conditions to ensure that a goroutine writes to the same variable can be observed in another Goroutine.

Golang also introduces the happens-before primitive:

  • If event e1 happens before event e2, then we say that e2 happens after e1. Also, if e1 does not happen before e2 and does not happen after e2, then we say that e1 and e2 happen concurrently.
  • If event E1 occurs before event E2, then we say that e2 occurs after event E1. Similarly, if E1 does not precede e2 and does not follow E2, then we say that E1 and E2 occur at the same time.

Based on the happens-before primitive, Golang defines two sets of conditions to indicate whether shared variables in one Goroutine are visible in another

The first set of conditions:

When the following two conditions are met, the read operation r of variable v is allowed to monitor the write operation W of v

  • R does not happen before W
  • There is no other write w ‘to v that happens after W but before r. There is no other write w ‘to v that happens after W but before r.

The second set of conditions:

To ensure that read operation R on variable v can detect a particular write operation W on v, you need to ensure that w is the only write operation r is allowed to see. That is, when the following conditions are met, r can ensure that W can be monitored

  • W happens before R. — W happens before R
  • Any other write to the shared variable v either happens before w or after r

Within a single Goroutine, both sets of conditions are defined the same. But if the second set of conditions is more stringent in a multi-Goroutine environment, why is the second set more stringent?

Based on the definition of happens before, we can draw three timeline situations of happens before:

e1 happens before e2 
----e1-----e2----->

e1 happens after e2
----e2-----e1----->

e1 and e2 happen concurrently
------e1e2-------->
Copy the code

From this graph, it is easy to know that neither Condition 1 nor Condition 2 in the first group of conditions concurrently contains happen concurrently. That is:

  • R does not precede W, thenR can happen after W, and H and W can also happen concurrently
  • There is no other write to v w prime after w and before r, thenW 'can happen after W, and also can happen concurrently WITH W; W 'can also happen before R, and can also happen CONCURRENTLY WITH R

The second group of conditions limits either happens before or happens after, ruling out the condition of happen concurrently.

Therefore, allowed to observe is used in the first group of conditions, because if happen concurrently, THE W operation of V may be concurrently observed or not.

And the second set of conditions uses guarantee… This word, because it excludes happens CONCURRENTLY, is guaranteed to see writes to V

In the memory model of Go, there’s another suggestion

Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.

Programs that modify data accessed simultaneously by multiple Goroutines must serialize this access.

To serialize access, protect the data with channel operations or other synchronization primitives such as those in the sync and sync/atomic packages.

To serialize access, secure data using channel operations or other synchronization primitives such as sync and primitives in the SYNC /atomic package.

We can see that go does not provide instructions such as volatile in C++ or Java to prevent memory rearrangements. Instead, go recommends channels and synchronization primitives. Why? Memory Order Guarantees in Go in the go101 article:

The design philosophy of Go is to use as fewer features as possible to support as more use cases as possible, at the same time to ensure a good enough overall code execution efficiency. So Go built-in and standard packages don’t provide direct ways to use the CPU fence instructions. In fact, CPU fence instructions are used in implementing all kinds of synchronization techniques supported in Go. So, we should use these synchronization techniques to ensure expected code execution orders.

Go is designed to support as many use cases as possible with as few features as possible, while maintaining good enough overall code execution efficiency. Therefore, the Go built-in and standard packages do not provide a direct way to use CPU fence instructions. In fact, the CPU-fence directive is used to implement the various synchronization technologies supported by Go. Therefore, we should use these synchronization techniques to ensure the expected order of code execution.

Therefore, go does not have Volatile modifiers like C++ and Java, and does not provide very low-level synchronization primitives such as LoadLoad, StoreLoad, or directly call assembly instructions. Instead, go encapsulates higher-level synchronization methods such as channel and atomic.

How are the atomic and channel underlayers of GO implemented?

Can take a look at the source code, refer to:

// runtime/chan.go func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool { if c == nil { if ! block { return false } gopark(nil, nil, waitReasonChanSendNilChan, traceEvGoStop, 2) throw("unreachable") } ........... lock(&c.lock) // <---------------------------- here if c.closed ! = 0 { unlock(&c.lock) panic(plainError("send on closed channel")) } if sg := c.recvq.dequeue(); sg ! = nil { // Found a waiting receiver. We pass the value we want to send // directly to the receiver, bypassing the channel buffer (if any). send(c, sg, ep, func() { unlock(&c.lock) }, 3) return true } ................. } // Runtime internal atomic TEXT Runtime/internal/atomic·Cas64(SB), NOSPLIT, $0-25 MOVQ PTR +0(FP), BX MOVQ old+8(FP), AX MOVQ new+16(FP), CX LOCK CMPXCHGQ CX, 0(BX) SETEQ ret+24(FP) RET runtime/internal/Atomic ·Casuintptr(SB), NOSPLIT, $0 to 25 JMP runtime/internal/atomic · Cas64 (SB.)Copy the code

The go channel is also implemented by locking. The lock() method actually calls the lock instruction in plan9. Lock actually implies a barrier function, before the lock command is executed, incomplete read and write operations will be completed.

Understanding of some terms

Sequential consistency (SC)

Sequential consistency is one of the consistency models used in concurrent computing (such as distributed shared memory, distributed transactions, etc.).

The result of any execution is the same, just as all processor operations are executed in some order, each processor’s operations appear in the order specified by its program

That is:

  • Instructions within each thread are executed in program order (single thread perspective)
  • The staggered order in which threads execute can be arbitrary, but the overall execution order of the entire program as seen by all threads is the same (program perspective)

Cache Consistency (CC)

Cache-coherence Protocols CC is a synchronization protocol between caches that guarantees that a read to an address will return the most recent value of that address, possibly the most recent value written into the CPU core on which the thread is located. It could be the latest value that a thread on another CPU core just wrote in. A common example is Intel’s MESI cache consistency protocol.

Cache consistency protocols are typically implemented by hardware, triggered by memory barriers

Out-of-order execution (OoOE)

Out-of-order execution

In computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units (cpus) to take advantage of instruction cycles that would otherwise be wasted. In this paradigm, the processor executes instructions in an order controlled by the availability of input data and execution units, rather than in their original order in the program. By doing this, the processor can avoid idling while waiting for the previous instruction to complete, and at the same time process the next instruction that can immediately run independently. Obviously, the unaffected semantics can only guarantee explicit causality between instructions, but not implicit causality. That is, there is no guarantee that semantically unrelated but logically related sequences of operations will be executed in order.

Memory barriers

Memory barriers

A memory barrier, also known as a membar, memory fence, or fence instruction, is a barrier instruction that causes the central processing unit (CPU) or compiler to impose sorting constraints on memory operations issued before and after a barrier instruction. This usually means that actions issued before a barrier are guaranteed to be performed before actions issued after a barrier. Memory hurdles are necessary because most modern cpus employ performance optimizations that can result in out-of-order execution. This reordering of memory operations (load and store) is usually not noticed in a single thread of execution, but can lead to unpredictable behavior in concurrent programs and device drivers unless carefully controlled. The exact nature of sort constraints depends on the hardware and is defined by the architecture’s in-memory sort model. Some architectures provide multiple barriers to enforce different sorting constraints. Memory barriers are commonly used when implementing low-level machine code that runs on memory shared by multiple devices. This code includes synchronization primitives and lock-free data structures on multiprocessor systems, as well as device drivers that communicate with computer hardware.

Memory barriers solve two problems:

  • Prevents reordering of instructions on both sides of the barrier
  • Force write buffer/cache dirty data etc. to write back to main memory, invalidate the corresponding data in the cache.

The msci agreement

The msci agreement

Critical section

Critical section In concurrent programming, concurrent access to a shared resource can lead to unexpected or erroneous behavior, so the parts of the program that access the shared resource need to be protected from concurrent access. This protected part is a critical part or critical area. Cannot be executed by more than one process at a time. Often, critical parts access shared resources, such as data structures, peripherals, or network connections, that do not function properly in the context of multiple concurrent access.

reference

Preshing.com/20120612/an…

Preshing.com/20120625/me…

www.codedump.info/post/201912…

www.parallellabs.com/2010/03/06/…

www.huaweicloud.com/articles/0f…

Go101.org/article/con…

Go101.org/article/mem…

Mp.weixin.qq.com/s/viQp36FeM…

www.youtube.com/watch?v=Vmr…

timelife.me/

yunwang.github.io/volatile/

En.wikipedia.org/wiki/Memory…

En.wikipedia.org/wiki/Sequen…

En.wikipedia.org/wiki/Memory…

En.wikipedia.org/wiki/Synchr…

En.wikipedia.org/wiki/Critic…

En.wikipedia.org/wiki/Barrie…

En.wikipedia.org/wiki/Out-of…

En.wikipedia.org/wiki/Cache_…

En.wikipedia.org/wiki/Memory…

En.wikipedia.org/wiki/Linear…

En.wikipedia.org/wiki/Optimi…