Long time no see. How are you? It has been more than a month since the last article, and I have not updated the article yet. I am also very anxious.

For your information, I’ve been looking at the source code for proc.go, which is actually the source code for the scheduler. There are thousands of lines of code, unlike maps, channels and so on. Trying to make sense of all this code is a huge undertaking. As of today, I can’t say I see it all clearly.

If you have to dig deep, it’s endless, so that’s the end of the exploration phase. I’ll share my explorations of this time.

In fact, today’s post is just a prelude to a series of ten articles. The directory is as follows:

This series of articles is mainly inspired by the public account “Go language core programming technology”, it has a go scheduler series tutorial, written very good, strongly recommend everyone to read, will often refer to its articles. I can’t help pasting the qr code of the public number here, so I must pay attention to it. This is a treasure I found in the process of looking for information. I wanted to keep it for myself, but I still want to share good things with everyone. I can’t be complacent.

Let’s get started on today’s topic.

A month ago, Chai Shushan, the author of Advanced Programming of Go language, published an article on CSDN titled “Go Language has been established for ten years, AND Go2 is ready for development” with a very grand perspective. We have to look down at the road, and sometimes we have to look up at the sky.

The article mentions hu Wen Go, the first novel about Go. I looked it up. It was funny, laughing and raving. There is a sentence in the book:

In Go language, Go Func is the unit of concurrency, chan is the mechanism of coordinating the unit of concurrency, Panic and Recover are the mechanism of error handling, and defer is the stroke of genius, which greatly simplifies the management of errors.

Goroutines simultaneously perform functions independently in the same user space, while channels are used for communication and synchronous access control between Goroutines.

In the last article we talked about channels and mentioned that Goroutines and channels are the two cornerstones of concurrent programming for Go. This article will focus on Goroutines and the Go Scheduler that schedules Goroutines.

Front knowledge

os scheduler

From an operating system perspective, we write programs that eventually translate into a series of machine instructions, and the machine is done as long as it executes all the instructions in sequence. The entity that performs the sequential instruction task is the thread, that is, the thread is the entity that the CPU schedules, and the thread is the entity that actually executes the instructions on the CPU.

When each program starts, an initial process is created and a thread is started. Threads can create more threads, which can execute independently, with the CPU scheduling at this layer, not the process.

OS Scheduler ensures that the CPU is not idle if there are threads to execute. It also ensures that all executable threads appear to be executing simultaneously. In addition, OS Scheduler can ensure that high-priority threads have more opportunities to execute than low-priority threads, while keeping low-priority threads from getting the opportunity to execute. OS Scheduler also needs to make decisions quickly to reduce latency.

thread

OS scheduler schedules a thread based on its state. A thread has three states: Waiting, Runnable or Executing.

state explain
Waiting Wait state. The thread is waiting for something to happen. For example, waiting for network data and hard disks; Call the operating system API; Wait for memory synchronization access conditions ready, such as atomic, mutexes
Runnable Ready state. I can run it if I give it CPU resources
Executing Running status. The thread is executing instructions, which is what we want

There are generally two types of things that threads can do: computational and IO.

The computational type occupies CPU resources and is constantly doing computational tasks, such as prime factorization of a large number. This type of task does not cause the thread to jump into a Waiting state.

IO type is to obtain external resources, such as through the network, system call and so on. Memory synchronization access control primitives: Mutexes can also be considered of this type. The common feature is the need to wait for external resources to be ready. An IO task causes the thread to jump into a Waiting state.

A thread switchover is a process in which the OPERATING system transfers a Executing thread from the CPU to a Runnable thread. A new thread may be in a Executing state and a new thread may be Waiting or Runnable. A thread that is doing computational tasks becomes Runnable. The thread that is working on an IO task becomes Waiting.

Therefore, the “attitude” to thread switching is different between computationally intensive tasks and IO intensive tasks. Because a computationally intensive task always has tasks to do, or it always has instructions to execute, thread switching can cause it to stop the current task, at great cost.

On the other hand, a thread dedicated to an IO intensive task will have no impact if it jumps into a Waiting state for an operation and is removed from the CPU. Moreover, the new thread can continue to use the CPU to complete the task. Across the operating system, “work progress” is forward.

Keep in mind that the most important thing for OS Scheduler is not to leave one CPU core idle, but to try to keep each CPU core occupied.

If you have a program that is focused on IO-Bound work, then context switches are going to be an advantage. Once a Thread moves into a Waiting state, another Thread in a Runnable state is there to take its place. This allows the core to always be doing work. This is one of the most important aspects of scheduling. Don’t allow a core to go idle if there is work (Threads in a Runnable state) to be done.

Function call process analysis

To understand the underlying principles of Go Scheduler, an understanding of the function call process is essential. It involves the passing of function parameters, the instruction jump of CPU, the passing of function return values and so on. This requires some knowledge of assembly language, because only assembly language can do low-level operations like register assignment. This is also explained in some previous articles, so let’s review it again.

The space of function stack frame is mainly composed of function parameters and return values, local variables and parameters and return values of other functions being called.

Let’s take a look at the specification of function calls in Go. Here’s a picture from Cao Da’s blog:

The Go Plan9 assembly passes function arguments and return values through the stack.

When a subfunction is called, the arguments are prepared at the top of the stack before the CALL instruction is executed. The CALL instruction pushes the VALUE of the IP register, which is the next instruction to be executed after the function CALL is complete.

It then enters the called stack frame. First, caller BP is pushed, which indicates the base address of the stack. The top and base of the stack define the stack frame of the function.

The CALL instruction is similar to the combination of PUSH IP and JMP somefunc instructions. First, the value of the current IP instruction register is pushed onto the stack, and then the address of the function to be called is written into the IP register by JMP instruction to realize the jump.

RET instruction is the opposite operation of CALL, which is basically equivalent to POP IP instruction. That is, the return address saved in SP during the execution of CALL instruction is reloaded into the IP register to realize the return of the function.

The first is the input parameter and return value space prepared before calling the function. The CALL instruction then triggers the return address push operation first. After entering the function being called, the assembler automatically inserts the BP register-related instruction, so that the BP register and the return address are next to each other. Below that is the local variable space of the current function, including the call parameter space needed to call other functions again. When the RET return instruction is executed by the called function, BP and SP registers are first restored from the stack, and then the retrieved return address jumps to the corresponding instruction execution.

The above two paragraphs describe an assembly language chapter from the book Advanced Programming for the Go language.

How does goroutine work

What is a goroutine

Goroutine can be thought of as a layer of abstraction for Thread, which is more lightweight and can be executed separately. Because of this abstraction, Gopher didn’t face Thread directly, we just saw goroutines flying through the code. The operating system, on the other hand, I don’t care about goroutine. I’m comfortable executing threads, which is my basic unit of scheduling.

Difference between goroutine and Thread

When it comes to Goroutine, one of the most common topics is: How is it different from Thread?

The resources [How Goroutines Work] tell us that there are three ways to distinguish: memory consumption, creation and destruction, and switching.

  • Memory footprint

Creating a Goroutine consumes 2 KB of stack memory. In practice, if the stack space is insufficient, it will be expanded automatically. Creating a thread consumes 1 MB of stack memory and requires an area called a Guard page that is isolated from the stack space of other threads.

For an HTTP Server built with Go, it is very easy to create a Goroutine for each incoming request. If a service is built in a language that uses threads as a concurrency primitive, such as Java, one thread per request is a waste of resources and quickly generates an OOM error.

  • Create and destroy

Thread creation and destruction are costly, and because you’re dealing with an operating system at the kernel level, the usual solution is Thread pools. Since Goroutine is managed by the Go Runtime, creation and destruction costs are minimal and are user-level.

  • switch

When switching threads, various registers need to be saved for future restoration:

16 general purpose registers, PC (Program Counter), SP (Stack Pointer), segment registers, 16 XMM registers, FP coprocessor state, 16 AVX registers, all MSRs etc.

The Goroutines switch saves only three registers: Program Counter, Stack Pointer and BP.

Typically, thread switching takes 1000-1500 nanoseconds, with an average of 12-18 instructions executed per nanosecond. Therefore, due to thread switching, the number of instructions executed will be reduced by 12,000-18,000.

Goroutine switches at about 200 ns, equivalent to 2400-3600 instructions.

As a result, Goroutines have much lower switching costs than Threads.

M: N model

As we all know, Go Runtime takes care of the life and death of goroutine, from creation to destruction. The Runtime creates M threads (the unit of CPU execution scheduling) at startup, and N subsequent goroutines are created to execute on these M threads. This is the M:N model:

Only one Goroutine can run on a thread at a time. When a Goroutine is blocked (for example, when sending data to a channel was blocked, as mentioned in the previous article), the Runtime dispatches the current Goroutine and lets another Goroutine execute it. The goal is to keep a thread from sitting idle and draining the CPU of every last drop.

What is a scheduler

Go Program execution consists of two layers: Go Program, Runtime, that is, user Program and Runtime. They implement memory management, channel communication, goroutines creation, and so on through function calls. System calls made by user programs are intercepted by the Runtime to help with scheduling and garbage collection.

A panoramic representation of the relationship is shown below:

Why scheduler

The Go Scheduler is arguably one of the most important parts of the Go runtime. The Runtime maintains all goroutines and uses scheduler to schedule them. Goroutines and Threads are separate, but Goroutines depend on Threads for execution.

The efficiency of the Go program is inseparable from the scheduler’s scheduling.

Scheduler underlying principles

In fact, from the operating system’s point of view, all programs are performing multithreading. Scheduling Goroutines to execute on threads is just a concept at the Runtime level, above the operating system level.

There are three basic constructs for implementing goroutines scheduling. G, m, p.

G represents a Goroutine, which contains: fields representing the Goroutine stack, indicating the current state of the Goroutine, indicating the address of the instruction currently running to, that is, the PC value.

M stands for kernel thread and contains fields such as running Goroutine.

P is a virtual Processor that maintains a Runnable G queue, and M needs p to run G.

There is, of course, a core structure: Sched, which takes in the big picture.

Runtime starts with G’s: garbage collection G, scheduling G, user code G; And an M will be created to start running G. Over time, more G’s will be created, and more M’s will be created.

Of course, in earlier versions of Go, there was no structure p. M had to get g from a global queue to run, so it needed to get a global lock. When concurrency was high, the lock became a bottleneck. Later, in Dmitry Vyokov’s implementation, the P structure was added. Each P maintains its own queue of G in a Runnable state, solving the original global locking problem.

Objectives of Go Scheduler:

For scheduling goroutines onto kernel threads.

The core idea of Go Scheduler is:

  1. Reuse threads;
  2. Limit the number of threads running simultaneously (without blocking) to N, which is equal to the number of CPU cores;
  3. Thread-private runqueues, which can be run from other threads stealing goroutine, passing runqueues to other threads when blocked.

Why P is needed, can’t runqueues be stored directly in M?

You might wonder now, why have contexts at all? Can’t we just put the runqueues on the threads and get rid of contexts? Not really. The reason we have contexts is so that we can hand them off to other threads if the running thread needs to block for some reason.

An example of when we need to block, is when we call into a syscall. Since a thread cannot both be executing code and be blocked on a syscall, we need to hand off the context so it can keep scheduling.

When a thread blocks, the goroutines on P bound to it are transferred to another thread.

Go Scheduler starts a background thread, Sysmon, that detects goroutines running for long periods (more than 10 ms) and schedules them to global Runqueues. This is a global runqueue, with a lower priority as a penalty.

The overview

GPM models are often mentioned when talking about Go Scheduler, so let’s look at them one by one.

The following is the hardware information of the MAC I use, which only has 2 cores.

But with CPU hyperthreading, one core can become two, so when I run the following program on a MAC, it prints four.

func main(a) {
	// NumCPU returns the number of logical cores available to the current process
	fmt.Println(runtime.NumCPU())
}
Copy the code

Because NumCPU returns the number of logical cores, not the number of physical cores, the final result is 4.

After the Go program starts, each Logical core is assigned a Logical Processor (P). At the same time, each P is assigned an M (Machine, for kernel threads), which are still scheduled by the OS scheduler.

To summarize, when I start a Go program locally, I get four system threads to perform tasks, each with a P.

At initialization, the Go program will have a G (Initial Goroutine), the unit of instruction to execute. G is executed on M, the kernel thread is scheduled on the CPU core, and G is scheduled on M.

G, P, M are all said and there are two more important components left out: the global runnable queue (GRQ) and the local runnable queue (LRQ). LRQ stores local (that is, concrete P) runnable goroutines, and GRQ stores global runnable Goroutines that have not yet been assigned to concrete P.

The Go Scheduler is part of the Go Runtime, which is embedded in the Go program and runs with it. So it runs in user space, one layer above the kernel. Unlike THE PREemptive scheduling of Os Scheduler, Go Scheduler adopts cooperative scheduling.

Being a cooperating scheduler means the scheduler needs well-defined user space events that happen at safe points in the code to make scheduling decisions.

Collaborative scheduling typically involves the user setting the scheduling point. For example, yield in Python tells the Os scheduler that it is ready to schedule me out.

However, in Go, goroutine scheduling is done by the Go Runtime rather than by the user, so we can still regard the Go Scheduler as preemptive scheduling, because the user cannot predict what the scheduler will do next.

Like threads, goroutine has three states (simplified version) :

state explain
Waiting Wait state, goroutine is waiting for something to happen. For example, waiting for network data and hard disks; Call the operating system API; Wait for memory synchronization access conditions ready, such as atomic, mutexes
Runnable Ready state, just give me M and I can run
Executing Running status. Goroutine executes instructions on M, which is what we want

The following GPM global operation diagram is more than seen, can keep, after reading the next series of articles and then come back to see, or very touched:

Goroutine scheduling timing

In four cases, goroutine scheduling may occur, but it does not necessarily occur, except that Go Scheduler has a chance to do so.

situation instructions
Use keywordsgo Go creates a new Goroutine, and the GO Scheduler considers scheduling
GC Since the goroutine that does GC also needs to run on M, scheduling is bound to occur. Of course, Go Scheduler does many other scheduling, such as scheduling goroutines that do not involve heap access to run. GC doesn’t care about memory on the stack, it only reclaims memory on the heap
The system calls When a goroutine makes a system call, M blocks, so it gets scheduled away, and a new Goroutine gets scheduled up
Synchronous memory access Atomic, mutex, channel operations, etc., block the Goroutine and are therefore scheduled away. When the conditions are met (for example, other Goroutines are unlocked) it will be scheduled to continue running

work stealing

The Go Scheduler’s job is to evenly distribute all goroutines at runnable to M running on P.

When a P finds that its LRQ has no G left, it “steals” some G from other PS to run. Look at the spirit! When your own work is done, share it with others for the benefit of the whole. This is called work-Stealing, and Go has been implemented since 1.1.

The Go Scheduler uses the M:N model, where M Goroutines (G) are allocated to N kernel threads (M) at any one time, running on a logical processor (P) with a maximum number of GOMAXPROCS. Each M must be attached to a P, and each P can only run one M at a time. If M on P blocks, then it needs other M to run goroutines in P’s LRQ.

Personally, the above picture is more vivid than the usual ones with triangles for M, circles for G, and rectangles for P.

In fact, all the Go Scheduler does for each round of scheduling is find the Goroutines in the runnable and execute them. The order of search is as follows:

runtime.schedule() {
    // only 1/61 of the time, check the global runnable queue for a G.
    // if not found, check the local queue.
    // if not found,
    // try to steal from other Ps.
    // if not, check the global runnable queue.
    // if not found, poll network.
}
Copy the code

Once an executable goroutine is found, execution continues until it is blocked.

When a G on P2 finishes, it goes to LRQ for the next G to execute. If the LRQ is empty, the local runnable queue has no G to execute, and the GRQ has no G to execute. At this point, P2 picks a random P (called P1) and “steals” half of G from P1’s LRQ.

The advantage of this is that there are more PS working together, speeding up the execution of all G’s.

Synchronous/asynchronous system calls

When G needs to make a system call, depending on the type of call, it can be attached to M in two cases: synchronous and asynchronous.

In the case of synchronization, M will be blocked and then scheduled down from P, where P does not feed idle people and G remains attached to M. After that, a new M is called to P, followed by the grunting Gs in the LRQ that executes P. Once the system call is complete, G is added to P’s LRQ, and M is “hidden away” until it is needed.

In the asynchronous case, M will not be blocked, G’s asynchronous request will be taken over by the “agent” Network Poller, AND G will be bound to the Network Poller. After the system call ends, G will return to P again. Since M is not blocked, it can therefore continue to execute other G’s in LRQ.

As you can see, in asynchronous cases, Go Scheduler successfully switches I/O tasks to CPU tasks through scheduling, or switches kernel-level thread switches to user-level Goroutine switches, greatly improving efficiency.

The ability to turn IO/Blocking work into CPU-bound work at the OS level is where we get a big win in leveraging more CPU capacity over time.

Like a very demanding scheduler, Go Scheduler doesn’t leave a SCHEDULer idle and always finds a way to make you do more.

In Go, it’s possible to get more work done, over time, because the Go scheduler attempts to use less Threads and do more on each Thread, which helps to reduce load on the OS and the hardware.

The trap of the scheduler

Because Go is a collaborative scheduling language, it is not like threads, when the time slice runs out, the CPU interrupts the task to force it away. The Go Scheduler has a background thread that monitors goroutine running for longer than 10 ms and sets the “preemption flag bit” of the Goroutine, which is then processed by the scheduler. However, the time to set the flag bit is only in the “prologue” section of a function, not if there is no function call.

Golang implements a co-operative partially preemptive scheduler.

So in some extreme cases, there are traps. The following example comes from The Resources scheduler’s Pitfalls.

func main(a) {
	var x int
	threads := runtime.GOMAXPROCS(0)
	for i := 0; i < threads; i++ {
		go func(a) {
			for { x++ }
		}()
	}
	time.Sleep(time.Second)
	fmt.Println("x =", x)
}
Copy the code

The result is that the last print statement is not printed in an infinite loop.

Why is that? The above example starts goroutines equal to the number of CPU cores on the machine, and each goroutine executes an infinite loop.

After the goroutines are created, a time.sleep (time.second) statement is executed in the main function. When the Go Scheduler sees this statement, it is almost overjoyed to come to life. This is a good time to dispatch, so the main Goroutine is dispatched. Threads goroutines created earlier fill up both M and P.

Inside these Goroutines, there are no calls to channels, time.sleep, and other things that would cause the scheduler to work. Oh, trouble. We’re just going to have to let these infinite loops continue.

The solution is to decrease threads by 1:

func main(a) {
	var x int
	threads := runtime.GOMAXPROCS(0) - 1
	for i := 0; i < threads; i++ {
		go func(a) {
			for { x++ }
		}()
	}
	time.Sleep(time.Second)
	fmt.Println("x =", x)
}
Copy the code

Running results:

x = 0
Copy the code

The main goroutine sleeps for a second, is awakened by Go Schduler, and is scheduled to continue execution on M. After printing a statement, it exits. When the main goroutine exits, all other Goroutines must follow suit. The so-called “how can there be all eggs under the nest”, both loss and loss.

As for why x is 0 in the final print, it was mentioned in the previous article “Cao Da on memory rearrangement”, which will not be discussed here.

Another solution is to add a sentence to the for loop:

go func(a) {
	time.Sleep(time.Second)
	for { x++ }
}()
Copy the code

You can also give main Goroutine a chance to schedule execution.

conclusion

This article, looking at the Go scheduler from a macro perspective, covers many aspects. In the next 10 consecutive articles, I will dig deep into the source code and parse it layer by layer. Stay tuned!

There are many well-written English blogs in Resources that will give you a familiar feeling once you have mastered the basics. Well spoken!

The resources

Zhihu answered, how to understand the difference between blocking non-blocking and synchronous asynchronous? www.zhihu.com/question/19…

From scratch to learn architecture Reactor and Proactor book.douban.com/subject/303 】…

【 think no goalng second bosses on translation 】 segmentfault.com/a/119000001…

【 ardan LABS 】 www.ardanlabs.com/blog/2018/0…

【 dissertation Analysis of the Go Runtime scheduler】www.cs.columbia.edu/~aho/cs6998…

Morsmachine. dk/go-schedule…

Farmers turn article [code] mp.weixin.qq.com/s/BV25ngvWg…

[Goroutine Collection] github.com/ardanlabs/g…

【 DaBin scheduler series 】 lessisbetter. Site / 2019/03/10 /…

The Scalable scheduler design doc 2012] docs.google.com/document/d/…

【Go Scheduler blog Post 】 Morsmachine. dk/go-schedule…

【 Work Stealing 】rakyll.org/scheduler/

[Tony Bai also talks about goroutine scheduler] tonybai.com/2017/06/23/…

【Tony Bai debugging example analysis 】tonybai.com/2017/11/23/…

[How Tony Bai Goroutine works] tonybai.com/2014/11/15/…

【 How Goroutines Work 】 blog.nindalf.com/posts/how-g…

What are blocking, non-blocking, synchronous and asynchronous? www.zhihu.com/question/26…

【 zhihu article fully understand asynchronous and synchronous/block/non-blocking 】 zhuanlan.zhihu.com/p/22707398

【The Go Netpoller 】morsmachine.dk/netpoller

【 zhihu column Head First of Golang Scheduler 】 zhuanlan.zhihu.com/p/42057783

[Bird nest five IO models] colobu.com/2019/07/26/…

The Go Runtime Scheduler 】 speakerdeck.com/retervision…

Go – the scheduler povilasv 】. Me/go – the schedule…

Tracking the scheduler 】 【 www.ardanlabs.com/blog/2015/0…

【go tool trace 】making.pusher.com/go-tool-tra…

[Goroutine tour] medium.com/@riteeksriv…

Concurreny and Parallelism www.youtube.com/watch?v=cN_…

The scheduler trap 】 【 www.sarathlakshman.com/2016/06/15/…

【boya source read 】github.com/zboya/golan…

【 wave o zhang scheduler series 】 mp.weixin.qq.com/mp/homepage…

【 asmShare 】github.com/cch123/asms…

【 Go scheduler is introduced and the easily neglected problems] www.cnblogs.com/CodeWithTxT…

[recently discovered a big guy source analysis] github.com/changkun/go…