What is the GPM scheduler for Go?

😋 MY name is Ping Ye. There is an open source project “Go Home” focusing on Gopher technology growth.

takeaway

Many of you have heard that the Go language naturally supports high concurrency due to its built-in Goroutine support, which enables thousands of coroutines to be launched in a single process. So how can it achieve such high concurrency? You need to understand what the concurrency model is.

Concurrency model

Herb Sutter, a noted C++ expert, once said that “the free lunch is over.” To make code run faster, faster hardware alone is no longer enough. We need multiple cores to exploit the value of parallelism, and the purpose of the concurrency model is to show you how different executing entities work together.

Of course, different concurrency models cooperate in different ways. There are seven common concurrency models:

Thread and lock
Functional programming
The way of Clojure
actor
Communication Sequence Process (CSP)
Data-level parallelism
Lambda architecture

Today, we will only talk about the concurrency model CSP, which is related to Go. If you are interested, please refer to the book “Seven Concurrent Models in seven Weeks”.

CSP article

CSP, short for Communicating Sequential Processes, is one of the seven concurrent models. Its core idea is to connect two concurrent entities through a channel, and all messages will be transmitted through a channel. In fact, the concept of CSP was first proposed by Tony Hall in 1978, and it has recently gained popularity thanks to the rise of Go.

So how does CSP relate to the Go language? Next, let’s look at the implementation of THE CONCURRENT model of CSP by Go language — GPM scheduling model.

GPM scheduling model

GPM represents three roles: Goroutine, Processor, and Machine.

Goroutine: is our common use of the go keyword to create the implementation, it corresponds to a structure G, the structure holds the Goroutine stack information
Machine: indicates the operating system thread
Processor: indicates the Processor used to establish the connection between G and M

Goroutine

Goroutine is used in the code go keywords to create execution unit, also known as a “lightweight thread,” said the coroutines, coroutines is not known to the operating system, it is composed of a programming language level implementation, context switch does not need through the kernel mode, plus coroutines memory space is very small, so has a very big development potential.

go func(a){} ()Copy the code

In Go, Goroutine is represented by a very complex structure called Runtime. Go, which has more than 40 member variables and stores execution stack, state, currently occupied threads, and scheduling-related data. There is also a Goroutine logo that you really want to get, but I am sorry that the official set it to private in consideration of the development of the Go language and will not call you 😏.

type g struct {
	stack struct {
		lo uintptr
		hi uintptr
	} 							// Stack memory: [stack.lo, stack.hi)
	stackguard0	uintptr
	stackguard1 uintptr

	_panic       *_panic
	_defer       *_defer
	m            *m				// The current m
	sched        gobuf
	stktopsp     uintptr		// Expect sp to be at the top of the stack for backtracking
	param        unsafe.Pointer // wakeUp The argument passed when waking up
	atomicstatus uint32
	goid         int64
	preempt      bool       	// Preempt signal, stackGuard0 = stackPreempt copy
	timer        *timer         // The cached timer for time.sleep. }Copy the code

Goroutine scheduling-related data is stored in Sched, which is used when coroutine switches and restores context.

type gobuf struct {
	sp   uintptr
	pc   uintptr
	g    guintptr
	ret  sys.Uintreg
	...
}
Copy the code

Machine

By default, GOMAXPROCS is set to the number of cores. If there are four cores, then by default, four threads are created, each with a runtime. M structure. The reason for the number of threads equal to the number of cpus is that each thread allocated to one CPU does not have to be context-switched by threads, keeping system overhead to a minimum.

type m struct {
	g0   *g 
	curg *g
	...
}
Copy the code

There are two important things in M, one is G0 and the other is curg.

G0: Deeply involved in runtime scheduling, such as goroutine creation, memory allocation, etc
Curg: Represents the goroutine currently executing on the thread.

Just now, P is responsible for the association between M and G, so M also stores data related to P.

type m struct{... p puintptr nextp puintptr oldp puintptr }Copy the code

P: the processor running the code
Nextp: transient processor
Old: processor of the thread before the system call

Processor

Proccessor is responsible for the connection between the Machine and the Goroutine. It can provide the context required by the thread, and also allocate G to the thread it should go to execute. With Proccessor, every G can get a reasonable call, and every thread will no longer fish in troubled waters, which is a necessary good product for home.

Similarly, the number of processors is set by default to GOMAXPROCS, which corresponds to the number of threads.

type p struct {
	m           muintptr

	runqhead uint32
	runqtail uint32
	runq     [256]guintptr
	runnext guintptr
	...
}
Copy the code

Structure P stores fields related to performance tracking, garbage collection, timers, and so on. It also stores the processor’s queue to run, which stores a list of goroutines to execute.

The relationship between the three

First, four threads and four processors are started by default, and then bound to each other.

At this point, a Goroutine structure is created, and after updating the function body address, parameter start address, parameter length, and scheduling properties, it is queued up by a processor to be dispatched.

What, create another G? Then take turns to put other P inside bai bai, believe you queue to get the number when you see other window no one queue will also past.

What if I have a bunch of G’s, and they all fill up? Instead of putting G in the processor’s private queue, put it in the global queue (waiting hall).

In addition to plug in, M side also crazy to fetch out, first to the processor’s private queue to fetch G, if the global queue to fetch, if there is no global queue, to steal other processor queue, wow, so hungry, it is the devil ah!

What if I can’t find a G to execute anywhere? Then M would be so disappointed that he would disconnect from P and go to sleep (idle).

What if two Goroutines are doing something affectionate with a channel and get blocked? Is M going to wait until they’re done? Obviously not, M doesn’t care about the Go couple, and will turn around and find another G to perform.

The system calls

If G makes the syscall call, M will also enter the syscall state, and then the P will be wasted. The neat thing about this is that P doesn’t wait around for the G and M system calls to complete, but instead finds other M’s to execute other G’s.

When G completes the system call, it must find another free processor to start because it needs to continue to execute.

If there are no free processors, G is put back on the global queue for allocation.

sysmon

Sysmon is our cleaning aunt, it is an M, also called monitoring thread, it does not need P to run independently, it will wake up every 20us~10ms to clean up, the main work is to recycle garbage, recycle P blocked for a long time system scheduling, send preemption scheduling to G running for a long time and so on.

entry

Tony Hall

Tony Hall is a British computer scientist and Turing prize winner who designed the powerful quicksort algorithm, Hall logic and the CSP model. He was awarded the John Von Neumann Award in 2011.

Thank you for watching, if you feel the article is helpful to you, welcome to pay attention to the public account “Ping ye”, focus on Go language and technology principle.