When you write the following program, how does the computer execute your program?

Do you execute the first go func loop, or print “Hello Go”?

goroutine

Go can easily create tens of thousands of concurrent requests, thanks to Goroutine. In the initial computer era, only process, process shortcomings are obvious, too heavy, fork a process 40M memory, consume more resources. Later, threads came along, and initializing a thread was supposed to be one data magnitude less memory than a process, much lighter. Compared to the process, thread concurrency is much better, but thread switching is also to fall into the kernel, frequent switching, this overhead is relatively large, the CPU to protect the scene, temporary storage of the current running state of the thread and so on. If there are too many threads, then the CPU is wasting too much time on context switches. If thread A executes A + B in user mode and no system call occurs, then thread B will also be cut from user mode to kernel mode.

Then came the Coroutine, where users collaborated. The advantage compared with threads is that multiple coroutines can be associated with a thread, avoiding the creation of too many threads. In addition, after the execution of one coroutine, the next coroutine can be notified to execute. Such switching does not need to be cut to the kernel state, but can be completed in the user state.

Goroutine is also derived from coroutines, and naturally has all the benefits of coroutines, of course, better than regular coroutines. It takes only 2k memory to create a Goroutine, and thread switching takes about 1000-1500 nanoseconds. A nanosecond can execute 12-18 instructions. Goroutine switching takes about 200 nanoseconds, which is at least five times faster than threading. An OS-level thread can bind multiple Goroutines. When a Goroutine switch occurs, only the goroutine that is currently executing needs to be removed. No thread switch is required. After GO1.14, Goroutine supports signal preemptive scheduling (a Goroutine does not occupy CPU all the time, after executing for more than a certain period of time, it is relinquished).

GMP

  • G: a goroutine
  • P: processor, set at the start of the GO program. The default number of Gomaxprocs determines the number of goroutines that can be executed in parallel
  • M: OS thread, the thread that actually executes the code

If M owns P and P owns G, then M can perform G.

Scheduling algorithm:

  • When the program starts, it initializes P with the same number of cores.
  • When a goroutine is created, the first attempt is placed on the local queue, and if the local queue is full, the first half of the local queue is moved to the global queue along with the new Goroutine.
  • If no P is available, the new goroutine is added to the global queue.
  • If a free P is obtained, try to wake up an M and create a new ONE if no M is available.
  • When M is associated with P and the local queue has tasks, the goroutine can always be fetched from the local queue of P.
  • When P has no goroutine in its local queue, it tries to take part of the global queue and place it in the local queue, which is locked.
  • When the global column is not fetched, it will try to steal half from the local queue of other P and put it in its own local queue
  • When a system call is made to G, P disconnects from the current M and attempts to fetch an M from M’s idle queue to continue with the remaining goroutine.
  • When the above G system call ends, M tries to get a P to continue execution. If it does not get a P, it puts the G in the global queue and enters M’s free queue. This is not to destroy M, to avoid unnecessary overhead to create M later.

The nice thing about goroutine is that each G doesn’t have m all the time. Go has made a big improvement since version 1.14, based on signal grab-and-grab scheduling. Each M, when initialized, registers a handler that can receive sigurg signals. This sigurG signal is emitted by a monitor for sysmon, which occupies a single M, and sysmon periodically checks whether the Goroutine is executing more than 10ms or GC (STW). If the condition is met, sysmon will send sigurg signal to the corresponding M, and the corresponding handler will start execution, mark the G that is being executed, and then check whether the current stack overflows (morestack). After the condition is met, M will save the context of the current G (if the G can be executed by M again next time, Context can quickly revert to the last execution), and the current G is dropped into the global G queue while M continues to execute the next G.

How to enforce it?

Let’s go back to the problem we started with

func main() {
  runtime.GOMAXPROCS(1)
     go func() {
	 for {
           }
	}()
    time.Sleep(time.Second)
    fmt.Println("hello go!")
}
Copy the code

At the beginning of the program p=1, there is no parallelism and only one Goroutine can be executed at a time, either the go func loop or the main Goroutine time.sleep or later.

  • Versions below GO1.14 are stuck in an infinite loop, because even if the main Goroutine grabs the P, it gives up the CPU for sleep, and then go func starts executing indefinitely.
  • In go1.14 and above, due to sysmon preemption, even if the first go Func grabs P, it will be kicked out of the LOCAL queue of P for executing more than 10ms. Print the main Goroutine and exit without getting stuck.