7.2 Garbage Collector

We in the last section introduces in detail the design and implementation of the Go language memory allocator principle, analyzes the runtime memory management of the relationship between the components as well as the distribution principle of different types of objects, but the memory management system programming languages besides is responsible for the distribution of the heap memory, it also need is responsible to recycle no longer used objects and memory space, This is done by the garbage collector described in this section.

In virtually all modern programming languages, the garbage collector is a complex system, in order to not affect the user program under the condition of recycling waste memory needs a lot of effort, the Java garbage collection mechanism is a good example of a Java 8 contains linear, concurrent, parallel tag removal and G1 four garbage collector 1, It takes a lot of effort to understand how they work and the implementation details.

This section details the design and implementation principles of the garbage collector in Go runtime systems. We will not only discuss the common garbage collection mechanism, analyze the evolution of the Go language from v1.0, but also delve into the source code to analyze how the garbage collector works. Next, let’s move on to another important component of the Go language’s memory management, garbage collection.

7.2.1 Design Principles

Today’s programming languages use both manual and automatic memory management. Programming languages such as C, C++, and Rust use manual memory management. Languages like Python, Ruby, Java, and Go use automatic memory management systems, usually garbage collection, but Objective-C uses automatic reference counting 3. Reference counting is also an automatic memory management mechanism, but we won’t cover it in detail here. This section focuses on garbage collection.

It is believed that many people have the impression of the garbage collector is to Stop the world (STW), as the user program requests more and more memory, the garbage in the system gradually increases; When the program memory footprint reaches a certain threshold, the whole application will be suspended, the garbage collector scans all the objects assigned memory space and recycling are no longer used, after the end of this process, the user program can continue, the language also use this strategy implement garbage collection in the early days, but today’s implementation is much complicated.

Figure 7-21 Components of memory management

In the figure above, the user program (Mutator) allocates memory on the heap through the Allocator, and the garbage Collector is responsible for reclaiming memory on the heap. The Allocator and garbage Collector jointly manage the heap memory space in the program. In this section we detail the key theories involved in Go language garbage collection to help us better understand the rest of this section.

Mark clear

Mark-sweep algorithm is the most common garbage collection algorithm, and the mark-sweep collector is a tracking garbage collector, whose execution process can be divided into two stages: Mark and Sweep:

  1. Mark phase – finds and marks all surviving objects in the heap starting from the root object;
  2. Cleanup phase – walks through all objects in the heap, collects unmarked garbage objects and adds the reclaimed memory to the free linked list;

As shown in the figure below, there are multiple objects in the memory space. Starting from the root object, we traverse the child objects of the object in turn and mark the objects reachable from the root node as living state, namely A, C and D. The remaining objects B, E and F will be treated as garbage because they are unreachable from the root node:

Figure 7-22 Marking phase of mark clearing

After the marking stage, the collector will enter the clearing stage, in which the collector will go through all the objects in the heap in turn, release the unmarked objects B, E and F, and connect the new free memory space in a linked list structure to facilitate the use of the memory allocator.

Figure 7-23 Marking the cleanup phase of a cleanup

The garbage collector starts from the root object of the garbage collection, recursively traverses the child objects that these objects point to and marks all reachable objects as alive. At the end of the marking phase, the garbage collector will walk through the objects in the heap and remove the garbage. The whole process needs to mark the living state of the objects, and the user program cannot be executed during the garbage collection. We need to use a more complex mechanism to solve the STW problem.

Three color abstract

In order to solve the long STW caused by the original tag cleaning algorithm, most modern tracer garbage collectors implement a variant of the tricolor tag algorithm to shorten the STW time. The three-color marking algorithm divides the objects in the program into white, black and gray categories 4:

  • White objects – potential garbage whose memory might be reclaimed by the garbage collector;
  • Black objects – active objects, including objects that do not have any references to external Pointers and objects reachable from the root object;
  • Gray objects – objects that are active because there are external Pointers to white objects and the garbage collector scans their children;

Figure 7-24 Three-color object

When the garbage collector starts working, there are no black objects in the program, the root object of the garbage collection will be marked gray, garbage collector will only take the objects from the gray object collection and start scanning, when there are no objects in the gray collection, the marking phase will end.

Figure 7-25 Execution process of the tri-color garbage collector

The working principle of the tri-color tag garbage collector is simple and can be summarized in the following steps:

  1. Select a gray object from the set of gray objects and mark it black;
  2. Mark all objects that a black object points to as gray to ensure that the object and objects referenced by the object are not recycled.
  3. Repeat the above two steps until there are no gray objects in the object graph;

When the three color clear marking phase is over, the application of the heap object does not exist any gray, we can only see black live objects, and white garbage objects, garbage collector can recycle these white trash, the following is the use of three color heap memory after the garbage collector performs marking, only object in the heap D to be the recycling:

Figure 7-26 Heap after the tri-color label

Because the user program may change the pointer to the object in the process of marking execution, so the three color tag removal algorithm itself is not can be executed concurrently or incremental, it still needs to STW, in the process of the three color marker is shown below, the user program is established from A to D object references, but because the program has no gray object, So D objects are mistakenly collected by the garbage collector.

Figure 7-27 Tricolor and user program

An object that should not be reclaimed is reclaimed, which is a very serious error in memory management. We refer to this error as a hanging pointer, which is a pointer that does not point to a valid object of a particular type, affecting the security of memory. 5.

Barrier technology

Memory barrier technology is a barrier instruction that allows the CPU or compiler to follow certain constraints when executing memory-related operations. Most modern processors execute instructions out of order to maximize performance, but this technique can ensure that code operates sequentially on memory. Operations performed in front of the memory barrier must precede operations performed behind the barrier 6.

To ensure correctness in concurrent or incremental labeling algorithms, we need to achieve either of the following two tri-color invariants:

  • Strong trichromatic invariance – black objects will not point to white objects, only to gray objects or black objects;
  • Weak tricolor invariance – the white object to which the black object points must contain a reachable path from the gray object through multiple white objects 7;

Figure 7-28 Trichromatic invariance

The figure above shows heap memory with strong and weak tricolor invariance respectively. We can guarantee the correctness of garbage collection algorithm by following either of the above two invariance, and barrier technique is an important technique to guarantee tricolor invariance during concurrent or incremental marking.

A barrier technique in garbage collection is more like a hook method, which is a piece of code that is executed when a user program reads an object, creates a new object, and updates an object pointer. Depending on the type of operation, we can divide them into Read barriers and Write barriers. Because read barriers require code snippets to be added to read operations, which greatly affects the performance of user programs, programming languages often adopt write barriers to ensure tricolor invariance.

Here we would like to introduce two write barrier techniques used in Go, namely The insert write barrier 8 proposed by Dijkstra and the delete write barrier 9 proposed by Yuasa, and analyze how they ensure tricolor immutability and correctness of the garbage collector.

Insert write barrier

Dijkstra introduced insert write barriers in 1978, with write barriers like the following, user programs and garbage collectors can ensure correct program execution while working alternately:

writePointer(slot, ptr):
    shade(ptr)
    *field = ptr
Copy the code

The pseudocode that inserts the write barrier above is pretty straightforward. Whenever we execute an expression like *slot = PTR, we execute the write barrier above to try to change the color of the pointer via the shade function. If the PTR pointer is white, the function sets the object to gray, otherwise it remains the same.

Figure 7-29 Dijkstra inserting a write barrier

Assuming we use Dijkstra’s insert write barrier in our application, in a scenario where the garbage collector and the user program run alternately, the marking process shown in the figure above appears:

  1. The garbage collector marks the root object pointing to A in black and the object pointing to B in gray;
  2. The user program modifies the pointer of object A to point to object C, which triggers the write barrier and marks the object C in gray.
  3. The garbage collector walks through the other gray objects in the sequence, marking them as black;

Dijkstra’s insertion write barrier is a relatively conservative barrier technique that marks viable objects in gray to satisfy strong trichromatic invariance. In the garbage collection process shown above, B objects that are no longer actually alive end up not being collected; If we change the pointer to a C object back to B between the second and third steps, the garbage collector still considers the C object alive, and these mismarked garbage objects will only be collected in the next loop.

While the plug-in Dijkstra write barrier is simple to implement and guarantees strong tricolor invariance, it also has significant disadvantages. Because the object on the stack in the garbage collection will also be considered to be the root object, so in order to ensure the safety of the memory, Dijkstra must add write barriers to object on the stack or in the mark phase of scanned object on the stack object, the disadvantage of the two approaches each have each, the former will greatly increase the write pointer overhead, The designer of the garbage collection algorithm needs to make a tradeoff between the latter, which requires the program to be paused while the stack object is rescan.

Deleting write Barriers

In his 1990 paper, Real-time Garbage Collection on General-Purpose Machines, Yuasa proposed removing the write barrier because once it was working, it guaranteed the reachabilities of all objects on the heap when the write barrier was turned on, It is also called Snapshot GC.

This guarantees that no objects will become unreachable to the garbage collector traversal all objects which are live at the beginning of garbage collection will be reached even if the pointers to them are overwritten.

The algorithm uses write barriers as shown below to ensure correctness of the program for incremental or concurrent garbage collection:

writePointer(slot, ptr)
    shade(*slot)
    *slot = ptr
Copy the code

The code above will color the white old object gray when the reference to the old object is deleted. In this way, removing the write barrier ensures weak tricolor invariance, and the downstream object referenced by the old object must be referenced by the gray object.

Figure 7-29 Yuasa deleting a write barrier

Assuming we use Yuasa’s proposed write deletion barrier in our application, in a scenario where the garbage collector and the user program run alternately, the marking process shown in the figure above appears:

  1. The garbage collector marks the root object pointing to A in black and the object pointing to B in gray;
  2. The user program points the pointer of object A to object B to C, triggering the deletion of the write barrier. However, because object B is already gray, no change is made.
  3. The user program deletes the pointer of object B that originally points to C, triggering the write deletion barrier, and the white C object is painted gray.
  4. The garbage collector walks through the other gray objects in the sequence, marking them as black;

The third step in the above process triggers Yuasa to remove the write barrier’s coloring, because the user program removes a pointer from B to C, so C and D violate strong and weak tricolor invariance respectively:

  • Strong trichromatic invariance – the black A object points directly to the white C object;
  • Weak tricolor invariance – Garbage collector cannot access white C and D objects from a gray object through several consecutive white objects;

Yuasa removes write barriers by coloring C objects to ensure that C objects and downstream D objects can survive the garbage collection cycle, avoiding hanging Pointers to ensure user program correctness.

Increments and concurrency

Traditional garbage collection algorithms pause the application for the duration of garbage collection. Once garbage collection is triggered, the garbage collector will seize the CPU and occupy a large amount of computing resources to complete the marking and cleaning work. However, many real-time applications cannot accept long STW.

Figure 7-30 Garbage collection and pause program

Computing resources of ancient times is not so rich today, today’s computers are often multi-core processor, the garbage collector once started will waste a lot of computing resources, to reduce the number of applications to suspend the longest time and total garbage collection pause time, we will use the following strategies to optimize modern garbage collector:

  • Incremental garbage collection – Incrementally flags and cleans garbage, reducing the maximum amount of time an application can pause;
  • Concurrent garbage collection – using multi-core computing resources to concurrently mark and remove garbage as user programs execute;

Because both incremental and concurrent can be run interchangeably with user programs, we need to use barriers to ensure proper garbage collection; At the same time, the application can’t wait to be out of memory, also triggers garbage collection, because when there is insufficient memory, the application has been unable to allocate memory, this with suspension directly program does not have what distinction, incremental and concurrent garbage collection need to trigger early and before there is insufficient memory to complete the whole cycle, avoid program suspend for a long time.

Incremental collector

Incremental garbage collection is a solution to reduce the maximum pause time of an application. It can be divided into several smaller pieces of GC time, which increases the time from start to end of the garbage collection, but also reduces the maximum pause time of an application:

Figure 7-31 Incremental garbage collector

It is important to note that the incremental garbage collection need to use with three color marker method, in order to guarantee the correctness of the garbage collection, we need to open the write barriers before the garbage collection, so that the user program memory changes will first after write barriers of processing, guarantees the heap memory objects in the relationship of strong three color invariance or weak invariance of the three color. Although incremental garbage collection can reduce the maximum program pause time, incremental garbage collection can also increase the total time of a GC cycle. Incremental garbage collection is not the only advantage during garbage collection because of the additional computational overhead of the user program due to write barriers.

Concurrent collector

Concurrent garbage collection can reduce not only the maximum pause time of a program, but also the overall garbage collection phase. By enabling read/write barriers and taking advantage of multiple cores to execute in parallel with user programs, Concurrent garbage collectors can indeed reduce the impact of garbage collection on an application:

Figure 7-31 Concurrent garbage collector

Although the concurrent collector can run with the user program, not all phases can run with the user program, and some phases still need to suspend the user program. However, compared with traditional algorithms, the concurrent garbage collection can execute the work that can be executed concurrently as much as possible. Of course, due to the introduction of read/write barriers, concurrent garbage collectors are also bound to introduce additional overhead, not only increasing the total time of garbage collection, but also affecting user programs, which must be taken into account when designing garbage collection strategies.

7.2.2 Evolution Process

The language of the garbage collector has been in since the first day of the birth of evolution, besides a few versions have big update, almost every release small version will improve the performance of garbage collection, and increase with the performance and the complexity of the garbage collector code, this section will start from the Go language v1.0 version analysis process of the evolution of the garbage collector.

  1. V1.0 – completely serial marking and cleaning processes, requiring the entire program to be suspended;
  2. V1.1 – Mark and sweep phase 11 of garbage collection performed in parallel on multi-core hosts;
  3. v1.3- Runtime basedOnly values of pointer type contain PointersAdded support for precise scanning of stack memory, enabling truly precise garbage collection12;
    • willunsafe.PointerA value converted from a type to an integer is considered illegal and may cause serious problems such as dangling Pointers.
  4. v1.5- Implemented based onTricolour marks sweep concurrencyGarbage collector13;
    • Significantly reduce garbage collection latency from several hundred ms to less than 10ms;
    • Calculate the appropriate time for garbage collection to start and speed up the garbage collection process through concurrency;
  5. v1.6- to achieve thedecentralizedGarbage collection coordinator;
    • An explicit state machine enables any Goroutine to trigger a garbage collection state shift;
    • Use dense bitmaps instead of heap memory represented by free linked lists to reduce CPU usage during the cleanup phase 14;
  6. V1.7 — Reduce garbage collection time to less than 2ms by parallel stack shrinkage 15;
  7. V1.8 — Use hybrid write barriers to reduce garbage collection time to less than 0.5ms 16;
  8. V1.9 — The process of completely removing the rescan stack of paused applications 17;
  9. V1.10 — Updated implementation of garbage Collection Frequency Modulator (Pacer), separating hard and soft heap size target 18;
  10. V1.12 — Simplify several phases of the garbage collector with a new tag termination algorithm 19;
  11. V1.13 — The new Scavenger solves the problem of returning memory to the operating system from applications that occupy too much memory instantaneously.
  12. V1.14 — Optimized memory allocation speed with new page allocator 21;

The evolution of the Go garbage collector has seen the component’s implementation and algorithm become more complex. The original garbage collector was an imprecise single-threaded STW collector, but the latest version of the garbage collector supports concurrent garbage collection, decentralized coordination, and other features. Here we describe the components and features associated with the latest version of the garbage collector.

Concurrent garbage collection

Go language in v1.5 introduces concurrent garbage collector, the garbage collector USES the three color abstract we mentioned above and write barriers technology to ensure the correctness of the garbage collector to perform, how to implement concurrent garbage collector will not spread is introduced here, let’s learn some concurrent workflow of the garbage collector.

First, the concurrent garbage collector must trigger the garbage collection cycle at the right point in time. Assuming our Go program is running on a 4-core physical machine, the collector spends 25% of its computing resources scanning and marking objects in memory in the background after garbage collection begins:

Figure 7-32 Concurrent collection of Go languages

The Go language’s concurrent garbage collector pauses the program to do some preparation work for marking objects before scanning them, including starting the background tag garbage collector and turning on the write barrier. If the background garbage collector is not fast enough, the application allocates memory faster than expected. The runtime lets the memory-requesting application assist in the scanning phase of garbage collection, and after the mark and mark termination phases it enters the asynchronous cleanup phase, reclaiming unused memory increments.

V1.5 implements a concurrent garbage collection strategy with a dedicated Goroutine responsible for synchronizing and coordinating the status of garbage collection across processors. When other Goroutines find that they need to trigger garbage collection, they need to notify the main Goroutine responsible for changing the state. However, this notification process introduces a delay. The delay window is likely to be uncontrollable, and the user program will allocate a lot of memory for the interface during this time.

V1.6 introduced a decentralized garbage collection coordination mechanism 22, which turned the garbage collector into an explicit state machine. Any Goroutine can call methods to trigger state migration. Common state migration methods include the following

  • runtime.gcStart— from_GCoffConversion to_GCmarkPhase, enter the concurrent marking phase and open the write barrier;
  • Runtime gcMarkDone – if all of objects have been completed scan, calls the runtime. GcMarkTermination;
  • runtime.gcMarkTermination— from_GCmarkconversion_GCmarkterminationPhase, enters the mark termination phase and enters after completion_GCoff;

The above three methods were introduced in the runtime: Replace GC Coordinator with State Machine commitment-related issues, removing the formerly centralized state migration process.

Recycle heap target

STW garbage collectors need to suspend the program, but it can effectively control the size of the heap memory, the default configuration will Go language runtime in garbage collection of heap memory at last 2 times, trigger a new round of garbage collection, this behavior can be through the environment variable GOGC adjustment, by default, its value is 100, A 100% increase in heap memory is required to trigger GC.

Figure 7-33 Garbage collection time for the STW garbage collector

Because of concurrent garbage collector will run together with the program, so it can’t accurately control the size of the heap memory, concurrent collector need before reaching the target triggers garbage collection, so to ensure that the memory size of the controllable, concurrent collector need to ensure that at the end of the garbage collection of heap memory as much as possible in line with the user configuration GOGC.

Figure 7-34 Heap memory for the concurrent collector

The Go language V1.5 introduced concurrent garbage collectors while using the garbage collection stepping (Pacing) algorithm to calculate the optimal time for triggered garbage collection, ensuring that the triggered time neither wastes computational resources nor exceeds the expected heap size. As shown in the above, including the black part is the last time the tag after garbage collection of heap size, green part is the last time the garbage collection after the new allocated memory, because we are using concurrent garbage collection, so the yellow part is the memory allocated during garbage collection, the last of the red part is the end of the garbage collection with goal difference, We want to minimize the red portion of memory, reduce the overhead of garbage collection and the pause time of the program.

The garbage collection stepping algorithm was introduced along with V1.5 with the goal of optimizing the heap growth rate and the CPU utilization of the garbage collector 23. In V1.10, the algorithm was optimized by splitting the original target heap size into two targets 24, soft and hard. Since adjusting the frequency of garbage collection involves complex formulas that are of limited help in understanding the principles of garbage collection, this section will not cover them, but readers who are interested can read for themselves.

Hybrid write barrier

Prior to the V1.7 version of the Go language, the runtime used Dijkstra insert write barriers to ensure strong tri-color immutability, but the runtime did not enable insert write barriers on all garbage collection root objects. Because Go applications can contain hundreds or thousands of Goroutines, and garbage collection root objects typically include global variables and stack objects, it would be a huge overhead to have write barriers on hundreds of goroutines on the stack at runtime. So the Go team implemented the option of pausing the program at the end of the marking phase, graying all stack objects, and rescanning, a process that would take 10 to 100ms in applications with a lot of active goroutines.

Go in V1.8 combined Dijkstra insert write barrier and Yuasa delete write barrier to form a hybrid write barrier as shown below, which greases overwritten objects and greases new objects when the current stack is not scanned:

writePointer(slot, ptr):
    shade(*slot)
    if current stack is grey:
        shade(ptr)
    *slot = ptr
Copy the code

In order to remove the stack of the scanning process, in addition to introduction of mixed write barriers, mark phase in garbage collection, we also need to create all new objects are marked as black, prevent new distribution of the objects in the stack memory and heap memory is recycling, wrongly, because mark phase is present in the stack will eventually become black, so no longer need to scan the stack space.

7.2.3 Implementation Principles

Before introducing the evolution of the garbage collector, we need to have some preliminary understanding of the execution cycle of the latest garbage collector, which will be helpful in understanding its overall design. The Go language garbage collection can be divided into four distinct stages of sweep termination, tag, tag termination, and sweep, which accomplish different tasks 25:

Figure 7-35 Garbage collection phases

  1. Clearing termination stage;
    1. Pause the program, at which point all processors enter a Safe point.
    2. If the current garbage collection cycle is forcibly triggered, we also need to deal with memory management cells that have not been cleaned up;
  2. Marking stage;
    1. Switch the state to_GCmarkTo enable write barriers, Mutator Assiste and add root objects to the team.
    2. The recovery executor, the marking process and the assisting user program will start marking objects in memory concurrently, the write barrier will mark both overwritten and new Pointers in gray, and all newly created objects will be directly marked in black;
    3. Start scanning the root object, including all Goroutine stacks, global objects, and runtime data structures not in the heap, pausing the current processor while scanning the Goroutine stack;
    4. Process the objects in the gray queue in turn, marking the objects black and the objects they point to gray;
    5. A distributed termination algorithm is used to check the remaining work, and it is found that the marking termination stage is entered after the marking stage is completed.
  3. Mark termination stage;
    1. To pause, to switch the status to_GCmarkterminationAnd close the user program for auxiliary tags;
    2. Clear the thread cache on the processor
  4. Cleaning stage;
    1. Switch the state to_GCoffStart the cleanup phase, initialize the cleanup state and close the write barrier;
    2. Restore user program, all newly created objects will be marked white;
    3. The background clears all memory management units concurrently, and when Goroutine requests a new memory management unit, the clearing is triggered.

In the runtime, only _GCoff, _GCmark and _GCmarktermination are used to represent all stages of garbage collection, but the implementation is complicated. This section will analyze the implementation principle in detail according to different stages of garbage collection.

The global variable

There are some important global variables in garbage collection, and we will introduce them one by one before we analyze the process. These variables are repeated in each phase of garbage collection, so it is important to understand their functions. We will introduce some simple variables first:

  • runtime.gcphaseIs the current phase of the garbage collector_GCoff,_GCmark_GCmarktermination, the Goroutine needs to be atomic when reading or modifying this phase;
  • Runtime. gcBlackenEnabled is a Boolean value that is set to 1 when garbage collection is in the tagging phase, where user programs assisting garbage collection and background tagging tasks can blacken objects;
  • Runtime. gcController implements a step-down algorithm for garbage collection, which determines when to trigger a parallel garbage collection and what work to do;
  • Runtime. gcpercent is the percentage of memory increase that triggers a garbage collection. By default, it is 100, meaning that GC should be triggered when the heap memory has increased by 100% since the last garbage collection.
  • runtime.writeBarrierIs a structure that contains the write barrier state, whereenabledField indicates whether write barrier is enabled or disabled.
  • Runtime. worldsema is a global semaphore, and the thread that acquired it has the right to suspend the current application.

In addition to the above global variables, we need to take a brief look at the runtime.work variable here:

var work struct {
	full  lfstack
	empty lfstack
	pad0  cpu.CacheLinePad

	wbufSpans struct {
		lock mutex
		free mSpanList
		busy mSpanList
	}
	...
	nproc  uint32
	tstart int64
	nwait  uint32
	ndone  uint32
	...
	mode gcMode
	cycles uint32
	...
	stwprocs, maxprocs int32
	...
}
Copy the code

This structure contains a number of fields related to garbage collection, such as the number of garbage collection cycles completed, the current cycle time and CPU utilization, the mode of garbage collection, and so on, and we’ll see more fields in this structure in a later section.

trigger

The runtime determines whether garbage collection needs to be triggered using the runtime.gctrigger.test method as shown below. When the basic conditions for triggering garbage collection are met — garbage collection is allowed, the program is not crashed, and the program is not in a garbage collection cycle — this method is checked differently depending on three different ways that garbage collection is triggered:

func (t gcTrigger) test() bool { if ! memstats.enablegc || panicking ! = 0 || gcphase ! = _GCoff { return false } switch t.kind { case gcTriggerHeap: return memstats.heap_live >= memstats.gc_trigger case gcTriggerTime: if gcpercent < 0 { return false } lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) return lastgc ! = 0 && t.now-lastgc > forcegcperiod case gcTriggerCycle: return int32(t.n-work.cycles) > 0 } return true }Copy the code
  1. gcTriggerHeap– The allocation of heap memory reaches the trigger heap size calculated by the controller;
  2. gcTriggerTime– If it is not triggered within a certain period of time, a new loop is triggeredruntime.forcegcperiodVariable control, default is 2 minutes;
  3. gcTriggerCycle– If garbage collection is not currently enabled, a new cycle is triggered;

The method used to start garbage collection runtime.gcStart receives a predicate of type Runtime. gcTrigger. We can find the code for all triggered garbage collection based on the structure that triggers _GCoff exit:

  • Runtime. sysmon and Runtime. forcegchelper – Background running periodic checks and garbage collection;
  • Runtime.gc – User program triggers garbage collection manually;
  • Runtime. mallocGC – Memory request triggers garbage collection based on heap size;

Figure 7-36 Garbage collection trigger

In addition to using the system monitor running in the background and the mandatory garbage collector to trigger garbage collection, two other methods can trigger garbage collection from any processor. This method, which does not require central component coordination, was introduced in V1.6. We will expand on these three different trigger times.

The background to trigger

The runtime opens a Goroutine in the background to force a garbage collection on application startup. The job of this Goroutine is very simple – call the Runtime.gcstart method to try to start a new garbage collection:

func init() {
	go forcegchelper()
}

func forcegchelper() {
	forcegc.g = getg()
	for {
		lock(&forcegc.lock)
		atomic.Store(&forcegc.idle, 1)
		goparkunlock(&forcegc.lock, waitReasonForceGGIdle, traceEvGoBlock, 1)
		gcStart(gcTrigger{kind: gcTriggerTime, now: nanotime()})
	}
}
Copy the code

In order to reduce the usage of computing resources, the Goroutine will call Runtime. goparkunlock in the loop and hibernate to wait for the other Goroutine to wake up. Runtime. forcegChelper will hibernate most of the time. But it will be woken up by the system monitor Runtime.sysmon when garbage collection conditions are met:

func sysmon() { ... for { ... if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) ! = 0 { lock(&forcegc.lock) forcegc.idle = 0 var list gList list.push(forcegc.g) injectglist(&list) unlock(&forcegc.lock) }}}Copy the code

In each loop, the system monitor proactively builds a Runtime. gcTrigger and checks whether the trigger conditions for garbage collection are met. If the conditions are met, the system monitor adds the Goroutine held in the Runtime. forcegc state to the global queue waiting for the scheduler.

Manual trigger

User programs actively notify runtime execution during program execution through the runtime.GC function, which blocks the caller until the garbage collection cycle is complete, and may pause the entire program through STW during garbage collection:

func GC() { n := atomic.Load(&work.cycles) gcWaitOnMark(n) gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) gcWaitOnMark(n + 1) for atomic.Load(&work.cycles) == n+1 && sweepone() ! = ^uintptr(0) { sweep.nbgsweep++ Gosched() } for atomic.Load(&work.cycles) == n+1 && atomic.Load(&mheap_.sweepers) ! = 0 { Gosched() } mp := acquirem() cycle := atomic.Load(&work.cycles) if cycle == n+1 || (gcphase == _GCmark && cycle ==  n+2) { mProf_PostSweep() } releasem(mp) }Copy the code
  1. Before garbage collection begins, the runtime waits for the tag termination, tag, and tag termination phases of the previous cycle to complete through the runtime.gcWaitOnMark function;
  2. Call Runtime. gcStart to trigger a new garbage collection and wait through Runtime. gcWaitOnMark for the marked termination phase of the garbage collection to end normally;
  3. Runtime. sweepone continuously cleans up all pending MMS and waits for all cleanup work to complete. During this wait, runtime.Gosched is called to release the processor.
  4. After the garbage collection is completed, runtime.mProf_PostSweep is used to publish a snapshot of the heap status.

Manually triggering the garbage collection process is not very common and is usually only seen in test code at run time, but it can be invoked directly if we think it is necessary to trigger an active garbage collection, although the authors do not consider this a recommended practice.

Application memory

The last thing that can trigger garbage collection is the runtime.mallocgc function, which at runtime divides objects on the heap into microobjects, small objects, and large objects by size, as we saw in the last memory allocator section. The creation of each of these three objects can trigger a new garbage collection cycle:

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { shouldhelpgc := false ... if size <= maxSmallSize { if noscan && size < maxTinySize { ... v := nextFreeFast(span) if v == 0 { v, _, shouldhelpgc = c.nextFree(tinySpanClass) } ... } else { ... v := nextFreeFast(span) if v == 0 { v, span, shouldhelpgc = c.nextFree(spc) } ... } } else { shouldhelpgc = true ... }... if shouldhelpgc { if t := (gcTrigger{kind: gcTriggerHeap}); t.test() { gcStart(t) } } return x }Copy the code
  1. When there is no free space in the memory snap-in of the current thread, creating miniobjects and miniobjects requires calling runtime.mcache. NextFree to fetch a new snap-in from the central cache or page heap, which may trigger garbage collection.
  2. When a user application requests to allocate large objects larger than 32KB, the runtime.gcTrigger structure must be built to try to trigger garbage collection.

To trigger a garbage collection by heap memory, you need to compare two fields in Runtime. mSTATS – heAP_LIVE, which represents the number of bytes of the live object in the garbage collection, and gc_TRIGGER, which represents the heap memory size of the trigger token; A new garbage collection begins when the number of bytes of objects alive in memory exceeds the heap size that triggers the garbage collection. Here, we will introduce the calculation process of these two values respectively:

  1. heap_live– To reduce lock contention, the runtime updates only when the central cache is allocated or released and large objects are allocated on the heap;
  2. gc_trigger– Called during the tag termination phaseruntime.gcSetTriggerRatioUpdate the heap size that triggers the next garbage collection;

Runtime. GcController trigger at the end of each loop calculation ratio. And through the runtime gc_trigger gcSetTriggerRatio setting, it can determine triggers garbage collection time, and the user program and background processing mark how many of the tasks, Algorithms using feedback control determine the timing of garbage collection based on heap growth and GARBAGE collection CPU utilization.

You can be in the runtime. GcControllerState. EndCycle method found in 26 v1.5 garbage collection tuning step algorithm, And in the runtime. GcControllerState. Revise method found in 27 v1.10 into the pile of hard and soft targets of separation algorithm.

Garbage collection startup

The runtime.gcStart function is always called during the garbage collection startup process. Although the implementation of the runtime.gcStart function is complicated, its main responsibility is to modify the global garbage collection state to _GCmark and do some preparatory work.

  1. The runtime.gctrigger. Test method is called twice to check whether the garbage collection criteria are met;
  2. Suspend the program, start the working Goroutine for processing the marking task in the background, make sure all memory management units are cleaned up, and other preparations before the marking phase begins;
  3. Enter the marking stage, prepare background marking work, root object marking work and micro object, restore user program, enter the concurrent scanning and marking stage;

While verifying the garbage collection condition, the runtime.sweepone is called repeatedly in the loop to clean up the memory unit that has been marked, completing the last garbage collection cycle:

func gcStart(trigger gcTrigger) { for trigger.test() && sweepone() ! = ^uintptr(0) { sweep.nbgsweep++ } semacquire(&work.startSema) if ! trigger.test() { semrelease(&work.startSema) return } ... }Copy the code

After verifying the garbage collection conditions and putting the finishing touches on them, This method will be through semacquire access to global worldsema semaphore, invoke the runtime. GcBgMarkStartWorkers starting background markers task, the stack in the system calls the runtime. StopTheWorldWithSema Pause the program and call Runtime. finishSweep_m to ensure that the last memory cell is properly collected:

func gcStart(trigger gcTrigger) {
	...
	semacquire(&worldsema)
	gcBgMarkStartWorkers()
	work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
	...

	systemstack(stopTheWorldWithSema)
	systemstack(func() {
		finishsweep_m()
	})

	work.cycles++
	gcController.startCycle()
	...
}
Copy the code

In addition, the above procedure modifs the state held by the global runtime.work variable, including the number of goroutines required for garbage collection and the number of loops completed.

After all the preparatory work is done, the method enters the final phase of execution. In this phase, we modify the global garbage collection state to _GCmark and perform the following steps in sequence:

  1. Call runtime.gcBgMarkPrepare to initialize the required status of the background scan.
  2. Call runtime. GcMarkRootPrepare function on scanning stack, global variables, such as the root object and putting them to join the queue;
  3. Set the global variable Runtime. gcBlackenEnabled so that user programs and tag tasks can blacken objects;
  4. Call runtime. StartTheWorldWithSema initiator, background tasks will also start tag in the heap object;
func gcStart(trigger gcTrigger) {
	...
	setGCPhase(_GCmark)

	gcBgMarkPrepare()
	gcMarkRootPrepare()

	atomic.Store(&gcBlackenEnabled, 1)
	systemstack(func() {
		now = startTheWorldWithSema(trace.enabled)
		work.pauseNS += now - work.pauseStart
		work.tMark = now
	})
	semrelease(&work.startSema)
}
Copy the code

In analyzing the garbage collection startup process, we have omitted several key processes, including suspending and resuming the application and starting background tasks, which are detailed below.

Pause and resume programs

Runtime. StopTheWorldWithSema and runtime. StartTheWorldWithSema is a pair of to suspend and resume the core function of the program, they have the function of the opposite, but the application of suspension could be back in a bit more complicated than, Let’s see how the former works:

func stopTheWorldWithSema() {
	_g_ := getg()
	sched.stopwait = gomaxprocs
	atomic.Store(&sched.gcwaiting, 1)
	preemptall()
	_g_.m.p.ptr().status = _Pgcstop
	sched.stopwait--
	for _, p := range allp {
		s := p.status
		if s == _Psyscall && atomic.Cas(&p.status, s, _Pgcstop) {
			p.syscalltick++
			sched.stopwait--
		}
	}
	for {
		p := pidleget()
		if p == nil {
			break
		}
		p.status = _Pgcstop
		sched.stopwait--
	}
	wait := sched.stopwait > 0
	if wait {
		for {
			if notetsleep(&sched.stopnote, 100*1000) {
				noteclear(&sched.stopnote)
				break
			}
			preemptall()
		}
	}
}
Copy the code

The pause program mainly uses runtime.preemptall, which calls runtime.preemptone, which we described earlier, because the maximum number of active processes in the program is gomaxprocs, So the runtime. StopTheWorldWithSema in every time when he found the processor of the stop to the variable minus one, stop running until all of the processor. This function stops the current handler, waits for the system call handler, and grabs and preempts the idle handler. The handler’s state is updated to _Pgcstop when the function returns, waiting for the garbage collector to wake up.

Application recovery process will use the runtime startTheWorldWithSema, the implementation of this function is relatively simple:

  1. Call runtime.netpoll to fetch pending tasks from the network poller and add them to the global queue;
  2. Call Runtime. procreSize to expand or shrink the global processor.
  3. Call Runtime. notewakeup or Runtime. newm in turn to wakeup the processor or create a new thread for the processor;
  4. If the number of goroutines currently waiting to be processed is too high, create additional processors to help complete the task.
func startTheWorldWithSema(emitTraceEvent bool) int64 { mp := acquirem() if netpollinited() { list := netpoll(0) injectglist(&list) } procs := gomaxprocs p1 := procresize(procs) sched.gcwaiting = 0 ... for p1 ! = nil { p := p1 p1 = p1.link.ptr() if p.m ! = 0 { mp := p.m.ptr() p.m = 0 mp.nextp.set(p) notewakeup(&mp.park) } else { newm(nil, p) } } if atomic.Load(&sched.npidle) ! = 0 && atomic.Load(&sched.nmspinning) == 0 { wakep() } ... }Copy the code

The pause program preemptall preempts all the processors, and the recovery program wakes up the processors with Runtime. notewakeup or Runtime. newm.

Background tag mode

During the garbage collection to start the runtime call runtime. GcBgMarkStartWorkers create used to perform the background for global each processor markup task Goroutine, run every Goroutine runtime. GcBgMarkWorker, All goroutines running runtime.gcBgMarkWorker fall into sleep after startup waiting for the scheduler to wake up:

func gcBgMarkStartWorkers() {
	for _, p := range allp {
		if p.gcBgMarkWorker == 0 {
			go gcBgMarkWorker(p)
			notetsleepg(&work.bgMarkReady, -1)
			noteclear(&work.bgMarkReady)
		}
	}
}
Copy the code

The runtime. finDRUNnable function executes the Goroutine assisted concurrent object tags on the current processor when garbage collection is in the tagging phase and the current processor does not need to do any work:

Figure 7-37 Processor and background tag tasks

Scheduler runtime in the scheduling cycle. The schedule can also through the garbage collection in the runtime of the controller. The gcControllerState. FindRunnabledGCWorker method to capture and tag for the background task execution.

The working coroutine used to scan objects concurrently has three different modes runtime.gcMarkworkerMode. The three different modes of Goroutine use completely different strategies when marking objects. The garbage collection controller executes different types of working coroutines as needed:

  • gcMarkWorkerDedicatedMode— The processor is responsible for marking the object and will not be preempted by the scheduler;
  • gcMarkWorkerFractionalMode– Starting this type of work coroutine helps the garbage collection reach its utilization goal when the background CPU usage of the garbage collection is not as expected (25% by default), because it occupies only part of the same CPU and can therefore be scheduled;
  • gcMarkWorkerIdleMode– When the processor has no Goroutine to execute, it runs the garbage collection marker task until it is preempted;

The runtime. GcControllerState. StartCycle will according to the number of global processor and garbage collection dedicatedMarkWorkersNeeded and calculated the CPU utilization of the above FractionalUtilizationGoal to determine the number of different patterns of work coroutines.

Since the CPU utilization of background tagging tasks is 25%, if the host is 4-core or 8-core, garbage collection requires one or two Goroutines dedicated to the related tasks; However, if the host has three or six cores, which are not divisible by four, then you need zero or one Goroutine dedicated to garbage collection, which takes up part of the CPU’s time at runtime. Using coroutines gcMarkWorkerFractionalMode mode to ensure that the CPU utilization.

Figure 7-38 Number of host cores and garbage collection task mode

Garbage collection controller will at runtime. GcControllerState. GcMarkWorkerMode findRunnabledGCWorker method of setting the processor:

func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
	...
	if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
		_p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
	} else if c.fractionalUtilizationGoal == 0 {
		return nil
	} else {
		delta := nanotime() - gcController.markStartTime
		if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal {
			return nil
		}
		_p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode
	}

	gp := _p_.gcBgMarkWorker.ptr()
	casgstatus(gp, _Gwaiting, _Grunnable)
	return gp
}
Copy the code

The implementation of the above method is relatively clear, Controller through dedicatedMarkWorkersNeeded decided specifically performs the task of tag Goroutine quantity and according to the time and total time of tag mission to decide whether to start gcMarkWorkerFractionalMode mode Goroutine. In addition to the work coroutines required by both controllers, the scheduler speeds up the process by using idle processors to perform garbage collection in runtime.findrunnable:

func findrunnable() (gp *g, inheritTime bool) { ... stop: if gcBlackenEnabled ! = 0 && _p_.gcBgMarkWorker ! = 0 && gcMarkWorkAvailable(_p_) { _p_.gcMarkWorkerMode = gcMarkWorkerIdleMode gp := _p_.gcBgMarkWorker.ptr() casgstatus(gp, _Gwaiting, _Grunnable) return gp, false } ... }Copy the code

The three working coroutines work together to ensure that the CPU utilization of garbage collection reaches the desired threshold and the marking task is completed before the target heap size is reached.

Concurrent scanning with tag assist

Runtime. gcBgMarkWorker is a function executed by the background marking task. In the loop of this function, it scans and marks the object graph in memory.

  1. Get the current processor as well as the Goroutine packageparkInfoType of structure and actively fall into hibernation waiting to wake up;
  2. According to the processorgcMarkWorkerModeMode determines the scan task policy;
  3. When all marking tasks are complete, the runtime.gcMarkDone method is called to complete the marking phase.

The runtime creates a parkInfo structure that prestores the processor and the current Goroutine. When we call Runtime. gopark to trigger sleep, The runtime securely binds the processor to the background tag task in the system stack:

func gcBgMarkWorker(_p_ *p) { gp := getg() type parkInfo struct { m muintptr attach puintptr } park := new(parkInfo) park.m.set(acquirem()) park.attach.set(_p_) notewakeup(&work.bgMarkReady) for { gopark(func(g *g, parkp unsafe.Pointer) bool { park := (*parkInfo)(parkp) releasem(park.m.ptr()) if park.attach ! = 0 { p := park.attach.ptr() park.attach.set(nil) if ! p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) { return false } } return true }, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0) ... }Copy the code

A Goroutine that goes to sleep via Runtime. gopark does not enter the run queue. It only waits for the garbage collector or scheduler to wake it up directly. Upon awakening, we select a different marking execution strategy based on the processor gcMarkWorkerMode, which calls Runtime. gcDrain to scan the workbuffer runtime.gcWork:

if _p_.gcBgMarkWorker.ptr() ! = gp { break } park.m.set(acquirem()) atomic.Xadd(&work.nwait, -1) systemstack(func() { casgstatus(gp, _Grunning, _Gwaiting) switch _p_.gcMarkWorkerMode { case gcMarkWorkerDedicatedMode: gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit) if gp.preempt { lock(&sched.lock) for { gp, _ := runqget(_p_) if gp == nil { break } globrunqput(gp) } unlock(&sched.lock) } gcDrain(&_p_.gcw, gcDrainFlushBgCredit) case gcMarkWorkerFractionalMode: gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit) case gcMarkWorkerIdleMode: gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit) } casgstatus(gp, _Gwaiting, _Grunning) }) incnwait := atomic.Xadd(&work.nwait, +1)Copy the code

It is important to note that the task cannot be preempted gcMarkWorkerDedicatedMode mode, in order to reduce the overhead, the first call to the runtime. GcDrain method is allowed to grab, but once be preempted the processor, The current Goroutine will move all runnable Goroutines on the processor to the global queue, keeping the CPU resources occupied by garbage collection. When all background work tasks are stuck waiting and there is no work left, we consider the marking phase of garbage collection to be over and call runtime.gcmarkdone:

if incnwait == work.nproc && ! gcMarkWorkAvailable(nil) { _p_.gcBgMarkWorker.set(nil) releasem(park.m.ptr()) gcMarkDone() park.m.set(acquirem()) park.attach.set(_p_) } } }Copy the code

Runtime.gcdrain is the core method used to scan and mark objects in heap memory, along with working pools, write barriers, and marker-assist implementations.

Work pool

When the runtime.gcDrain function is called, the runtime passes runtime.gcWork on the processor. This structure is an abstraction of the working pool in the garbage collector, which implements a producer and consumer model.

Figure 7-39 Garbage collector working pool

Write barriers, root object scanning, and stack scanning all add additional gray objects to the work pool to be processed, and the object scanning process marks gray objects as black and may find new gray objects. When no gray objects are in the work queue, the whole scanning process ends.

In order to reduce lock contention, the runtime will save independent in each processor for scanning, however this will meet with the same problem – different processor resource scheduler is not average, cause the processor with nothing to do, the scheduler work stealing is introduced to solve this problem, the garbage collector also USES the same mechanism to balance the to-do tasks on different processors.

Figure 7-40 Global and local tasks

The Runtime.gcwork. balance method puts some of the processor’s local work back into the global queue for other processors to handle, ensuring load balance between different processors.

Runtime.gcwork provides the garbage collector with an abstraction of the production and consumption tasks. This structure holds two important work buffers, wbuF1 and wbuf2, which are the primary and standby buffers respectively:

type gcWork struct { wbuf1, wbuf2 *workbuf ... } type workbufhdr struct { node lfnode // must be first nobj int } type workbuf struct { workbufhdr obj [(_WorkbufSize -  unsafe.Sizeof(workbufhdr{})) / sys.PtrSize]uintptr }Copy the code

When we add or delete objects to the structure, it always operates on the primary buffer first. If the primary buffer runs out of space or has no objects, it triggers a switch between the primary and secondary buffers. When both buffer Spaces are insufficient or empty, objects are inserted or retrieved from the global working buffer. The implementation of structure-related methods is too simple to be analyzed here.

Scanning the object

The runtime uses the Runtime. gcDrain function to scan for grey objects in the work buffer, which selects a different strategy depending on which gcDrainFlags are passed in:

func gcDrain(gcw *gcWork, flags gcDrainFlags) { gp := getg().m.curg preemptible := flags&gcDrainUntilPreempt ! = 0 flushBgCredit := flags&gcDrainFlushBgCredit ! = 0 idle := flags&gcDrainIdle ! = 0 initScanWork := gcw.scanWork checkWork := int64(1<<63 - 1) var check func() bool if flags&(gcDrainIdle|gcDrainFractional) ! = 0 { checkWork = initScanWork + drainCheckThreshold if idle { check = pollWork } else if flags&gcDrainFractional ! = 0 { check = pollFractionalWorkerExit } } ... }Copy the code
  • gcDrainUntilPreempt– when GoroutinepreemptReturns when the field is set to true;
  • gcDrainIdle– callruntime.pollWorkFunction that returns when the processor contains other Goroutine to execute;
  • gcDrainFractional– callruntime.pollFractionalWorkerExitFunction when the CPU usage exceedsfractionalUtilizationGoal20% return;
  • gcDrainFlushBgCredit– callruntime.gcFlushBgCreditCalculate the amount of tagging tasks done in the background to reduce the workload of user programs assisting garbage collection during concurrent tagging;

The runtime uses the check function in the local variable to check if it is time to exit the marking task and relinquish the handler. Once we have done our homework, we can start scanning the root object in the global variable, which is the first task to be performed in the tagging phase:

func gcDrain(gcw *gcWork, flags gcDrainFlags) { ... if work.markrootNext < work.markrootJobs { for ! (preemptible && gp.preempt) { job := atomic.Xadd(&work.markrootNext, +1) - 1 if job >= work.markrootJobs { break } markroot(gcw, job) if check ! = nil && check() { goto done } } } ... }Copy the code

The runtime.markroot function scans the cache, the data segment, the BSS segment that holds global and static variables, and the Goroutine stack memory. Once the root object is scanned, the current Goroutine starts fetching tasks to be executed from the local and global working cache pools:

func gcDrain(gcw *gcWork, flags gcDrainFlags) { ... for ! (preemptible && gp.preempt) { if work.full == 0 { gcw.balance() } b := gcw.tryGetFast() if b == 0 { b = gcw.tryGet() if b == 0 { wbBufFlush(nil, 0) b = gcw.tryGet() } } if b == 0 { break } scanobject(b, gcw) if gcw.scanWork >= gcCreditSlack { atomic.Xaddint64(&gcController.scanWork, gcw.scanWork) if flushBgCredit { gcFlushBgCredit(gcw.scanWork - initScanWork) initScanWork = 0 } checkWork -= gcw.scanWork gcw.scanWork = 0 if checkWork <= 0 { checkWork += drainCheckThreshold if check ! = nil && check() { break } } } } ... }Copy the code

Runtime. scanObject is used for scanning objects. This function starts scanning from the location passed in and calls Runtime. greyObject to color the active objects found during the scan.

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
	...
done:
	if gcw.scanWork > 0 {
		atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
		if flushBgCredit {
			gcFlushBgCredit(gcw.scanWork - initScanWork)
		}
		gcw.scanWork = 0
	}
}
Copy the code

When the scan of this round is interrupted due to changes in external conditions, the runtime.gcFlushBgCredit records the number of bytes in memory for the scan to reduce the workload of auxiliary markers.

The process of scanning and marking objects in memory involves a lot of bit operations and pointer operations. The code implementation is complicated and will not be described here. Interested readers can use Runtime.gcdrain as an entry point to explore the process of tricolor marking.

Write barriers

Write barriers are a technique to ensure the security and unfetchability of concurrent markup in Go language. We need to use hybrid write barriers to maintain weak tricolor invariance of object graph. However, the implementation of write barriers requires the cooperation of compiler and runtime. In the middle of the SSA code generation phase, the compiler will use the CMD/compile/internal/SSA writebarrier function in the Store, Move and Zero operation to add write barriers, generate the code shown below:

if writeBarrier.enabled {
  gcWriteBarrier(ptr, val)
} else {
  *ptr = val
}
Copy the code

When Go enters the gc phase, the runtime.writeBarrier field is set to enabled, and all write operations are called runtime.gcWriteBarrier:

The TEXT, the runtime gcWriteBarrier (SB), NOSPLIT, $28... get_tls(BX) MOVL g(BX), BX MOVL g_m(BX), BX MOVL m_p(BX), BX MOVL (p_wbBuf+wbBuf_next)(BX), CX LEAL 8(CX), CX MOVL CX, (p_wbBuf+wbBuf_next)(BX) CMPL CX, (p_wbBuf+wbBuf_end)(BX) MOVL AX, -8(CX) -4(CX) // Record * Slot JEQ Flush ret: MOVL 20(SP), CX MOVL 24(SP), BX MOVL AX, (DI) // Trigger write ret flush:... CALL the runtime, wbBufFlush (SB)... JMP retCopy the code

In the assembler function above, the DI register is the destination address of the write operation, and the AX register stores the overwritten values. This function overwrites the original values and notifies the garbage collector via Runtime.wbbufflush to add the original and new values to the current processor’s work queue, since the implementation of this write barrier is complicated. So write barriers still have a significant impact on program performance, requiring dozens of instructions to do a job that once required only one instruction.

We mentioned above that the hybrid Dijkstra and Yuasa write barriers require all newly created objects to be painted black when enabled, and the marking process is done by runtime.gcMarkNewObject:

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { ... if gcphase ! = _GCoff { gcmarknewobject(uintptr(x), size, scanSize) } ... } func gcmarknewobject(obj, size, scanSize uintptr) { markBitsForAddr(obj).setMarked() gcw := &getg().m.p.ptr().gcw gcw.bytesMarked += uint64(size) gcw.scanWork += int64(scanSize) }Copy the code

Runtime. Mallocgc will call this function after the start of the garbage collection, object to obtain the corresponding memory cell and mark a runtime. MarkBits and calls the runtime. MarkBits. SetMarked directly new object will be painted black.

Marker assisted

In order to ensure that the user program does not allocate memory faster than the marking speed of background tasks, the runtime also introduces the tagging assist technology, which follows a very simple and unsophisticated principle that as much memory is allocated, as many tagging tasks should be completed. Each Goroutine holds the gcAssistBytes field, which stores the number of object bytes for the current Goroutine auxiliary tag. During the concurrent marking phase, when the Goroutine calls Runtime. mallocgc to allocate a new object, this function checks whether the Goroutine that requested memory is running out of money:

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { ... var assistG *g if gcBlackenEnabled ! = 0 { assistG = getg() if assistG.m.curg ! = nil { assistG = assistG.m.curg } assistG.gcAssistBytes -= int64(size) if assistG.gcAssistBytes < 0 { gcAssistAlloc(assistG) } } ... return x }Copy the code

The runtime.gcAssistAlloc and runtime.gcFlushBgCredit are called to “borrow” and “pay”, respectively. We were able to ensure that Goroutine was running without too much stress on garbage collection and that the marking phase was completed when the heap size target was reached.

Figure 7-41 Dynamic balance of auxiliary markers

The gcAssistBytes held by each Goroutine represent the number of bytes of the current coroutine helper tag. The bgScanCredit held by the global garbage Collection controller represents the number of bytes of the backend coroutine helper tag. When the local Goroutine allocates more objects, Repayment can be made using public credit, bgScanCredit. GcAssistAlloc (runtime.gcassistalloc);

func gcAssistAlloc(gp *g) { ... retry: debtBytes := -gp.gcAssistBytes scanWork := int64(gcController.assistWorkPerByte * float64(debtBytes)) if scanWork < gcOverAssistWork { scanWork = gcOverAssistWork debtBytes = int64(gcController.assistBytesPerWork * float64(scanWork)) } bgScanCredit := atomic.Loadint64(&gcController.bgScanCredit) stolen := int64(0) if bgScanCredit > 0 { if bgScanCredit < scanWork { stolen = bgScanCredit gp.gcAssistBytes += 1 + int64(gcController.assistBytesPerWork*float64(stolen)) } else {  stolen = scanWork gp.gcAssistBytes += debtBytes } atomic.Xaddint64(&gcController.bgScanCredit, -stolen) scanWork -= stolen if scanWork == 0 { return } } ... }Copy the code

This function first calculates the number of mark tasks to be completed based on Goroutine’s gcAssistBytes and the configuration of the garbage collection controller. If there are points available in the global bgScanCredit, the points are subtracted because concurrent execution is not locked. So the global credit may be updated to negative, but in the long run this is not a significant issue.

If the global credit does not cover the local debt, the runtime calls Runtime.gcassistalloc1 in the system stack to perform a marked task. This function directly calls Runtime.gcdrainn to perform a specified number of marked tasks and returns:

func gcAssistAlloc(gp *g) { ... systemstack(func() { gcAssistAlloc1(gp, scanWork) }) ... if gp.gcAssistBytes < 0 { if gp.preempt { Gosched() goto retry } if ! gcParkAssist() { goto retry } } }Copy the code

Runtime.gcparkassist is executed if the current Goroutine is still out of money and the Goroutine is not preempted after the mark assist task is completed. In this function, if the global credit is still insufficient, Runtime.gcparkAssist puts the current Goroutine to sleep, joins the global secondary tag queue, and waits for the background tag task to wake up.

The runtime.gcFlushBgCredit implementation is simpler. If there is no waiting Goroutine in the secondary queue, the current credit is added directly to the global credit bgScanCredit:

func gcFlushBgCredit(scanWork int64) { if work.assistQueue.q.empty() { atomic.Xaddint64(&gcController.bgScanCredit, scanWork) return } scanBytes := int64(float64(scanWork) * gcController.assistBytesPerWork) for ! work.assistQueue.q.empty() && scanBytes > 0 { gp := work.assistQueue.q.pop() if scanBytes+gp.gcAssistBytes >= 0 { scanBytes += gp.gcAssistBytes gp.gcAssistBytes = 0 ready(gp, 0, false) } else { gp.gcAssistBytes += scanBytes scanBytes = 0 work.assistQueue.q.pushBack(gp) break } } if scanBytes > 0 {  scanWork = int64(float64(scanBytes) * gcController.assistWorkPerByte) atomic.Xaddint64(&gcController.bgScanCredit, scanWork) } }Copy the code

If the secondary queue is not empty, the above function will decide whether to wake up the dormant Goroutine based on the amount of debt and work done for each Goroutine. If all goroutines are woken up and there are still marks left, these marks are added to the global credit.

Figure 7-42 Global and local credit

User program auxiliary marker’s core purpose is to avoid the user program allocates memory affect the expectations of the garbage collector do mark time, it is by maintaining account system ensure that the user program will not cause too much burden to garbage collection, once the user program are assigned a lot of memory, the user program can balance the books by means of auxiliary marker, This process eventually reaches a relative balance, ensuring that the marking task is completed when the desired heap size is reached.

Tag to terminate

When all of the processors’ local tasks are complete and there is no working Goroutine left, the user program of the background concurrent tasks or auxiliary markup calls runtime.gcmarkdone to notify the garbage collector. When all reachable objects are marked, the function switches the garbage collection state to _GCmarktermination; If there are still pending tasks in the local queue, the current method adds all tasks to the global queue and waits for the other Goroutine to complete processing:

func gcMarkDone() { ... top: if ! (gcphase == _GCmark && work.nwait == work.nproc && ! gcMarkWorkAvailable(nil)) { return } gcMarkDoneFlushed = 0 systemstack(func() { gp := getg().m.curg casgstatus(gp, _Grunning, _Gwaiting) forEachP(func(_p_ *p) { wbBufFlush1(_p_) _p_.gcw.dispose() if _p_.gcw.flushedWork { atomic.Xadd(&gcMarkDoneFlushed, 1) _p_.gcw.flushedWork = false } }) casgstatus(gp, _Gwaiting, _Grunning) }) if gcMarkDoneFlushed ! = 0 { goto top } ... }Copy the code

If there are no global tasks in the runtime and no local tasks in the processor, then the gray objects in the current garbage collection cycle are all marked black and we can start to trigger a garbage collection phase migration:

func gcMarkDone() {
	...
	getg().m.preemptoff = "gcing"
	systemstack(stopTheWorldWithSema)

	...

	atomic.Store(&gcBlackenEnabled, 0)
	gcWakeAllAssists()
	schedEnableUser(true)
	nextTriggerRatio := gcController.endCycle()
	gcMarkTermination(nextTriggerRatio)
}
Copy the code

Will disable the function in the final mixing write barriers, wake up all the garbage collection of the user program, restore user Goroutine scheduling and calls the runtime. GcMarkTermination enter the tag end stage:

func gcMarkTermination(nextTriggerRatio float64) {
	atomic.Store(&gcBlackenEnabled, 0)
	setGCPhase(_GCmarktermination)

	_g_ := getg()
	gp := _g_.m.curg
	casgstatus(gp, _Grunning, _Gwaiting)

	systemstack(func() {
		gcMark(startTime)
	})
	systemstack(func() {
		setGCPhase(_GCoff)
		gcSweep(work.mode)
	})
	casgstatus(gp, _Gwaiting, _Grunning)
	gcSetTriggerRatio(nextTriggerRatio)
	wakeScavenger()

	...

	injectglist(&work.sweepWaiters.list)
	systemstack(func() { startTheWorldWithSema(true) })
	prepareFreeWorkbufs()
	systemstack(freeStackSpans)
	systemstack(func() {
		forEachP(func(_p_ *p) {
			_p_.mcache.prepareForSweep()
		})
	})
	...
}
Copy the code

We have omitted a lot of data statistics in the row count function, including the amount of memory being used, the pause time for this round of garbage collection, and CPU utilization, which can help the controller decide the heap size for the next round of garbage collection. In addition to statistics, This function also calls Runtime. gcSweep to reset the state associated with the cleanup phase and blocks cleanup of all memory management units if needed; The _GCmarktermination state does not last very long in garbage collection; it quickly transitions to _GCoff and resumes the application, at which point garbage collection is basically over and the user program lazily collects memory when it claims it.

Memory clean

Garbage collection cleanup includes the object Reclaimer and memory unit collector, which use different algorithms to clean heap memory:

  • The object collector looks for and frees unmarked objects in the memory management cell, but if all objects in runtime.mspan are unmarked, the entire cell is reclaimed directly, The process will be the runtime. McEntral. CacheSpan or runtime. Sweepone asynchronous trigger;
  • The memory unit reclaim will look up runtime.mspan in memory where all objects are not marked. The process will be triggered by Runtime. mhep. reclaim.

Runtime. sweepone is a common function we see in garbage collection that looks for memory management units in heap memory to be cleaned up:

func sweepone() uintptr { ... var s *mspan sg := mheap_.sweepgen for { s = mheap_.sweepSpans[1-sg/2%2].pop() if s == nil { break } if state := s.state.get(); state ! = mSpanInUse { continue } if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) { break } } npages := ^uintptr(0) if s ! = nil { npages = s.npages if s.sweep(false) { atomic.Xadduintptr(&mheap_.reclaimCredit, npages) } else { npages = 0 } } _g_.m.locks-- return npages }Copy the code

The state and Sweepgen fields are used to determine whether the current memory management unit needs to be processed. If the sweepgen of the memory cell is equal to mheap.sweepgen-2, then it means that the current cell needs to be cleaned, and if it is equal to mheap.sweepgen-1, then the current snap-in is being cleaned.

All of the collection is ultimately done by Runtime.mop.sweep, which reclaims garbage from memory cells based on concurrent marking phases and clears the marks so as not to affect the next round of garbage collection.

7.2.4 summary

Go language implementation of the garbage collector is very complex, the author thinks that this is one of the most complicated in programming languages, modules, the complexity of the scheduler and the garbage collector is not entirely a level, we are in the process of analyzing the garbage collector to omit many implementation details, including concurrent tagged objects, the process of cleaning the waste concrete implementation, These procedures design a lot of low-level bit operations and pointer operations, and this section contains links to all the relevant code for interested readers to explore on their own.

Garbage collection is a very old technology, its execution speed and efficiency largely determines the running speed of the program, the language in order to achieve high performance of concurrent garbage collector, using three color abstraction, concurrent incremental recovery, mixed write barriers, pacing algorithms and mechanisms such as the user program to assist the garbage collection pause time optimization below millisecond, From early releases to today, we can see the engineering design and evolution, and the authors find it interesting and worthwhile to analyze the implementation of garbage collection.

7.2.5 Read more

  • Garbage Collection In Go : Part I – Semantics
  • Getting to Go: The Journey of Go’s Garbage Collector
  • The Garbage Collection Handbook
  • Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance
  • Go’s march to low-latency GC
  • GopherCon 2015: Rick Hudson – Go GC: Solving the Latency Problem
  • Go GC: Latency Problem Solved
  • Concurrent garbage collection
  • Design and Implementation of a Comprehensive Real-time Java Virtual Machine

  1. “Garbage Collectors – Serial vs. Parallel vs. CMS vs. G1 (and what’s new in Java 8)” blog.overops.com/garbage-col… ↩ ︎

  2. “An Overview of the Memory Management in Rust” pcwalton. Making. IO / 2013/03/18 /… ↩ ︎

  3. “Objective – C Automatic Reference Counting (ARC)” clang.llvm.org/docs/Automa… ↩ ︎

  4. “Tri – color marking” en.wikipedia.org/wiki/Tracin… ↩ ︎

  5. “Dangling pointer en.wikipedia.org/wiki/Dangli…” ↩ ︎

  6. “Wikpedia: barrier Memory” en.wikipedia.org/wiki/Memory… ↩ ︎

  7. P. P. Pirinen. Barrier Techniques for Incremental tracing. In ACM SIGPLAN, 34(3), 20 — 25, October 1998. Dl.acm.org/doi/10.1145… ↩ ︎

  8. E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. Steffens. On-the-fly garbage collection: An exercise in currency. Communications of the ACM, 21 (11), 966-975, 1978. www.cs.utexas.edu/users/EWD/t… ↩ ︎

  9. Real-time garbage collection on general-purpose Machines. Journal of Systems and Software, 11(3):181 — 198, 1990. www.sciencedirect.com/science/art… ↩ ︎

  10. Paul R Wilson. “Uniprocessor Garbage Collection Techniques” www.cs.cmu.edu/~fp/courses… ↩ ︎

  11. Performance · Go 1.1 Release Notes golang.org/doc/go1.1#p… ↩ ︎

  12. Changes to the garbage collector · Go 1.3 Release Notes golang.org/doc/go1.3#g… ↩ ︎

  13. Go 1.5 concurrent garbage collector pacing docs.google.com/document/d/… ↩ ︎

  14. Austin Clements. 2015. “Proposal: Dense mark bits and sweep – free allocation” go.googlesource.com/proposal/+/… ↩ ︎

  15. Runtime: ShrinkStack during Mark Termination Significantly Increases GC STW time github.com/golang/go/i… ↩ ︎

  16. Proposal: Eliminate STW stack re – scanning go.googlesource.com/proposal/+/… ↩ ︎

  17. Proposal: Eliminate STW stack re – scanning go.googlesource.com/proposal/+/… ↩ ︎

  18. Proposal: Separate the soft and hard heap size goal go.googlesource.com/proposal/+/… ↩ ︎

  19. Proposal: Simplify mark termination and eliminate mark 2 ttps://go.googlesource.com/proposal/+/master/design/26903-simplify-mark-termination.md ↩ ︎

  20. Proposal: Smarter Scavenging go.googlesource.com/proposal/+/… ↩ ︎

  21. Proposal: Scaling the Go page allocator go.googlesource.com/proposal/+/… ↩ ︎

  22. Austin Clements. 2015. “Proposal: Decentralized GC coordination” go.googlesource.com/proposal/+/… ↩ ︎

  23. Go 1.5 concurrent garbage collector pacing docs.google.com/document/d/… ↩ ︎

  24. Proposal: Separate the soft and hard heap size goal go.googlesource.com/proposal/+/… ↩ ︎

  25. SRC/runtime/MGC. Go golang.org/src/runtime… ↩ ︎

  26. Go 1.5 concurrent garbage collector pacing docs.google.com/document/d/… ↩ ︎

  27. Runtime: Separate soft and Hard heap limits github.com/golang/go/c… ↩ ︎