Mark cleaning method

Before version 1.3, GO used the mark sweep method. The core idea is to scan objects and mark the scanned objects. The unscanned objects are garbage and need to be recycled.

  1. Let’s pause the whole program
  2. Scan from the root object, find all references, and mark them
  3. Start removing unmarked garbage
  4. Restore the entire program

The disadvantages of the tag sweep are obvious: STW is required, the program is stopped during this time, the tag scans the entire heap area, and can result in a large amount of debris once it is cleared. When applying for a large object, it is possible to trigger a GC again due to too many fragments.

Go was optimized in version 1.3 to stop STW after marking, allowing the sweep to run in parallel with the user program.

Go1.3 before:

  1. Stop the world
  2. tag
  3. Clean the
  4. Start the world, the program resumes

Go1.3:

  1. Stop the world
  2. tag
  3. Start the world, the program resumes
  4. Clean the

Tricolor marking

Go GC in version 1.5 and later is a tricolor notation. Root set: The root object, the first object checked by the garbage collector. This includes variables that can be determined at compile time for the entire life of the program, variables on each Goroutine execution stack and Pointers to the heap, and certain Pointers to heap memory during execution. White: All objects are white before GC starts. Gray: the objects are being searched. Black: the objects are searched

  1. All objects are white

2. GC starts scanning from the root, makes direct contact with the object, sets it to gray, and puts it into the gray collection

3. Iterate over the gray collection, mark the white object referenced by the gray object as gray and put it into the gray collection, mark itself as black and put it into the black object at the same time

4. Repeat step 3 until all the grays are black, at which point the rest is to be recycled.

5. After GC, all black objects are restored to white. Repeat the steps next time.

Barrier mechanisms

One problem: in step 3 above, if F has a new pointer to H, because F is already black, it will not be scanned, so the associated H object will be collected by GC, then F will inevitably fail to access H. The barrier mechanism can be simply understood as a series of pre-work tasks such as intercepting and verifying objects before they are created or deleted.

In order to keep the GC program and the user program concurrent, it is possible to keep the GC from incorrectly collecting objects that should not be collected as long as any of the three colors are consistent.

  • Strong trichromatic invariance: Black objects never point to white objects
  • Weak trichromatic invariance: The white object to which the black object points contains at least one reachable path from the gray object through the white object

Write barriers

The purpose of the read barrier is to prevent black objects from referring directly to white objects. An object can be created on the stack or on the heap. Because the stack is small and fast, write barriers are not suitable for the stack.

  1. All are white before GC starts
  2. GC marks grey (A, F) from root set
  3. Gray Object The referenced objects are gray (C, G) and turn black (A, F).
  4. Stack A refers to A1, heap F refers to F1, A1 unchanged
  5. Heap area write barrier, F1 is black
  6. No gray objects, stack area STW before preparing GC
  7. The stack area is re-marked to be grey-free
  8. End of stack STW
  9. Clear white objects D and H
  10. All objects return to white, waiting for the next GC

Assuming that in step 4 above, F does not refer to the new F1, but instead changes the reference from G to H, then it becomes:

The idea behind the insertion barrier is to mark all possible surviving objects as gray (H), so that G objects that are already gray are not collected by GC in this round, but only in the next round.

Advantages of the insertion barrier: THE GC can be executed with the user program except for the transient STW in the stack area

Disadvantages of the insert barrier: Stack areas are re-marked for collection and garbage may survive the round (G above)

  1. Initializing GC tasks, including enabling write barriers and mutator assist, and counting the number of tasks on root objects, requires STW.
  2. The Stack Scan phase collects variables from the global space and goroutine Stack space.
  3. Mark stage, perform the above three color marking method until there are no gray objects.
  4. In the Mark termination phase, start STW, re-scan the global pointer and stack. Since Mark and mutator are parallel, there may be new object assignment and pointer assignment during Mark. At this point, they need to be recorded through a write barrier and marked.
  5. In the Sweep phase, STW and write barriers are closed and white objects are cleaned.

Remove barriers

When an object is deleted, in order to satisfy weak trichromatic invariance, that is, to prevent the loss of the reachable path from the gray object to the white object, the deleted object, regardless of whether it is gray or white, is eventually marked gray.

  1. Objects are all white
  2. Gc starts scanning from root set, A and F are gray
  3. A removes A reference to C, triggering A deletion barrier and turning C gray
  4. All of them are black, except for H

Delete barriers, as opposed to insert barriers, do not require a stack scan after GC. However, there is also A disadvantage: C has been deleted by A, why C needs to be marked as gray to avoid the phenomenon of GC cleaning, in fact, this is to meet the weak tricolor invariance, suppose that when A deletes C, F references C, then if C is cleared, F will report an error when reading C. So the C and D rounds of GC will survive, and the next round will be cleared if it is not referenced. As you can see, the deletion barrier has a low reclaim precision. Even if the last pointer to it is deleted, an object can still live one GC before being cleared in the next GC.

Hybrid barrier

Go1.8 introduces hybrid barriers that combine the best of write and delete barriers. Prior to GO1.8, we knew that no write barrier was inserted into the stack space, and when GC was finished, the stack was scanned again, which took 10~100 ms. The mixed barrier logic is as follows:

  • GC starts by marking all objects on the stack black, without STW
  • New objects created on the stack during GC are marked black
  • The deleted downstream objects are marked in gray
  • Downstream objects that are added are marked gray

Scenario 1

  1. All objects touched by the scan stack are black
  2. Stack space A refers to heap space G, and stack space has no insertion barrier, so the color of G remains the same
  3. The heap space removes references from F to G, triggering the delete barrier, and G turns gray

Scenario 2

  1. All objects touched by the scan stack are black
  2. Stack space B, black, A reference to B, direct reference
  3. A B C D
  4. Delete the reference from C to D

STW phases

  • All currently running programs are paused, root nodes in memory are scanned and write barriers are added
  • Processor P (either the processor running code or the processor already in the IDLE list) is marked as stopped and no code is running. The scheduler separates each processor’s M from its corresponding processor P and puts it in the IDLE list. The Goroutine itself, they’re put in a global queue to wait

Memory tag

Golang uses a SPAN data structure to manage memory, in which a memory block is maintained and the allocation of memory blocks is represented by a bitmap allocBits, while gcmarkBits records the reference of each memory block mentioned above.

  • GcmarkBits corresponds to a bit of 1 (black), and this object will not be collected in this GC
  • The gcmarkBits bit is 0 (white), and this object will be cleaned up in the GC

  1. AllocBits records the allocation of each memory. 1: allocated 0: unallocated
  2. GcmarkBits records the marking condition of each piece of memory: 1: marked 0: unmarked
  3. AllocBits and gcmarkBits have exactly the same data structure,
  4. At the end of the tag, point the allocBits to gcmarkBits, and the tagged bits are alive, thus completing the memory reclamation
  5. GcmarkBits reallocates memory at the next markup

Clean up the stage

  1. Start a worker in the background waiting to clear memory, one mSPAN at a time

When the program starts running, Go sets up a background Worker(whose only task is to clean up memory), which goes to sleep and waits for the memory segment to be scanned.

  1. The allocation is executed when a range is required.

This operation is triggered when the application Goroutine tries to allocate new memory in heap memory. Memory segments are already distributed to the local cache McAche of each processor P, so it’s hard to keep track of which memory is cleaned up first. This is why Go moves all memory segments to McEntral in the first place. It will then make the local cache McAche request them again for instant cleanup.

The gc history

  1. Go V1.1: mark-purge method, STW is required for the whole process, STW time may be in the second level
  2. Go V1.3: Mark-sweep, the marking process still requires STW but the Sweep process is parallel (Mark and Sweep separated). Mark STW, Sweep concurrent), STW hundreds of ms
  3. Go V1.4: The Runtime code is basically changed from C and little assembly to GO and little assembly, including GC, to achieve accurate GC, reduce the heap size, and introduce write barriers for pointer writes, for 1.5 STW hundreds of ms
  4. Go V1.5: Introduce the three-color marking method of write barrier insertion technology, only start the write barrier insertion in the heap space, after all scanning, STW is required to rescan the stack space, concurrent Mark and Sweep, STW time is reduced to 10-40ms
  5. Go V1.6: Some changes in V1.5 that are not compatible with concurrent GC. Centralized GC coordination coroutine, change to state machine to achieve STW time reduced to 5-20ms
  6. Go V1.7: In GC, stack shrinkage is changed to concurrency, and the time of object allocation status change from FREEList to bitmap STW in SPAN is reduced to about 1-3ms
  7. Go V1.8: Introduce the tri-color notation of mixed write barrier technology, only start the mixed write barrier in the heap space, there is no need to rescan the stack space after GC, STW time is reduced to about 0.5ms
  8. Go V1.14: New page allocator is introduced to optimize the speed of memory allocation, STW is very short.

When to trigger GC

  1. GcTriggerHeap is triggered when the currently allocated memory reaches a threshold that is adjusted after each GC based on heap memory growth and CPU usage;

The default GC Percentage parameter at runtime is 100. After the last GC session of your program, the remaining memory was 10MB, so the next GC session will be triggered if the heap memory reaches 20MB. If set to 200, GC will be triggered the next time the heap reaches 30MB

  1. GcTriggerTime If the interval since the last GC reaches runtime. forcegCperiod (default: 2 minutes), the GC will start; The Sysmon thread is responsible for monitoring

3. GcTriggerCycle Starts GC(Runtime.gc ()) if garbage collection is not currently enabled

Satisfy one of the above three conditions.