This is the second day of my participation in the November Gwen Challenge. Check out the details: the last Gwen Challenge 2021

Garbage Collection (GC) is an automatic memory management mechanism in programming languages. Garbage is a block of memory that is no longer needed and cannot be reused if it is not cleaned up in time.

Garbage collection algorithm

Common garbage collection algorithms are:

  • Reference count: Each object maintains a reference count, which counts -1 if the object is destroyed, and when the count is 0, the object is reclaimed.
    • Advantages: Objects can be reclaimed quickly without running out of memory or reaching thresholds.
    • Disadvantages: Does not handle circular references well
  • Mark-clean: Starting with the root variable, all referenced objects are marked “referenced” and unmarked objects are reclaimed.
    • Advantages: Solves the disadvantages of reference counting.
    • Disadvantages: Need STW (stop the world), temporarily stop the program running.
  • Collection by generation: The collection space is divided into different generations according to the life cycle of objects. The long life cycle is put into the old generation, and the short life cycle is put into the new generation. Different generations have different collection algorithms and collection frequency.
    • Advantages: Good recovery performance
    • Disadvantages: Complex algorithm

GC for Go (garbage Collection)

Mark and sweep

This algorithm was used before Go V1.3 and has two main steps:

Mark Phase — “Sweep Phase”

  1. Pause the program business logic, identify unreachable objects, and mark them.

Note: The Mark and Sweep algorithm needs to be paused while it executes! Also known as STW(Stop the World). That’s when the program gets stuck.

Program reachable objects are 1, 2, and 4

  1. Start marking, find all reachable objects, and mark

Objects 1, 2, and 4 are marked

  1. Clears unmarked objects

  1. Program pause cancelled. The process is then repeated until the end of the program life cycle.

Tricolor concurrent notation

This algorithm was first used in Go V1.5. Tricolor is just a way of saying it is easy to abstract from the narrative. In fact, objects do not have colors. The tricolor here corresponds to the three states of the object during garbage collection:

  • Gray: The object is still waiting in the tag queue
  • Black: The object is marked and will not be cleaned in this GC
  • White: the object is unmarked and will be cleaned up in this GC
  1. All objects are white in their initial state.
  2. Start from the root node and iterate over all objects, turning them into gray objects (note: gray objects are root nodes).
  3. Iterate over the gray object, turning all objects referenced by the gray object (note: this refers to all objects referenced by the gray object, including those indirectly referenced by the gray node) into gray objects, and then turning the gray object traversed into black objects.
  4. Repeat step 3 until the gray objects are all black.
  5. Repeat with write-barrier to detect object changes. (Note: Since Mark and the user program are running in parallel, new object assignments may occur during the previous step, and write barriers were introduced to address this problem.)
  6. Collect all white objects (garbage).

For example:

  1. So the objects are white, and the call is as follows:

Root – > / A – A – > B > C/A < – > D; root->F; E; G->H;

  1. GC starts scanning and traverses from the root node. It finds that only A and F are root nodes and changes A and F to gray objects.

  1. GC continues to scan the gray object, and the nodes referenced in the gray object will also become gray objects. The nodes B, C, and D referenced by node A will become gray objects. Then all the children of node A will become black objects after traversal, while node F will also become black objects without children.

  1. GC iterates through the gray object until there are no nodes in the gray object. In this case, it turns B, C, and D into black objects when it finds that none of the children of B, C, and D are white.

  1. The remaining E, G, and H are white objects, which are reclaimed by GC.

  1. After the above garbage collection is complete, the GC does one more step, which is to re-color the black object into a white object for the next garbage collection.

Stop The World

In Golang, STW (Stop The World) is to Stop all goroutine, concentrate on garbage collection, and restore goroutine after garbage collection. The length of STW time has a direct impact on the execution of the application. Excessively long time is unacceptable for some Web applications, which is also one of the reasons for widespread criticism. In order to shorten the STW time, Golang constantly optimizes the garbage collection algorithm, and this situation has been greatly improved.

Garbage collection optimization

Write Barriers

The purpose of STW is to stop goroutine from changing memory during GC scans, and write barriers are a means of keeping Goroutine running at the same time as GC. While write barriers do not eliminate STW completely, they can greatly reduce STW time. Write barriers are a kind of switch that is turned on at a specific time in GC. When turned on, the pointer will be marked as passed, meaning that the pointer will not be collected in this cycle until the next GC. Newly allocated memory is marked immediately during the GC, without using write barriers, meaning that memory allocated during the GC is not reclaimed in the current GC.

Auxiliary GC (Mutator Assist)

To prevent memory allocation too quickly, during GC execution, if a Goroutine needs to allocate memory, the Goroutine participates in the GC’s work, that is, helps the GC do some of the work, a mechanism called Mutator Assist.

Garbage collection trigger time

  1. Each time memory allocation is checked to see if the current memory allocation has reached the threshold, and if so, GC is started immediately. The memory growth rate is controlled by the environment variable GOGC, which defaults to 100, starting GC whenever memory is doubled.

Threshold = last GC memory allocation * memory growth rate

  1. By default, GC is triggered every 2 minutes at most.

  2. You can also use runtime.gc () in program code to trigger GC manually. This is primarily used for GC performance testing and statistics.