G1 garbage collector for Java Virtual Machines

Let’s take a closer look at the latest popular representative of the current Java GC Collector: the G1 Collector (garbage-First Garbage Collector)

preface

The G1 collector is a server-side garbage collector designed for machines with multi-core processors and large memory. It was officially introduced in JDK 7U4 and designated as the official GC collector in JDK9. It satisfies high throughput while keeping GC pauses as short as possible. The G1 collector is designed specifically for the following application scenarios

The CMS collector can run concurrently with the application
Compress free memory fragments without lengthy GC pauses
GC pauses can be better predicted
You don’t want to sacrifice a lot of throughput performance
A larger Java Heap is not required

G1 is designed to replace CMS in the long term. There are several differences that make G1 a better solution for GC than CMS.

First point: G1 compacts free memory to make it compact enough by allocating regions instead of a fine-grained free list, reducing memory fragmentation.
Second point: G1’s STW is more controllable. G1 has added a predictive mechanism for pause times, allowing users to specify desired pause times.

1. G1 collector

Conventional GC collectors (serial, Parallel,CMS) invariably divide the heap into three contiguous Spaces of fixed size: young generation, old generation, and permanent generation

But the G1 takes a different approach, with a new memory layoutIn G1, the heap is divided into equal-size heap regions, typically more than 2,000, that are logically contiguous. Each region is marked with a unique generation identifier (Eden,survivor,old). Logically, Eden regions constitute Eden space, survivor regions constitute survivor space, and Old regions constitute Old space.

Run the -xx :NewRatio=n command to set the ratio between the new generation and the old generation. The default value is 2, that is, the ratio is 2:1. -xx :SurvivorRatio=n The ratio of Eden to Survivor can be configured. The default value is 8.

G1 runs in a manner similar to CMS in that there is a concurrent global marking phase to determine the survival of objects in the heap. After the concurrent marking is complete, G1 knows which regions have more free space (recyclable objects) and preferentially collects these empty regions, freeing up a large amount of free space. That’s why this method of Garbage collection is called G1.

G1 focuses its collection and compression activities on areas of the heap that are likely to be filled with recyclable objects (that is, garbage), uses a pause prediction model to meet a user-defined pause time goal, and selects the number of areas to collect based on the specified pause time goal.

It is important to note that G1 is not a real-time collector. It can meet the pause time target with a high probability, but not absolute certainty. Based on previously collected data, G1 estimates how many regions can be collected within a target time specified by the user. Therefore, the collector has a fairly accurate model of the cost of the collection area, which it uses to determine which and how many areas to collect within the pause time target.

1.1 the G1 region

Each Region in G1 has the same size. The Region size can be set using -xx :G1HeapRegionSize. The Region size ranges from 1M to 32M, and is an exponent of 2. If not, G1 will automatically determine the Heap size.

Decision logic:

Size = (heap minimum + heap maximum)/TARGET_REGION_NUMBER(2048) Then take size to the nearest power of 2 and keep size between [1M,32M].Copy the code

The specific code is as follows

// share/vm/gc_implementation/g1/heapRegion.cpp
// Minimum region size; we won't go lower than that.
// We might want to decrease this in the future, to deal with small
// heaps a bit more efficiently.
#define MIN_REGION_SIZE  (      1024 * 1024 )
// Maximum region size; we don't go higher than that. There's a good
// reason for having an upper bound. We don't want regions to get too
// large, otherwise cleanup's effectiveness would decrease as there
// will be fewer opportunities to find totally empty regions after
// marking.
#define MAX_REGION_SIZE  ( 32 * 1024 * 1024 )
// The automatic region size calculation will try to have around this
// many regions in the heap (based on the min heap size).
#define TARGET_REGION_NUMBER          2048
void HeapRegion::setup_heap_region_size(size_t initial_heap_size, size_t max_heap_size) {
  uintx region_size = G1HeapRegionSize;
  if (FLAG_IS_DEFAULT(G1HeapRegionSize)) {
    size_t average_heap_size = (initial_heap_size + max_heap_size) / 2;
    region_size = MAX2(average_heap_size / TARGET_REGION_NUMBER,
                       (uintx) MIN_REGION_SIZE);
  }
  int region_size_log = log2_long((jlong) region_size);
  // Recalculate the region size to make sure it's a power of
  // 2. This means that region_size is the largest power of 2 that's
  // <= what we've calculated so far.
  region_size = ((uintx)1 << region_size_log);
  // Now make sure that we don't go over or under our limits.
  if (region_size < MIN_REGION_SIZE) {
    region_size = MIN_REGION_SIZE;
  } else if (region_size > MAX_REGION_SIZE) {
    region_size = MAX_REGION_SIZE;
  }
}
Copy the code

1.2 Cross-generation References

At the time of the Young GC, objects in the Young section may still have references to the Old section, which is the problem with cross-generation references. In order to solve the problem of scanning the whole old age during Young GC, G1 introduced the concepts of Card Table and Remember Set, the basic idea is to exchange space for time. These two data structures are specifically designed to handle references from the Old to the Young section. References from Young to Old do not need to be handled separately, because objects in Young vary so much that there is no need to waste space to record them.

RSet: stands for Remembered Sets. This parameter is used to record all external references to the Region. Each Region maintains an RSet.
Card: The JVM divides memory into fixed-size cards. This is analogous to the concept of page in physical memory.

RS(Remember Set) is an abstract concept used to record a collection of Pointers from the non-collection part to the collection part. In the traditional generational garbage collection algorithm, RS(Remember Set) is used to record Pointers between generations. In the G1 collector, RS is used to record Pointers from other regions to one Region. Thus, a Region will have an RS. This recording provides a great benefit: instead of performing a full heap scan when retrieving a Region, you can simply check its RS to find external references, which are one of the roots of initial Mark.

So, if a thread changes a reference inside a Region, it must notify RS to change the record inside. To achieve this goal, G1 collector introduced a new structure, CT(Card Table) — Card Table. Each Region is divided into several cards of a fixed size. For each card, a Byte is used to record whether it has been modified. The card table is a collection of these bytes. In fact, if RS is understood as a conceptual model, CT can be said to be an implementation of RS.

Concurrency issues are also encountered with RS modifications. Because a Region can have multiple threads modifying it concurrently, they can also modify RS concurrently. To avoid such a conflict, the G1 garbage collector further divides RS into hash tables. Each thread is modified in its own hash table. Finally, RS is logically a collection of these hash tables. Hash tables are one of the common ways to implement RS. It has the great benefit of eliminating repetition. This means that the size of the RS will match the number of Pointers being modified. Without weight loss, the number of RS is equal to the number of write operations.

The dotted Table name of RS in the figure indicates that RS is not a separate and different data structure from Card Table, but that RS is a conceptual model. In fact, Card Table is an implementation of RS. For the maintenance of the RSet structure, you can refer to this article, which does not go into too much depth.

1.3 STAB

The STAB algorithm mentioned above for Global Concurrent Marking, what exactly is this STAB? STAB is called snapshot-at-the-beginning, and its purpose is to maintain concurrent GC correctness

SATB(snapshot-at-the-beginning) was originally used as a real-time garbage collector. The G1 garbage collector uses this technique to “look takes a snapshot of the set of live objects in the heap at the start of marking Cycle “). However, during the concurrent marking phase, the application may modify the original reference, such as deleting an original reference. This can cause the snapshot of the living object after the concurrent marking ends to be inconsistent with the SATB. G1 addresses this problem by introducing a write barrier during the concurrent markup phase: Whenever there is a reference update, G1 writes the value before the modification to a log buffer (which filters out the null reference) and scans the SATB during the final marking phase to correct the SATB error.

First, we will introduce the three-color marking algorithm.

Black: The root object, or both the object and its children are scanned
Gray: The object itself is scanned, but the child objects in the object have not been scanned
White: unscanned objects. After scanning all objects, the final white objects are unreachable, that is, garbage objects.

The process of GC Root scanning is as follows:

The color before GC scans C is as follows:

During the concurrent marking phase, the application thread changes this reference relationship

A.c=C
B.c=null
Copy the code

The result is as follows.

The scan results in the re-marking phase are as follows

In this case C will be recycled as garbage. The existing Snapshot objects A, B, and C are now A and B, and the integrity of Snapshot is damaged.

The G1 uses a pre-write barrier to solve this problem.

The pre-write barrier function records changes in reference relationships in a queue called SATb_mark_queue in the JVM source code. In the remark phase, the queue is scanned. In this way, the objects to which the old reference refers are marked and their descendants are marked recursively, so that no objects are missed and snapshot integrity is guaranteed.

2. G1 GC collection

G1 keeps YGC and adds a new MIXGC for collecting old ages. There is no Full GC in G1. The SERIAL Old Full GC is used in G1.

2.1 YGC collection

When Eden space is full, YGC is triggered. In G1, YGC still copies the surviving object to survivor space. When the surviving age of the object meets the promotion conditions, the object is promoted to old generation regions(old age).

G1 controls the cost of YGC by dynamically changing the number of young regions. In the process of YGC, STW(Stop the World application pause) is still used, and objects are concurrently copied by multi-threading to reduce the GC pause time.

YGC began

End of YGC

We see in the diagram that the survivable object in the Eden zone is copied to the new Survior zone.

Does YGC need to scan the entire old age?

We know that to determine whether an object is alive or not, we need to start from GC ROOTS node, and objects reachable from GC ROOTS node are alive. In YGC, objects from the old age are not recycled, which means GC ROOTS should contain objects from the old age. But scanning an entire generation can be time consuming and will definitely affect the performance of the entire GC! . Therefore, the structure of Card Table is used in CMS, which records references from old objects to new ones. The structure of Card Table is a contiguous byte[] array, scanning Card Table time is much less than the cost of scanning the entire old era! G1 follows this line of thinking, but uses a new data structure Remembered Set for Rset. Rsets record the relationships between objects in other regions and objects in this Region. Rsets belong to the points-into structure (who references my object). Card tables, on the other hand, are a points-out structure where each Card covers a certain range of Heap (typically 512Bytes). The RSet of G1 is implemented on the basis of Card Table: each Region records the Pointers of other regions and marks which Card ranges these Pointers belong to. The RSet is actually a Hash Table. The Key is the start address of another Region, and the Value is a set. The element in the RSet is the Index of the Card Table. Each Region has a corresponding Rset.

How exactly does RSet assist GC?

When doing YGC, you only need to select rsets of the Young Generation region as the root set. These Rsets record cross-generation references of old->young, avoiding scanning the whole old generation. However, in mixed GC, RSet of old->old is recorded in old generation, and reference of young->old is obtained by scanning all young generation regions, so it is not necessary to scan all old generation regions. So the introduction of Rsets greatly reduces the amount of GC work.

So YGC in G1 does not need to scan the whole old age, just scan the Rset to know which objects in the new generation are referenced by the old age.

2.2 Mixed GC

MIXGC in G1 selects all regions in the new generation, and collects some old regions with high returns according to the statistics of Global Concurrent marking, and tries to select the old regions with high returns for recycling within the cost target specified by users. So the memory region reclaimed by MIXGC is new generation + old generation.

Before introducing MIXGC, we need to understand Global Concurrent marking, the global concurrency marking. Because old-time recycling depends on that process.

Global concurrent token

The global concurrent marking process is divided into five phases

Initial Mark Indicates STW

The Initial Mark Initial Mark is an STW event that does its job by marking objects directly reachable by GC ROOTS. And push their fields into the marking stack for subsequent scanning. G1 uses an external bitmap to record mark information, rather than the mark bit in the mark Word of the object header. Because of STW, YGC usually borrows THE STW of YGC and starts Initial Mark, which is to start the global concurrency token, which is logically independent from YGC.

2. Root Region Scanning Root Region Scanning

A root zone scan starts with objects in Survior, marks objects that are referenced in the past, and marks their fields before marking the stack for subsequent scans. Unlike Initial Mark, Root Region Scanning does not require STW to run concurrently with the application. Root Region Scanning must be completed before YGC begins.

3. Concurrent Marking

STW is not required. Constantly fetching references from the scan stack recursively scans the entire heap of objects. Each scanned object is marked and its fields are pushed onto the scan stack. Repeat the scan process until the scan stack is empty. References recorded by the SATB write Barrier are also scanned. Concurrent Marking can be interrupted by YGC

4. Remark Final STW

STW operation. After the concurrent marking, each Java thread will have some remaining references to the SATB write Barrier record that have not been processed. This stage handles the rest of the references. Reference processing is also performed in this phase. Note that this pause is essentially different from CMS remark, which only scans the SATB buffer. CMS remark only scans the dirty cards in the mod-Union table plus the entire root set. At this point, the entire Young Gen (whether the object is alive or dead) is treated as part of the root set, and CMS Remark can be very slow.

5. Cleanup STW AND Concurrent

STW operation to count regions with and without living objects (Empty Region)

STW operation to update Rset
The Concurrent operation collects Empty regions into an allocatable Region queue.

After Global Concurrent marking, the Collector knows which regions have live objects. In addition, fully recyclable regions (with no living objects) are collected and added to the allocatable Region queue to reclaim the memory of these regions. For regions with living objects, G1 retrieves objects based on the statistical model from regions with the highest revenue and the cost less than the upper limit specified by the user. The collection of selected regions is called a collection set. 支那

The Cset in MIXGC is to select all the regions of young Gen, plus some old Gen regions with high revenue according to global Concurrent marking statistics.

Cset in YGC is the region of all young Gen selected. Control the overhead of young GC by controlling the number of regions of Young Gen.

Both YGC and MIXGC adopt multithreaded copy and clearance, and the whole process will STW. The low latency of G1 is due to the fact that the reclaim area becomes more precise and the range is smaller.

Best practices

1. G1 Collect related parameters

Command usage:

Java XX:+UseG1GC ‐XX:MaxGCPauseMillis=100 ‐XX:+PrintGCDetails ‐Xmx256m TestGC \ n-jar c:\javademos\demo\jfc\Java2D\Java2demo.jarCopy the code

## Use the G1 garbage collector -xx :+UseG1GC ## to set the log detail level to detail. -verboseGC (equivalent to -xx :+PrintGC) ## Sets the detail level to a finer level, showing the average time, minimum time, and maximum time for each stage. -xx :+PrintGCDetails ## Sets the desired maximum GC pause time indicator (which the JVM will try to achieve, but is not guaranteed to achieve). The default is 200 milliseconds. -xx :MaxGCPauseMillis ## Set the size of G1 region. The value is a power of 2 and ranges from 1 MB to 32 MB. ## The goal is to partition about 2048 regions based on the minimum Java heap size. The default is 1/2000 of the heap. -xx :G1HeapRegionSize=n ## Set the value of the number of STW worker threads. Sets the value of n to the number of logical processors. The value of ## n is the same as the number of logical processors, up to 8. -xx :ParallelGCThreads=n ## Set n to about 1/4 of the number of parallel garbage collection threads (ParallelGCThreads). -xx :ConcGCThreads=n ## Sets the Java heap usage threshold for the trigger marking cycle. The occupancy rate is 45% of the Java heap - XX: InitiatingHeapOccupancyPercent = nCopy the code

2. Recommendations for the G1 collector

Young generation size: Avoid explicitly setting the young generation size with the -xMN option or other related options such as -xx :NewRatio. Fixed the size of the young generation to override the pause time target.
Don’t be too strict with your pause time goals. The throughput goal for the G1 GC is 90% application time and 10% garbage collection time. When evaluating G1 GC throughput, don’t be too harsh with pause time goals. Being too demanding means you’re willing to incur more garbage collection overhead, which directly affects throughput

conclusion

In this article, I introduced a lot of principles and concepts about G1

Finally, a brief summary:

The G1 divides memory into regions of the same size.
G1 keeps YGC and adds a new MIXGC for collecting old ages. There is no Full GC in G1. The SERIAL Old Full GC is used in G1. The Cset in MIXGC is to select all the regions of young Gen, plus some old Gen regions with high revenue according to global Concurrent marking statistics. Cset in YGC is the region of all young Gen selected. Control the overhead of young GC by controlling the number of regions of Young Gen. Both YGC and MIXGC adopt multithreaded copy and clearance, and the whole process will STW.
The low latency of G1 is due to the fact that the reclaim area becomes more precise and the range is smaller.
Five stages of global simultaneous scoring.
STAB is used to maintain accuracy of concurrent GC.
Best practices for using G1
G1 GC logs are displayed

Related articles

G1 collector principle understanding and analysis
G1 collector