This is the 28th day of my participation in Gwen Challenge

This article begins with a brief introduction to the common methods of garbage collection, then analyzes how the G1 collector collects, its advantages over other garbage collectors, and finally gives some tuning practices.

What is garbage collection

First, before we get to G1, we need to be clear: What is garbage collection? Simply put, garbage collection is the collection of objects that are no longer used in memory.

G1 collector

The G1 collector (or garbage-first collector) is designed to minimize the pauses that occur when working with very large heaps (larger than 4GB). The advantage of CMS is that the generation rate of memory fragmentation is greatly reduced.

How to start the G1 collector

-XX:+UseG1GC

G1 development principles

Only available in JDK1.7U4 in 2012. Oracle officially plans to make G1 the default garbage collector in jdK9, replacing CMS. Why does Oracle recommend G1 so strongly, and what are its benefits?

First and foremost, the G1 is designed for simple, actionable performance tuning

The developer simply needs to declare the following parameters:

-XX:+UseG1GC -Xmx32g -XX:MaxGCPauseMillis=200

  • -xx :+UseG1GC: enables G1 garbage collector

  • -XMX32G: ** The maximum memory of the designed heap is 32GB

  • XX:MaxGCPauseMillis=200 Sets the maximum pause time for GC to 200ms

If we need to tune, we only need to change the maximum pause time for a given memory size.

Second, G1 removes the division of physical space between Cenozoic and old age.

This way we no longer have to set up each generation in a separate space and worry about whether there is enough memory for each generation. Instead, the G1 algorithm divides the heap into regions, which still belong to the generational collector.

  • Some of these areas contain the new generation, whose garbage collection still pauses all application threads (STW), copying surviving objects to the old age or Survivor space.

  • The old era is also divided into regions, and the G1 collector cleans up by copying objects from one region to another.

  • This means that, under normal processing, G1 compacts the heap (at least part of it) so that there is no CMS memory fragmentation problem.

  • In G1, there is a special region called Humongous. If an object occupies more than 50% of its partition capacity, the G1 collector considers it a giant object.

  • These giant objects, by default, are allocated directly to the tenured generation, but if it is a short-term giant object, it will have a negative impact on the garbage collector.

  • To solve this problem, G1 has a Humongous zone, which is dedicated to storing giant objects.

If an H block does not fit a large object, G1 looks for contiguous H partitions to store it. Sometimes you have to start the Full GC in order to find consecutive H regions.

In Java 8, the persistent generation was also moved into the normal heap memory space, becoming a meta-space.

Object allocation strategy

Speaking of large object allocation, we have to talk about object allocation strategy. It is divided into three stages:

  • TLAB(Thread Local Allocation Buffer) The Buffer is allocated locally by the Thread

  • Allocated in Eden area

  • Humongous area distribution


  • If objects are allocated in a shared space, some synchronization mechanism is required to manage free space Pointers in those Spaces.

  • In Eden space, each thread has a fixed partition for allocating objects, namely a TLAB. When allocating objects, there is no longer any synchronization between threads.

    • (-xx :+UseTLAB)TLAB allocates buffers locally for the thread. Its purpose is to make objects allocated as quickly as possible.

    • For objects that cannot be allocated in TLAB space, the JVM tries to allocate them in Eden space. If the Eden space cannot accommodate the object, the space can only be allocated in the old age.

  • The G1 offers two GC modes, Young GC and Mixed GC, both of which are Stop The World(STW).

We’ll take a look at each of these modes.

G1 Young GC

The Young GC mainly GC the Eden region, which is triggered when the Eden space runs out.

  1. In this case, data in Eden space is moved to Survivor space, and if Survivor space is insufficient, some data in Eden space is promoted directly to tenured space.
  2. Survivor zone data is moved to the new Survivor zone, and some data is promoted to the old chronosphere.
  3. Finally, the Eden space is empty, the GC stops working, and the application thread continues.

Question 1: How do I find all root objects if I only GC new generation objects? Are all objects in the old world roots?

G1 introduced the concept of RSet. It stands for Remembered Set and tracks object references to a certain heap area.

  • In CMS, there is also the concept of RSet, an area in the old era that records references to the New generation.

    • This is a point-out where, when scanning the root for a Young GC, you only need to scan this region, not the whole old age.
  • However, in G1, point-out is not used because a partition is too small and there are too many partitions. If point-out is used, it will cause a lot of scanning waste (there will be repeated data blocks of scanning Pointers). Some partition references that do not need GC are also scanned.

  • The G1 uses point-in to solve this problem. Point-in means which partitions reference objects in the current partition.

Problem 2: Root scan avoids invalid scan. Since there are multiple Cenozoic generations, do we need to record citations between Cenozoic generations?

  • This is not necessary because every time GC is performed, all new generations are scanned, so only references between old and new generations need to be recorded.

If there are many objects referenced, the assigner needs to process each reference, which can be expensive. In order to solve this problem, another concept, Card Table, is introduced in G1.

  • A Card Table logically divides a partition into contiguous areas of a fixed size. Each area is called a Card Table.

  • Card tables are usually small, between 128 and 512 bytes.

A Card Table is usually a byte array. The Card index (array subscript) identifies the spatial address of each partition.

By default, each card table is not referenced. When an address space is referenced, the value of the array index corresponding to the address space is marked as “0”, that is, marked as dirty referenced, and RSet also records the array subscript.

Generally, the RSet is a Hash Table. The Key is the start address of another Region, and the Value is a set. The element in the RSet is the Index of the Card Table.

Young stage of GC

  • Phase 1: Root scan: Static and local objects are scanned
  • Phase 2: Update RS: Process the DIRTY card queue to update RS
  • Phase 3: Processing RS: Detects objects from the young generation to the old generation
  • Phase 4: Object copy: Copy surviving objects to the survivor/old zone
  • Phase 5: Processing the reference queue: soft reference, weak reference, virtual reference processing

G1 Mixed GC

Mixed GC not only does normal Cenozoic garbage collection, but also reclaims some of the older partitions marked by the background scanning thread.

GC steps are divided into 2 steps:

  • Global Concurrent marking

  • Copy alive objects (evacuation)

Global concurrent token

Global Concurrent marking (global concurrency marking) is preceded by Mixed GC.

In the G1 GC, it provides markup services primarily for Mixed GC and is not a necessary part of the GC process. The implementation process of Global Concurrent marking is divided into five steps:

Initial Mark (STW)

At this stage, the G1 GC marks the roots. This phase is closely related to conventional (STW) young generation garbage collection.

Root Region Scan

The G1 GC scans references to older ages (scanning CardTable and RSet) in the live area of the initial tag and marks the referenced objects. This phase runs concurrently with the application (non-STW), and only after this phase is complete can the next STW young generation garbage collection begin.

Concurrent Marking

The G1 GC looks for accessible (live) objects throughout the heap. This phase runs concurrently with the application and can be interrupted by the STW young generation garbage collection.

Final marking (Remark, STW)

This phase is STW collection, which helps complete the marking cycle. The G1 GC empties the SATB buffer, traces the unaccessed live objects, and performs reference processing.

Cleanup (STW)

  • In the final phase, the G1 GC performs STW operations for statistics and RSet purification.

    • During statistics, the G1 GC identifies areas that are completely free and areas that are available for mixed garbage collection.

    • The cleanup phase is partially concurrent when the blank area is reset and returned to the free list.

Three color marking algorithm

When it comes to concurrent markup, we have to understand the three-color markup algorithm of concurrent markup. It is a useful way to describe the tracer and to deduce the correctness of the collector.

First, we classify objects into three types.

  • Black: The root object, or both the object and its children are scanned

  • Gray: The object itself is scanned, but the child objects in the object have not been scanned

  • White: unscanned objects. After scanning all objects, the final white objects are unreachable, that is, garbage objects

When GC starts scanning objects, follow the steps shown below to scan objects:

The root object is set to black and the child object is set to gray.

Continue traversing by gray, and set the objects that have scanned child objects to black.

After traversing all reachable objects, all reachable objects turn black. Unreachable objects are white and need to be cleaned up.

This looks nice, but if the application is also running during the marking process, then the object’s pointer might change. In this case, we have a problem: object loss problem

Let’s look at the following case when the garbage collector scans the following case:

At this point, the application does the following:

A.c=C

B.c=null
Copy the code

Thus, the object’s state graph looks like this:

When the garbage collector marks the scan, it looks like this:

** Obviously, at this time, C is white and is considered as garbage to be removed, which is obviously unreasonable. So how do we ensure that GC marked objects are not lost while the application is running? ** has the following 2 possible methods:

  • Record the object at insert time
  • Record the object when it is deleted

This corresponds to two different implementations of CMS and G1:

CMS uses Incremental update

Incremental update is used in CMS. When a reference to a white object is assigned to a field of a black object in the write barrier, the white object is turned gray. That is, write it down when you insert it.

SATB (snapshot-at-the-beginning)

In G1, SATB (snapshot-at-the-beginning) is used to record all objects when deleting them. It has 3 steps:

  1. A snapshot graph is generated to mark the live object at the start of the tag

  2. All objects that are changed are enqueued during a write barrier (making all old references non-white).

  3. There may be stray garbage and it will be collected next time

G1 now knows which old partitions have the most garbage to recycle. At some point, the Mix GC begins when the global concurrent markup is complete. These garbage collections are called “hybrid” because they not only do normal new generation garbage collection, but also reclaim some of the partitions marked by the background scanning thread.

Hybrid garbage collection is shown below:

Hybrid GC is also a replication cleanup strategy that refrees space when the GC is complete.

This is the end of the hybrid GC. In the next section we move into tuning practices.

Tuning practices

MaxGCPauseMillis tuning

The most basic parameters for using GC were described earlier:

-XX:+UseG1GC -Xmx32g -XX:MaxGCPauseMillis=200

MaxGCPauseMillis = MaxGCPauseMillis This parameter is literally the maximum pause time allowed for GC. G1 tries to ensure that each GC pause is within the set MaxGCPauseMillis range.

So how does the G1 achieve maximum pause times? This refers to another concept, CSet(Collection set). It means a collection of regions that are collected in a garbage collector.

  • Young GC: Select all Cenozoic regions. Control the overhead of the Young GC by controlling the number of regions in the new generation.

  • Mixed GC: Select all regions in the new generation, plus some old regions with high revenue according to global Concurrent marking statistics. Select the old region with high income as far as possible within the cost target range specified by the user.

Problem 3: Needs to be set within this limit. But what value should you set?

  • We need a balance between throughput and MaxGCPauseMillis. If MaxGCPauseMillis is set too small, GC will be frequent and throughput will decrease.

  • If MaxGCPauseMillis is set too high, the application pause time will be longer. The G1’s default pause time is 200 milliseconds.

Other tuning parameters

-XX:G1HeapRegionSize=n

Set the size of the G1 region. The value is a power of 2 and ranges from 1 MB to 32 MB. The goal is to partition about 2048 regions based on the minimum Java heap size.

-XX:ParallelGCThreads=n

Sets the value of the number of STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors, up to 8.

If there are more than eight logical processors, set the value of n to about 5/8 of the number of logical processors. This applies in most cases, except for larger SPARC systems, where the value of n can be about 5/16 of the number of logical processors.

-XX:ConcGCThreads=n

Sets the number of threads to mark concurrently. Set n to about 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).

-XX:InitiatingHeapOccupancyPercent=45

Sets the Java heap usage threshold for trigger marking cycles. The default usage is 45% of the entire Java heap.

Avoid using the following parameters:

Avoid explicitly setting the young generation size with the -xMN option or other related options such as -xx :NewRatio. Fixed the size of the young generation to override the pause time target.

Trigger a Full GC

In some cases, G1 triggers a Full GC, at which point G1 degenerates to use the Serial collector to clean up garbage, using only a single thread to do GC work, and GC pauses of seconds.

The entire application is in suspended animation, unable to handle any requests, and our application certainly doesn’t want that. So what happens when a Full GC happens?

Concurrent mode failure

G1 starts the marking cycle, but the old age fills up before Mix GC, at which point G1 abandons the marking cycle. In this case, either increase the heap size or adjust the cycle (e.g., increase the number of threads -xx :ConcGCThreads, etc.).

Failed promotion or evacuation

G1 does not have enough memory for live or promoted objects at GC time, triggering the Full GC. You can see it in the log (to-space Exhausted) or (to-space Overflow).

The solution to this problem is:

  • Increase the value of the -xx :G1ReservePercent option (and increase the total heap size accordingly) to increase the amount of memory reserved for the “target space”.

  • By reducing – XX: InitiatingHeapOccupancyPercent start marking cycles ahead of time.

  • You can also increase the number of parallel tagged threads by increasing the value of the ** -xx :ConcGCThreads ** option.

Giant object allocation failed

When a large object cannot find a suitable space to allocate, Full GC is started to free up space. In this case, avoid allocating a large number of mega objects, increase memory or increase -xx :G1HeapRegionSize, so that the mega object is not a mega object.