1. Important Concepts

1. Partition (Region)

G1 divides the heap into regions of equal size. Each partition can be Eden, Survivor, or Old, and each partition can belong to only one generation at a time. The size of a Region in G1 ranges from 1MB to 32MB. The exact size is determined by the Heap size, which tries to ensure that the entire Heap is divided into approximately 2048 regions. For example, if the Heap has 16G, then 16G / 2048 = 8MB, i.e. a Region is about 8MB. Of course, 2048 regions are not absolute. If the Heap is very large or very small, the total number of regions can exceed or be less than 2048. The total number of regions can also be set by parameter -xx :G1HeapRegionSize=n.

A Humongous Object that occupies more than half of a Region’s memory is called a Humongous Object. Humongous objects are allocated based on the size of the Region. In available regions, find several consecutive regions that are sufficient to put the object and allocate them to the Humongous object. If the size of the object is smaller than one region, the entire region will be occupied. If such a contiguous-region is not found, G1 uses Fail-Safe’s FGC directly to clean and compact the heap. Many YGC and OGC processes are concurrent, and the Humongous Object cannot allocate memory to allow the application thread to continue running. A full STW collection of memory must be performed. The following figure shows that successive regions consist of StartsHumongous and ContinuesHumongous regions.

A separate area for Humongous objects is created to avoid copying long-live large objects during GC. The purpose of opening a continuous Region to store only one Humongous Object is to enable G1 to collect Humongous Object more aggressively. Once this Object dead is found, all Regions occupied by it can be collected. There is no need to determine whether the Region is used by other objects and whether other objects are still live. For example, in the clean up phase of marking, YGC, and FGC, humongous objects are immediately collected if no reference is found.

2.Collection Set (CSet)

A collection of partitions that can be reclaimed. Data that survives in a CSet is moved to another availability zone during GC, and partitions in a CSet can come from Eden space, survivor space, or old age.

3.Remembered Set (RSet)

Each Region has an RSet, which records the index of the card that references objects in the Region. Belongs to the Point into structure. RSet allows the garbage collector to know which objects from other partitions reference objects from that partition without scanning the entire heap. As shown in the figure below, the objects in Region1 and Region3 both refer to objects in Region2, so both references are recorded in Region2’s RSet.

G1 GC adds a layer of structure to the points-out card table to form a points-into RSet: each region records which other regions have Pointers to it, and these Pointers are in the range of which cards. The RSet is actually a hash table. The key is the start address of another region, and the value is a set. The element in the RSet is the index of the card table. For example, if the key of region A’s RSet is Region B and the value contains A card with index 1234, it means that A card of Region B has A reference to Region A. Therefore, for Region A, the RSet records the points-into relationship. The card table still records the points-out relationship.

RSet updates are done through a post-write barrier, and RSet updates occur throughout the application lifecycle. The post-write barrier is invoked when a user thread writes a reference, such as x.f=y. The whole update process is as follows:

  1. Then check whether Y and X belong to the same Region. If y is not null and x belong to different regions, go to the next step.
  2. The rs_enqueue method is called. This method first checks whether the card where X resides is marked as dirty in the card table. If so, it ends. If it is not, mark the card as dirty and place the card x in the thread’s remembered Set log(also called dirty card queue). If the queue is full, This thread is placed in the global queue (filled RS buffers) and allocated a new queue for it.

This is what the write barrier does. When the number of logs in filled RS buffers reaches a threshold, a set of Concurrent Refinement Threads processes the card inside: First, set the status of the card to clean. Then, check whether all objects in the card have Pointers to other Region objects. If yes, update the RSet of the corresponding Region.

G1 limits the size of filled RS buffers. Threads that submit the dirty Card queue(remembered set log) above this value need to process these cards themselves, which has a performance impact.

4.Snapshot-At-The-Beginning (SATB)

SATB is a means to maintain the correctness of concurrent GC. The concurrency theory of G1 GC is based on SATB. The G1 garbage collector uses this algorithm to take snapshots of live objects during the marking phase. During the concurrent marking phase, the application thread may modify a reference to an object, such as deleting or pointing to another object, resulting in a snapshot of the living object at the end of the concurrent marking phase that is not the same as it was at the beginning. GC solves this problem by introducing a pre-write barrier: In a concurrent marking cycle, whenever there is a reference update, the application thread stores the object referenced by that object to the thread’s local SATBMarkQueue, also known as the SATB log buffer, before the reference is modified. If the SATB log buffer is full, The buffer is placed in the global SATB log buffer list, and then a SATB buffer is allocated to the thread. The concurrent marking thread periodically checks and processes the filled SATB log buffer, STW and processes all SATB log buffers during the final marking process.

For example, during concurrent marking, the business thread executes the following statement:

x.f = y
Copy the code

Change the reference to f in object X so that it points to y. Then the object pointed to by x.f may be dead or alive. According to SATB’s requirements, it needs to be marked as live. The code logic for pre-write is similar:

if (is-marking-active) {
    prev = x.f;
    if (prev != Null) {
        satb_enqueue(prev);
    }
}
Copy the code

In other words, if the reference of x.f changes during marking, the object originally pointed to by x.f should be added to the satb_enqueue and marked as live in an asynchronous manner.

5. Marking Bitmap and TAMS

Marking bitmap is a data structure used to mark living objects during Marking cycles, where each bit represents a starting address that can be assigned to an object. For example, in the figure below, addrN represents the starting address of an object. The green blocks represent live objects at the start address, while the remaining white blocks represent junk objects.

Marking bitmap is a global data structure. Addresses in each Region can be easily mapped to the bitmap. The G1 uses two bitmaps: previous bitmap and Next Bitmap. The previous bitmap holds the previous completed marker information, and the next bitmap is the bitmap created and recorded in the current concurrent marker cycle. When the current marker cycle ends, the next bitmap overwrites the previous bitmap.

Each Region maintains two Top at Mark Start (TAMS) to record the top position of the Region when the Region is marked. (Top is the dividing line between the Region with allocated objects and the Region with unallocated objects. The Region with a smaller address than TOP is the Region with allocated objects. The area larger than the top address is for unallocated objects. The two TAMS, prevTAMS and nextTAMS, represent the top position at the start of the previous tag and the top position at the start of the current tag. The positions between nextTAMS and TOP are considered to be token alive objects (implicit tokens, where objects are new objects created by the user thread after the start of the token cycle).

The figure above uses a Region as an example to help us clearly understand previous bitmap and next Bitmap.

  1. NextBitmap is the newly created empty bitmap. PrevTAMS points to the start address of Region BOTTOM and nextTAMS points to the current top.
  2. At this time, some bits have been marked in nextBitmap, and top has moved forward for a period of time. Between nextBitmap and TOP is the new object generated from the initial marking, which is the marked object by default.
  3. The C stage is the cleanup stage, which points prevBitmap to nextBitmap and nextBitmap to null, and points nextTAMS to bottom.
  4. In phase D, we see that at the start of a new concurrent markup cycle, prevBitmap stores the last markup information, while nextBitmap is still the newly created bitmap, and a new markup is opened.
  5. E, F is the repetition of B, C above.

6. Evacuation

Evacuation actually performs garbage collection. The Evacuation phase is STW and can be roughly divided into two steps: the first step is to select and Collect several regions from regions, called Collect Sets (Csets); The second step is to copy the living objects of these regions to the free regions and add the reclaimed regions to the free Region list.

Second, the process

1. New Generation waste Recycling (YGC)

When the Eden area is exhausted, garbage collection for the new generation will be started. The whole process STW, the collection of the new generation will be executed by multiple GC threads in parallel. After the collection of the new generation, the surviving objects will be copied to a new survivor area or the old era.

2. Concurrent Marking Cycle Phases

When the whole heap utilization rate has reached the threshold value, then open the concurrent marking cycles, the threshold value by the parameter – XX: InitiatingHeapOccupancyPercent specified, the default is 45%. Concurrent markup consists of the following five phases:

2.1. Initial Mark

The initial mark is to mark the object directly reachable by GC roots.

At this stage, will STW, due to the new generation garbage collection will STW, so will carry on a YGC, at this stage in the new generation garbage collection when initialize tags, so although can make the STW time longer, and increase the CPU overhead, but the initial marker than alone on a STW + a YGC STW time is shorter. In this phase, the nextBitmap is reset and nextTAMS of each Region are directed to the corresponding TOP.

2.2. Root Region Scan

Scans for objects in the Survivor region, marking the objects referenced by these objects. The Survivor zone objects are the objects generated by YGC in the initial marking stage, so they are considered to be the surviving objects and therefore used for marking roots scanning.

This phase is performed concurrently with the application and cannot be interrupted by YGC, which modifies survivor zones. If YGC has to be done during this phase, it pauses until the root zone scan is complete before garbage collection. In the following GC log, you can see that YGC starts to wait before the root region scan finishes.

50.819: [GC concurrent-root-region-scan-start] 51.408: [GC concurrent-root-region-scan-end, 0.5890230]... 350.994: [GC Pause (young) 351.093: [GC concurrent-root-region-scan-end, 0.6100090] 351.093: [GC Pause (young) 351.093: [GC concurrent-root-region-scan-end, 0.6100090] 351.093: [GC concurrent - mark - start], 0.37559600 secs]Copy the code

2.33. Concurrent Mark

Marks all reachable objects in the entire heap, executes concurrently with the user thread, and can be interrupted by YGC (because YGC STW). The concurrent tag uses the trace algorithm to find all living objects and records them in nextBitmap, because objects above nextTAMS are considered to be alive implicitly, so we only need to traverse those below nextTAMS.

This phase does deep traversal by scanning the marked live objects on nextBitmap and using a marking stack. Imagine traversing nextBitmap, using marking stack, three-color marking, etc.

When the concurrent marking thread is scanning and marking objects, it will also check the SATB log buffer list periodically, read the SATB log buffer from it, mark the objects in it, and the objects referenced by this object will be marked eventually.

2.44. Re-marking

This phase will STW and the GC thread will consume all of the SATB log buffer. If the user thread continues to run, the SATB log buffer will increase so that it cannot be consumed completely. In addition, I will also do some processing of reference(soft,weak, Phantom).

2.5. Clean up (the Cleanup)

This phase points prevBitmap to nextBitmap and prevTAMS to nextTAMS.

There are also the three most time-consuming operations at this stage:

  1. Count the live objects in regions and sort them from smallest to largest in proportion to the number of live objects (mark the most garbled regions as X and reclaim them in MixedGC).
  2. If a Region without a live object is found, it is cleared and returned to the free list.
  3. Clear the RSet of each Region. For example, if an object pointing to the current Region in a card is garbage, clear the records in the RSet. Since both YGC and Mixed GC will scan this RSet later, cleaning it will help improve the efficiency of RSet scanning in the subsequent cleaning process.

This procedure is STW when counting objects and cleaning up rsets, and is performed concurrently with the user thread when resetting regions with no live objects and returning them to the free list.

Therefore, the main purpose of this process is not to reclaim memory, only regions with no living objects are reclaimed.

3.Mixed GC

After the concurrent marking cycle is complete, G1 switches from performing YGC to performing MixedGC. MixedGC cleans up both the new generation Region and part of the old generation Region, so we can say that this period does YGC and cleans up the Region marked X at the end of the concurrent marking cycle. Like normal YGC, G1 completely empties Eden and adjusts survivor zones. Each MixedGC only cleans up a portion of the marked partitions from the old generation, so MixedGC will continue to occur until all the marked partitions have been cleaned up. After that, G1 switched from MixedGC mode to YGC mode until the condition of concurrent marking was reached again, and a new round of concurrent marking cycle was started.

In the big post, I gave a pseudo-G1 garbage collection running process, as shown in the figure below, and combined with the details in the previous section, you can understand the normal process of G1 GC.

G1 tuning

See [www.javaadu.online/?p=465] fourth, fifth…