1. Basic Concepts

1. An overview of the

How G1 got its name

G1 collects as many regions as possible. That’s why it’s called garbage first.

The characteristics of G1

  • Up to 10 gigabytes of memory or more

  • Object assignment and promotion are more efficient

  • The fragmentation problem has improved significantly

  • Meet as much as possible -XX:MaxGCPauseMillis target

2. region

G1 allocates memory into 2048 regions of the same size. Each region is 1 to 32 MB in size. The region size is dynamically adjusted based on the target time. These regions are the basic units for memory allocation and garbage collection. Eden and survivor function the same as the other collectors in G1, but their address Spaces are not continuous, even though the Eden and survivor address Spaces of the other collectors are. The old age contains the “humongous” region, which may span multiple regions.

-XX:G1HeapRegionSize= The set of the heap region size based on initial and maximum heap size. So that heap contains roughly 2048 heap regions. The size of a heap region can vary from 1 to 32 MB, and must be a power of 2.

Ergo is short for ergonomic. Ergonomic means ergonomic. See below:

You can think of it as a more comfortable, more comfortable way. In G1 parameter, can be understood as “optimal parameter”, “engineering parameter” meaning.

The application always allocates objects in the Eden region first. Humongous objects are objects larger or equal the size of half a region.

In the figure above, gray is empty memory space, red is Eden region, red is SURVIVOR region, and light blue is old age region (Humongous region containing contiguous regions).

3. evacuation

G1 reclaims space mostly by using evacuation: live objects found within selected memory areas to collect are copied into new memory areas, compacting them in the process. After an evacuation has been completed, the space previously occupied by live objects is reused for allocation by the application.

G1 copies live objects from the “Collect Target region” to the new region and compresses them during copying.

Why does the G1 reduce memory fragmentation? For example, in contrast to CMS, CMS performs the cleanup algorithm in the old years, not the copy algorithm. When CMS is cleared, only garbage is removed, which does not compress the old years, resulting in many “holes”, resulting in memory fragmentation. The G1, on the other hand, is a full copy algorithm and does not cause memory fragmentation.

The word evacuation means “to evacuate.” To evacuate and move people from one place to another.

In G1, reachable objects are evacuated and moved from one region to another.

This GC places Eden and Survivor evacuation in a new region.

4. cset

What memory the G1 collects is determined by a CSET. G1 collects all the young generations and the old generation with the most garbage as a Cset.

G1 achieves the “maximum pause time” goal by controlling the number of regions to be collected at a time. But when we collect regions A, B, and C, how do we know whether regions D and E are referenced? Do I need a full heap scan? Isn’t full heap scanning slow?

G1 has a concept called Remembered Set (RSet). Each region has an RSet that represents the references of other regions to its region objects.

Note that references to older objects that are already considered garbage are invalid if they are in concurrent tags.

And, finally, the live objects are moved to survivor regions, creating new if necessary. The now empty regions are freed and can be used for storing objects in again.

5. RSET

G1 has many regions, and each region is divided into cards. By default, each card is 512 BYTES. When the object in regionA’s card references another regionB’s object, the card is marked as dirty and the address of the card is recorded in the RSet of the target regionB. In this way, when regionB is reclaimed, it knows whether another Region references the object of this Region. Reduced full heap scan.

When REGION1’s card was set as dirty, the pointer reference was not immediately updated in region2’s Rset. Because region2’s Rset is a globally shared resource in a concurrent scenario, it may be heavily locked if it is updated immediately. When the card in Region1 is set to dirty, the card table is placed in the Dirty Card queue, which is consumed by the refine thread of G1. But the actual consumption principle is realized by coloring the Dirty Card Queue.

Another option to increase throughput is to try to decrease the amount of concurrent work in particular, concurrent remembered set updates often require a lot of CPU resources. Increasing -XX:G1RSetUpdatingPauseTimePercent moves work from concurrent operation into the garbage collection pause. In the worst case, concurrent remembered set updates can be disabled by setting -XX:-G1UseAdaptiveConcRefinement -XX:G1ConcRefinementGreenZone=2G -XX:G1ConcRefinementThreads=0. This mostly disables this mechanism and moves all remembered set update work into the next garbage collection pause.

One way to increase throughput is to minimize the number of concurrent jobs. Concurrent rsets often require a lot of CPU resources. Can increase the parameter – XX: G1RSetUpdatingPauseTimePercent value, make the RSet update as far as possible in the process of GC pause to complete. In the worst case, you can turn off concurrent Rset updates so that all Rset updates can be moved to the next GC pause.

The above paragraph is referenced to illustrate that RSET updates may be concurrent with the user thread, but the RSET is also updated when GC STW is performed (as reflected in the GC log).

To maintain the remembered sets, during the runtime of the application, a Post-Write Barrier is issued whenever a write to a field is performed. If the resulting reference is cross-region, i.e. pointing from one region to another, a corresponding entry will appear in the Remembered Set of the target region. To reduce the overhead that the Write Barrier introduces, the process of putting the cards into the Remembered Set is asynchronous and features quite a number of optimizations. But basically it boils down to the Write Barrier putting the dirty card information into a local buffer, and a specialized GC thread picking it up and propagating the information to the remembered set of the referred region.

2, Boil down to

The original meaning is propagate, it can be extended to mean propagate.

6. Three-color marking

How does G1’s concurrent marking phase work? Note that the GC is a marked live object, just a copy of the live object, the garbage is still in its original place, probably not emptied at all, because the garbage object is overwritten with new data.

Mark objects with three colors:

  • Black: The root object, or the object itself and its children, have been marked.

  • Gray: The object itself is marked, but the child object has not been marked or has not finished marking.

  • White: the object is not marked and will be garbage.

The process of tricolor marking:

After the above step, you can see that the grey object becomes black, and the white object referenced by grey becomes grey.

After the previous step, you can see that the white object above is the last garbage object.

It is important to note that G1’s tagging process is executed concurrently with the application, so this process may have Lost Object problems (Lost Object problems) or floating garbage problems (objects that should be garbage, Mislabeled).

In the next step, the GC thread is suspended and the application thread executes.

In the above step, the GC thread is scheduled to execute. But object A is already black, indicating that it and its children have been marked, while object B’s child pointer has been null, making C A white object. But object C is actually referred to by object A as far as the application thread is concerned. But with this marking algorithm, object C will be recycled. Obviously not.

7. SATB

G1 marking uses an algorithm called Snapshot-At-The-Beginning (SATB) . It takes a virtual snapshot of the heap at the time of the Initial Mark pause, when all objects that were live at the start of marking are considered live for the remainder of marking. This means that objects that become dead (unreachable) during marking are still considered live for the purpose of space-reclamation (with some exceptions). This may cause some additional memory wrongly retained compared to other collectors. However, SATB potentially provides better latency during the Remark pause. The too conservatively considered live objects during that marking will be reclaimed during the next marking.

G1 markers use SATB algorithm. G1 makes a backup of the heap during the initial tag pause phase, and if the objects are alive at the beginning of the tag, they are considered alive for the rest of the tag (even if some of the later objects are unreachable, they cannot be recycled this time). This caused some memory to be incorrectly reserved. However, this algorithm enables better throughput in the remark phase. These live objects that are incorrectly marked will be reclaimed the next time they are marked.

SATB(snapshot-at-the-beginning) means that a Snapshot is taken before tagging, i.e. The GC tri-color tagging process takes place in this Snapshot. With this snapshot, the missing object C in the tricolour case above will not be missed because there is always a reference from object B to object C in snapshot.

What if the application thread makes a parallel reference to the snapshot white object? How does the tag thread sense it? Remember the “write barrier” from rsets? Write barrier is also used here. When a new copy operation is performed, it will be recorded and re-marked in the remark phase.

Another question, what if the application thread creates a new object that is not in snapshot? Each region records two top-at-Mark-start (TAMS) Pointers, prevTAMS and nextTAMS. Objects above TAMS are newly assigned and are therefore considered implicitly marked.

8. write barrier

Write barriers are mentioned in both the CSET and SATB sections. Write barriers are used for object assignments. A write barrier is A bit of code added by the JVM to record additional information, such as A reference to the card of region A in the CSET of region B.

2. GC log analysis

1. Evacuation Pause: Fully Young

When the young generation is full, a YGC is performed to copy the contents of the Eden region to the survivor region. But what is the size of the young generation, and what is the ratio between Eden and survivor? The official explanation is, do not configure these. The size of the young generation is key to G1’s goal of “maximum pause time.”

If you prefer high throughput, then relax the pause-time goal by using –

XX:MaxGCPauseMillis or provide a larger heap. To swallow, increase the XX:MaxGCPauseMillis parameter, or provide large heap memory.

If latency is the main requirement, then modify the pause-time target. Avoid limiting the young generation size to particular values by using options like -Xmn, -XX:NewRatio and others because the young generation size is the main means for G1 to allow it to meet the pause-time. Setting the young generation size to a single value overrides and practically disables pausetime control. To delay, modify the Maximum pause time. Do not use -xmn, -xx :NewRatio configuration.

There is one parameter you can configure for the ratio of the younger generation to the older generation.

The size of the young generation will change dynamically between these two values.

  • User – Total CPU time that was consumed by Garbage Collector threads during this collection

  • Sys-time spent in OS calls or waiting for system events

  • Real Clock time for which your application was stopped. With the parallelizable activities during GC this number is ideally close to (user time + system time) divided by the number of threads used by Garbage Collector. In this particular case 8 threads were used. Note that due to some activities not being parallelizable, it always exceeds the ratio by a certain amount.

  • User time: the amount of time that G1 threads use during collection. This time is the sum of time spent in G1 by each thread.

  • System time: the time spent in kernel mode during GC or waiting for system events.

  • Wall time: This is the actual time spent, approximately equal to (user + sys)/gcThreadsCount. It usually takes a little longer than this formula, because not all operations are performed in parallel.

The GC log above describes the process of collecting young generations in parallel. Key points:

  • The time spent collecting in parallel (this is the wall time, which is the actual time consumed) and the number of threads collecting in parallel (this is also important, not too large).

  • GC worker start time: this shows the start time of the GC parallel collection (the elapsed time between JVM startup and the start time of this GC, i.e. 0.134) : 0.134 in GC Pause (G1 Evacuation Pause) (young), 0.0144119 SECS). This describes the difference in the startup time of each G1 worker thread. If there is a large difference between min and Max, attention should be paid: the number of GC worker threads may be too large, or the machine load may be too high, and other processes on the machine are stealing the CPU time of the PROCESS GC.

  • GC worker end time: same as GC worker start time.

2. G1 concurrent marking

When the old s space utilization rate exceeds the threshold InitiatingHeapOccupancyPercent (IPOP), concurrent tags.

By default, G1 will start adapting the IPOP switch (-xx :+G1UseAdaptiveIHOP) to adaptively calculate the best IPOP value by observing the elapsed time of the marking process and the allocation of older ages. You can see that IPOP calculations are based on a “tagging” process. If it hasn’t been marked yet, can’t it go unmarked all the time? The value of the first IPOP is G1HeapReservePercent minus -xx.

When adaptive IPOP switch is closed (- XX: + G1UseAdaptiveIHOP), always use – XX: InitiatingHeapOccupancyPercent parameters as the threshold value (the default is 45%).

The entire process of concurrent markup:

  • First recognition mark: marks all living objects reachable from GC root. In fact, the initial labeling phase of G1 is a phase of YGC.

One more word: piggy-backed. On the back of, on the back of, on the back of

  • Scan root Region.

  • Concurrent marking phase

  • Remark. Stop the World phase.

  • The clean up. There will probably be a Stop the World phase. In this phase, all-free regions are concurrently reclaimed and the proportion of living objects is concurrently calculated to sort these regions in preparation for garbage first.

3. Evacuation Pause: Mixed

During the clean up phase of the concurrent marking phase, the mixed phase is not used if all old ages are freed. The Mixed phase is not executed immediately after the concurrent marking phase. In general, there are multiple YGCs between concurrent markers and mixed. Mixed will not only collect the young generation but also the old generation.

The GC calculates the proportion of garbage in each old region based on the previous concurrency marker and selects the old region with the most garbage to put in the CSET. Old region in CSET, The proportion of the old live objects in the region – XX: G1MixedGCLiveThresholdPercent = 85 (old generation regions with who live object occupancy than This percentage aren’t collected in this space-reclamation phase.)

A MIX GC selects old regions based on garbage proportion. However, how many old regions are collected at a mix GC? Can pass parameters – XX: G1OldCSetRegionThresholdPercent = 10 (Sets an upper limit on the number of old regions to be collected during a Mixed garbage collection cycle. The default is 10 percent of The Java heap.).

The copy algorithm performed by the MIX GC is to copy the living objects to the new region. If the proportion of living objects in the old region is high, this copy is time-consuming and can be ignored. Wastepercent =10(Sets the percentage of heap that you are willing to waste. The Java HotSpot VM does not initiate the mixed garbage collection cycle when the reclaimable percentage is less than the heap waste percentage. The Default is 10 percent.) That is, when there is 10% garbage in the heap, there is no MIX GC.

In the case of cross-region reference, the reference needs to be written to the RSet of the region where the applied object belongs. This process is asynchronous.

Third, in actual combat

In one line, there are many to-space Exhausted in GC log.

This is because after a GC, when you want to copy, you find that the to space is not enough to store, and you need to do an FGC. A: Reserve is used to reserve space. This will not do FGC.

-XX:G1ReservePercent=10

Sets the percentage of reserve memory to keep free so as to reduce the risk of to-space overflows. The default is 10 percent. When you increase or decrease the percentage, make sure to adjust the total Java heap by the same amount.

Original arguments -xmx4096m -xMS4096m -verbose: gC-xloggc :/home/logs/gc-19164.log 
-XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions 
-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/logs  
-XX:ParallelGCThreads=4 -XX:ConcGCThreads=1 
-XX:G1ReservePercent=10- XX: - OmitStackTraceInFastThrow - XX: + UseCGroupMemoryLimitForHeap optimized parameters - Xmx6144m - Xms6144m - verbose: gc -Xloggc:/home/logs/gc-19164.log 
-XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions 
-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/logs 
-XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 
-XX:G1ReservePercent=20 
-XX:-OmitStackTraceInFastThrow 
-XX:+UseCGroupMemoryLimitForHeap 
-XX:InitiatingHeapOccupancyPercent=40 
-XX:MaxGCPauseMillis=40
Copy the code

The data on August 5 is the optimized data, while the data on July 31 is the pre-optimized data.

From the “total GC time” and “maximum GC time”, we can see that there is a great effect before and after optimization.

Reference:

  • Oracle G1 official documentation

  • Garbage First Garbage Collector Tuning

  • G1: One Garbage Collector To Rule Them All

  • Tips for Tuning the Garbage First Garbage Collector

  • A better video explanation

  • G1 Log Analysis

  • Analysis of missing mark problem of three-color marking algorithm in concurrent condition

Hard advertising

Welcome to the public account: Double6