An overview of the

This article introduces the basic principles and theories of GC, GC tuning methods and methods, based on Hotspot JDK1.8, will understand how to troubleshoot GC problems in production system after learning

The reading lasts about 30 minutes and includes the following:

  • GC fundamentals, including tuning targets, GC event classification, JVM memory allocation strategy, GC log analysis, etc
  • CMS principles and tuning
  • G1 principle and tuning
  • GC troubleshooting and solutions

GC Fundamentals

1 GC tuning target

In most cases, GC tuning Java programs focuses on two goals: response speed and throughput

  • Responsiveness refers to how quickly a program or system responds to a request. For example, user order query response time, large pause times are not acceptable on systems that require high response speed. The focus of tuning is to respond quickly in a short amount of time

  • Throughput (Throughput) Throughput attention within a specified period the biggest workload of application system, such as batch number system can accomplish the task of an hour, the optimization of the system in terms of Throughput, longer GC pause time is acceptable, because of high Throughput applications are more concerned about is how to finish the task as soon as possible, Fast response to user requests is not considered

In GC tuning, application pause time caused by GC affects system response speed, and CPU utilization of GC processing threads affects system throughput

2 GC generation collection algorithm

Modern garbage collectors basically use generational collection algorithms, the main idea of which is to logically divide the Java heap memory into two parts: the new generation and the old generation, and adopt different garbage collection strategies for objects of different life cycles and sizes

  • Young Generation

The new generation is also called the young generation. Most objects are created in the new generation, and many objects have a short life cycle. Only a small number of objects survive each generation of garbage collection (also known as Young GC, Minor GC, and YGC), so with the replication algorithm, the collection can be completed with a small cost of replication operations

The new generation is divided into three zones: one Eden zone and two Survivor zones (S0 and S1, also known as From Survivor and To Survivor). Most objects are generated in Eden Zone. When the Eden zone is full, the surviving objects are copied to one of the two Survivor zones. When this Survivor zone is full, objects that survive in this zone and do not meet the criteria for promotion to the old age are copied to another Survivor zone. Each time the object undergoes replication, age is increased by 1. After reaching the promotion age threshold, the object is moved to the old age

  • The Old Generation

Objects that survive N garbage collections in the new generation are placed in the old age, where objects have a high survival rate. Garbage collection in the old days often used a mark-and-tidy algorithm

3 GC event classification

Depending on the area of garbage collection, garbage collection is usually divided into Young, Old, Full, and Mixed GC

(1) Young GC

The Young GC(or Minor GC) is usually triggered when the JVM is unable to allocate memory space for new objects in the new generation, such as when the Eden region is full. The higher the new object allocation frequency, the higher the frequency of the Young GC

Young GC causes stop-the-world pauses every time, stopping all application threads, and The pauses are almost negligible compared to The pauses caused by older GC

(2) Old GC, Full GC, Mixed GC

The Old GC only cleans up GC events in the Old chrono, and the concurrent CMS GC only cleans up GC events in the whole heap, including Cenozoic, Old, meta, etc

  • Mixed GC, which cleans up the entire Cenozoic generation and some old GC, only G1 has this mode

4 GC log analysis

The GC log is an important tool that accurately records the time and results of each GC execution. By analyzing the GC log, you can tune the heap Settings and GC Settings, or improve the object allocation mode of your application.

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps  -XX:+PrintGCTimeStamps
Copy the code

The common meanings of the Young GC and Full GC logs are as follows:

Free GC log graph analysis tools recommend the following two:

  • GCViewer, download jar package directly run
  • Gceasy, a web tool for uploading GC logs for online use

5 Memory allocation policy

Java provides automatic memory management, which can be boiled down to solving the problem of object memory allocation and reclamation. The following are some of the most common memory allocation strategies

  • Object allocation in Eden Area First In most cases, objects are allocated in Eden area of the first generation. When the Eden area does not have enough space to allocate, the virtual machine will initiate a Young GC

  • Large object directly into the old s JVM provides an object size threshold parameter (- XX: PretenureSizeThreshold, the default value is 0, represents no matter how much to allocate memory in Eden), is greater than the threshold value of the parameter Settings of objects directly in the old s allocation, This avoids large memory replication of the object directly in Eden and two survivors

  • Each time an object undergoes a garbage collection and is not collected, its age increases by 1. Objects greater than the age threshold parameter (-xx :MaxTenuringThreshold, default 15) will be promoted to the old age

  • Before the Young GC, the JVM needs to estimate whether the old age can accommodate the surviving objects promoted from the new generation to the old age after the Young GC, so as to determine whether GC needs to be triggered in advance to reclaim the old age space, which is calculated based on the space allocation guarantee strategy:

ContinueSize: indicates the maximum available continuous space in the old age

If successful after Young GC (promoted objects can be put into the old age after Young GC), then it means that the guarantee is successful, and there is no need to carry out Full GC, improving performance; If it fails, a “promotion failed” error will appear, indicating that the guarantee has failed and Full GC is required

  • Dynamic age determination the age of the new generation objects may be promoted to the old age before reaching the threshold (specified by MaxTenuringThreshold parameter). If after Young GC, the sum of the size of all objects of the same age of the new generation surviving objects is greater than half of any Survivor space (total space S0 or S1), At this point, S0 or S1 will not accommodate the surviving Cenozoic objects, and objects older than or equal to this age can directly enter the old age without waiting for the age required in MaxTenuringThreshold

In addition, if the S0 or S1 zone after Young GC is not enough to accommodate: the new generation surviving objects that do not meet the conditions for promotion to the old age will directly enter the old age, which should be avoided as far as possible

CMS principles and tuning

1. Explanation of nouns

Accessibility analysis algorithm: To determine whether an object is alive or not, the basic idea is to start with a series of objects called “GC Root” as the starting point (common GC Root has system classloader, objects in the stack, threads in active state, etc.). Based on the object reference relationship, search from GC Roots down the path is called the reference chain. When an object is not connected to GC Root by any reference chain, the proof object is no longer alive

Stop The World: Object reference relationships are analyzed during GC. To ensure The accuracy of analysis results, it is necessary to Stop all Java execution threads to ensure that reference relationships do not change dynamically. This pause event is called Stop The World(STW).

Safepoint: Special points in code execution where the virtual machine is safe and can be paused if GC is required. HotSpot uses active interrupts, which allow threads of execution to poll for a pause flag at run time and suspend if needed

2 CMS profile

CMS(Concurrent Mark and Sweep) is a Concurrent garbage collection algorithm that uses the Mark Sweep algorithm to collect garbage only for older generations. When the CMS collector works, the GC thread and the user thread execute concurrently as much as possible to reduce STW time

Enable the CMS garbage collector with the following command-line arguments:

-XX:+UseConcMarkSweepGC
Copy the code

It is worth adding that the CMS GC described below refers to the GC of the old age, while the Full GC refers to the GC of the whole heap, including the Cenozoic, old age, meta-space, etc. The two are distinguished

3 New generation garbage recycling

The next generation garbage collectors that can be used with CMS are Serial and ParNew. Both collectors use a marker copy algorithm and both fire STW events to stop all application threads. The difference is that Serial is single-threaded and ParNew is multi-threaded

4 Old age garbage recycling

The CMS GC can be divided into seven phases to minimize STW time for the purpose of obtaining the minimum pause time

  • Stage 1: Initial Mark

The goal of this phase is to trigger the first STW event by marking all surviving objects in the old generation, including direct references from GC Root and objects referenced by surviving objects in the new generation

This process is to support multithreading (single thread before JDK7, after JDK8 parallel, can be adjusted through parameter CMSParallelInitialMarkEnabled)

  • Stage 2: Concurrent Mark

In this phase, the GC thread and the application thread execute concurrently, traversing the live objects initially marked in phase 1, and then recursively marking the objects reachable by those objects

  • Phase 3: Concurrent Preclean

The GC thread and the application thread also execute concurrently at this stage, since phase 2 is executed concurrently with the application thread, some references may have changed. The old chronospace is logically divided into equal sized cards in advance by Card Marking. If the reference relationship changes, the JVM will mark the changed regions as “Dirty cards”. Then, at this stage, the Dirty regions will be found and the reference relationship will be refreshed. Clear the Dirty zone flag

  • Phase 4: Concurrent Abortable Preclean

This phase also does not stop the application thread. In this phase, we try to do as much work as possible before the Final Remark of STW to reduce the application pause time. In this phase, we continue to cycle: mark old reachable objects and scan objects in the Dirty Card area. The termination conditions of the cycle are as follows: 1 The number of cycles is reached. 2 The cycle execution time threshold is reached. 3 The memory usage of the new generation reaches the threshold

  • Phase 5: Final Remark

This is the second (and final)STW phase in the GC event, and the goal is to complete the marking of all living objects in the old age. At this stage, the following steps are performed: 1 through the new generation objects and re-mark them; 2, according to GC Roots, re-mark the Dirty cards of the old age and re-mark them

  • Phase 6: Concurrent Sweep

This phase is executed concurrently with the application, without STW pauses, and garbage objects are removed based on the marked results

  • Phase 7: Concurrent Reset

This phase executes concurrently with the application to reset internal data related to the CMS algorithm in preparation for the next GC cycle

5 CMS FaQs

Too long pause in the final marking phase

About 80% of THE GC pause time of CMS is in the Final Remark phase. If the pause time of this phase is too long, the common reason is the invalid references of the new generation to the old age. In the stage of concurrent cancelable pre-cleaning in the previous stage, the cycle is not completed within the execution threshold time, so it is too late to trigger the Young GC to clean up these invalid references

Add the following parameters: -xx :+CMSScavengeBeforeRemark. Firing the Young GC before performing the final operation reduces invalid references by the new generation to the old generation and pauses in the final marking phase, but if the Young GC has been fired in the previous phase (concurrent cancelable pre-cleanup), it will be fired again

Concurrent mode failure & Promotion failed issues

Concurrent mode fails: CMS garbage collection degrades to a single-threaded Full GC when the new generation garbage collection occurs while the CMS is collecting and the old generation does not have enough room for promoted objects. All application threads are suspended, and all invalid objects from the old days are recycled

Promotion failure: When garbage collection occurs in the new generation, the old generation has enough space to hold promoted objects, but due to fragmentation of free space, promotion fails and a single-threaded Full GC with compression action is triggered

Failure in concurrent mode and failure in promotion can lead to long pauses. Common solutions are as follows:

  • Reduce the trigger threshold of the CMS GC, namely parameters – XX: CMSInitiatingOccupancyFraction values, let CMS GC executed as early as possible, to ensure there is enough space
  • Increase the number of CMS threads with -xx :ConcGCThreads
  • Increase the old chronological space
  • Let objects as far as possible in the new generation of recycling, to avoid entering the old age

Memory fragmentation problem

Generally, the GC process of CMS is based on the tag clearing algorithm without compression, which results in more and more memory fragments needing to be compressed. The following scenarios are commonly used to trigger memory fragment compression:

  • New generation of Young GC appears promotion failed
  • The program actively executes system.gc ()

How many times can be set up by the value of the parameter CMSFullGCsBeforeCompaction Full GC triggered a compression, the default value is 0, representing every into Full GC trigger compression, with the algorithm of compression action for the above mentioned single-threaded Serial Old algorithm, Pause times (STW) are very long and need to minimize compression time

G1 principle and tuning

1 introduction of G1

Gbage-first (G1) is a server-oriented Garbage collector that supports Garbage collection in both the new generation and the old chronospace. It is mainly aimed at machines equipped with multi-core processors and large memory capacity. The main design goal of G1 is to achieve predictable and configurable STW pause times

2 G1 Heap space division

  • Region

To reclaim large memory space with low pause time, multiple regions of equal size are divided. Each small heap region can be an Eden region, a Survivor region, or an Old region, but can belong to only one generation at a time

Logically, all Eden and Survivor regions together are the new generation, and all Old regions together are the Old age. In addition, the memory regions of the new generation and the Old age are automatically controlled by G1 and constantly changing

  • Giant object

If the size of an Object exceeds half of a Region, it is considered a Humongous Object and is directly allocated to Humongous Regions in the old era. These regions are a continuous set of regions. Each Region contains at most one Humongous Object. A large object can have multiple regions

G1 divides heap memory into regions because:

  • Instead of processing the entire heap space each time, only a few regions are processed at a time, enabling large-memory GC
  • By calculating the reclaim value of each Region, including the reclaim time and the reclaim space, reclaim as much memory as possible within the limited time. The pause time caused by garbage collection is controlled within the expected configured time range, which is also the origin of G1’s name: garbage-first

3 G1 Working mode

G1 offers two GC modes for The New generation and The old generation, Young GC and Mixed GC, both of which cause Stop The World

  • When the space of the new generation is insufficient, G1 triggers Young GC to reclaim the new generation space. Young GC mainly performs GC on Eden area, which is triggered when Eden space is exhausted. Based on the idea of generation recycling and replication algorithm, Young GC will select all regions of the new generation each time. At the same time, the space of Eden Region and Survivor Region required by the next Young GC is calculated, and the number of regions occupied by the new generation is dynamically adjusted to control the Young GC overhead

  • Mixed GC when the old age space reaches the threshold, the Mixed GC will be triggered, and all regions in the new generation will be selected. According to the statistics of the global concurrent marking stage (as described below), several old age regions with high collection benefits will be obtained. Within the cost target range specified by the user, the Mixed GC overhead is controlled by which and how many old regions are selected for GC

4 Global concurrency flags

The global concurrent mark is mainly used to find regions with high reclaim returns for Mixed GC calculation, which can be divided into five stages

  • Phase 1: Initial Mark Suspends all application threads (STW) and marks objects directly reachable from GC Root (native stack objects, global objects, JNI objects) concurrently. When the trigger condition is reached, G1 does not initiate a concurrent marking cycle immediately, but waits for the next generation collection. Piggybacking takes advantage of the STW period collected by the Cenozoic generation to complete initial marking.

  • Phase 2: Root Region Scan After the initial tag pause ends, the new generation collects the completed objects and copies them to Survivor, and the application thread becomes active. At this point, in order to ensure the correctness of the marking algorithm, all newly copied objects to Survivor partition need to find out which objects have references to old objects and mark these objects as Root. This process is called Root Region Scanning, and the Suvivor Region scanned at the same time is also known as Root Region. The root partition scan must be completed before the next Cenozoic garbage collection starts (which may be interrupted by several Cenozoic garbage collections during the subsequent concurrent marking), because each GC produces a new collection of living objects

  • Phase 3: Concurrent Marking is performed in parallel with the application thread to mark the information about live objects for regions in the heap, a step that may be interrupted by the new Young GC. All Marking tasks must be scanned before they fill up. If Concurrent Marking takes a long time, It is possible that several Cenozoic collections were made during the concurrent tagging process

  • Phase 4: Remarking (Remark) and CMS similar to suspending all application threads (STW) to complete the marking process stop the application thread briefly, mark the objects that have changed during the concurrent marking phase, and all unmarked live objects, while completing the calculation of live data

  • Phase 5: Cleanup prepares for the upcoming transition phase, which also performs all the necessary Cleanup calculations for the next tag:

    • RSet(Remember set, HashMap) of each Region, record which old objects point to the Region, key refers to the object reference to the Region, value refers to the specific Card Region of the Region, RSet can be used to determine the object survival information of Region, avoiding full heap scan.
    • Reclaim regions that do not contain living objects
    • The collection of old age partitions with high recovery returns (based on free space and pause goals) is statistically calculated

5 G1 Tuning points

Full GC problem

G1 does not have a Full GC in its normal processing flow. It occurs only when garbage collection fails (or is actively triggered). G1’s Full GC is a single-threaded Serial Old GC, which results in a very long STW.

  • The program actively executes system.gc ()
  • Old chronospace filled up during global concurrent marking (concurrent mode failed)
  • Old space filled up during Mixed GC (promotion failed)
  • There is not enough space for surviving objects in Survivor space and old age at Young GC

Similar to CMS, common solutions are as follows:

  • Increasing the -xx :ConcGCThreads=n option increases the number of concurrent flagged threads, or the number of parallel threads during STW: -xx :ParallelGCThreads=n
  • Reduce – XX: InitiatingHeapOccupancyPercent start marking cycles ahead of time
  • Increase reserved memory -xx :G1ReservePercent=n. The default value is 10, indicating that 10% of the heap memory is reserved memory. If there is not enough space in the Survivor region for new promoted objects, reserved memory will be used

Jumbo object allocation

Each Region in the giant object Region contains one giant object, and the remaining space is no longer utilized, leading to space fragmentation. When G1 has no suitable space to allocate the giant object, G1 will start the serial Full GC to release the space. You can increase the Region size by increasing -xx :G1HeapRegionSize, so that a large number of large objects are no longer large objects, but are allocated in a normal way

Do not set the size of the Young section

The reason is that the logical Young region adjusts dynamically in order to maximize the target pause time. If the size is set, it overrides and disables pause time control

Average response time Settings

When setting MaxGCPauseMillis using the average response time of the application as a reference, the JVM will try to meet this condition. Perhaps 90% or more of the requests will meet this condition, but this does not mean that all requests will meet this condition. Setting the average response time too low can lead to frequent GC

Tuning methods and ideas

How to analyze system JVM GC health and optimize it?

The core idea of GC optimization lies in: let objects be allocated and recycled in the new generation as much as possible, avoid too many objects entering the old age, which leads to frequent garbage collection in the old age, and give the system enough memory to reduce the number of garbage collection in the new generation. System analysis and optimization is also carried out around this idea

1 Analyze the operating status of the system

  • The number of requests per second, how many objects are created per request, and how much memory is used
  • Young GC firing frequency, the rate at which the object passes into the old age
  • Memory usage of old years, frequency of Full GC triggering, cause of Full GC triggering, cause of long Full GC

Common tools are as follows:

  • Jstat JVM provides a command line tool that can be used to collect statistics on memory allocation rates, GC times, and GC time
Jstat -gc <pid> < count interval > < count count >Copy the code

The meanings of the returned values are as follows:

For example, run the jstat -gc 32683 1000 10 command to collect statistics about processes whose PID is 32683 once per second

  • The Jmap JVM comes with a command-line tool that can be used to learn about object distribution during system running. The common commands are in the following formats
Jmap-histo <pid> // Generate a heap dump snapshot, and export the dump.hrpof binary file in the current directory. Jmap-dump :live,format=b,file=dump.hprof <pid>Copy the code
  • Jinfo command format
jinfo <pid> 
Copy the code

Extended parameters to view running Java applications, including Java System properties and JVM command-line parameters

Other GC Tools

  • Monitor alarm systems: Zabbix, Prometheus, and Open-Falcon
  • JDK automatic real-time memory monitoring tool: VisualVM
  • Out-of-heap Memory monitoring: Java VisualVM installs Buffer Pools plug-in, Google Perf tool, Java NMT(Native Memory Tracking) tool
  • GC log analysis: GCViewer, GCeasy
  • GC parameter checking and optimization: xxfox.perfma.com/

2 GC optimization cases

  • Data analysis platform system frequently Full GC

The platform mainly conducts regular analysis and statistics on users’ behaviors in APP, supports report export and uses CMS GC algorithm. The data analyst found in the use of the system page open often lag, through the jstat command found that the system after each Young GC about 10% of the surviving objects into the old age.

The original reason is that the Survivor zone space is set too small, and the surviving objects cannot be placed in the Survivor zone after each Young GC, so the Survivor zone can accommodate the surviving objects after Young GC by adjusting the Survivor zone. After adjustment, the surviving objects that enter the old age after each Young GC run stably only a few hundred Kb, and the frequency of Full GC is greatly reduced

  • Service interconnection gateway OOM

The gateway mainly consumes Kafka data, performs data processing and calculation and then forwards it to another Kafka queue. The system runs for a few hours and then gets OOM again after restarting the system for a few hours. The heap memory is exported through JMAP and analyzed in Eclipse MAT to find out the reason: In the code, Kafka topic data of a service is printed asynchronously. The amount of service data is large, and a large number of objects are piled up in the memory waiting to be printed, resulting in OOM

  • Account rights management system is Full GC frequently for a long time

The System provides a variety of account authentication services to the outside world, and it is often found that the System services are not available when using it. Through the monitoring platform of Zabbix, it is found that the System is frequently Full GC for a long time, and the old heap memory is usually not fully occupied when triggered. It is found that system.gc () is called in the business code.

conclusion

There are no shortcuts to GC problems. Troubleshooting online performance problems is not easy. In addition to understanding the principles and tools described in this article, we also need to accumulate experience to truly achieve optimal performance

Due to lack of space, I will not expand on the use of common GC parameters. I will post on github: github.com/caison/cais…

reference

Scott Oaks, Java Performance: The Definitive Guide

Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (2nd edition)

Java performance tuning exercise

Getting Started with the G1 Garbage Collector

GC Reference Manual -Java Edition

Ask RednaxelaFX about the principle of G1 algorithm

Some key technologies for Java Hotspot G1 GC — Meituan Technical team