The introduction

The concept of over-generation and partition recycling was discussed in the basic part of GC, but the basic part is more based on the theory of some algorithms for GC, while this part will focus on the implementation of generational collector for a comprehensive detailed explanation. It will cover serial Collector, parallel Collector, tricolor tag, SATB algorithm, GC execution process, concurrent tag, CMS Collector and other knowledge. This paper focuses on the implementation of GC mechanism, namely Garbage Collector.

Heap space review and GC collector Overview

The scope of GC covers heap space and meta-space, and the main scope is heap space, so after a brief review of heap space, some concepts in GC will be explained, and then the GC collector.

1.1. Heap Space Review

In the previous”JVM runtime Memory partition”As mentioned, the heap space structure of the JVM depends on which GC collector is used at runtime. In all GC collectors, most of the heap space is divided into two categories: generation and partition:



As shown in the figure above, the generation reactor space will be divided into two regions: Cenozoic and senile, and Cenozoic will be divided intoEden * 1, Survivor * 2Three pieces. The new generation uses the copy algorithm, HotSpot because of the adjustmentEdenwithSurvivorThe ratio of the regions is zero8:1:1So the new generation is the most wasteful of memory10%, the maximum capacity is80% + 10% = 90%. And when theSurvivorIf the space is insufficient to store live objects, the system relies on the elderly generation to allocate the objects that meet the standards to the elderly generation.

1.2 overview of the GC collector

The garbage collection-related algorithms in the previous article are the methodology of the GC mechanism, and the garbage collector is the implementation of the GC mechanism.

But in the Java ecosystem, there are many TYPES of GC collectors, and there is no one-size-fits-all collector. This is because during actual development, we need to choose the most appropriate collector for the application, depending on the business type of the project.

But before we get to the GC collector, we need to understand a few common terms for GC collectors.

1.2.1. Interpretation of nouns in GC collector

There are some common terms in the GC collector that you have to know before you get to know the GC collector: serial collection, parallel collection, exclusive execution, concurrent execution, throughput, pause time, throughput first, response time first, and so on.

Serial, parallel and exclusive, concurrent
  • (1) serialSerialCollection: The situation where all user threads stop and a single GC thread reclaims the heap is called serial collection.
  • (2) parallelParallelCollection: A situation where all user threads stop and multiple GC threads reclaim the heap (with multi-core CPU support).
  • (3) monopolyMonopolyExecution: This is when the GC thread preempts all resources for execution and the entire application is stopped.
  • (4) concurrentConcurrentExecution: Concurrency here refers to the situation where the user thread and the GC thread execute simultaneously (alternately) without stopping a particular type of thread.
throughput

Throughput is an important indicator in performance optimization. It refers to the ratio of the CPU time spent executing user code to the total CPU time. In Java, throughput can be calculated as follows:

Throughput = total user code execution time/(total user code execution time + total garbage collection time).

If the JVM executes for 100 minutes online, where user code takes 99 minutes to execute, and garbage collection takes 1 minute, then throughput is 99min/(99min+1min)=99%.

Pause time

The pause time is the pause time for all user threads (the entire application) while the GC collector is working. Pauses can be long for GC collectors with exclusive classes. For a concurrent GC collector, the pause time of the program is shortened because the GC thread and the user thread are executed alternately, but the overall GC is not as efficient as the exclusive GC collector, so the throughput of the system is reduced.

Based on the nature of the exclusive and concurrent collectors, two new terms for tuning are introduced: throughput first and response time first. In contrast, when you design a system architecture to select a GC collector or to tune it, you end up chasing higher throughput and shorter response times.

  • Throughput first: To ensure higher throughput of the program, allow long pauses while GC occurs.
  • Response time first: To ensure a better user experience, some throughput can be sacrificed in exchange for faster response times. The shorter the pause time when GC occurs, the better.

1.2.2 overview of GC collectors in Java

In today’s official JDK, there are ten implementations of the JVM GC collector, respectivelySerial, ParNew, Parallel Insane, CMS, Serial Old (MSC), Parallel Old, G1, ZGC, Shenandoah, EpsilonEtc., as follows:



In the figure above, there are ten TYPES of GC collectors, which can be divided into generational and partitioned types based on their properties at collection time:

  • Generational collector:Serial, ParNew, Parallel Avenge, CMS, Serial Old (MSC), Parallel Old
  • Partition collector:G1, ZGC, Shenandoah

Epsilon is an exception. The garbage collector is provided by JDK11. The Java program that loads the garbage collector does not perform any GC operations during runtime, and will exit for OOM reasons once the allocated heap space is used up. The Epsilon collector is mainly used for testing before the program goes online, such as performance testing, memory stress testing, VM interface testing, etc. Choosing to load the Epsilon collector at program startup helps us filter out performance artifacts caused by the GC mechanism.

This article focuses on generational GC, so let’s look at generational collectors first. Six generational collectors, each of which works in different areas:

  • Cenozoic collector:Serial、ParNew、Parallel Scavenge
  • Generation collector:CMS, Serial Old (MSC), Parallel Old



As shown in the figure above, a line between the two indicates that two GC collectors can be used together, so there are six possible combinations:

The new generation Old generation
Serial CMS (Active) /Serial Old (Standby)
Serial Serial Old (MSC)
ParNew CMS (Active) /Serial Old (Standby)
ParNew Serial Old (MSC)
Parallel Scavenge Serial Old (MSC)
Parallel Scavenge Parallel Old

Be insane. The GC collector can be implemented as well as the Parallel Scavenge avenge.

Second, generational GC collector details

Generational GC collectors in the JVM, in addition to being divided into the new generation and the old generation, can also be divided into single-threaded and multithreaded collectors based on their collection process. Serial, Serial Old (MSC) are single-threaded collectors, while ParNew, Parallel Insane, CMS, Parallel Old are concurrent multi-threaded collectors. But let’s go through the GC collector from a generational perspective.

2.1 Detailed explanation of the new generation GC collector

The New generation collector includes the Serial, ParNew, Parallel Scavenge, and the Serial collector.

2.1.1 Serial Collector (Single thread)

Serial is the original generation collector, and it is also known as the Serial collector because it is a single-threaded GC collector. As the name implies, it runs single-threaded when GC work is performed, and the collector generates STW, which stops all user threads, when GC occurs. But because other user threads are stopped, there is no switching between threads when GC is performed. Therefore, it is very efficient to clean up on a single CPU machine. In general, for JVMS running in Client mode, this collector is a good choice for embedded GC.

SerialCollector summary:

Startup parameters:-XX:+UseSerialGC(After this parameter is enabled, the aged generation uses itMSC).

Collection action: serial GC, single thread.

Algorithm: Copy algorithm.

STW: The GC process is performed in STW.

When GC occurs, it is executed as follows:

Because the collector needs to be in the STW all the way through its GC, the user experience is poor on a system level. Be like you online see piece (point to a movie), see two minutes turn a few circles, see circle again after a period of time, repeatedly card ton…. This is obviously a difficult thing for you to accept.

2.1.2 ParNew collector (multi-threaded)

The ParNew collector is based on an evolutionary version of the Serial collector and, in the strictest sense, can be referred to as a multi-threaded version of the Serial collector, which also acts on the new generation region. The overall implementation is almost identical to the Serial collector, except that multiple thread collection is used during the GC collection phase.

ParNewCollector summary:

Startup parameters:-XX:+UseParNewGC.

Collecting actions: Parallel GC, multithreading.

Algorithm: Copy algorithm.

STW: The GC process takes place in the STW with multi-threaded collection.

When GC occurs, it is executed as follows:

Because the only difference between this collector and Serial is that it uses multiple threads, it still causes program pauses when GC occurs. But also because of the use of multithreaded recycling, it can significantly reduce the pause time of the system, resulting in a better user experience than Serial.

The collector, however, requires multi-core CPU support because it is multithreaded. The collector will enable a different number of GC threads depending on the number of CPU cores to achieve optimal garbage collection (also specified by the -xx :ParallelGCThreads parameter). But it may not be as efficient as Serial on a single-core machine.

In general, if your application is running in Server mode and the CMS collector is used in the past, then the new generation with ParNew is a good choice.

2.1.3. Parallel Avenge (multithreading)

The Parallel Insane is also a multithreaded GC collector for the new generation, but unlike the ParNew collector: ParNew controls the number of GC threads to reduce application pause times and is more concerned with the response time of the application than the Parallel Scavenge avenge, which is more concerned with the throughput of the application execution as a percentage of the total application execution time over a period of time.

Parallel ScavengeCollector summary:

Startup parameters:-XX:+UseParallelGC.

Collecting actions: Parallel GC, multithreading.

Algorithm: Copy algorithm.

STW: The GC process takes place in the STW with multi-threaded collection.

When GC occurs, it is executed as follows:

From the above summary, the PS collector and the ParNew collector do not appear to be much different. But the underlying GC framework on which they are based is completely different, and the focus is completely different. The goal of the PS collector is to allow a program to achieve a manageable Throughput, so PS is also called a Throughput first garbage collector.

The PS collector can precisely control the time and throughput ratio of GC occurrences with -xx :MaxGCPauseMillis and -xx :GCTimeRatio parameters. The main difference from the ParNew collector is that: The PS collector can also enable the JVM to enable an adaptive GC tuning policy by turning on the -xx :+UseAdaptiveSizePolicy parameter, which allows the JVM to adjust the throughput ratio and GC times based on the current system state to ensure optimal pause times and throughput.

  • So if you usePSAt collector time, if we manually set the GC time to be very small and then set the throughput ratio to be very high, wouldn’t the GC collection be perfect?
  • The answer is: no. Throughput must be sacrificed in pursuit of response time, and response time must be sacrificed in pursuit of throughput. If you set the GC time to very small by parameter, thenPSAt runtime, the Cenozoic space is reduced, such as from the original1GBTo adjust to800MBTo collect800MBThe space is bound to be faster than1GBMuch faster. But the corresponding collection frequency will increase, may be original60sCollect once, pause each collection100msAnd now that the memory has been turned down,40sOne GC occurs, and each GC pauses80msYou can compare the difference between the two:
  • 24min/1GBSpace -GC overhead:(24min/60s)*100ms=24000ms
  • 24min/800MBSpace -GC overhead:(24min/40s)*80ms=28800ms
  • So you end up with a result that while response time does decrease, throughput also decreases.

Therefore, in general, we should not manually adjust these parameters without much experience in tuning. Instead, we should turn on the JVM’s adaptive strategy and let the JVM adjust itself.

2.2. Full explanation of the old generation GC collector

The Old generation collector mainly includes CMS, Serial Old (MSC) and Parallel Old. Like the new generation collector, there are also single-thread and multi-thread collectors. Next, we analyze the Old generation collector in turn.

2.2.1 Serial Old (MSC) collector

Serial Old (MSC) is the same single-thread Serial collector as Serial collector, but MSC is a collector for the aged space, it uses a mark-collation algorithm to reclaim the aged space. At the same time, the collector can also be used as a CMS backup collector.

Serial Old (MSC)Collector summary:

Startup parameters:-XX:+UseSerialGC(After this parameter is enabled, the new generation will use itSerial).

Collection action: serial GC, single thread.

Algorithm: mark-collation algorithm.

STW: The GC process occurs in the STW, using a single thread to perform serial collection.

When GC occurs, it is executed as follows:

Serial Old (MSC) is not much different from the new generation collector Serial, and the collection process is also a single-thread Serial collection, which is the older version of Serial.

2.2.2 Parallel Old Collector (Multithreading)

The Parallel Old is an older version of the Parallel Avenge collector, which also uses multiple threads for Parallel collection and uses a mark-collation algorithm internally. Like the new generation of PS collectors, THE PO also seeks throughput first.

Parallel OldCollector summary:

Startup parameters:-XX:+UseParallelOldGC.

Collecting actions: Parallel GC, multithreading.

Algorithm: mark-collation algorithm.

STW: The GC process takes place in the STW with multi-threaded collection.

When GC occurs, it is executed as follows:

As an older version of the PS collector, PO features much the same as PS, so it is also suitable for throughput – or CPU-sensitive systems.

2.2.3 CMS collector (multi-threaded/concurrent)

The CMS collector, called ConcurrentMarkSweep, is a milestone in GC. It is the first collector to implement the concept of concurrent collection, where the GC thread works with the user thread without stopping the user thread. At the same time, this collector is the pursuit of the shortest collection time, belongs to multi-threaded collector, its internal use of mark-clear algorithm.

CMSCollector summary:

Startup parameters:-XX:+UseConcMarkSweepGC.

Collect action: Concurrent GC, multithreading in parallel.

Algorithm used: mark – clear algorithm.

STW: STW occurs during the GC process, but not the entire GC process is executed in the STW, using multithreaded collection.

When GC occurs, it is executed as follows:

As you can clearly see from the CMS execution diagram above, the collection process for CMS is significantly more complex than that for other GC collectors. The collection for CMS collectors can be divided into four steps: initial marking, concurrent marking, re-marking, and concurrent cleaning.

  • ① Initial mark: mark onlyGcRootNode directly associated with the object, the stage speed will be very fast, need inSTW.
  • ② Concurrent marking: this stage is mainly to do GC traceability (GcTracing), starting from the root node, the whole heap space is analyzed for reachability to find all viable objects, which are executed simultaneously by the GC thread and the user thread.
  • ③ Re-mark: This phase is mainly to correct the “concurrent mark” phase due to the user thread execution of the GC mark changes that part of the object, this phase needs to be inSTW, and the pause time in this phase is considerably longer than in the initial phase.
  • (4) Concurrent removal: in this stage, garbage objects other than living objects are mainly removed. This stage does not need to stop the user thread and is executed concurrently.
  • PS: There are actually two detailed operations between concurrent marking and re-marking: pre-cleaning and terminable pre-cleaning.

During the entire collection process, except for the initial marking and re-marking phases, all collection actions are performed concurrently with the user thread. As a result, the CMS collector’s pauses during GC are very brief and are superior to previous collectors in terms of user experience. Because of the concurrent collection and low pause delay nature of the CMS collector, it is also called concurrent low pause collector in some places.

From the above summary, CMS sounds great, but in reality, CMS has several fatal disadvantages: floating garbage generation and failure to collect, CPU dependency, and memory fragmentation after GC is complete.

  • ①CMS is a collector completely based on the development of multi-threaded environment. By default, the number of threads opened in the recycling process is(number of CPU cores + 3) / 4“, which means: a machine with eight cores should at least be turned on2 ~ 3GC threads. And when the number of CPU cores is less4The CMS GC thread can have a significant impact on user thread performance because half of the CPU is allocated to perform GC collection.
  • (2) due to the recycling of CMS collector is concurrent remove rubbish object, therefore, in the sweep phase user thread is still in the execution, and the user threads execute will inevitably cause the new waste, but this part of the new waste generated in the object cannot be marked, so can only wait until the next time the GC occurs to recycling, and this part of the garbage is known as “floating garbage”.
  • ③ Because CMS adopts mark-clear algorithm, a large number of memory fragments will be caused after the collection work is completed.
    • Why not use the standard integer algorithm? Because the CMS executes concurrently, all object references in the user thread need to be changed if the live objects are compressed into memory, which can be extremely complex and inefficient to implement.

Because CMS can generate floating garbage and memory fragmentation during collection, CMS generally must have an alternative collector as a backup, and the options are one and only: That is Serial Old (MSC), which is triggered by the Serial Old (MSC) collector when memory is too fragmented to allocate new objects, or when the ratio of living objects + floating garbage reaches a specified threshold after a collection. There are three key parameters that determine whether Serial Old (MSC) is triggered:

  • -XX:CMSInitIatingOccupancyFaction: You need to specify a percentage that will be triggered when the ratio of live objects + floating garbage reaches this valueMSCWork.
  • XX:UseCMSCompactAtFullCollection: This parameter is enabled by default. This parameter is triggered when the memory is too fragmented to allocate new objectsMSChappenFullGC.
  • XX:CMSFullGCsBeforeCompaction: This parameter can set the number of intervalsFullGCAfter the occurrence of a defragmentation of memoryFullGC(MSCThe default is0Every time,FullGCWill triggerMSCRecycling.

2.3 Summary of generational GC collector

Currently, the GC collectors that have been analyzed can be divided into the new generation and the old generation according to their generational characteristics. Based on the thread Angle, it can be divided into single-thread serial, multi-thread parallel collector. In terms of attention, it can be divided into throughput first and response time first.

In general, if your program is more focused on user experience, you can work with a response-first collector because it doesn’t pause for very long. However, if your program does not require a lot of interaction with the user, such as batch processing, order processing, report computation, scientific computation, etc., you can use throughput first collector, because high throughput can use CPU resources efficiently.

Three, collector combination scheme, CMS three-color marking and cross-generation reference

3.1 GC combination scheme analysis

In the second section, we looked at each of the different GC collectors in the JVM in detail, but which combination would be better for our program during actual development? There is no such thing as a perfect combination. The combination you choose to use as a collector for Your Java application depends more on your business scenario.

If your application is looking for low latency and high user interaction, then you can use a combination of ParNew + CMS (this was also an early choice on Taobao, but later used a self-developed JVM).

The Parallel Insane + Parallel Old collector is for you if your application is driven by high throughput and does a lot of background computing.

But when your program is written and deployed on a single or dual-core machine, the classic Serial + Serial Old combination is definitely your best choice.



Once again, focusing on this diagram, it’s worth noting that before JDK1.8, dashed line combinations were available, but after JDK1.8, the red line combination was removed and considered a deprecated collector combination (though it could be used if it was used). By JDK1.9, the red line combination was removed, meaning that it could no longer be specified as a collector in 1.9. In the later JDK14, the green line combination was also deprecated and officially removedCMSCollector, to giveG1Paving, usingG1Instead of aCMS.

3.1.1 Why can’t THE PS collector be used with the CMS collector?

Because in the HotSpot, the bottom there is a generational GC, the framework of Serial/SerialOld ParNew/CMS are implemented based on the framework, and within the framework of the new generation collector and old generation collector is used can match each other, this is the so-called mix – and – match rules. However, when the PS collector was implemented, it found that the original generational GC framework was not suitable, so it finally adopted its own special framework to implement, so the PS collector is not in the previous generational GC framework. Therefore, PS cannot be used with a CMS that uses that framework.

3.2. Three-color marking algorithm

Tricolor marking algorithm is a kind of concurrent marking algorithm widely used since CMS collector, it can make JVM in GC, only a short STW can realize an algorithm of living object marking. This is the fundamental reason why CMS in the JVM, and subsequent generation-neutral collectors, can achieve low latency.

Three-color marking idea: in this algorithm, the object is divided into black, white and gray colors, with the following definition: Black: the object that has been marked is still alive. Grey: The current object has been tagged, but the associated node (attribute member) has not been tagged. White: Unmarked objects or objects with no references (junk objects).

3.2.1. Execution process of tricolor marking

Without further ado, first on a three-color marking of the implementation process diagram:

  • A GC collector that implements the three-color marking algorithm creates three collections at startup: black, white, and gray, with all objects in the white collection at the beginning.
  • When GC occurs, a transient occursSTW, will all withGcRootsDirectly connected objects are moved into the gray set.
  • After the concurrent execution, the object in the gray set is traversed, and the object is marked alive according to the reachability analysis algorithm. When all the members of an object are marked, the object will be moved to the black set. At the same time, the marked members of the object are moved from the white collection to the gray collection.
  • Repeat the previous step until the gray collection is completely empty of objects.
  • Flags are fired again after all objects are completedSTWThrough thewrite-barrierWrite barriers detect whether objects have changed and re-mark them if they have, correcting “mislabeling” during concurrent marking.
  • Perform cleanup concurrently, reclaiming all objects in the white collection (because according toGCRootsAfter the node’s reachability analysis, all surviving objects will be moved from the white set to the black set, so the objects still in the white set must be garbage objects, which are the objects that need to be recycled).
  • Finally, after the cleanup is complete, indicating the end of the entire GC process, the marker is reset and all objects are put into the white collection again, waiting for the next GC to arrive.

3.2.2 Three-color marking – Mislabeling problems caused by concurrent marking

A GC collector using the tricolor tagging algorithm, in pursuit of low latency, usually ends STW after tagging objects directly associated with GCRoots, and instead marks other objects concurrently. But because the concurrent markup is done by the GC thread working in conjunction with the user thread, it can lead to the following:

One of the marked black objects suddenly breaks the reference to another object, causing another already marked black object to suddenly become garbage.

But because the object is already marked, the collector does not re-mark the object, and when the cleanup occurs, the collector does not reclaim the object because it was originally marked black. This is known as “mislabeling/mislabeling/multi-labeling” due to tricolor tagging, also known as floating garbage from concurrent tagging.

This problem is not a big deal, because the floating garbage generated by this mislabeling will still be collected in the next GC. It is a fact of the JVM that garbage will be “killed” sooner or later, so don’t worry too much about this problem.

3.2.3 Three-color marking – Missing marks caused by concurrent execution

Suppose that during the execution of the tricolor tag, the following situation occurs:

① A user thread breaks an unmarked white object connection during execution and then establishes a reference connection with an already marked black object. The diagram below:



The white object breaks the reference to the gray object on the left and establishes a new reference to the black object on the right.

② A user thread breaks the reference connection between a grey object and an unmarked white object in the process of execution just as the GC thread marks it. Then, when the GC marks the grey object and marks it as black, the white object is re-referenced. The diagram below:



Before GC marks, the white object is disconnected from the gray object, and four seconds later GC marks the gray object, at which point the white object is re-referenced to the black object.

In both cases, the GC thread does not re-mark the white object and its member objects because the “parent” of the re-referenced white object has already been marked black, so the white objects will remain in the white collection. The end result is that those living objects that still have references are “misjudged” as garbage objects to be removed. This situation directly affects the correctness of the application and is unacceptable.

Condition 1: The gray object is disconnected from the white object (either directly or indirectly). Condition 2: An object marked black is re-referenced to a white object. Missing marks can occur only when an object satisfies both conditions. To understand the last simple code example:

Object X = obj.fieldX; // get the obj.fieldx member object
obj.fieldX = null; // Break the reference to obj.fieldx
objA.fieldX = X; // Establish a reference between the X white object and the black object objA
Copy the code

From the point of view of the code above, if obj is a gray object, get its member fieldX and assign it to variable X so that the instance in the heap keeps a reference to variable X. Obj. FieldX = obj; obj = obj; obj = oba; obj = obj;

Gray object obj, white object obj.fieldx /X, black object objA. White object X disconnects from the grey object before GC marks the grey object obj member property, and then “hooks up” with the black object objA. At this point, white object X will remain in the white collection forever until the cleanup phase arrives and is “misjudged” as garbage collected.

In fact, the idea to solve the problem of missing marks is very simple, and before “concurrent programming” to solve the problem of thread safety, there are three necessary conditions for thread safety, destroy any of the conditions, thread safety problems will not appear. As we have just analyzed, there are two necessary conditions for the object missing label problem, so we only need to destroy any of them. For example, in the above case, we can simply record the X object by special means and then iterate over it once again as a gray object.

  • How does the collector with the three-color marking algorithm specifically solve the problem of missing marks?
  • CMS: Incremental update + write barrier
  • G1: STAB + Write barrier
  • ZGC: Read barrier

In this paper, the solution to the missing mark of CMS is firstly analyzed, and the solution to the missing mark problem of G1 and ZGC collector is described in the next article.

3.2.4 CMS to solve the problem of missing marks: incremental update + write barrier

Before we look at the write barrier, let’s take a look at HotSpot’s implementation of assigning values to object members. The general logic is as follows:

void oop_field_store(oop* field, oop new_value) { 
    *field = new_value; // Assignment: replace old value with new value
} 
Copy the code

The so-called write barrier refers to adding some logic before and after the assignment (similar to SpringAOP’s idea of tangle-oriented pre – and post-processing), as follows:

void oop_field_store(oop* field, oop new_value) {
    pre_write_barrier(field); // Write the front barrier
    *field = new_value; // Assignment: replace old value with new value
    post_write_barrier(field, value);  // Write the back barrier
} 
Copy the code

The CMS collector solves the missing mark problem by implementing the logic of incremental update in the post processing of the write barrier.

Incremental updating (Increment Update) are specially for new reference object, when an unmarked white object by other objects to reference, the white object can be recorded, as follows:

// Write the back barrier
void post_write_barrier(oop* field, oop new_value) {  
  if($gc_phase == GC_CONCURRENT_MARK && !isMarkd(field)) {
      remark_set.add(new_value); // Record the newly referenced object (white object)}}Copy the code

From above the source code can watch out: for the assignment of new references, in writing after the barrier will be put into a specific set of records, such as concurrent, after the completion of phase GCRoots traversal of the marking in to mark phase will go to find a set of reference, then the source tag is gray, and get back to scan these objects.

By writing barriers and incremental updates, CMS breaks the second condition of the previous analysis of missing marks: the object marked black re-establishes a reference relationship with the white object. By means of incremental update, the “source” of these re-established references will be restored to gray objects, and then will be marked again in the re-marking stage. Meanwhile, in order to avoid the problem of missing marks again in the re-marking stage, STW must occur in the re-marking stage. For a detailed implementation of write barriers in HotSpot, see BarrierSet source Code Analysis.

3.3. Cross-generation References

Cross-generation references refer to objects in the generation space that refer to objects in the generation, or objects in the generation that refer to objects in the generation. In this case, it is impossible to scan the surviving objects from the new generation to the old generation in the accessibility analysis, because the whole stack scan will occur, and the efficiency is bound to be very low.

In the HotSpot VIRTUAL machine, to address cross-generation references, a small space in memory is dedicated to maintaining these special references, so that the GC does not have to scan the entire heap space. The small space opened up is called memory set, card table.

3.1 Remember Set

When we all know that in the event of a new generation of GC by root of algorithm to judge the garbage objects, and then to collection of non living objects, but if there is old generation object refers to the new generation object, then according to the characteristics of root of algorithm, the old generation can be join scan range, this down to a new generation of GC cost too much. Therefore, in order to solve the problem of cross-generation reference, the recordset data structure is introduced in the new generation to record the reference pointer set from the non-collection area to the collection area, avoiding adding the whole old age to the scan scope when judging the object’s survival through the root reachable algorithm.

During GC, the GC collector only needs to determine whether a pointer to a collection region exists in a non-collection region from the memory set without a detailed root search process. Memory set can be realized according to the different memory size: (1) word width/word length accuracy: accurate to every word wide (32 bit or 64 bit), the precision of each reference pointer (2) the objects across generations: accurate to each object, the object contained in the field of inter-generational pointer reference (3) card: precision accurate to each piece of memory area, there are objects in memory areas across generations pointer

3.2. Card Table

Card table is the implementation of the third precision of the memory set, and is also the implementation of the memory set in the HotSpot VIRTUAL machine.

In HotSpot, the card table is implemented using a byte array: CARD_TABLE[this addredd >>9]=0, each element in the array corresponds to its identified memory area, called the card page. HotSpot uses a card page size of 2^9 or 512 bytes, which means that each consecutive 512 bytes in memory is treated as a card page as an element of the card table.

If an old generation object references a new generation object, the card element for that new generation object is set to 1, and if not, to 0. (GC collectors after G1 are not generational, so memory sets after G1 are implemented not through arrays, but through hash table structures.)

The JVM also maintains card pages through write barriers.

4. GC log interpretation

For GC mechanisms, this area is key for programmers to tune the JVM, which must read the logs generated after GC occurs. The parameters associated with GC logging in the JVM are as follows:

  • 1.-XX:+PrintGCor-verbose:gc: Prints GC logs
  • 2.-XX:+PrintGCDetails: Prints detailed GC logs
  • 3.-XX:+PrintGCTimeStamps: Outputs the GC timestamp (in base time)
  • (4)-XX:+PrintGCDateStamps: Outputs the GC timestamp (in the form of a date)
  • 5.-XX:+PrintHeapAtGC: Prints heap information before and after GC occurs
  • 6.-Xloggc:/xxx/xxx/xx.log: Path to save GC log files

The -xx :+PrintGC or -verbose:gc parameter can only output the total change of heap space during gc.

// Start parameters: -xMS8m -XMx8m -xx :+PrintGC
public class GC {
    static void newObject(a){
        for (int i = 0; i <= 10000; i++)
            new Object();
    }

    public static void main(String[] args) throws InterruptedException {
        for(;;) { newObject(); }}}Copy the code

After executing the above case, you should get the following log in your console:

[GC (Allocation Failure)  1527K->868K(7680K), 0.0011957 secs]
[GC (Allocation Failure)  1924K->1201K(7680K), 0.0032349 secs]
......
Copy the code

We randomly selected one of the output logs for analysis, as follows:

[GC[1] (Allocation Failure)[2] 1527K[3]->868K[4](7680K)[5], 0.0011957 secs[6]]

This log only prints a general view of the heap space. The log information is interpreted as follows:

  • [1]: Indicates the type of the GC
    • GCSaid:Young GCAnd the type of GC occurring in Cenozoic
    • Full GC: Global GC, The GC types of Cenozoic generation, aged generation and meta-space
  • [2]: Causes of this GC
    • Allocation Failure: GC caused by a failure to allocate a newly created object
    • Metadata GC Threshold: GC caused by metadata data reaching the allocated space threshold
    • System.gc(): Manually passed in the programSystem.gc()Triggered by the GC
    • .
  • [3] : The amount of heap used before GC occurs
  • [4] : The amount of heap space used after GC occurs
  • [5] : Total size of heap space
  • [6] : Duration of GC

The diagram below:

The rule of a GC log is as follows: GC type +GC cause + Heap space + Time.

4.1 interpretation of GC log details

As mentioned earlier, the -xx :+PrintGC parameter only prints the overall change in the heap at GC, which is almost impossible to extract useful information from online emergencies. For this reason, the -xx :+PrintGCDetails argument is generally used online to get GC log details. Examples are as follows:

// Start parameters: -xMS8m -XMx8m -xx :+PrintGCDetails
public class GC {
    // As GC Roots
    static List<Object> listObject = new ArrayList<>();
    
    // Populate the new generation space with objects
    static void newObject(a){
        for (int i = 0; i <= 100000; i++)
            new Object();
    }
    
    // The object created keeps a reference to GCRoots, enough for the object to be promoted into the tenured generation space
    static void oldObject(a){
        for (int i = 0; i <= 10000; i++)
            listObject.add(new Object());
    }

    public static void main(String[] args) throws InterruptedException {
        for(;;) { newObject(); oldObject(); }}}Copy the code

After running the above program, you can get the following log information (manually formatted for easy observation) :

[GC (Allocation Failure) [PSYoungGen: 1527K->492K(2048K)]
    1527K->892K(7680K), 0.0038507 secs] 
    [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC (Allocation Failure) [PSYoungGen: 1548K->483K(2048K)]
    1948K->1174K(7680K), 0.0009940 secs] 
    [Times: user=0.00 sys=0.00, real=0.00Secs] omits most logs of the same type....... [Full GC (Ergonomics) [PSYoungGen: 2016K->0K(2048K)] 
    [ParOldGen: 4822K->4807K(5632K)] 6839K->4807K(7680K), 
    [Metaspace: 3625K->3625K(1056768K)], 0.0393051 secs] 
    [Times: user=0.06 sys=0.00, real=0.04 secs]
[Full GC (Allocation Failure)[PSYoungGen: 693K->693K(2048K)] 
    [ParOldGen: 5245K->5226K(5632K)] 5938K->5919K(7680K), 
    [Metaspace: 3626K->3626K(1056768K)], 0.0312005 secs] 
    [Times: user=0.03 sys=0.00, real=0.03 secs]
Heap
 PSYoungGen total 2048K, used 754K [0x00000000ffd80000,...). eden space 1536K,49% used [0x00000000ffd80000,...). from space 512K,0% used [0x00000000fff00000,...). to space 512K,0% used [0x00000000fff80000,...). ParOldGen total 5632K, used 5226K [0x00000000ff800000,...). object space 5632K,92% used [0x00000000ff800000,...). Metaspace used 3657K, capacity 4540K, committed 4864K, reserved 1056768Kclass space used 402K.capacity 428K.committed 512K.reserved 1048576K
Copy the code

From the above GC log, it can be seen that after the program runs, apart from triggering a new generation OF GC, FullGC is finally triggered with more and more surviving objects in the later period.

At the end of the log, the usage of each Java memory space is also displayed, such as the usage of Eden, form, and TO in the new generation, the usage of space in the old generation, and the usage of metadata space.

Let’s start with a normal GC log and explain the information in the above logs.

4.1.1 YoungGC Log description

To start with, extract a normal GC log from the above log:

[GC[1] (Allocation Failure) [2][PSYoungGen[3]: 1527 k [4] [5] – > 492 k (2048 k) [6]], [7] – > 892 k, 1527 k [8] [9] (7680 k), 0.0038507 secs [10]] [Times: User = 0.00 [11] [12] sys = 0.00, real = 0.00 [13] secs]

The interpretation of this GC log is as follows:

  • [1]: The type of this GC (normalYoung GC)
  • [2] : The cause of this GC (GC caused by allocation failure)
  • [3] : Responsible for the collector and GC type of this GC (new generation GC of PS)
  • [4]: Used size of Cenozoic space before GC (1527KB)
  • [5]: Used size of Cenozoic space after GC recovery (492KB)
  • [6]: Total size of Cenozoic space allocation (2048KB)
  • [7]: Used size of Java heap space before GC (1527KB)
  • [8]: Used size of Java heap space after GC collection (892KB)
  • [9]: Total size allocated to the Java heap space (7680KB)
  • [10]: Total time of the GC process (0.0038507A second)
  • [11]: User time spent during the GC process (0)
    • It’s too short to say exactly how long it takes, not really zero.
  • [12]: System time of this GC process (0)
  • [13]: Actual time of the GC process (0)

The entire YoungGC log is shown in the figure above, where the rule is: GC type +GC cause +GC collector + Description of new generation + description of heap space + description of time.

4.1.2 FullGC log details

Here’s another excerpt from the FullGC log:

[Full GC[1] (Ergonomics[2]) [PSYoungGen[3]: 2016K[4]->0K[5](2048K)[6]] [ParOldGen:[7] 4822K[8]->4807K[9](5632K)[10]] 6839K[11]->4807K[12](7680K)[13], [Metaspace:[14] 3625K[15]->3625K[16](1056768K)[17]], secs[18]] [Times: User = 0.06 [19] sys = 0.00 [20], real = 0.04 secs [21]]

  • [1]: The type of the GC (globalFull GC)
  • [2] : The cause of this GC (it is expected that the triggered GC cannot be stored in the next allocation)
  • [3] : Collector in charge of the New generation GC (PS)
  • [4]: Used size of Cenozoic space before GC (2016KB)
  • [5]: The occupied size of Cenozoic space after GC recovery (0KB)
  • [6]: Total size of Cenozoic space allocation (2048KB)
  • [7] : Collector (PO) responsible for this generation of GC
  • [8]: Used size of tenured generation space before GC (4822KB)
  • [9]: Space occupied by the old generation after GC collection (4807KB)
  • [10]: Total size of space allocated to the aged generation (5632KB)
  • [11]: Used size of Java heap space before GC (6839KB)
  • [12]: Used size of Java heap space after GC collection (4807KB)
  • [13]: Total size allocated to the Java heap space (7680KB)
  • [14]: Recycling area (MetaspaceMetadata space)
  • [15]: Used size of metadata space before GC (3625KB)
  • [16]: How much metadata space is occupied after GC collection (3625KB)
  • [17]: Total size allocated to metadata space (1056768KB)
  • [18]: Total time of the GC process (0.0393051A second)
  • [19]: User time spent during the GC process (0)
  • [20]: System time of this GC process (0)
  • [21]: Actual time of the GC process (0.04A second)

The logs of each FullGC are as follows: GC type +GC cause + Description of new generation + description of old generation + description of heap space + metadata space + description of time consuming.

4.1.3 Causes of GC

The previous logs have seen several causes of GC, such as Allocation Failure, Ergonomics, Metadata GC Threshold, etc. How many causes can trigger GC? In the/SRC /share/vm/gc_interface/gcCause. CPP file (based on OPenJDK1.8 source code), the runtime GC trigger is defined as follows:

#include "precompiled.hpp"
#include "gc_interface/gcCause.hpp"

const char* GCCause::to_string(GCCause::Cause cause) {
  switch (cause) {
    case _java_lang_system_gc:
      return "System.gc()";

    case _full_gc_alot:
      return "FullGCAlot";

    case _scavenge_alot:
      return "ScavengeAlot";

    case _allocation_profiler:
      return "Allocation Profiler";

    case _jvmti_force_gc:
      return "JvmtiEnv ForceGarbageCollection";

    case _gc_locker:
      return "GCLocker Initiated GC";

    case _heap_inspection:
      return "Heap Inspection Initiated GC";

    case _heap_dump:
      return "Heap Dump Initiated GC";

    case _no_gc:
      return "No GC";

    case _allocation_failure:
      return "Allocation Failure";

    case _tenured_generation_full:
      return "Tenured Generation Full";

    case _metadata_GC_threshold:
      return "Metadata GC Threshold";

    case _cms_generation_full:
      return "CMS Generation Full";

    case _cms_initial_mark:
      return "CMS Initial Mark";

    case _cms_final_remark:
      return "CMS Final Remark";

    case _cms_concurrent_mark:
      return "CMS Concurrent Mark";

    case _old_generation_expanded_on_last_scavenge:
      return "Old Generation Expanded On Last Scavenge";

    case _old_generation_too_full_to_scavenge:
      return "Old Generation Too Full To Scavenge";

    case _adaptive_size_policy:
      return "Ergonomics";

    case _g1_inc_collection_pause:
      return "G1 Evacuation Pause";

    case _g1_humongous_allocation:
      return "G1 Humongous Allocation";

    case _last_ditch_collection:
      return "Last ditch collection";

    case _last_gc_cause:
      return "ILLEGAL VALUE - last gc cause - ILLEGAL VALUE";

    default:
      return "unknown GCCause";
  }
  ShouldNotReachHere(a); }Copy the code

From HotSpot source code, there are many different causes of GC being triggered. There are more than 20 possible causes in GC logs.

  • System.gc(): manually called in a Java programSystem.gc()Method to trigger the GC.
  • FullGCAlot: GC that fires periodically (JDK beta only, used for JVM development).
  • ScavengeAlot: GC that fires periodically (JDK beta only, used for JVM development).
  • Allocation ProfilerUse:-XaprofParameters run the program and trigger a GC when the JVM terminates (JFK1.8 is deprecated).
  • JvmtiEnv ForceGarbageCollection: forces a call to the local method librarynativeMethods:ForceGarbageCollection(jvmtiEnv* env)Triggered GC.
  • GCLocker Initiated GC: If the thread is executing in a JNI critical section, when GC is neededGCLockerWill prevent GC from occurring and prevent other threads from entering the JNI critical section until a GC is triggered when the last thread exits the critical section.
  • Heap Inspection Initiated GCThrough:jmapThe GC that is triggered when the command performs heap detection.
    • Heap detection command:jmap -histo:live <pid>
  • Heap Dump Initiated GCThrough:jmapThe GC that is triggered when the command takes a heap dump.
    • Heap dump command:jmap -dump:live,format=b,file=heap.out <pid>
  • WhiteBox Initiated Young GC: Triggered actively during the testYoung GC(Need to be increasedWhiteBoxtheAgentTo use).
  • Update Allocation Context Stats: This GC is only used to get updated allocation context statistics.
  • Allocation Failure: GC triggered by an object allocation failure due to insufficient memory.
  • Tenured Generation Full: GC triggered by insufficient memory in the aged generation.
  • Metadata GC Threshold: GC triggered by insufficient metadata space memory.
  • CMS collector related GC log information:
    • No GC: used to indicateCMSThe concurrent marking phase of.
    • CMS Generation Full: A FullGC occurs in the CMS.
    • CMS Initial Mark: Indicates the log information during the CMS initial marking phase.
    • CMS Final Remark: Indicates the log information about the CMS re-marking phase.
    • CMS Concurrent Mark: Indicates the log information about the CMS concurrent marking phase.
  • The two I didn’t get:
    • Old Generation Expanded On Last Scavenge
    • Old Generation Too Full To Scavenge
    • Leave a comment in the comments section if you understand these two things.
  • Ergonomics: GC triggered when space allocation is guaranteed in PS+PO combinations.
  • G1 collector related GC log information:
    • G1 Evacuation Pause: no idle in G1regionThe GC that is triggered when a partition causes an allocation failure.
    • G1 Humongous Allocation: noHumongousThe GC that is triggered when a large object is allocated.
  • Last ditch collection: GC that is triggered when metadata space allocates data and allocation fails and memory cannot be extended further.
  • ILLEGAL VALUE - last gc cause - ILLEGAL VALUE: Normally, this information is invisible.
  • unknown GCCause: GC triggered for unknown (undefined) reasons.

5. Summary of GC generation

In this chapter, we comprehensively describe some basic concepts of GC, generation collector, collection process of each collector, CMS collector and its execution process, three-color marking algorithm, three-color marking – missed mark/multi-mark problem, YoungGC, FullGC log interpretation, causes of GC induction and so on.

There is no such thing as a “best GC” in the JVM’s GC architecture, and the best solution can be achieved by using the right GC collector for different scenarios. The comparison of the various GC collectors is as follows:

The GC collector The GC properties Function area GC algorithm features Application scenarios
Serial Serial recovery The new generation Replication algorithm Speed of response priority Single-core machine /client program
Serial Old Serial recovery Old generation Standard and integral algorithm Speed of response priority Single-core machine /client program
ParNew Parallel recovery The new generation Replication algorithm Throughput priority A program that calculates much/little interaction
Parallel Scavenge Parallel recovery The new generation Replication algorithm Throughput priority A program that calculates much/little interaction
Parallel Old Parallel recovery Old generation Standard and integral algorithm Throughput priority A program that calculates much/little interaction
Parallel Old Parallel/concurrent collection Old generation Bid – clearing method Speed of response priority Programs that interact too much/calculate too little