An in-depth look at how the G1 garbage collector is implemented

Algorithms related to the garbage collector

Before looking at G1, let’s briefly review the algorithms associated with the garbage collector. Garbage collectors are generally referred to more as Tracing Garbage collections than Reference Counting Garbage collectors. Tracking the garbage collector USES the accessibility analysis method to determine which objects will be recycled, often choose some object as GC Roots, if the object can be directly or indirectly by GC Roots of object references, argues that the object can reach object (survival) cannot be recycled, otherwise, the object is inaccessible (garbage object) to be recycled.

1.1 Three-color labeling algorithm

When determining which objects in memory are garbage objects, the simplest marking algorithm can be adopted, that is, each object in memory is given a special marking bit, and the marked object is considered as a viable object, otherwise, it is garbage object. Then, the object graph can be recursively traversed from the object set of GC Roots. Mark objects in the object graph if they can be referenced directly or simply by objects in GC Roots. In theory, the algorithm can determine which objects in memory are alive and which are garbage, but the entire process must be paused and all memory regions processed by the application.

In order to solve the above problems, Dijkstra et al. proposed tri-color Marking algorithm in the paper on-the-fly Garbage Collection: An Exercise in Cooperation. Languages like Go, JavaScript, and Java all use variations of the three-color tag algorithm for memory reclamation.

** Three-color marking algorithm will create white, gray, black three sets, three sets respectively store only white objects, gray objects, black objects. White objects, representing objects that have not yet been marked or that have been marked and identified as garbage; The gray set represents the object in the tag, that is, the object graph has traversed itself, but has not finished traversing its reference object; Black objects that are marked and confirmed to be alive (normally, the color of the object mark changes only from white to gray and from gray to black). At first, the black set is usually empty, the gray set contains objects directly referenced by GC Roots, and other objects are in the white set. An object can only be in one of the white, gray, or black sets at any one time. ** The processing flow of the usual three-color marking algorithm is as follows:

1. Move the objects directly referenced by GC Roots from the white set to the gray set for all objects except GC Roots at first.

2. Take a gray object from the gray set and process the objects referenced by the object in turn. If it does not reference any object, it is moved directly into the black collection; If the object referenced by the gray object is in the white set, it will be moved to the gray set. Otherwise, it will not be processed directly. When all the objects referenced by the gray object are processed, it will be moved to the black set.

3. Repeat the process in Step 2 until the grey set is empty.

4. After the processing of the above steps, GC Roots and objects in the black set are alive objects, while objects in the white set are garbage objects. The last thing to do is to clean the garbage objects in the white set.

The figure below shows the processing flow of the three-color labeling algorithm when there are 8 objects except GC Roots.

At first, all objects except those in GC Roots are in the white set on the right.
After the objects directly referenced by GC Roots are moved from the white set to the gray set, A objects and F objects have been moved from the white set to the gray set.
After processing the B object referenced by an object in the gray set, the B object has moved from the white set to the gray set.
After processing C objects and D objects referenced by A objects in the gray set. At this point, object A has been moved from the gray set to the black set, and object C and object D have been moved from the white set to the gray set.
After processing the remaining B objects, C objects, D objects and F objects in the gray set, the images of B objects, C objects, D objects and F objects have been moved from the gray set to the black set.
After the above processing, the gray set is empty, and the three-color marking stage ends and reaches the cleaning stage. The E object, G object and H object in the white set are cleaned, and the final result is shown in the figure below.

1.2 Shortcomings of the three-color marking algorithm

If the application thread runs with the GC thread of the tricolor labeling algorithm, object mislabeling and missing labeling may occur. Object mislabeling is when an object that was originally garbage is marked black and considered alive. This occurrence does not cause an application error, but simply delays garbage collection until the next garbage collection. The object missing label means that the object originally marked as black is omitted and not marked, which eventually leads to the object being garbage collected and recycled in the white collection. Once this happens, the application will encounter an unknown exception, which can be insignificant or fatal.

Take a look at the example above to see how missing marks can occur.

Assume that before the GC thread is ready to mark the next step, the object’s mark state looks like the figure above. At this point, the GC thread will next process the F object in the grey set, since the F object does not reference any objects it will move directly into the black set, and the whole grey set will end with an empty flag. But what if the application thread adds a reference from F to G before the GC thread has finished moving F from gray to black?

Since the F object has finished marking (the actual GC thread already considers the F object to be black), the F object will eventually be successfully moved from the gray collection to the black collection. The GC thread will not be aware of the application’s new F object to G object reference, resulting in G object missing flag. In fact, producing a missing mark must satisfy one of the following two conditions.

The application thread adds a reference to the black object to the white object during the GC thread markup

The application thread removes the reference from the gray object to the white object during the GC thread tag

The missing mark example above is actually the first case, In the 3.2.1 Incremental approache section of Uniprocessor Garbage Collection Techniques, GC is divided into Snapshot-at-beginning according to different points of concern when dealing with missed standard Collectors and Incremental Update collectors. Snapshot-at-beginning collectors are concerned with the first case, and Incremental Update collectors are concerned with the second, G1 belongs to snapshot-at-beginning collectors, and CMS belongs to Incremental Update collectors. The G1 concurrent marking process deals with the application thread adding a reference from a black object to a white object. When a black object references a white object, the black object is reset to grey (technically a pre-write barrier). The CMS marking process is concerned that the processing thread deletes the reference from the gray object to the white object, that is, when the gray object deletes the reference from the white object, the white object will be directly gray (technical implementation is a post-write barrier).

What kind of technology is used to deal with the above two cases? It is simply a matter of making the GC thread aware of changes in object references, known as write barriers. The write barrier mentioned here is not the write barrier at the hardware level, but the write barrier at the software level. Its essence can be understood as adding a section before the write operation of reference assignment, which can be divided into pre-write barrier and post-write barrier according to the joining time of the section point. Below is a pseudo-code implementation of the write barrier used in G1 (from [HotSpot VM] for advice on how G1 algorithm works).

void oop_field_store(oop* field, oop new_value) {
  pre_write_barrier(field);             // pre-write barrier: for maintaining SATB invariant
  *field = new_value;                   // the actual store
  post_write_barrier(field, new_value); // post-write barrier: for tracking cross-region reference
}
Copy the code

G1 uses pre-write barriers to ensure the integrity of snapshot-at-the-beginning SATB (SATB) to be processed during concurrent marking (details will be analyzed later on how to implement G1 SATB). That is, when the GC thread tracks the snapshot of the object graph generated at the beginning of marking, Changes made by application threads to the object graph snapshot are aware of through the pre-write barrier. G1, on the other hand, uses post-write barriers to maintain new inter-zone references generated by application threads that need to be tracked during concurrent markers (this will be explained later when analyzing G1’s RSet).

1.3 Object clearing implementation

When the object is marked, the object can be cleared. In the concrete implementation, three methods can be used to clear, copy and compress marks. Token clearing is the simplest and most efficient, because it removes junk objects directly, but it also brings the problem of memory fragmentation, and object allocation has to use the free space list algorithm rather than the efficient pointer collision algorithm. The token copy algorithm typically takes up 50% more space by always using half of memory to store objects and leaving the other half empty. When garbage is collected, the surviving objects in the used space are copied directly into the empty memory and then directly empty the previously used half of memory. The tag compression algorithm takes into account the advantages of tag clearance and tag copy algorithm. After garbage collection, the surviving objects will be moved accordingly and the fragmented memory space will be compressed as far as possible.

2 generational garbage collector

Before looking at the G1 garbage collector, it is necessary to briefly review the other garbage collectors in HotSpot VM. The other garbage collectors in HotSpot VM prior to G1 were based on the generation and generation of garbage collection, collectively referred to as generational garbage collection. G1, on the other hand, is a generational and partitioning garbage collector.

2.1 Generational Garbage Collector Garbage collection process

The generational garbage collector divides Heap into Young Generation and Old Generation, and before JDK1.8 there was Permanent Generation. The New generation is further divided into Eden, From Space and To Space. The size of “From Space” and “To Space” is equal To each other and is called Survivor Spaces.

The division of Heap into new and old generations is based on the weak generation assumption, which can be observed in Java applications as well as other applications.

1. Most assigned objects are not referenced long term (considered alive) i.e. they die young.

2. Old objects rarely hold references from new objects.

Generation garbage collection is relatively frequent, and the algorithms used are efficient and fast, because the young generation space is usually small and may contain many objects that are no longer referenced. The garbage collection frequency of the old generation is relatively low, but because the old generation occupies more memory, garbage collection of the old generation is usually more time-consuming. The new generation and the old generation store objects of different ages respectively. Generally, the objects that have just allocated memory are stored in the new generation. After each garbage collection, if the object is still alive, its age will be increased by 1.

As the garbage collection of the new generation is relatively more frequent, the garbage collection of the new generation pays more attention to the timeliness of garbage collection, and usually adopts the replication algorithm or the tag clearance algorithm to deal with garbage collection. The garbage collection of the aged generation pays more attention to the spaciality of garbage collection, that is, whether more continuous memory can be released after garbage collection, and usually adopts compression algorithm to process garbage collection.

Let’s take a quick look at how objects are allocated and transferred between Eden, Survivor, and Old Generation.

Any new objects are allocated to the Eden space of the new generation, and when the Eden region cannot accommodate new objects, a Young GC is triggered. Both Survivor Spaces are initially empty (below is the case after several GCS).

Objects that are still referenced in Eden Space during the Young GC (survivable objects) are copied to the first Survivor Space (S0 Survivor Space). Unreferenced objects in Eden space will be deleted. Any object that survives one Young GC increases its age by 1. The objects in S0 Survivor Space below all undergo only one Young GC and are marked as 1.
The next time the Young GC still references objects (survivable objects) are copied To the previously empty Survivor space (To Survivor space, actually S1 Survivor space), and unreferenced objects in Eden space are deleted directly. The former S0 suvivor space is now called a Form survivor space, where the dependent referenced object is copied into a previously empty To survivor space. Unreferenced objects on Form survivor space will be deleted directly. When an object that is still referenced is copied from a Form survivor space To a To survivor space, its object age increases by one, indicating that the object has undergone another Young GC.

In the next Young GC, the same process is repeated. But then the Survivor space roles swap, so that From Survivor space becomes To Survivor space, and To Survivor space becomes From Survivor space. The purpose of this exchange is essentially to copy surviving objects from a used Survivor space into a cleared Survivor space.

** Surviving members of the new generation will be promoted to the old generation after many Young GC sessions. The figure below shows that when the MaxTenuringThreshold parameter is 8, surviving objects are promoted From From survivor space of the new generation to the old age. **

As Young GC continues to occur, the surviving objects of the new generation will continue to be promoted to the old generation. Eventually there will be no more room for newly promoted objects in the old age, at which point the Major GC is raised.

A detailed flow chart of when GC is triggered during object assignment and promotion can be found below (see the figure in Chapter 4 of Code Out Efficiently: The Java Development Manual, which goes into the JVM) :

The details of Thread Local Allocation Buffer (TLAB) and Promotion Local Allocation Buffer (PLAB) are not depicted in the figure above. In addition, the Full GC in the figure above can be confusing because it is so easily confused with the Major GC. The actual JVM specification and garbage collection literature do not define the Full and Major GC. In general, Full GC is considered to recycle garbage for both the new generation and the old generation, while Major GC is specifically for the old generation. The problem is that the Young GC causes the old garbage collection. Is it better to call the Full GC or the Major GC? Personally, I think Full GC is more appropriate, so you don’t have to worry too much about this. What is the difference between Minor vs. Major vs. Full GC and Major vs. Full GC? What are the trigger conditions? .

The garbage collector is not only responsible for the collection of memory, but also responsible for the allocation of objects. For more information on how to get started, see Memory Management in the Java HotSpot™ Virtual Machine, Plumbr Handbook Java Garbage Collection. If you want to learn more about garbage collection, check out the Garbage Collection Algorithm Manual: The Art of Automatic Memory Management.

2.2 Serial garbage collector

The Serial GARBAGE collector has only a single GC thread collecting garbage. Serial garbage collectors are generally simpler to implement, do not maintain complex data structures internally, and have a lower memory overhead. However, because only a single GC thread is doing garbage collection during STW (Stop The World), The garbage collection time is usually long and increases linearly with The memory footprint of The application. The garbage collector is suitable for scenarios with small memory footprint such as Client and embedded devices.

In The figure above, The gray arrow is The application thread and The black arrow is The GC thread. The application thread is usually multi-threaded while working. When The application thread stops working, it is also called SWT (Stop The World), and The serial garbage collector will start a GC thread to complete The garbage collection. Serial garbage collectors are usually divided into Serial New and Serial Old according to different generations of collection. They are responsible for collecting Young Generation and Old Generation respectively. Serial New uses the replication algorithm to clean up garbage, and Serial Old uses the compression algorithm to clean up garbage.

2.3 Parallel garbage collector

It is clear that serial garbage collectors cannot take advantage of multi-core cpus when garbage collection is under multi-core CPU architectures. Therefore, Parallel GC is deployed. The biggest difference between the Parallel GC and the serial GC is that in STW, the threads for garbage collection are changed from a single thread to multiple threads. Compared to the serial garbage collector, the overall time per GC is significantly reduced because the garbage collection work is divided among multiple threads. When the parallel garbage collector works, both the new generation and the old generation use threads to handle garbage collection in parallel. Like serial garbage collectors, Parallel garbage collectors are usually divided into ParNew and Parallel Old according to different generations of collection. They are responsible for collecting Young Generation and Old Generation respectively. Similarly, the new generation garbage recycling adopts replication algorithm to complete garbage cleaning, and the old generation adopts compression algorithm to complete garbage cleaning.

In the figure above, the gray arrows are application threads and the black arrows are GC threads. The parallel garbage collector’s thread that collects garbage at STW becomes multiple threads compared to the serial garbage collector, and these threads collect garbage at the same time.

2.4 Concurrent marking removes the garbage collector

Concurrent Mark Sweep (CMS) is a true Concurrent garbage collector on Hotspot VM. Concurrent refers to the GC thread working with the application thread, while the GC thread does not use STW while the application thread is also working, while Parallel refers to multiple GC threads working at the same time to clean up garbage. Many references refer to application threads as Mutator threads.

CMS is mainly responsible for recycling garbage of the old generation. When CMS is used, garbage collection of the New generation is usually completed by Serial New or ParNew, and the default garbage collector of the New generation is ParNew. When CMS collects old generation garbage, the overall garbage collection process is divided into multiple stages, and most of the stages are concurrent with application threads without STW. The overall garbage collection process of CMS can be divided into initial-mark, Concurrent Marking, Concurrent pre-cleaning, re-marking and Concurrent cleaning Sweeping), initializing marks and remarking all occur STW, but usually in a relatively short time. CMS initialization mark to mark in its earlier version is done by a single thread, later can pass – XX: + CMSParallelInitialMark and – XX: CMSParallelRemarkEnabled initializes the mark to mark phase respectively designated as multithreaded. Many Young GC may occur in the new generation when CMS carries out concurrent collection for the old generation. At this time, garbage collection of the old generation will be interrupted immediately and resumed after the end of the Young GC.

The gray arrow is the application thread and the black arrow is the GC thread. CMS starts multiple GC threads to mark GC Root in the Initial-mark phase, which is usually short. The CMS will enable multi-threading during both the concurrent marking and concurrent pre-cleanup phases, when the GC thread and the application thread work concurrently. In the figure above, the long black arrow in the Concurrent Making pre-cleaning stage represents the GC thread processing Concurrent Making work, and the short black arrow represents the thread processing pre-cleaning work. CMS also starts multiple GC threads during re-marking and SWT as in the initial-mark phase. The CMS GC thread works concurrently with the application thread during the concurrent cleanup phase.

One of the key issues in CMS tuning is finding the right time for the CMS to start concurrent work so that the CMS can complete all the concurrent work before the application runs out of available heap space. Concurrent work on the CMS usually starts after a Young GC, because the CMS initail-mark usually has fewer objects to mark after a Young GC. Another problem of CMS is the memory fragmentation problem of the aged generation. As CMS uses the tag clearing algorithm to reclaim the aged generation, the tag clearing algorithm is more efficient than the compression algorithm. However, since there is no memory compression and collation during garbage cleaning, the memory fragmentation problem will inevitably occur. The following two conditions result in concurrent mode failure due to memory fragmentation.

1. During Young GC, the surviving objects in Eden area are too large to be stored in Survivor area, leading to promotion failed. At this time, objects can only be placed in the aged generation, but due to the memory fragmentation problem, the aged generation also cannot place the objects. Finally, a Concurrent mode failure occurs, which causes a Full GC to reclaim the entire Heap space, resulting in a sudden increase in STW duration.

2. During the Young GC, the age of the surviving object in the Survivor region exceeds MaxTenuringThreshold and is promoted to the aged generation. However, due to memory fragmentation, the aged generation cannot put the object, and a Concurrent mode failure occurs, which triggers the Full GC.

Understanding GC pauses in the JVM, HotSpot’s CMS Collector. For more information about CMS tuning practices, refer to the nine common CMS GC problems that appeared in Java in these two articles

Here is a classic picture of how the garbage collector in HotSpot VM is combined to handle young and old generations. The Serial New, ParNew, and Parallel Avenge avenge are insane and insane. The Serial Old, Parallel Avenge, and the Parallel Avenge avenge are insane and insane. G1, in the middle, can deal with both the new and the old. The solid black lines in the figure represent which new generation garbage collectors work with which old generation garbage collectors. The dotted black line between CMS and Serial Old indicates that the CMS fails to be Full when a Concurrent mode failure occurs.

The Serial GC (Serial New & Serial Old), the Parallel GC (ParNew, Parallel Avenge, Parallel Old), CMS, Since the memory layout of the new generation and the old generation is continuous (virtual memory is continuous), these garbage collectors can only deal with a specific partition or the entire Heap when collecting garbage. This inevitably leads to a more or less linear positive correlation between the STW time of garbage collection and the memory occupied by the application, that is, the larger the memory occupied by the application, the longer the STW time will be when garbage collection is performed. While previous garbage collectors were generational garbage collectors, G1 pioneered the partitioning garbage collector (although G1 logically also has the concept of new generation and old generation). G1 divides the Heap into regions of equal size using didivide and conquer. Memory management can be implemented for these regions rather than for a specific Generation. Since the size of a Region is much smaller than that of a Generation, processing multiple regions is more efficient than processing one Generation.

3 overview of G1 garbage collector

The G1 (Garbage First) Garbage collector is another generational Garbage collector that followed the CMS collector and pioneered the partitioned Garbage collector. G1 tries its best to meet users’ requirements for pause time through time prediction model (users can specify the maximum pause time for garbage collection by -xx :MaxGCPauseMillis=XXX). G1 uses compression algorithm to optimize the partition where garbage collection is more. So it is called the Garbage First Garbage collector.

3.1 Memory layout in G1 garbage collection

G1 has the same concept of Eden Generation, Survivor Generation and Old Generation as the traditional generational garbage collector introduced above, but the biggest difference is that the relationship between these generations is logical. The memory layout of each Generation does not have continuity. G1 divides the Heap into regions, each of which has a size of 2 to the power of N and a value between 1M and 32M. Each Region belongs to a Generation, hence the concept of Eden Region, Survivor Region, and Old Region/Tenured Region (unlike a traditional generational garbage collector, There is no From and To Survivor in G1, because both Young GC, Mix GC, and Full GC objects are moved From one Region To another or cleared. G1 also has a Humongous Region for storing large objects (objects that occupy more than half the memory of a Region by default). Humongous Region may consist of multiple regions. However, a Region can store a maximum of one large object. If multiple regions are used to store a particularly large object, these regions are contiguous in layout. After many Young GC’s, Mix GC’s, Full GC’s, and object allocations (more on that later in G1), The roles of Eden Region, Survivor Region, Old Region, and Humongous Region change. That is, after the Region containing a specific Generation object is cleared, it can be used to store one of Eden objects, Survivor objects, Old objects, or Humongous objects.

The memory layout that divides memory into regions is more conducive to memory reclamation. Garbage collection can manage a small area of memory (dealing with memory allocation and reclamation) using the idea of divide-and-conquer. This avoids the Old Generation dilemma where garbage collection can only handle the entire Old Generation (processing the entire Old Generation together is usually very time-consuming and STW cannot be avoided in this process).

3.2 G1 Garbage collection cycle

From a global perspective, the G1 collector’s garbage Reclamation process alternates between the Young-only phase and the Space Reclamation phase, The following figure is from the HotSpot Virtual Machine Garbage Collection Tuning Guide on Oracle website.

Young – only stage

The Young-only phase actually involves multiple Young GCS and the entire concurrent marking process. It starts with some ordinary Young GC (the small blue dots in the figure above represent ordinary Young GC) and promotes the objects that meet the criteria to the old age. The old generation of memory more than threshold, will trigger the concurrent mark phase, the threshold value by the parameter – XX: InitiatingHeapOccupancyPercent = 65%, specify a default value of 65%. At the same time, G1 will turn on concurrent Young GC (the big blue one in the figure above represents concurrent Young GC) instead of regular Young GC. The entire concurrent marking phase is alternated with the normal Young GC. The concurrent Marking stage can be subdivided into Initial Marking, re-marking and Cleanup stages. The initial marking phase is actually done in a concurrent Young GC (commonly described in the literature as piggybacking). After the initialization mark is complete, several ordinary Yong GC may occur before entering the Remark phase (the yellow dot near the top in the figure above). The concurrent marker may then be interrupted by the Young GC and then the Cleanup phase (the yellow dot near the bottom of the figure).
Space Reclamation phase

When Cleanup is over, G1 enters the Space Reclamation phase, which consists of several mixGCs. Each MixGC cleans up some of the objects marked in the previous concurrent marking phase, and the MixGC is accompanied by some Young GC (the red dot in the figure above represents a MixGC). When G1 finds that there is not enough Space for clearing objects, it stops MixGC, and the Space Reclamation phase ends at the same time.

When the Space Reclamation phase ends, the G1 collection cycle starts again with a young-only phase. The trigger condition of the normal Young GC in young-only is the same as that of the previous generation garbage collector, Young GC, except that G1 handles for regions. That is, G1 decides whether to enable the Young GC based on whether there are any regions in Eden Region that can accommodate new objects. As a back-of-the-envelope strategy, G1 uses Full GC to process all regions when G1 garbage collection does not free enough memory to meet the memory requirements of new objects in the application.

3.3 Memory Set (RSet)

It has been learned that Young GC only processes the corresponding regions of the new generation, namely Eden Region and Survivor Region, which helps reduce the time of each Young GC. But what if Eden Region and Survivor Region hold references from older ages? Does the Young GC have to traverse all the regions in the Heap to determine which Eden Region and Survivor Region objects are garbage? This approach is obviously not desirable, as it will lengthen the Young GC.

An effective way to do this is for each Region of the new generation to maintain a collection of cross-generation references to the point-in of the old generation, so that when the Young GC is looking at the point-in collection, this collection is called the memory Set (RSet). Do you need this RSet in the old generation? As mentioned above, each Mix GC will reclaim part of the old generation’s regions. Without this memory set, the whole old generation’s regions will be scanned just like the Young GC. Therefore, the old generation’s regions also need to maintain a point-in collection. A set of Old Region point-in records, and a set of Young Region point-in records.

What about the implementation of RSet? Before you do that, you have to know the CardTable, which was in CMS before G1. CardTable is essentially a point-out data structure, indicating that a certain area has references to other areas. In G1, CardTable consists of byte arrays, and each element of the array is called card/CardPage. The CardTale maps to the entire heap, and each CardPage corresponds to 512B space in the heap. As shown in the figure below, in an 8GB heap, the length of CardTable is 16777215 (8GB / 512B); If -xx :G1HeapRegionSize is 2MB, that is, the size of each Region is 2MB, each Region has 4096 cardPages. CardTable will take up 16MB of additional memory space.

To find the CardPage of an object, you only need to apply the following formula.

CardPageIndex = (address of the object – start address of the heap) ÷ 512

RSet is implemented using HashMap. The HashMap’s key references the addresses of other regionRs of the Region, and its value is an array. The elements of the array are the subscripts of the CardPage in the CardTable of the object referencing the party.

As shown in the figure above, object Y in region B refers to object X in region A. The reference relationship spans two regions. In the RSet of region A, the address of region B is taken as the key, and the reference relationship is recorded with the subscript 79 of CardPage of object B as value, thus completing the record of cross-region reference. However, the granularity of this CardTable is a little coarse. After all, a CardPage has 512 bytes, and there may be multiple objects in a CardPage. Therefore, the whole CardPage associated with RSet needs to be scanned when scanning tags. The example in the figure above is to scan all cardpages with CardTable subscript 79.

In fact, G1’s RSet implementation in HotSpot VM is more complex than this (this is just one case, with Sparse granularity). There may be frequent update references in the application that make rsets of some regions popular regions. G1 uses different granularity to deal with RSet Popularity. RSet can be divided into Sparse, Fine and Coarse granularity. Rsets with different granularity use different data structures to record references of other regions point-in. This is the case with Sparse granularity. The following is a simplified definition of RSet data structure in G1 given in the paper Evaluating and improving remembered Sets in the HotSpot G1 garbage Collector.

Sparse Grained (SAPrSE in G1_rSET data structure above)

In the case of sparse granularity, a HashMap is adopted. The key of the HashMap refers to the address of another Regionr of the Region, and the value is an array. The elements of the array are subscripts of the CardPage corresponding to the reference object in the CardTable.

173 Fine Grained (in the G1_rset data structure above)

The key of the HashMap references the addresses of other regionRs of the Region. The value is a bitmap. The maximum number of bits in the bitmap represents the maximum number of cardpages that a Region can be split into. A bitmap value of 1 indicates that there are objects in CardPage of Region that reference the Region to which RSet belongs.

Coarse Grained (in the G1_rset data structure above)

In the coarse-grained case, a bitmap is used. The maximum number of bits in the bitmap represents how many regions the Heap can be split into. If the value is 1 on the bitmap, it indicates that there are objects in other regions that reference the Region to which RSet belongs. Because regions are of the same size, the starting address of each Region in the bitmap can be calculated from the starting address of the Heap.

G1 typically uses Refined Threads to maintain Rsets asynchronously, Each thread uses the post-write barrier to record cross-generation references and Old generation-to-generation references in its own local log buffer. When the local buffer becomes full, it is flushed to the global buffer, and Refinement Threads maintains the RSet by processing the global buffer. When Refinement Threads cannot efficiently handle the global log buffer, the application Threads will process the log buffer together, but this can be costly to the application Threads’ performance. If the global log buffers have not been processed during garbage collection, the GC thread will process them.

void oop_field_store(oop* field, oop new_value) {
  pre_write_barrier(field);             // pre-write barrier: for maintaining SATB invariant
  *field = new_value;                   // the actual store
  post_write_barrier(field, new_value); // post-write barrier: for tracking cross-region reference
}
Copy the code

3.4 Collecting Data Back (CSet)

A Collection Set (CSet), which represents a series of target partitions that are reclaimed each time a GC pauses. During any collection pause, all partitions of the CSet are freed, and internal surviving objects are moved to the allocated free partition. So whether young generation collection or mixed collection, the working mechanism is the same. The young generation collection CSet only contains the partitions of the young generation, while the mixed collection will select the partitions with the highest recycling returns from the candidate recycling partitions of the old generation through the heuristic algorithm and add them to the CSet.

4 In-depth analysis of G1 garbage collection

Heap layout, global cycle, RSet implementation, AND CSet in G1 have been briefly introduced, and the Young GC phase, concurrent markup phase, and Mix GC phase in G1 will be introduced in more detail.

4.1 Young GC Stage

As with generational garbage collector, Young GC in G1 is triggered when there is no Eden Region in G1 capable of holding new objects to be created. At the same time, each thread has a corresponding TLAB, and small objects are preferentially created directly in the TLAB. It has been learned that only all Young regions, Eden regions and Survivor regions will be reclaimed in G1’s Young GC phase. At the same time, if the memory ratio of the aged generation exceeds the specified threshold, the Young GC completes and initializes the bid phase together. After each Young GC, G1 will readjust the Cenozoic size according to the current Cenozoic size, minimum Cenozoic value, maximum Cenozoic value, target pause time and so on. Here’s a look at the GC logs for the general Young GC to see what phases are included in the Young GC. The JDK uses HotSpot 1.8.0_241 JDK and JVM parameters are as follows:

-XX:+UseG1GC -XX:G1HeapRegionSize=2m -Xms2g -Xmx2g -Xloggc:/Users/mac/Desktop/g1log -XX:+PrintGCDetails

-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

2021-12-29T10:03:58.217- 0800:0.244: [GC pause (G1 Evacuation Pause) (young), 0.0914253 secs] [Parallel Time: 90.3ms, GC Workers: 8] [GC WorkerStart (ms): Min: 244.0, Avg: 244.1, Max: 244.1, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 0.1, Avg: 0.3, Max: 0.7, Diff: 0.7, Sum: 2.2]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0.0] 0] [ScanRS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.2Copy (ms): Min: 89.1, Avg: 89.6, Max: 89.8, Diff: 0.7, Sum: 716.5]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.8] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 0.8] 8] [GC WorkerOther (ms): Min: 0.1, Avg: 0.2, Max: 0.2, Diff: 0.1, Sum: 1.3]
      [GC Worker Total (ms): Min: 90.1, Avg: 90.1, Max: 90.2, Diff: 0.1, Sum: 721.0]
      [GC Worker End (ms): Min: 334.2, Avg: 334.2, Max: 334.2, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.1ms] [Other: 1.1ms] [Choose CSet: 0.03ms] [Ref Proc: 0.3ms] [Ref Enq: 0.03ms] [Redirty Cards: 0.1ms] [Humongous Register: 0.1ms] [Humongous Reclaim: 0.0ms] [Free CSet: 0.0ms] [Eden: 102.0M(102.0M)- > 0.0B(88.0M)Survivors: 0.0 B - > 14.0 M Heap: 102.0M(2048.0M)- > 98.0M(2048.0M)]
 [Times: user=0.12 sys=0.28, real=0.09 secs] 
Copy the code

The GC Pause (G1 Evacuation Pause) (young) shown in the first line of the GC log above represents the Young GC for which THIS GC is paused as G1. The work of Young GC is divided into Parallel work and Other work, specifically [Parallel Time: 90.3ms, GC Workers: 8] and [Other: 1.1ms] in GC log. Parallel work, which is the main work of the Young GC, is broken down into the following parts.

External Root Scanning (Ext Root Scanning (MS):…)

Handles scanning external roots pointing to csets, such as registers, thread stacks, and so on.

Update RS (ms:…); Update RS (ms:…);

Update the RSet. RSet is used to record incoming references of other regions.

3, Processed Buffers (Processed Buffers:

As mentioned earlier, rsets are maintained by writing log Buffers first and then updating them. Processed Buffers are those that have not been Processed by the Refinement Thread since the start of the Young GC, ensuring the integrity of the RSet.

Scan RSets (Scan RS (MS):…)

Scan for references to this Region from other regions in RSets. This scanning time varies greatly due to different RSet Popularity and different granularity of data structure storage. Coarse takes the most time to scan in local mode.

5, Code Root Scanning (GC log [Code Root Scanning (MS):…)

Responsible for scanning the reference root of the compiled source code in CSet.

[Object Copy (MS):…

Responsible for copying surviving dependent objects from the new generation Region, namely Eden Region and Survivor Region, to the unused Survivor Region or the promoted objects to the Old Region.

7, Termination (Termination Attempts (MS):… and Termination Attempts (MS):…) in the GC log

After each GC Work thread completes its own Work, a finalization phase occurs, in which the completed Work of the Work thread synchronizes with the other Work threads and attempts to use the Work stealing algorithm to capture the Work of other threads that have not completed their Work. The Termination Attempts part represents that the Work thread successfully obtains Work and then Attempts again after processing. The process also tries again to acquire tasks from other unfinished worker threads.

Other work, this section is mainly some of their tasks including selecting Regino into CSet, reference processing, reference into queue, re-marking card pages as dirty pages, releasing CSet, processing Humongous objects, etc. For a more detailed explanation of G1 GC logs, see Collecting and Reading G1 Garbage Collector Logs – Part 2.

The old generation accounted for more than a memory – XX: InitiatingHeapOccupancyPercent specified threshold, the Young, the GC will be done by the concurrent initialization of the marking job. At this point, key information for GC Pause (G1 Evacuation Pause) (Young) (Initial-mark) will appear in the GC log, where initial-mark represents the initialization marker work that was slightly completed during the Young GC.

4.2 Specific implementation of SATB in G1

The garbage collector usually uses the TRI-color Marking algorithm to analyze object references in the Marking phase. G1 adopts the snapshot-at-the-beginning (SATB) algorithm proposed by Yuasa in real-time Garbage Collection on General Purpose Machines to deal with label omission.

The SATB algorithm ensures that all junk objects are identified by the snapshot after the concurrent marking begins. Newly allocated objects during concurrent tagging are considered alive and do not need to be traced, which helps reduce tagging overhead. G1 maintains two global bitmaps for concurrent markup, labeled Previous and Next. The previous bitmap holds the mark information of the previous concurrent mark, and the next bitmap holds the mark information of the current ongoing or recently completed concurrent mark. The mark information of the previous concurrent mark in the previous bitmap can be used directly in this issue. At the same time, each Region has several important Pointers PTAMS (the start position of the last concurrent mark), NTAMS (the start position of the next concurrent mark), Bottom (the start address of the Region), Top (the used address of the Region), and End (the End address of the Region). TAMS implementation is the abbreviation of Top at Mark Start, that is, each concurrent marker will put the corresponding pointer at the same position of the top finger, representing the end of the marker. At the beginning of each concurrent mark, the NTAMS pointer repoints to the position of the Top pointer. When the concurrent mark ends, the NTAMS pointer and PTAMS pointer interconnect, the next bitmap and the previous bitmap swap roles, and the new next bitmap is cleared. That is, the previous bitmap is cleared.

The figure above shows the positions of the Pointers to Bottom, PTAMS, NTAMS, Top, and End in a Region and the meanings of the areas between the Pointers. White, gray and black in the interval can be roughly understood as the three colors in the three-color marking method.

[Bottom, PTAMS) range

This area represents the interval of the last concurrent mark, PTAMS is the end position of the last concurrent mark, and the information of the last concurrent mark in this area can be directly used by the ongoing concurrent mark, that is, the ongoing concurrent mark knows which are garbage and which are viable objects in this area through the last bitmap.

[PTAMS, NTAMS) range

NTAMS is the end of the concurrent mark. At the beginning of the concurrent mark, G1 will create a snapshot for the [PTAMS, NTAMS) interval, which is actually the next bitmap, and then process the address of the bitmap mapping. Mark objects on these addresses as garbage or alive. The actual marking process is to have a pointer move from PTAMS pointer position all the way to NTAMS pointer position.

[NTAMS, Top) range

This area represents objects newly generated by the application thread during concurrent marking. As mentioned earlier, newly allocated objects are considered alive during concurrent marking, so this area is all black in the figure above. Concurrency marks start with the Top pointer in the same position as the NTAMS pointer, and each time the application thread generates a new object, the Top pointer moves to the right of the End pointer accordingly.

[Top, End) range

This area represents a Region that is not yet used.

Obviously, the GC thread will only handle the **[PTAMS, NTAMS) interval to complete the marking, while the application thread will affect the [Bottom, Top) interval. The effect of the application thread on the [NTAMS, Top] interval in the [Bottom, Top) interval does not affect the concurrent marking of the GC thread, because objects added to this part of the application thread are considered alive. G1 uses the pre-write barrier to ensure that the “Bottom, Top” [PTAMS, NTAMS] interval is correct. That is, if the application thread adds a reference to a white object within the [PTAMS, NTAMS) interval, the pre-write barrier will set the white object to a gray object, so that the object can be marked again without missing the mark. The effect of the application thread on the [Bottom, PTAMS) interval ** may affect the concurrent marking of the GC thread.

With that in mind, it should be clear to look at this diagram in the Initial Marking Pause/Concurrent Marking section of Sun G1’s garbage-First Garbage Collection.

In the Initial Marking phase, when the Region is marked for the first time, the PrevBitmap is empty, and there are block photos of **[PrevTAMS, NextTAMS) interval ** in NextBitmap. After the Marking of concurrency, which Bitmap objects are garbage objects will be determined. The PrevTAMS pointer is in the same position as the Bottom pointer, and the NextTAMS pointer is in the same position as the Top pointer.

In Remark phase, the live object and garbage object in the [PrevTAMS, NextTAMS) interval are marked, and the NextBitmap changes, where the black part represents the marked live object and the white part represents the garbage object. Also, as the application generates new objects, the position of the Top pointer is moved to the right from the NextTAMS pointer.

· In the Cleanup/GC Pauses stage, NextBitmap and PrevBitmap switch roles, and the NextTAMS pointer and PrevTAMS pointer switch positions.

In the next Marking phase, NextTAMS pointer points to Top pointer again, PrevBitmap guarantees the last Marking information, NextBitmap contains block photos of **[PrevTAMS, NextTAMS) interval **.

Mark the Remark phase again and repeat the above B thing.

· Indicate the Remark stage, Cleanup/GC Pauses, and then repeat the tasks of C above.

4.3 Concurrent marking phase

In the concurrent marking phase, garbage objects to be collected during the Mix GC are marked first and then sorted according to the memory space that can be freed by the Region. At the end of the marking phase, regions with no living objects are directly freed and added to the list of available regions. Concurrent Marking stage can be divided into five stages: Initial Mark, Root Region Scanning, Concurrent Marking, Remark, Cleanup, etc.

Initial stage of Mark

Accounted for more than the old generation of memory – XX: InitiatingHeapOccupancyPercent triggered when the specified threshold concurrent tags, concurrent marks the first stage of Initial Mark, the stage will be STW, its only scan GCRoot direct reference object, Since Young GC also scans objects referenced directly by GCRoot, Young GC completes Initial Mark’s work in passing. GC logs typically have key information for GC Pause (G1 Evacuation Pause) (Young) (initial-mark), where initial-mark represents the initialization marker work that is done incidentally while the Young GC is being performed.

Root Region Scanning

The actual scan is for objects referred to by the Survivor Region of the new generation, and this phase must be completed before the next GC pause, because Survivor Region references must be recognized before the Heap scans for surviving objects.

Concurrent Marking phase

Concurrent flag phase GC threads are concurrent with application threads, and the number of concurrent GC threads can be specified with -xx :ConcGCThreads. G1 uses a pre-write barrier to solve the problem of concurrent marker overcrowding caused by application threads updating the snapshot of the object graph created at the beginning of the concurrency, per thread. In the concurrent marking phase, each Region object is counted, which Region can reclaim more memory.

Remark phase

This phase is actually the last phase of the marker, SWT, and is responsible for processing the remaining references. This phase processes the unprocessed references of the previous SATB write Barrier record. However, it is essentially different from CMS remark, that is, G1 remark only needs to scan SATB buffer to suspend, while CMS remark needs to rescan all the dirty cards in the whole root set, and the whole Cenozoic generation will be regarded as part of the root set. Thus CMS Remark is likely to be very slow.

The Cleanup phase

This phase is STW, whose main job is to reset the tag state, such as the previous NextBitmap and PrevBitmap swap roles, and the NextTAMS pointer swap places with the PrevTAMS pointer. If a Region is not found, the system clears the Region and adds the Region to the free Region list. Of course, statistics on how much garbage can be collected by each Region are also done in this phase so that the CSet can be quickly determined when the Mix GC objects are moved.

4.4 Mix GC Phase

The MixGC stage is mainly responsible for recovering some regions of the old generation and all regions of the new generation. G1 will recycle regions marked in the concurrent marking stage according to the target pause time set by -xx :MaxGCPauseMillis. That is, a MixGC phase can contain multiple mixGCs, and G1 will stop MixGC when it finds that the memory acquired by garbage free objects is too small. Key information for **GC Pause (G1 Evacuation pause) (mixed)** will appear in the GC log when MixGC. As a back-of-the-envelope strategy, G1 uses Full GC to process all regions when G1 garbage collection does not free enough memory to meet the memory requirements of new objects in the application.

conclusion

This article analyzes the implementation principle of G1 garbage collector in detail, although there is no specific source code related to G1 garbage collector, but the basic knowledge of G1 are analyzed by shallow and deep. Hope to finish reading this article you will gain, limited to my limited ability in the article is not correct hope correction. Finally, welcome to pay attention to personal public insight source code.

reference

Java Performance Companion

Handbook of Garbage Collection Algorithms the Art of Automatic Memory Management

Memory Management in the Java HotSpot™ Virtual Machine

Plumbr Handbook Java Garbage Collection

Java Garbage Collection Basics

G1: One Garbage Collector To Rule Them All

HotSpot Virtual Machine Garbage Collection Tuning Guide

Collecting and reading G1 garbage collector logs – part 2

[HotSpot VM] Learn the principle of G1 algorithm

Java Virtual Machine 07 – Garbage Collector G1

Garbage-First Garbage Collection

Minor GC vs Major GC vs Full GC

Real-Time Garbage Collection on General Purpose Machines

memorymanagement.org Tri-color Marking

Wiki Tri-color Marking

On-the-Fly Garbage Collection: An Exercise in Cooperation

Uniprocessor Garbage Collection Techniques

Evaluating and improving remembered sets in the HotSpot G1 garbage collector

The interviewer asked me how does the G1 collector know when you are garbage?

At the end

Original is not easy, praise, look, forwarding is a great encouragement to me, concern about the public number insight source code is my biggest support. At the same time believe that I will share more dry goods, I grow with you, I progress with you.