Garbage collector

If collection algorithms are the methodology of memory collection, the Garbage collector is the practice of memory collection

Serial

The Serial collector is the most basic and oldest. This collector can only work on a single thread. Not only does it use only one processor or one collection thread to do garbage collection, but more importantly, it must suspend all other worker threads while garbage collection is in progress until it is finished.

The Serial collector is simple and efficient compared to other collectors, and it has the smallest Memory Footprint of any collector for a memory-constrained environment. In the application scenarios of user desktops and some popular micro-service applications in recent years, the memory allocated for VM management is not very large, which is suitable for Serial.

ParNew

PreNew is essentially a multithreaded concurrent version of the Serial collector that behaves exactly the same as the Serial collector, except that multiple threads are used simultaneously for garbage collection. The preferred generation collector in legacy systems prior to JDK7, as it is currently the only one that works with the CMS collector in addition to the Serial collector

It turns on by default the same number of collection threads as the number of processor cores. You can limit the number of threads for garbage collection with the -xx :ParallelGCThreads parameter

Parallel Scavenge

The Parallel Avenge collector is a new generation collector that is also based on the mark-copy algorithm.

The Parallel Collector is different from the collector the CMS collector is focused on minimizing the downtime of user threads during garbage collection. The goal of the Parallel Insane is to achieve a controlled Throughput -> the amount of time the CPU spends running user code as a percentage of the total CPU consumption.

The Parallel Scavenge collector provides two parameters for precise throughput control: the -xx :GCTimeRatio parameter is used to set the throughput size, and the -xx :MaxGCPauseMillis parameter is used to control the maximum garbage collection pause time. The collector tries to ensure that the user is set to this value

Parallel Insane also known as a “through-first processor”

Serial Old

Serial Old is an older version of the Serial collector, which is also a single-threaded collector and uses a mark-collation algorithm.

It is mainly used by clients

If used for a server: probably used previously with the Parallel Insane collector or as an alternative to the CMS collector when it fails

Parallel Old

The Parallel Avenge collector, supported by multiple threads, is based on the mark-collation algorithm. It wasn’t available until JDK6.

Parallel Insane and Parallel Old are preferred in applications where throughput is important or where processor resources are scarce

CMS

CMS (Concurrent Mark Sweep) is currently the most commonly used JVM garbage collector

CMS is a concurrent garbage collector that uses a tag sweep algorithm. The CMS allows the GC thread to execute concurrently with the user thread whenever possible, eliminating long GC pauses (STW).

CMS does not do garbage collection for the new generation, only for the old generation by default. In addition, CMS can also enable permanent generation garbage collection (or metspace) to avoid Full GC due to PermGen space running out. JDK6 is controlled by -xx :+CMSClassUnloadingEnabled, which is disabled by default before JDK8. JDK8 is enabled by default.

CMS will be paired with a new generation of garbage collector, called “generational collection.” The next generation of collectors that work with CMS are the Serial collector and the ParNew collector, and we typically use the ParNew collector that supports multi-threaded execution.

When using the CMS GC strategy, GC categories can be divided into: Young GC (also known as Minor GC), Old GC (also known as Major GC, CMS GC), and Full GC. Full GC is the garbage collection of the whole heap. The STW takes a long time and has a great impact on business. Therefore, Full GC should be avoided as far as possible.

CMS Working Process

Concurrent markup: This phase follows the initial markup phase, tracing the markup down from the initial markup. In the concurrent marking phase, the application thread and the concurrent marking thread execute concurrently, so the user does not feel the pause. The whole collection process is divided into four steps:

  1. CMS Initial Mark STOP THE WORLD; The initial tag only marks objects that GC Roots can associate with, which is fast.
  2. CMS Concurrent Mark; Concurrent marking is the process by which GC Roots’ directly related objects start traversing the entire object graph, which is time-consuming but does not require the user thread to be paused.
  3. CNS remark STOP THE WORLD, incremental update; Re-marking is to correct the mark record of that part of the object where the mark changes as the user program continues to run during concurrent marking.
  4. CMS Concurrent sweep; Concurrent cleanup is to clean and delete the dead objects judged in the marking phase. Since there is no need to move the inventory objects, STW is not needed in this phase.

Disadvantages:

  1. The application slows down by taking up a portion of the threads, reducing the total throughput (the default collection thread started by CMS is (number of processor cores +3) / 4
  2. The CMS collector cannot handle “Floating Garbage.” The CMS collector cannot wait for the aged band to be almost completely filled before collecting, as other collectors do, and must reserve some space for program operations in concurrent collections. The CMS collector will be activated after 68% of the current year of use (a conservative estimate) and may be subject to Concurrent Mode failure.
  3. CMS is based on a mark-sweep algorithm, which means that a lot of space debris is generated at the end of the collection

The value of -xx :CMSInitiatingOccu-pancyFraction can be modified to improve the trigger percentage of CMS, reduce the frequency of memory reclamation, and obtain better performance. At JDK6, the startup threshold of CMS collector has been increased to 92% by default. However, a “concurrent failure” occurs during the CMS run if the memory cannot meet the requirements of the program to allocate new objects. In this case, the VIRTUAL machine has to freeze the execution of the user thread and temporarily enable the Serial Old collector to restart the garbage collection of the Old band, so that the pause time is long

Garbage First

The G1 collector is a milestone in the history of garbage collector technology. It pioneered the idea of garbage collector’s local collection-oriented design and region-based memory layout

With JDK Update40, G1 provides support for concurrent class uninstallation. It was not until later that the G1 was officially called the “Fully Featured Grabage Collector” by Oracle.

G1 is a garbage collector primarily for server applications

All other garbage collectors prior to G1, including CMS, targeted garbage collection for either the entire Minor, Major, or Full Java heap. However, G1 collects any part of the heap to form a Collection Set (commonly referred to as CSet). The measurement standard is no longer which generation it belongs to, but the memory that stores the most garbage and benefits the most from Collection. This is the Mixed GC mode of G1 collector

The region-based heap memory layout pioneered by G1 is key to its ability to achieve this goal.

G1 no longer insists on a fixed size and a fixed number of generation regions, but divides the continuous Java heap into multiple independent regions of equal size. Each Region can act as the Eden space, Survior space or old belt space of the new generation according to needs.

A Region has a special Humongous Region for storing large objects. G1 considers a Region to be a large object if its size exceeds half of its capacity. The size of each Region can be set using the -xx :G1HeapRegionSize parameter, which ranges from 1MB to 32MB and is a power of 2 N words. Super objects that exceed the capacity of the entire Region will be stored in N consecutive Humongous regions. Most of G1 behaviors treat Homongous regions as part of the old belt

G1 uses Region as the minimum unit for a single collection. That is, the memory space collected by the phone is an integer multiple of Region size, which can systematically avoid whole-region garbage collection in the entire Java heap.

The memory set of G1 is essentially a Hash table in its storage structure. Key is the starting address of another Region, Value is a collection, and the stored element is the index number of the card table. (Two-way card table structure, card table is “I point to who”, this structure also records “who points to me”)

G1 consumes at least 10%, 20% of Java’s capacity for additional memory to maintain collector work.

At the stage of concurrent marking, G1 uses the original snapshot (SATB) algorithm to achieve it. G1 designs two Pointers named Top at Mark Start (TAMS) for each Region to divide part of the Region space for new object allocation in the process of concurrent reclamation. All newly allocated object addresses must be above these two Pointers. If the speed of memory collection is not enough to keep up with the speed of memory allocation, Full GC will result in a long STW

G1 operation process can be roughly divided into the following four steps:

  1. Initial Marking: Only mark objects to which GC Roots can be directly associated, and modify the value of the TAMS pointer. This phase requires the STW of the thread to be paused, but it takes a short time.
  2. Concurrent Marking: Analyzes the reachability of objects in the heap starting from GC Root, recursively scanning the entire heap object graph to find objects that need to be reclaimed. This phase is time-consuming but can be executed concurrently with the user program.
  3. Final Marking: Another short pause (STW) is made for the user thread, and the user processes the last few SATB records that remain after the concurrent phase has ended.
  4. Live Data Counting and Evacuation: Update the statistics of a Region, sort the reclamation value and cost of each Region, make a reclamation plan based on the pause time expected by users, select multiple regions to form a collection, and copy surviving objects from the reclaimed Region to an empty Region. Clearing out all of the old Region. The operation here involves the movement of live objects, so user space must be paused, and multiple collectors can complete concurrently

In addition to the concurrent marking, the G1 collector completes the rest of its phases of suspending the user thread

CMS uses post write barrier to update maintenance card table; In addition to using post-write barriers for the same (but more complex) card table maintenance, G1 also uses pre-write barriers to track concurrent pointer changes in order to implement the raw snapshot search (SATB) algorithm.

The complexity of G1’s write barriers is more resource-intensive than CMS’s, so CMS’s write barriers implement direct synchronous operations, whereas G1 has to implement a message-queue structure where pre-write barriers and encounter barriers do things that are then processed asynchronously

Shenandoah

Shenandoah is the first HotSpot garbage collector not developed by Oracle’s virtual machine team, which was inevitably “officially” excluded. Sgebabdiag is a garbage collector only available in the Open JDK.

The Shenandoah collector’s working process can be roughly divided into the following nine stages:

· Initial M arking: As in G1, objects directly associated with GC Roots are marked first, this stage is still

Is “Stop The World”, but The pause time has nothing to do with heap size, only The number of GC Roots. · Concurrent Marking: As in G1, the phase of traversing the object graph and Marking all reachable objects

Is concurrent with the user thread, depending on the number of live objects in the heap and the structural complexity of the object graph.Copy the code
  • Final M arking: As in G1, the remaining SATB scans are processed and the regions with the highest Collection value are counted at this stage, which are grouped into a Collection Set. There is also a short pause in the final marking phase.
  • Concurrent Cleanup: this phase is used to Cleanup regions that have not found a single viable object in the entire Region(called Immediate Garbage regions).
  • Concurrent Evacuation: The Concurrent recovery phase is the core difference between Shenandoah and other collectors in Previous hotspots. At this stage, Shenandoah makes a copy of the surviving objects in the collection into another unused Region. Copying objects is fairly simple if the user thread is frozen, but becomes more complicated if both must be done concurrently. The difficulty is that while moving the object, the user thread may continue to read and write the object. Moving the object is a one-time behavior, but after moving the object, all references to the object in the whole memory are still the address of the old object, which is difficult to change all at once. Shenandoah solves these problems in the concurrent collection phase with read barriers and forward Pointers called Brooks Pointers. How long the concurrent collection phase runs depends on the size of the back collection.
  • Initial Update Reference: After the replication of objects in concurrent reclamation, all references to the old objects in the heap need to be corrected to the new address after replication. This operation is called Reference Update. The initialization phase of reference updates doesn’t actually do anything specific, it is set up simply to establish a thread collection point and ensure that all collector threads in the concurrent collection phase have completed the object movement task assigned to them. The initial reference update time is very short, resulting in a very short pause.
  • Concurrent Update Reference: The actual start of the Reference Update operation, which is performed concurrently with the user thread, depending on the number of references involved in memory. Unlike concurrent markup, concurrent reference updates no longer need to search along the object graph. Instead, they need to search for reference types linearly in the order of the physical address of memory, changing the old value to the new value.
  • Final Update Reference: After resolving Reference updates in the heap, references that exist in GC Roots are also fixed. This stage is Shenandoah’s last pause, and the pause time is only related to the number of GC Roots.
  • Concurrent Cleanup: After the concurrent collection and reference update, all Regions in the collection have no living objects. These Regions become Immediate Garbage Regions. Finally, the concurrent clearing process is invoked to reclaim the memory space of these Regions for future allocation of new objects.

ZGC

Z Garbage Collector is a low-latency Garbage Collector implemented in JDK11 (ZGC does not support class unloading in JDK11, JDK12 does)

The goals of the ZGC and Shenandoah are highly similar in that they both want to achieve low latency that can limit garbage collection pauses to less than 10 milliseconds at any heap size with as little impact on throughput as possible.

Features: The ZGC collector is a garbage collector based on Region memory layout, with (for now) no generation, using techniques such as read barriers, dye Pointers, and memory multiple mapping to implement a concurrent mark-collation algorithm, with low latency as the primary goal

ZGC regions can have large, medium, and small capacities:

  • Small Region: the capacity of a Region is fixed at 2MB. It is used to store Small objects smaller than 256KB.
  • Medium Region: with a fixed capacity of 32MB, it is used to store objects larger than or equal to 256KB but smaller than 4MB.
  • Large Region: the capacity of a Large Region is not fixed and can change dynamically. However, the value must be a multiple of 2 MB. It is used to place Large objects of 4MB or larger. Each large Region contains only one large object. This indicates that the actual capacity of a large Region may be smaller than that of a medium-sized Region, with the capacity as low as 4MB. Large regions are not reallocated in A ZGC implementation (reallocation is a ZGC process used in the collector phase of copying objects, described later) because copying a large object is very expensive.

Concurrent implementation algorithmZGC also uses a reading barrier, but its implementation is completely different from Shenandoah’s

The Colored Pointer (or Version Pointer) technique is one of the most direct and purest ways to use the ZGC collector. It writes the Tag Pointer directly on the Pointer to the referenced object. Under Linux, the high 18 bits of the 64-bit pointer cannot be used to address, and the remaining 46 bits of the pointer can still support 64TB of memory. ZGC’s dye pointer targets the 46 bits of the pointer and extracts its high 4 bits to store the four flag information. With these identifier bits, the virtual machine can directly see the tricolor status of the reference object from the pointer, whether it has entered the reallocation set, and whether it can only be accessed through finalize() method. This directly results in the ZGC managing no more than 4TB of memory

Three advantages of dyeing Pointers

  • The dye pointer allows a Region to be freed and reused as soon as its live objects are removed, rather than waiting for all references to the Region in the heap to be corrected.
  • · Coloring Pointers can significantly reduce the number of memory barriers used during garbage collection. Memory barriers, especially write barriers, are often used to record changes in object references. In fact, so far the ZGC has not used any write barriers, only read barriers (partly thanks to the dye pointer, and partly because the ZGC doesn’t support generational collection yet, so cross-generation references are naturally not an issue).
  • The dye pointer can be used as an extensible storage structure to record more data related to the object marking and relocation process to further improve performance later. The first 18 bits of 64-bit Pointers are still unused on Linux, and while they cannot be used for addressing, they can be used for recording information in other ways.

The ZGC operation process is divided into four stages

  • Concurrent Mark: Like G1 and Shenandoah, the concurrent marking stage is the stage of traversing the object graph for reachability analysis, and it also goes through a short pause before and after the initial and final marks similar to G1 and Shenandoah, and the things done in these pause stages are also similar in terms of goals. Different from G1 and Shenandoah, the marking of ZGC is carried out on Pointers instead of objects, and the marker bits of dyeing Pointers Marked0 and Marked1 will be updated in the marking stage
  • Concurrent Prepare for Relocate: This phase requires determining which regions need to be relocated and which Relocation sets need to be relocated based on specific query criteria. In addition, JDK12’s ZGC restart support for class unloading and weak reference handling was completed in this phase
  • Concurrent Relocate: Redistribution is the core phase of the ZGC. This process copies surviving objects in the redistribution set to new regions, and maintains a Forward Table for each Region in the redistribution set to record the transition relationship from the original object to the new object. Thanks to support dyeing pointer, ZGC collector can learned from the reference to an object is in a redistribution, if a thread concurrent access to the concentration of the redistribution of object, the visit will be intercepted by the preset memory barrier, then immediately turn according to the Region of the published records to forward the access to the new copy objects, The ZGC refers to this behavior as the self-healing ability of the pointer, while updating the value of the reference so that it points directly to the new object. The advantage of this is that the object will be caught forwarding on only one access, that is, only slow once, compared to Shenandoah’s Brooks forwarding pointer, which has a fixed overhead for each access (slow every time), so the ZGC’s runtime load on the user program is lower than Shenandoah’s
  • Concurrent Remap: What remapping does is correct all references to objects in the redistributable set throughout the heap, which is similar from a goal point of view to Shenandoah’s concurrent reference update phase, but ZGC’s concurrent remapping is not a task that must be “urgent” because even references can heal themselves.

Epsilon

Epsilon, which appears in JDK11, is a garbage collector that cannot perform a garbage collector

The garbage collector can be used for performance tests and stress tests that require the impact of a glass garbage collector

Garbage collector trade-offs

  • What are the main concerns of the application: If you are a data analysis or scientific computing person, the goal is to get the results as quickly as possible, then throughput is a major concern; For SLA applications, the pause time directly affects the replication quality, and even causes service timeouts in severe cases. Latency is the main concern. If it is a client application or embedded application, the memory footprint is not negligible.
  • The infrastructure to run the application, such as hardware specifications, design, whether the system costs x86-32/64, SPARC or ARM/Aarch64; Number of processors, size of memory; The operating system of choice is Liinux, Saolaris or Windows
  • What is the publisher of the JDK? What is the version number? ZingJDK/Zulu, Oracle JDK, OpenJDK, etc

If ZGC doesn’t work on Windows, try Shenandoah

CMS can generally handle heap memory better if it is less than 4GB to 6GB. For larger memory, G1 can be focused on