JVM Quick Tuning Manual V1.0 bis: Common garbage Collector

If the collection algorithm is the methodology of memory collection, then the garbage collector is the concrete implementation of memory collection.

Java virtual machine specification for the garbage collector should be how to implement and there are no rules, so different vendors, different versions of the garbage collector is provided by the virtual machine may have very big difference, and generally provide parameters for users according to their application characteristics and requirements of using different s collector.

The figure shows seven collectors acting on different generations, and if there is a line between the two collectors, they can be used together. The region in which the virtual machine is located indicates whether it belongs to the new generation collector or the old generation collector.

Conceptual understanding

The terms concurrency and parallelism are concepts in concurrent programming, and in the context of talking about the garbage collector, they can be explained as follows.
- Parallel: Multiple garbage collection threads work in Parallel while the user thread is still in a waiting state.
- Concurrent: The execution of both the user thread and the garbage collector thread at the same time (but not necessarily in parallel and may occur alternately), with the user program continuing to run while the garbage collector is running on another CPU.
Minor and Full GC
- Minor GC: Refers to garbage collection that occurs in the new generation. Since Java objects tend to be ephemeral, Minor GC is very frequent and generally fast.
- Old GC (Major GC/Full GC) The Applicability of the Parallel Avenge Avenge a Major GC, often accompanied by at least one Minor GC (but not always, as in the Parallel Avenge strategy). Major GC is typically 10 times slower than Minor GC.
Throughput Throughput is the ratio of the CPU time spent running user code to the total CPU elapsed time. That is, throughput = user code elapsed time/(user code elapsed time + garbage collection elapsed time). In total, the virtual machine ran for 100 minutes, and garbage collection took 1 minute, which gives a throughput of 99%.

Serial collector

The Serial collector is the most basic and oldest, and was once (prior to JDK 1.3.1) the only choice for the new generation of virtual machines.

Features: This collector is a single-threaded collector, but its “single-threaded” meaning is not only that it uses only one CPU or one collection thread to complete the garbage collection, but also that it must suspend all other worker threads while it is garbage collecting until it is finished. Stop The World
Application scenario: The Serial collector is the default generation collector for VMS running in Client mode.
Advantages: Simple and efficient (compared to the single-threaded collections of other collectors), Serial collectors can achieve maximum single-threaded collection efficiency in a single-CPU-limited environment because they have no overhead of thread interaction.

Second, ParNew collector

Features: The ParNew collector is a multithreaded version of The Serial collector. In addition to using multiple threads for garbage collection, The rest of The behavior of The Serial collector includes all The control parameters available to The Serial collector, collection algorithms, Stop The World, object allocation rules, collection policies, and so on are exactly The same as The Serial collector. In implementation, the two collectors also share a considerable amount of code.
Application scenario: The ParNew collector is the preferred new generation collector for many virtual machines running in Server mode.

This is important because, in addition to the Serial collector, it is currently the only one that works with the CMS collector. In JDK 1.5, HotSpot introduced an almost revolutionary garbage collector in strongly interactive applications, the CMS collector. The CMS collector was the first truly concurrent collector in the HotSpot VIRTUAL machine, enabling the garbage collector thread to work simultaneously with the user thread for the first time. Unfortunately, CMS, as an older collector, does not work with the Parallel Scavenge collector, which already exists in JDK 1.4.0, so when using CMS to collect the older generation in JDK 1.5, the younger generation had to choose between the ParNew or Serial collector.
Serial collector VS ParNew collector: The ParNew collector is by no means better than the Serial collector in a single-CPU environment, and even with the overhead of thread interaction, the collector is not 100 percent guaranteed to outperform the Serial collector in a two-CPU environment implemented through hyperthreading technology. However, as the number of cpus available increases, it is good for efficient utilization of system resources during GC.

Insane

The Parallel Collector is a new generation collector. It is also a collector using the replication algorithm, and is a Parallel multi-threaded collector.
Application scenarios: Shorter pause time is more suitable for programs that need to interact with users. Good response speed can improve user experience, while high throughput can efficiently use CPU time to complete the program’s computing tasks as soon as possible. It is mainly suitable for tasks that do not need too much interaction in the background.
Comparative analysis:
- The Parallel Scavenge avenge VS CMS collector The Parallel Collector is characterized by its focus on minimizing the pause time of user threads during garbage collection. The goal of the Parallel Insane is to achieve a controlled Throughput. The Parallel Avenge collector is also often referred to as a “through-first” collector due to its affinity for throughput.
- The Parallel Scavenge collector VS ParNew Collector: An important difference between the Parallel Scavenge collector and the ParNew Collector is the adaptive adaptation strategy.
  
  The Parallel Exploiture collector has a parameter -xx :+UseAdaptiveSizePolicy. When this parameter is opened, do not need to manually specify the size of the new generation, Eden and Survivor area ratio, promotion object details such as the age old s parameters, virtual opportunities according to the operation of the current system in performance monitoring information collection, dynamic adjustment of these parameters in order to provide the most appropriate pause time or maximum throughput, This type of regulation is called GC Ergonomics.

Serial Old collector

Features: Serial Old is an older version of the Serial collector, which is also a single-threaded collector using a mark-collation algorithm.
Application Scenarios:
- The Serial Old collector in Client mode is also used by VMS in Client mode.
- Server mode In Server mode, it has two main uses: The Application is used in JDK 1.5 and earlier with the Parallel Scavenge collector, and as a fallback to the CMS collector when Concurrent Mode failures occur in Concurrent collections.

Five,Parallel Old collector

Features: The Parallel Old is an older version of the Parallel Insane, using multithreading and a mark-and-collate algorithm.
Application scenarios: The Parallel Insane and the Parallel Old collector can be used preferentially in applications that are driven by throughput and CPU resources.

The collector was only available in JDK 1.6, after the recent generation of the Parallel Exploder was somewhat of an embarrassment. The reason is that if the new generation chooses the Parallel Avenge, the older generation will have no choice but to use the Serial Old collector. The Parallel collector may not maximize throughput on the whole application due to the performance of the Serial collector on the server application. The Parallel collector can not take advantage of the multi-CPU processing power of the server due to the single-threaded aging collection. In older environments with large and more advanced hardware, the throughput of this combination may not even be as “awesome” as ParNew plus CMS. It wasn’t until the Parallel Old collector came along that the “throughput first” collector finally had a more worthy combination of applications.

Vi.CMS collector

Features: THE CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. At present, a large part of Java applications are concentrated on the server side of Internet sites or B/S systems. These applications pay special attention to the response speed of services and hope to have the shortest system pause time to bring users a better experience. The CMS collector is a good fit for such applications.

CMS collector is based on the “mark – clear” algorithm, its operation process is more complex than the previous several collectors, the whole process is divided into four steps:
- CMS Initial mark simply marks objects that GC Roots can be directly associated with, which is fast and requires “Stop The World”.
- CMS Concurrent Mark Concurrent marking is the process of GC Roots Tracing.
- CMS Remark remarking phase is used to correct the marking record of the part of objects whose marks are changed due to the continuous operation of the user program during concurrent marking. The pause time in this phase is generally slightly longer than that in the initial marking phase, but much shorter than that in the concurrent marking phase. Still need to “Stop The World”.
- The CMS Concurrent sweep phase clears objects.
Because the collector thread, which takes the longest concurrent markup and concurrent cleanup, can work with the user thread, the CMS collector’s memory reclamation process is, in general, executed concurrently with the user thread.
Advantages: CMS is an excellent collector, and its main advantages are already evident in its name: concurrent collection, low pauses.
Disadvantages:
- CMS collector is sensitive to CPU resources In fact, programs designed for concurrency are sensitive to CPU resources. In the concurrent phase, it does not cause user threads to pause, but it does slow down the application and reduce overall throughput by taking up a portion of the threads (or CPU resources). By default, the number of garbage collection threads started by CMS is (number of cpus +3) / 4. That is, when the number of cpus is more than 4, garbage collection threads should occupy at least 25% of the CPU resources in concurrent collection, and the number of garbage collection threads decreases with the increase of the number of cpus. But when there are fewer than four cpus (say, two), the CMS’s impact on user programs can become significant.
- The CMS collector is unable to handle floating garbage, and another Full GC may occur due to a “Concurrent Mode Failure”.
  
  Because the CMS concurrent cleanup phase user threads are still running, new garbage is naturally generated as the program runs. This part of garbage is generated after the marking process, and the CMS cannot dispose of it in the current collection, so it has to be cleaned up in the next GC. This part of garbage is called “floating garbage”. Because user threads in garbage collection phase still need to run, it will also need to set aside enough memory space for user thread is used, therefore the CMS collector can’t wait for old age is almost like other collector is completely filled to be collected, need to set aside part of space to provide operational use concurrent collection program. If the CMS is running without enough memory to meet the program’s requirements, a “Concurrent Mode Failure” occurs, at which point the virtual machine starts a fallback: the Serial Old collector is temporarily enabled to restart the Old garbage collection, resulting in long pauses.
- CMS is a collector based on a “mark-clean” algorithm, which means that a large amount of space debris is generated at the end of the collection.
  
  When space debris is too much, it will bring great trouble to the allocation of large objects. Often, there will be a large amount of space left in the old years, but they cannot find a large enough continuous space to allocate the current object, and they have to trigger a Full GC in advance.

Seven,G1 collector

Features: G1 (garbage-first) is a Garbage collector for server-side applications. The HotSpot development team has given it the mission to replace the CMS collector released in JDK 1.5 in the future. Compared to other GC collectors, G1 has the following characteristics.
- G1 can take full advantage of The hardware advantages of multi-CPU, multi-core environment, using multiple cpus to shorten The stop-the-world pause time, some of The other collectors originally need to pause Java threads to perform GC actions, G1 collector can still make Java programs continue to execute through The way of concurrency.
- Generational collection As with other collectors, the concept of generational collection remains in G1. Although G1 can manage the entire GC heap independently without the cooperation of other collectors, it can work differently with newly created objects and old objects that have been around for a while and have survived multiple GC’s for better collection results.
- Spatial integration with CMS “tag – cleaning” algorithm is different, the G1 as a whole is based on “tag – finishing” algorithm implementation of collector, from the perspective on the local () between the two Region is based on the “copy” algorithm, but in any case, this means that both algorithms G1 does not produce memory space debris during operation, Collection provides neat free memory. This feature helps programs run for a long time and allocate large objects without triggering the next GC prematurely because contiguity memory space cannot be found.
- Predictable pauses This is another big advantage of G1 relative to CMS, reduce the pause time is the common concern of G1 and CMS, but G1 in addition to the pursuit of low pause, also can establish predictable pauses model, can let the user specify in a length of M segment within milliseconds, time on garbage collection may not consume more than N milliseconds.
While other collectors prior to G1 collected the entire Cenozoic or old age, G1 no longer does. When using the G1 collector, the memory layout of the Java heap is very different from that of other collectors. It divides the entire Java heap into independent regions of equal size. While the concept of new generation and old generation is retained, the new generation and old generation are no longer physically separated. They are all collections of parts of regions (which do not need to be continuous).

The G1 collector is able to model predictable pause times because it can systematically avoid region-wide garbage collection across the entire Java heap. G1 tracks the value of Garbage accumulation in each Region (the amount of space collected and the empirical value of the collection time), maintains a priority list in the background, and collects the most valuable Region (hence the name garbage-first) based on the allowed collection time. This use of regions and prioritized Region collection ensures that the G1 collector achieves the highest possible collection efficiency in a limited amount of time.
Execution: The OPERATION of the G1 collector can be roughly divided into the following steps:
- Initial Marking phase is simply Marking objects that GC Roots can be directly associated with, and modifying the value of TAMS (Next Top at Mark Start) to allow the Next phase of user programs to run concurrently. New objects can be created in the correct available Region, which requires the thread to be paused, but takes a short time.
- Concurrent Marking the Concurrent Marking phase, which starts with GC Root to analyze the reachability of objects in the heap to identify viable objects, is time-consuming but can be performed concurrently with user programs.
- The Final Marking phase is used to correct the part of the Marking record that changes during concurrent Marking as the user’s program continues to operate. The virtual machine records object changes during this time in the thread Remembered Set Logs. The final marking phase requires the consolidation of data from the Remembered Set Logs into the Remembered Set. This phase requires the thread to be paused, but can be performed in parallel.
- The Live Data Counting and Evacuation recovery phase, which first sorts the recovery value and cost of each Region and develops a recovery plan based on the expected GC downtime of the user, can also be executed concurrently with the user program. But because only a portion of regions are reclaimed, the time is user-controlled, and pausing the user thread greatly improves collection efficiency.

Eight, summary

Although we are comparing collectors, we are not trying to pick the best one. Since there is no single best collector to date, let alone a one-size-fits-all collector, we have chosen only the one that is most appropriate for our specific application. It doesn’t take much explanation to prove that if there were a perfect collector that could be used universally in any situation, the HotSpot VIRTUAL machine would not need to implement so many different collectors.