1: Whim

Today, after dinner, I was full, and I didn’t want to write code, so I wrote a blog. Let’s write a JVM garbage collector related one

2: an overview

The JVM specification has no rules about how the garbage collector should be implemented, so the garbage collector is provided by different vendors and different versions of virtual machines, so let’s just look at HotSpot virtual machines. Just as there is no best algorithm, there is no best garbage collector, only the best fit. All we can do is choose the best garbage collector for the specific application scenario.

3: Serial collector

The Serial collector is the most basic and oldest garbage collector (the new generation uses the copy algorithm, the old generation uses the flag collation algorithm). As you can see from the name, this collector is a single-threaded collector. Not only does it use only one garbage collection thread to complete The garbage collection, but it must “Stop The World” all other worker threads until The garbage collection is complete. Look at the picture and understand:

Above:

  • The new generation uses The copy algorithm, Stop-the-world
  • In The old days, The mark-tidy algorithm, Stop-the-world, was used

While it causes stop-the-world when it does GC work, just as every algorithm exists for a reason, so does The serial collector: Because it is simple and efficient (compared to the single-threaded performance of other collectors), for a single-CPU-limited environment, concentrating on GC naturally achieves the highest single-threaded efficiency without the overhead of thread interaction. So the Serial collector is a good choice for applications running in client mode (it is still the default collector for the next generation of virtual machines running in Client mode). The disadvantages of Serial collectors are obvious, and virtual machine developers are certainly aware of this. So I’ve been cutting back on Stop The World. The pause times get shorter in subsequent garbage collector designs (but there are still pauses, and the search for the best garbage collector continues)

The characteristics of

  • Collectors for the new generation;
  • Copy algorithm is adopted;
  • Single-thread collection;
  • When garbage collection is done, all worker threads must be paused until complete; “Stop The World”;

Application scenarios

  • It is still the default new generation collector for HotSpot in Client mode.
  • There are also advantages over other collectors: simplicity and efficiency (compared to the single-threaded nature of other collectors);
  • For environments limited to a single CPU, the Serial collector achieves the highest single-thread collection efficiency without thread interaction (switching) overhead;
  • In a user’s desktop application scenario, the available memory is generally small (tens of megabytes to one or two hundred megabytes) and garbage collection can be completed in a relatively short time (tens of MS to more than one hundred MS), as long as it does not occur frequently, this is acceptable

Set the parameters

Add this parameter to explicitly use the serial garbage collector: “-xx :+UseSerialGC”

4: ParNew collector (multi-threaded version of Serial collector – uses multiple threads for GC)

The ParNew collector is essentially a multithreaded version of the Serial collector, with the same behavior (control parameters, collection algorithms, collection strategies, and so on) as the Serial collector, except that it uses multiple threads for garbage collection. It is the primary choice for many virtual machines running in Server mode, and is currently the only one that works with the CMS collector in addition to the Serial collector. The CMS collector is considered an epoch-making concurrent collector, so if a garbage collector can be paired with it to make it even better, it must be an integral part. The running process of the collector is shown below:

The characteristics of

  • With the exception of multithreading, the behavior and characteristics of the Serial collector are the same.
  • For example, Serial collector can control parameters, collection algorithm, Stop The World, memory allocation rules, reclaim strategy, etc.
  • The Serial collector shares a lot of code;

Application Scenarios:

In Server mode, the ParNew collector is a very important collector because it is currently the only one besides Serial that works with the CMS collector; However, in a single CPU environment, it is no better than the Serail collector because of the thread interaction overhead.

Set the parameters

“-xx :+UseConcMarkSweepGC” enforces the use of ParNew: “-xx :+UseParNewGC” specifies the number of garbage collection threads. ParNew Specifies the number of garbage collection threads that are enabled by default.” -xx :ParallelGCThreads”

Why does only ParNew work with the CMS collector

  • CMS is HotSpot’s first truly Concurrent collector in JDK1.5, the first to allow garbage collection threads to work (basically) concurrently with user threads;
  • CMS is an older collector that does not work as Parallel Scavenge with JDK1.4.
  • The Parallel Insane (and G1) do not use the traditional GC collector code framework and are implemented independently; The other collectors share some of the framework code;

5. Parallel Avenge

The Parallel Scavenge collector is a new generation collector that uses replication algorithms as well as the Parallel multithreaded collector. The Parallel Scavenge collector focuses on throughput (how to use the CPU efficiently). Garbage collectors such as CMS focus more on the pause times of user threads (improving user experience). Throughput is the ratio of the CPU time spent running user code to total CPU consumption. Throughput: ratio of CPU time spent on user code to total CPU consumption, i.e. = time spent running user code /(time spent running user code + garbage collection time). For example, if the virtual machine runs for 100 minutes and garbage collection takes 1 minute, the throughput is 99%. Operation diagram:

The Parallel Collector provides a number of parameters to find the most appropriate pause times or maximum throughput. Rather than manually optimize the collector, you can leave memory management optimization to the virtual machine if you don’t know how the collector operates.

The characteristics of

  • Cenozoic collector;
  • Copy algorithm is adopted;
  • Multithreaded collection;
  • Collectors such as CMS focus on minimizing the pause time of user threads during garbage collection; The goal of the Parallel Insane is to achieve a controlled Throughput.

Application scenarios

  • The goal of high throughput is to reduce garbage collection time and allow user code to run longer;
  • When the application runs on multiple cpus and does not have a particularly high pause time requirement, that is, the program mainly performs calculations in the background without much interaction with the user;
  • For example, applications that perform batch processing, order processing (reconciliation, etc.), payroll, scientific calculations;

Set the parameters

The Parallel Avenge collector provides two parameters for precise throughput control:

  • Control the maximum garbage collection pause time “-xx :MaxGCPauseMillis”

  • Controls the maximum garbage collection pause time, the number of milliseconds greater than 0; If MaxGCPauseMillis is set to a smaller size, pause times may decrease, but throughput may also decrease. Because garbage collection may occur more frequently; Set the ratio of garbage collection time to total time “-xx :GCTimeRatio”

  • Set the ratio of garbage collection time to total time, 0 < n < 100 integer; GCTimeRatio is equivalent to setting the throughput size. The ratio of garbage collection execution time to application execution time is calculated as 1 / (1 + n). For example, the -xx :GCTimeRatio=19 option sets garbage collection time to 5% of the total time =1 /(1+19); The default value is 1% = 1/(1+99), that is, n=99; The time spent in garbage collection is the total time collected by the younger generation and the older generation; If the throughput goal is not met, the memory size of the generation is increased to maximize the time the user program runs;

6: Serial Old collector

An older version of the Serial collector, which is also a single-threaded collector. It is used primarily for two purposes: as a companion to the Parallel Scavenge collector in JDK1.5 and earlier releases, and as a fallback to the CMS collector

The characteristics of

  • For the old age;
  • A “Mark-sweep-compact” algorithm was adopted.
  • Single-thread collection;

Application scenarios

  • It is mainly used in Client mode.
  • The Application of the Parallel Avenge collector is insane and exploiture. The Application of the Parallel Avenge collector is insane. (B) as a backup plan for CMS collector in the event of Concurrent Mode Failure of Concurrent collection;

7: Parallel Old collector

An older version of the Parallel Exploiter. Use multithreading and mark-tidy algorithms. The Parallel Avenge and Parallel Old collectors are preferred in applications where throughput and CPU resources are important. Only available in JDK1.6.

The characteristics of

  • For the old age;
  • The “mark-collation – compression” algorithm is adopted.
  • Multithreaded collection; The Parallel Avenge /Parallel Old collector is illustrated as follows:

Application scenarios

  • JDK1.6 and later used to replace the Serial Old collector;
  • Especially in Server mode with multiple cpus; The Parallel Insane and Parallel Old collector is the result of the application of the Insane in throughput and CPU-sensitive scenarios.

Set the parameters

-xx :+UseParallelOldGC

8: Concurrent Mark Sweep (CMS) collector

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. It is ideal for use in ux focused applications.

The characteristics of

  • For the old days
  • Based on “mark-clean” algorithm (no compression operation, memory fragmentation)
  • Aim to obtain the shortest collection pause time
  • Concurrent collection, low pause
  • CMS is HotSpot’s first truly Concurrent collector in JDK1.5; For the first time, garbage collection threads work (basically) at the same time as user threads;

Application scenarios

  • Scenarios with a lot of user interaction; (such as common WEB, B/S- browser/server mode system on the server application)
  • Expect the system to stop the shortest time, pay attention to service response speed; To give users a better experience;

CMS collector process

As the word Mark Sweep in its name implies, the CMS collector is implemented as a mark-and-sweep algorithm, which is a bit more complex than the previous garbage collectors. The whole process can be divided into four steps:

  • Initial tag: Suspends all other threads. Initial tag only marks objects that GC Roots can be directly associated with, which is fast;
  • Concurrent markup Concurrent markup is the process of GC Roots Tracing; Start both GC and user threads, using a closure structure to log reachable objects. At the end of this phase, however, the closure structure is not guaranteed to contain all currently reachable objects. Because the user thread may be constantly updating the reference field, the GC thread cannot guarantee real-time accessibility analysis. So the algorithm keeps track of where these references are being updated;
  • Re-mark: the re-mark stage is to correct the mark record of the part of the object whose mark is changed because the user program continues to run during the concurrent mark period (multi-threaded parallel execution is adopted to improve efficiency); You need to “Stop The World”, and The pause is slightly longer than The initial tag, but much shorter than The concurrent tag;
  • Concurrent cleanup: the user thread is started, and the GC thread starts cleaning the marked area and reclaiming all garbage objects. The collector thread can work with the user thread because of the longest concurrent markup and concurrent cleanup process. So in general, CMS memory reclamation is performed “concurrently” with the user thread. The CMS collector runs as follows:

Setting parameter specifies to use CMS collector “-xx :+UseConcMarkSweepGC”

disadvantages

(1) Sensitive to CPU resources

Programs designed for concurrency are CPU sensitive (a characteristic of concurrent programs). In the concurrent phase, it does not cause user threads to pause, but it does slow down the application and reduce overall throughput by taking up a portion of the threads (or CPU resources). (In an accounting system, the CMS collector is not appropriate). CMS default number of collection threads =(number of cpus +3)/4; As the number of cpus increases, the number of reclaimed threads consumes less CPU. That is, when there are more than 4 cpus, the concurrent collection of garbage collection threads is not less than 25% of the CPU resources, which may have a greater impact on user programs. Less than four, the impact is greater and may be unacceptable. (For example, if CPU=2, then a thread reclamation is initiated, accounting for 50% of the CPU resources.) (A collection thread will occupy CPU resources for the duration of the collection)

  • For this situation, “Incremental Concurrent Mark Sweep” (I-CMS) occurred; Similar to the idea of using preemption to simulate multitasking mechanism, let the collection thread and user thread run alternately, reduce the running time of the collection thread; But the effect is not ideal, JDK1.6 after the official no longer advocate users.

(2) Floating garbage cannot be processed

Floating garbage cannot be processed and may result in a “Concurrent Mode Failure”.

  • This makes it necessary to reserve a certain amount of memory for concurrent cleanups, unlike other collectors, which can almost fill up the old years before collecting. You can also assume that the CMS needs more space than other garbage collectors; Can use “- XX: CMSInitiatingOccupancyFraction”, set the CMS reserved old s memory space; (See noun explanation for details)

(3) generate a large number of memory fragments

Because CMS is based on the “mark + clear” algorithm to recover old objects, it will cause a lot of space debris problems after running for a long time, which may lead to the failure of promotion of new generation objects to old generation. Due to excessive fragmentation, the allocation of large objects can be troublesome. So you have a situation where the old generation has a lot of free space, but there is no contiguous space to allocate the current object, and you have to trigger a Full GC beforehand.

  • Solution using “- XX: + UseCMSCompactAtFullCollection” and “- XX: + CMSFullGCsBeforeCompaction”, need to combine.
  • UseCMSCompactAtFullCollection “-XX:+UseCMSCompactAtFullCollection”

In order to solve the problem of space debris, the CMS provide the collector – XX: + UseCMSCompactAlFullCollection logo, setting the CMS above this happens without Full GC, and open the memory fragments merging finishing process; However, the merge and collation process cannot be concurrent and the pause time will be longer. The default open (but not, need to combine CMSFullGCsBeforeCompaction use);

  • CMSFullGCsBeforeCompaction due to merge sort is unable to execute concurrently, the problem of space debris, but has led to a continuous pause. So, you can use another parameter – XX: CMSFullGCsBeforeCompaction, said in many, many times without compression after Full GC, compression of space debris. It can reduce the pause time of the consolidation process. The default is 0, which means that Full GC is performed every time and no collation is done; Since space is no longer contiguous, CMS needs to use the available “free list” memory allocation method, which is more expensive than the simple practical “collision pointer” memory allocation.

CMS&Parallel Old

Overall, CMS reduces the application pause time when performing old-era garbage collection compared to Parallel Old garbage collector. However, it increases the application pause time for new generation garbage collection, reduces throughput, and takes up more heap space. (Reason: THE CMS saves time by not collating memory, but the available space is no longer contiguous, and garbage collection can no longer simply use Pointers to the next address available to allocate memory for an object. Instead, in this case, you need to use a list of available Spaces. That is, a list pointing to the unallocated region is created, and each time memory is allocated for an object, an area of memory of the appropriate size is found from the list to allocate memory for the new object. As a result, the memory allocation on the old age is more expensive than the simple utility collision pointer allocation. This also adds an additional burden to the young generation garbage collection, since most objects in the old generation are promoted from the new generation to the old generation at the time of the new generation garbage collection. When the new generation cannot allocate large objects, it allocates them to the old generation.

9: G1 collector

The previous generation of garbage collectors (Serial, parallel, and CMS) divided the heap memory into three parts of fixed size: The young generation, the old generation, and the permanent generation.

Note: All objects in heap memory can be considered Java objects.

G1 (garbage-first) is the commercially available collector of JDK7-U4. G1 (garbage-First) is a server-based Garbage collector, mainly for machines equipped with multiple processors and large memory capacity. High throughput performance characteristics while meeting the GC pause time requirements with extremely high probability. Is seen as an important evolutionary feature of the HotSpot virtual machine in JDK1.7. G1 is intended to replace CMS in the future and has become the default collector in JDK1.9.

The characteristics of

  • Parallelism and concurrency

The G1 takes full advantage of The hardware advantages of cpus and multi-core environments, using multiple cpus (cpus or CPU cores) to shorten stop-the-world pause times. While other collectors would have paused GC actions performed by Java threads, the G1 collector can still allow Java programs to continue executing concurrently.

  • Generational collection

Although G1 can manage the entire GC heap independently without the cooperation of other collectors, it retains the concept of generations.

  • The ability to manage the entire GC heap (young and old) independently without needing to be paired with other collectors;
  • Being able to deal with objects of different eras in different ways;
  • While the generational concept remains, the memory layout of the Java heap varies considerably;
  • Divide the whole heap into independent regions of equal size.
  • Cenozoic and oleozoic are no longer physically separate; they are collections of regions (which do not need to be continuous);

Spatial integration

Different from CMS’s “mark-clean” algorithm, G1 is a collector based on the “mark-clean” algorithm as a whole. Locally, it is based on a “copy” algorithm.

  • As a whole, it is based on mark-collation algorithm.
  • Locally (between two regions), it is based on the replication algorithm. This is an implementation of a train-like algorithm; Will not generate memory fragmentation, is conducive to a long time running;

The train algorithm is the algorithm used by generational collectors to provide time-limited progressive collections in mature object space. This will be covered in a later article.)

Predictable pauses

This is another big advantage G1 has over CMS. Reducing pause times is a common focus for both G1 and CMS, but G1 models predictable pause times in addition to pursuing low pauses. You can explicitly specify that garbage collection takes no more than N milliseconds in a time slice of M milliseconds. High throughput with low pauses.

The problem

Why can G1 achieve predictable pauses

Region-wide garbage collection in the Java heap can be systematically avoided; The G1 collector divides memory into independent regions of equal size, with the concepts of new generation and old generation retained, but no longer physically isolated. G1 tracks the value of each Region and maintains a priority list in the background. According to the allowed collection time, the Region with the highest value (garbage-first) is reclaimed First. This ensures the highest possible collection efficiency in a limited time;

The problem of an object being referenced by different regions

A Region cannot be isolated. Objects in a Region can be referenced by objects in any Region. Do YOU need to scan the entire Java heap to determine whether an object is alive? In other generational collectors, the problem is also present (and more so in G1) : collecting the new generation also has to scan the old generation? This reduces the efficiency of the Minor GC;

Solutions:

Regardless of G1 or other generational collectors, the JVM uses Remembered Set to avoid global scans: each Region has a corresponding Remembered Set; Each time a Reference data Write operation is performed, a Write Barrier operation is generated. Then check whether the Reference to be written refers to an object in a different Region from the Reference type data (other collectors: check whether old objects refer to new ones). If not, the related references are recorded in the Remembered Set of the Region where the reference points to the object through CardTable. When garbage collection is performed, add the enumeration scope of the GC root to Remembered Set. You can guarantee that no global scan will be done, and there will be no omissions.

Application scenarios

  • Service-oriented applications, for machines with large memory, multi-processor;
  • The primary application is to provide a solution for applications that require low GC latency and have a large heap; For example, when the heap size is about 6GB or larger, predictable pause times can be less than 0.5 seconds; (Practice: change CMS garbage collector to G1 in the reconciliation system and reduce the reconciliation time by more than 20 seconds)

When the G1 garbage collector is better than CMS, consider the following (but not absolute) scenarios: more than 50% of the Java heap is occupied by active data; The lifting frequency of the object assignment frequency or decade varies greatly; The GC pause time is too long (longer than 0.5 to 1 second); Suggestion: If the current collector doesn’t have problems, don’t rush to G1; If your application is looking for low pauses, try G1; Whether or not to replace the CMS will only be known if you need actual scenario testing. (If you find that G1 performance is not as good as CMS, choose CMS.)

Set the parameters

You can use the following parameters to set some g1-related configurations. -xx :+UseG1GC

When the total Java heap usage reaches the parameter value, the concurrent marking phase begins; The default is 45: “- XX: InitiatingHeapOccupancyPercent”

Set the pause time target for G1, default is 200 ms: “-xx :MaxGCPauseMillis”

Set the size of each Region, ranging from 1MB to 32MB. The goal is to have about 2048 regions at the minimum Java heap: “-xx :G1HeapRegionSize”

Minimum new generation value, default 5%: “-xx :G1NewSizePercent”

New generation Max, default 60%: “-xx :G1MaxNewSizePercent”

Set the number of ParallelGCThreads during STW: “-xx :ParallelGCThreads”

-xx :ConcGCThreads = “-xx :ConcGCThreads”

G1 in the marking process, the object activity of each region is calculated. When collecting, the region with low activity can be selected according to the pause time set by the user, which can ensure garbage collection and pause time without reducing throughput too much. The application of the new algorithm in the Remark stage and the compression in the collection process make up for the deficiency of CMS. To quote the Oracle website: “G1 is planned as the long term replacement for the Concurrent mark-sweep Collector (CMS)”. G1 is planned as a long-term alternative to the concurrent mark-sweep collector (CMS)