The introduction

Hello, everyone, I am South orange, from contact with Java to now also have almost two years, two years, from a Java have several data structures do not understand super small white, to now understand a little bit of advanced small white, learned a lot of things. The more knowledge is shared, the more valuable, I this period of time summary (including from other big guy over there to learn, quote) some of the focus in the ordinary study and interview (self think), hope to bring some help to everyone

These are the two previous ones. If you haven’t read them, you can read them together

  • Conquering the JVM — JVM Objects and object access location (1)
  • Conquer the JVM — Garbage collection for the JVM (Part 2)

JVM article ideas from the Ape Man Valley big guy, big guy technology is very good, writing articles are also very hard, very satisfying to eat. (^_^)

Students in need can add my public account, the latest articles in the future are in the first time, you can also ask me for mind map

The previous article covered the JVM’s garbage collection mechanism, giving you a general idea of what objects are collected and under what circumstances. In this article we will focus on a few specific garbage collectors.

As mentioned earlier, we use different garbage collectors according to the characteristics of different generations. According to the previous analysis, the garbage collector suitable for the new generation needs to be selected with higher efficiency and faster collection speed. For the older generation, because the number of recycling times is less, the use of new generation algorithms such as the mark-copy algorithm should be avoided.

1, Serial collector: Serial

Serial collector is a new generation of garbage collector based on mark-copy algorithm. It is collected in a single-threaded manner and is not allowed to be disturbed by application threads while the GC thread is working. Thus, while garbage collection is going on, the application goes into a paused state, called stop-the-world. For single-CPU limited environments, the Serial collector has no thread interaction (switching) overhead and can achieve the highest single-threaded collection efficiency.

2. Parallel collector: ParNew

The parallel collector is also a new generation of garbage collectors, based on the mark-copy algorithm, and can be seen as a multithreaded version of Serial. It takes advantage of the advantages of multiple processors, with multiple GC threads collecting in parallel, and a parallel collector working in a multi-processor environment that dramatically reduces stop-the-world time. The behavior and characteristics of the Serial collector are the same except for multithreading.

At the same time, only ParNew can be applied to the CMS collector. As an old collector, CMS cannot be applied to the existing new-generation collector in JDK1.4. Parallel Scavenge (and G1) is an essential application for a CMS. Parallel Scavenge is implemented independently and does not use a traditional GC collector code framework.

Throughput priority collector: Parallel Scavenge

First of all, a hint: Throughput is the ratio of CPU time spent running user code to total CPU consumption, which is throughput = User code time running/(User code time garbage collection time running). If the virtual machine runs for 100 minutes and garbage collection takes 1 minute, the throughput is 99%.

The Parallel Scanvenge collector is a new generation of garbage collector, mark-copy algorithm, similar to ParNew, but more focused on throughput. It provides a set of parameters on top of ParNew to configure the desired collection time or throughput, which is then targeted for collection.

The Parallel Scavenge collector provides two parameters to accurately control throughput:

  • 1, – XX: MaxGCPauseMillis

    To control the maximum garbage collection pause time, the number of milliseconds greater than 0, MaxGCPauseMillis is set to a smaller, the pause time may be shortened, but may also cause throughput degradation;

Because it might cause garbage collection to occur more frequently;

  • 2, – XX: GCTimeRatio.”

    Set the ratio of garbage collection time to total time. The value is an integer ranging from 0 to n to 100. GCTimeRatio equals to the throughput.

Iv. CMS garbage collector

The Concurrent Mark Sweep (CMS) collector is a collector whose goal is to obtain the shortest collection pause time.

The CMS collector only works on old collections and is based on the mark-clean algorithm. Its process is divided into four steps:

  • 1. CMS Initial mark

The initial markup phase mainly does two things: one is to traverse the old objects that GC Root can reach; Second, traversing the new generation directly to the old age objects. Direct here refers to the level 1 objects that are directly associated with GC Root. The initial markup phase is completely STW, and the reference program pauses. Through – XX: + CMSParallelInitialMarkEnabled parameters can open this phase parallel tag, use multiple threads, reduce the pause time.

  • 2. CMS Concurrent Mark

The concurrent markup phase is executed with the application and does two main things:

  • 1. Trace the living objects marked in the initial tag and mark these objects as reachable objects, such as A->B. A is recognized in the initial tag, while B is recognized in the concurrent tag phase.

  • 2. Mark the cards of the objects promoted to the old age from the new generation in the concurrent phase, the objects directly allocated in the old age, and the objects whose reference relationship changes in the old age as dirty, so as to avoid scanning the whole old age in the re-marking phase.

Because the concurrent marking phase is executed at the same time as the program, there is A case where A->B->C becomes A->C, in which case C objects cannot be marked during the concurrent marking phase. The three-color marking algorithm is used in the marking phase. The tricolor method divides objects in GC into three cases: white: objects that have not been searched (white objects are treated as garbage objects) Gray: objects that are being searched Black: objects that have been searched (not treated as garbage objects and will not be GC)

(Takes a picture from the boss) This is the trichromatic notation. As we can see, the tricolor method is very porous, so the write barrier method is used. That is, if A is already marked (already black), then when the user thread changes A->C, it automatically turns C to gray, so that it can search for C in the future.

But what if, during concurrency, the thread takes effect an object that has already been deactivated? That’s when it comes to re-marking.

  • 3, re-mark (CMS)

The initial marking and re-marking steps require stop-the-world. The re-marking phase is used to correct the part of the object’s marking record that has changed as the user program continues to operate during concurrent marking. For example, at this stage the user thread generates a new object. The object is white and cannot be GC. The purpose of this phase is to re-label the objects. The pause time for this phase is typically slightly longer than for the initial phase, but much shorter than for concurrent markup.

  • 4. CMS Concurrent Sweep

The mark-clear algorithm was used to clear

Benefits of CMS Collector

Concurrent collection, low pause.Copy the code

CMS Collector Disadvantages

The CMS collector is very sensitive to CPU resources. The CMS collector cannot handle Floating Garbage. The CMS collector is based on a mark-clean algorithm, which has all the disadvantages.Copy the code

G1 Garbage collector

G1 is a garbage collector for server-side applications. G1 has the following features:

  • 1. Redefinition of the heap space: breaking the original generational model, the heap is divided into regions. Each partition does not serve a certain generation, and can switch between the young generation and the old generation as needed.

  • Parallel to concurrency: G1 can take advantage of the hardware advantages of a multi-CPU, multi-core environment. It uses multiple cpus to shorten stop-the-world pauses. Some other collectors used to Stop Java threads from performing GC operations, but G1 can still keep Java programs running concurrently.

  • 3. Generational collection: Although G1 can manage the entire GC heap independently without the need for other collectors, the concept of generational collection is retained. It can handle newly created objects differently from old objects that have been around for a while and weathered multiple GCS for better collection results.

  • 4. Spatial integration: Different from mark-clean algorithm of CMS, G1 is a collector based on mark-clean algorithm on the whole; Locally, it is based on the mark-copy algorithm.

  • 5. Predictable pauses: This is another big advantage of G1 over CMS. Reducing pause times is a common concern of BOTH G1 and CMS.

Comparison of features of G1 and CMS:

Characteristics of the G1 CMS
Concurrency and generational is is
Maximize heap memory release is no
Low latency is is
throughput high low
compaction is no
predictability strong weak
The physical separation of the new generation from the old no is

partition

G1 redefines the heap space, breaking the original generational model and dividing the heap into regions

  • G1 uses the idea of partitioning, where the entire heap space is divided into several memory regions of equal size
  • Before G1, other collectors collected the entire new generation or the old. G1 no longer does. When designing the structure of the heap, G1 breaks the previous pattern of fixing the collection scope in the new generation or the old age. G1 divides the heap into many regions of the same size, each of which is called a Region. A Region is a memory space with contiguous addresses.
  • The G1 collector divides the entire Java heap into independent regions of equal size. Although the concept of the new generation and the old age is retained, the new generation and the old age are no longer physically isolated, but are collections of partial regions (which do not need to be continuous). The Region size is the same. The value is a power of 2 between 1M and 32M bytes. The JVM tries to divide 2048 regions of the same size

(This picture is in many articles, SO I have the audacity to borrow it)

card

The G1 garbage collector then divides each partition into cards of 512 Byte size. Cards that identify the minimum available heap memory granularity for all partitions are recorded in the Global Card Table.

An object allocated in the JVM occupies a number of physically sequential cards, and when looking for a reference to an object in the partition, the reference object can be found through the record card (see RSet). Each time the memory is recycled, the cards of the specified partition are processed.

The heap

G1 can also specify the heap size.

Automatically resize the heap space by calculating the ratio of time spent by the GC to the application when a young generation collection or mixed collection occurs. If the GC frequency is too high, the GC frequency can be reduced by increasing the heap size, and the GC time is correspondingly reduced.

Target parameter ** -xx :GCTimeRatio** is the GC/application time ratio, which defaults to 9 for G1 and 99 for CMS because CMS is designed to spend as little time as possible in GC. In addition, when the space is insufficient, for example, when object space allocation or transfer fails, G1 first tries to increase the heap space. If the capacity expansion fails, IT initiates the guaranteed Full GC.

After Full GC, the heap size calculation results also adjust the heap space.

generational

G1 logically divides memory into young and old generations, where young generations are divided into Eden Spaces and Survivor Spaces. However, the young generation space is not fixed. When the existing young generation partition is full, the JVM allocates new free partitions to the young generation space.

New objects in G1 are always allocated in Eden. Objects that have been garbage collected once are moved to the Survisor area. Objects that are still alive after several (15) garbage collections are moved to the Old area.

The entire young generation memory changes dynamically between the initial size ** -xx :G1NewSizePercent(the default is 5% for the entire heap) and the maximum size -xx :G1MaxNewSizePercent(the default is 60%). The value can be calculated from the target pause time -xx :MaxGCPauseMillis(200ms by default)**, the size to be expanded and the memory set (RSet) of the partition.

Here’s the question, why is the JVM generational age 15? (From the young generation to the old generation after 15 times)

As we mentioned earlier (everything — locks in JAVA), the HotSpot object header consists of two main parts of data: Mark Word (tag field), Klass Pointer (type Pointer), and Mark Word by default stores the object’s HashCode, generational age, and lock flag bit information. In HotSpot, there are four bits used to store object generational ages.

Ok? The generational age of the object is 4, which is 0000, and the maximum is 1111, which is 15.

Local allocation buffer

In chapter 1 of the JVM, we mentioned TLAB. Application threads can monopolize objects created in a local buffer (TLAB), and most of them fall into Eden (except for giant objects or allocation failures), so TLAB partitions belong to Eden.

Each GC thread also has its own local buffer (GCLAB) to move objects to each garbage collection, and each collection copies objects to Suvivor space or old age space. For objects promoted from the Eden/Survivor space to the Survivor/ old age space, there is also a gC-exclusive local buffer to operate on, called the promoted local buffer (PLAB).

The G1 uses memory in units of regions and allocates objects in units of cards.

Giant object

Sometimes objects are too large to be allocated in TLAB (for example, more than one partition).

If the size of an object exceeds the size of a partition, then two consecutive partitions are directly allocated to hold the giant object in the old partition. Giant partitions must be contiguous and will not be moved after they are allocated.

Before JDK8u40, giant objects can only be collected during the cleanup phase of the concurrent collection cycle or during the FULL GC process. After JDK8u40 (including this version), giant objects can also be collected in the young collection set once no other objects reference the giant object.

Remember Set (RSet)

In order to avoid stW-style heap scans, the G1 garbage collector records a memorized set (RSet) in each partition, which internally acts as a backward pointer to the card index that references the objects in the partition. When the partition is to be reclaimed, the RSet of the partition is scanned to determine whether objects referencing the partition are alive, and then determine whether objects in the partition are alive.

However, not all references need to be recorded in the RSet. The G1 GC collects the young generation as a whole each time, so references to objects from the young generation do not need to be recorded in the RSet. Only older partitions may have RSet records. Also, if a partition is determined to be scanned, references can be obtained without an RSet. Objects that refer to the partition do not need to be recorded in the RSet.

G1 maintains rsets to obtain accurate partition references. Rsets are maintained by Write barriers and Concurrence Refinement Threads.

Collection set (CSet)

A collection set (CSet) represents a list of target partitions that are reclaimed during each GC pause

At any collection pause, all partitions of the CSet are released, and internally living objects are moved to the allocated free partition. A young generation collection CSet contains only young generation partitions, while a mixed collection uses a heuristic algorithm to select the partitions with the highest collection revenue from the old generation candidate collection partitions and add them to the CSet.

Or ask big guy to borrow a picture to show you what is CSet

G1 has two main GC modes, Young GC and Mix GC, both of which need to Stop The World(STW)

Young GC (New Generation of Refuse Collection)

Process:

  • When Eden is full, the Young GC is triggered (the Young GC is parallel, stop-the-world).
  • 2. Copy the surviving objects in the Eden Region to survivor, or directly promote them to Old region; Copy surviving objects from a Survivor Regin to a new Survivor or promote an old region.
  • 3. For the next Young GC, readjust the size of Eden zone and Survivor zone according to the expansion size and partition memory set

Mix GC

The Mix GC includes the Young GC and the Old GC

Process:

  • 1. Initial markup :(stop-the-world) occurs with a normal Young GC, and then labels the Survivor region (root region) because there may be references to the older era in that region.
  • 2. Scan the root reference area: Because a Young GC was performed first, the current Young generation only has surviving objects in the Survivor area, which is called the root reference area. Scans for Survivor references to older ages. This phase must end before the next Young GC occurs.

Young generation collection cannot take place in this phase. If the Eden zone is full, Young GC can be performed only after the end of this phase.

  • 3. Concurrent marking: Looking for surviving objects of the entire heap, this phase can be interrupted by Young GC.

This phase is executed concurrently and can occur multiple times by the Young GC, which interrupts the marking process

  • 4, re-mark: stop-the-world, complete the last surviving object mark. The snapshot-at-beginning (SATB) algorithm, which is more efficient than the CMS collector, is used.

At this stage, fully idle blocks are reclaimed

  • 5. Cleanup: At this final stage, the G1 GC performs STW operations for statistics and RSet combing. During statistics, the G1 GC identifies regions that are completely free and regions that are available for mixed garbage collection. The cleanup phase is partially concurrent when the white space is reset and returned to the free list. Very little memory is actually reclaimed during the cleanup phase.

Full GC

In addition to the Young and Old GCS, when G1 fails to apply for a new partition in the heap space, G1 triggers a guarantee mechanism and executes a STW-style, single-threaded Full GC. The Full GC marks up and compresses the entire heap, eventually containing only pure living objects.

G1 will trigger Full GC in the following scenarios, with a log of to-space-exhausted and Evacuation failures:

  • 1. Promotion failure: After the concurrent cycle ends, there is a mixed garbage collection cycle, and the old space is cleaned up along with the young generation garbage collection. If the cleaning speed at this time is less than the consumption speed, resulting in insufficient old space, promotion failure will occur.
  • 2. Evacuation failure: Young generation garbage collection, if there is not enough space for all surviving objects in Survivor and Old sections. This situation is certainly fatal, since there is almost no space left to use, and it makes sense to trigger Full GC.
  • 3. Large object allocation failure: In the old days, Full GC was triggered when allocating large objects that could not find enough contiguic partitions. We should try not to create large objects, especially those larger than a block size.

The G1 garbage collector also has a number of parameters, which are described here

-xx :+UseG1GC Uses the G1 collector. -xx :MaxGCPauseMillis=200 Specifies the target pause time. The default value is 200 milliseconds. When setting the XX:MaxGCPauseMillis value, do not specify the average time, but specify that 90% of the pauses are within this time. Remember, the pause time goal is a goal that is not always met. - XX: InitiatingHeapOccupancyPercent = 45 whole heap usage after reaching the proportion, trigger concurrent GC cycle, 45% by default. -xx :NewRatio=n Old generation/young generation. The default value is 2, that is, 1/3 young generation. Do not set a fixed size for young generation for 2/3 old generation. -xx :SurvivorRatio=n Eden/Survivor, default value 8, -xx :G1HeapRegionSize=n The size of each region. -xx :G1HeapRegionSize=n The size of each region. The default value is 1MB to 32MB based on the heap size, which we usually specify as the whole heap size. -xx :ConcGCThreads=n Number of garbage collection threads during the concurrent markup phase increasing this value can make concurrent markup complete faster. If this value is not specified, the JVM will calculate this by using the following formula: ConcGCThreads=(ParallelGCThreads + 2) / 4^3 -xx :ParallelGCThreads=n Number of parallel garbage collection threadsCopy the code

Six, summarized

Jdk1.7 default garbage collector to be insane. Apply for the application.

Be insane. Jdk1.8 is the default garbage collector to apply for the APPLICATION.

Jdk1.9 default garbage collector G1

By learning about the JVM garbage collector, we are not going to actually write these things ourselves, but we are going to be able to work on the appropriate size of objects and set the appropriate parameters according to different situations. With JVM parameters set and code review, each of us can write good code.

conclusion

The last chapter on the JVM is finished. In the process of writing, I reviewed a lot of things. After reading this, I feel that there will be a little progress in interview and tuning. If you think it’s ok, you can give me a “like” or something. See you!

At the same time, if you need a mind map, you can contact me, after all, the more knowledge is shared, the more fragrant!