When I've answered 18 JVM questions! I thought I could do it!

preface

GC is very important to Java, whether it’s the day-to-day tuning of the JVM or the relentless bombardment of interviews.

In this article, I will expand on GC in a question and answer format.

Because this article explains a lot about the basics of garbage collection, it is important to understand where garbage collection and the evolving garbage collector are coming from.

The GC implementation referred to in this article is, by default, HotSpot’s, unless otherwise specified.

I’m going to list all 18 questions, so you can think about how many you can answer.

All right, let’s start the show.

Young GC, old GC, full GC, mixed GC

The prerequisite for this problem is that you need to know GC generation and why. This has been mentioned in the previous article. If it is not clear, you can go and have a look.

Now let’s answer that question.

There are two main types of GC: Partial GC and Full GC.

Partial GC refers to Partial GC, which can be divided into young GC, old GC and Mixed GC.

Young GC: Refers to the gc that collects only the young generation.
Old GC: Refers to the GC that collects only old years.
Mixed GC: This is unique to the G1 collector and refers to the GC that collects the entire young generation and part of the old generation.

Full GC refers to the collection of the entire heap, including young generations, old generations, and permanent generations if any.

In fact, there is also the term Major GC, which in Understanding the Java Virtual Machine refers to only the old GC, which is equivalent to the old GC, but there are many sources that consider it equivalent to the full GC.

There is also the Minor GC, which refers to the GC of the young generation.

What is the young GC trigger condition?

It is generally assumed that the Young GC will be triggered when the Eden of the young generation is nearly full.

Why in general? This is because some collector collection implementations allow the following Young GC to be performed before the full GC.

The Parallel Avenge, for example, but with parameters that can be adjusted to not do the Young GC.

There may be other implementations that do this, but normally it’s just as if the Eden area is almost full.

There are two triggers for Eden to be nearly full: one is insufficient memory allocation for objects and the other is insufficient memory allocation for TLAB.

What are the full GC trigger conditions?

That’s a little bit more of a trigger. Let’s see.

Full GC is triggered when the average promotion size of the young generation is larger than that of the old generation, based on previous statistics.
Full GC is also triggered if the permanent generation is full.
If there is insufficient space in the old age, large objects apply for allocation directly in the old age. If there is insufficient space in the old age, full GC will be triggered.
Guarantee failure is promotion failure. The to zone of the new generation cannot accommodate objects copied from Eden and from, or the GC age of the new generation has reached the threshold and needs to be promoted. If the old generation cannot accommodate, full GC will be triggered.
Full GC is triggered by commands such as system.gc () and jmap-dump.

You know TLAB? It said the

This starts with memory requisition.

Generally speaking, the generation of objects needs to request memory space from the new generation of the heap, and the heap is globally shared, like the new generation of memory is orderly, is divided by a pointer.

Bump [up] the pointer Bump [up] the pointer is called bump [up] the pointer.

As you can imagine, if multiple threads are allocating objects, then the pointer will become a hot resource that needs to be mutually exclusive and allocation will be inefficient.

Therefore, TLAB (Thread Local Allocation Buffer) is implemented to allocate memory area for one Thread.

This area allows only one thread to apply for allocation objects, allowing all threads to access this area of memory.

The idea of TLAB is very simple, that is, to allocate a block of space to a thread, so that each thread only needs to apply for object memory in its own acre, and does not need to fight for hot Pointers.

When this block of memory is used up, you can apply for it.

This idea is actually very common, such as distributed transmitter, each time will not take a number, will take a number, after the use of a batch to apply for a batch.

You can see that each thread has its own allocation area, and the shorter arrow represents the allocation pointer inside the TLAB.

You can apply only if the space is used up.

However, the size of each request varies, depending on the history of the thread since it was started. For example, the TLAB will be larger if the thread has been allocating memory, and smaller if the thread has hardly been allocating memory.

And TLAB is a waste of space, so let’s look at this diagram.

It can be seen that there is only one compartment in TLAB, and the application object needs two compartments. At this time, we need to apply for another TLAB, and the previous compartment will be wasted.

In HotSpot a fill object is generated to fill this area, because the heap requires linear traversal, and the process of traversal is to find the size of the object by looking at the object header, and then skipping that size to find the next object, so no holes.

Of course, traversal can also be achieved through external recording methods such as free linked lists.

In addition, TLAB can only allocate small objects, and large objects still need to be allocated in the shared Eden area.

So in general TLAB is designed to avoid competition in object allocation.

Does PLAB know?

As you can see, similar to TLAB, PLAB stands for Promotion Local Allocation Buffers.

Used when a younger generation is promoted to an older generation.

When multithreaded YGC is executed in parallel, there may be many objects that need to be promoted to the old age, and the old age pointer becomes “hot”, hence the PLAB.

We can allocate memory by adding the pointer to a freelist, so there will be less competition for freelists and space allocation will be faster.

The idea is that each program will apply for a block of MEMORY as a PLAB, and then allocate promoted objects in this block of memory.

This is similar to the idea of TLAB.

The true cause of concurrent mode failure

Understanding The Java Virtual Machine: Because The CMS collector cannot handle FloatingGarbage, it is possible to have a “con-current Mode Failure” that results in another Full “Stop The World” GC.

This error caused a Full GC.

The Full GC is actually causing this error. Let’s look at the source code, version is openJDK-8.

First, search for this mistake.

Find out who called report_concurrent_mode_interruption.

Void CMSCollector::acquire_control_and_collect(…) That’s called in this method.

First_state: CollectorState first_state = _collectorState;

Looking at enumerations is pretty clear, just before the CMS GC ends.

The acquire_control_AND_COLLECT method is used by CMS to execute foreground GC.

CMS is divided into foreground GC and background GC.

Foreground is just Full GC.

So the CMS gc is still in progress while the FULL GC is in progress, causing this error to be thrown.

The reason is that the allocation rate is so fast that the heap is not enough to recycle and therefore full GC is generated.

It is also possible that the threshold of the heap set by the initiating CMS GC is too high.

Why is the CMS GC full single-threaded when concurrent mode failure occurs?

The following answers are from R University.

Lazy because I didn’t have enough resources. It’s that simple. There were no technical problems. Big companies have done their own internal optimization.

So why steal the lazy one in the first place? The troubled CMS GC has experienced many upheavals. It was originally designed and implemented as a low-latency GC for Sun Labs’ Exact VM.

However, Exact VM lost an internal battle with HotSpot VM for Sun’s real JVM, and CMS GC was later ported to HotSpot VM as the technical legacy of Exact VM.

While the migration was in progress, Sun began to tire a little; By the time CMS GC was fully ported to HotSpot VM, Sun was on the verge of failure.

Development resources were reduced, developers were lost, and the HotSpot VM development team at the time was not able to do much except pick the important ones. By this time, Sun Labs’ other GC implementation, garbage-First GC (G1 GC), had been available.

G1 is seen as having more potential than CMS, which may be subject to fragmentation after a long run, because it incrementally collates/compresses the data in the heap to avoid fragmentation.

As a result, some of the limited development resources at the time were invested in the project to productize the G1 GC — which also resulted in slow progress.

After all, only one or two people are doing it. As a result, there were not enough development resources to polish the details of the various supporting facilities for the CMS GC, and the parallelization of the supporting backup Full GC was delayed.

But I’m sure you’re wondering: Doesn’t HotSpot VM already have parallel GC? And how many more?

Let’s take a look:

ParNew: The parallel Young Gen GC does not collect old Gen.
The Parallel GC (ParallelScavenge), a Parallel Young Gen GC similar to, but incompatible with, ParNew. Also not responsible for collecting old Gen.
ParallelOld GC (PSCompact) : Parallel full GC, but incompatible with ParNew/CMS.

So… That’s the thing.

HotSpot VM does already have parallel GC, but the two are only responsible for collecting young Gen during the Young GC, and only ParNew can be used with CMS.

While the parallel Full GC has a ParallelOld, it is incompatible with the CMS GC and therefore cannot be used as its backup Full GC.

Why can’t some collectors from older and newer generations be used in combination, such as ParNew and Parallel Old?

This image was drawn in 2008 by a member of HostSpot’s GC team, before the G1 came out and was in development, so there was a question mark on it.

The answer was:

“ParNew” is written in a style… “Parallel Old” is not written in the “ParNew” style

The HotSpot VM’s own generational collector implements a framework, and only implementations within the framework can be used with each other.

There was one development that he didn’t want to implement according to the framework, wrote it himself, tested well and then it was absorbed by HotSpot VM, which led to incompatibility.

I saw a very vivid explanation of the answer: just like the front of the EMU can not carry the green leather carriage, the electrical, hook and so on do not match.

How does the new generation GC avoid full heap scanning?

In the common generational GC, the memory set is used to achieve this, which records the address of the object referenced by the new generation in the possible old age to avoid the full heap scan.

The figure above has an object precision and a card precision, and card precision is called a card table.

The heap is divided into blocks of 512 bytes (card pages) each, with one element in the byte array representing the block, 1 representing the dirty block, and cross-generation references in it.

The implementation in Hotspot is the card table, which is maintained through a post-write barrier, pseudocode as follows.

References from the older generation to the younger generation need to be recorded in the CMS, but the write barrier implementation does not filter any conditions.

That is, the corresponding card table is marked as dirty only when the current object is not judged to be an old object and the reference is a new generation object.

Any reference assignment will mark the object’s card as dirty, although YGC scans only old card tables.

This is to reduce the overhead of write barriers, since reference assignments are very frequent.

So what’s the difference between CMS’s memory set and G1’s?

CMS memory set implementation is the card table, namely card table.

The most commonly implemented memory set is points-out. We know that a memory set is used to record cross-generation references from non-collection regions to collection regions. Its subject is actually non-collection regions, so it is points-out.

In the CMS, only the old generation refers to the young generation card table, which is used by the young generation GC.

G1 is region-based, so it adds a points-into structure to the points-out card table.

A region needs to know which other regions have Pointers to it, and it also needs to know which cards these Pointers are in.

In fact, G1’s memory set is a hash table, the key is the starting address of another region, and the value is a set that stores the index of the card table.

If we look at this picture, it’s clear.

For example, maintaining a memory set for each assignment of a reference field is expensive, so G1 uses a logging write barrier (described below).

It is also an asynchronous idea, and the changes will be recorded in the queue first. When the queue exceeds a certain threshold, the background thread will take out the traversal to update the memory set.

Why doesn’t G1 maintain memory sets from younger generations to older generations?

G1 has young GC and Mixed GC.

The Young GC selects all regions of the young generation for collection.

The Midex GC picks up all of the young generation regions and some of the old generation regions where the collection revenue is high.

Therefore, regions of the young generation are in the collection scope, so there is no need to record cross-generation references from the young generation to the old generation.

What do CMS and G1 do to maintain correct concurrency?

The previous article analyzed two necessary and sufficient conditions for concurrent execution to miss targets:

Inserts a new object into the scanned object by inserting a reference to a black object into a white object.
Removed references from gray objects to white objects.

CMS and G1 break these two necessary and sufficient conditions by incremental updates and SATB, respectively, to maintain the correctness of concurrency between GC threads and application threads.

Incremental update is used by CMS to break the first condition. The inserted white object is marked gray through write barrier, that is, it is added to the mark stack and scanned in the remark phase to prevent mark missing.

G1 uses snapshot-at-the-beginning (SATB) to break the second condition by writing down old reference relationships through the write barrier and then scanning them again.

This is already clear from the English noun. If an object is alive at the beginning of GC, it is considered alive, which is equivalent to taking a snapshot.

Also, newly allocated objects are considered alive during the GC process. Each region maintains the top at Mark Start (TAMS) pointer. PrevTAMS and nextTAMS mark the position of the top pointer at the start of the two concurrent tokens, respectively.

The Top pointer is the position of the newly allocated object in region. Therefore, objects in the region between nextTAMS and Top are all considered alive.

In the remark phase, the CMS with incremental update needs to rescan all thread stacks and the whole young generation. Because the previous root has been added, it needs to rescan. If there are many objects in the young generation, it will be time-consuming.

It is important to note that this phase is STW, so CMS also provides a CMSScavengeBeforeRemark parameter to force YGC before remark phase.

G1 with SATB only needs to scan the SATB records for old references in the final marking phase, which is faster than CMS in this respect, but also because it has more floating garbage than CMS.

What is logging Write Barrier?

A write barrier is a performance barrier for the application. It is the logic that is performed during reference assignments. This operation is very frequent, hence the logging write barrier.

Some of the logic to be executed by the write barrier is moved to background threads to mitigate the impact on the application.

In the write barrier, you only need to record a log to a queue, and then other background threads will fetch the information from the queue to perform subsequent operations. This is the asynchronous idea.

Like a SATB write barrier, each Java thread has a separate, fixed-length SATBMarkQueue, in which only old references are pushed into the queue in the write barrier. When full, it is added to the global SATBMarkQueueSet.

Background threads scan and process if they exceed a certain threshold. Tracing begins.

Logging write barriers are also used to maintain memory sets.

A brief description of G1 recycling process

G1 has two broad phases, namely concurrent markup and object copying.

Concurrent markup is STAB based and can be divided into four stages:

1, initial marking, this stage is STW, scan the root set, mark the object directly reachable by the root. In G1, objects are marked using external bitmaps rather than object headers.

Concurrent marking, which is concurrent with application threads. Tracing starts from the root reachable object marked in the previous step, and scans all reachable objects recursively. STAB also records references to changes at this stage.

3. Final marking, this stage is STW’s, handling references in STAB.

4. Clean up phase (CLENAUP). In this phase, STW counts the number of viable objects in each region according to the marked bitmap.

Object copy phase (evacuation), which is STW.

Select the appropriate reigon to form a collection set (CSet), and then copy the CSet survivable object into the new region.

The bottleneck in G1 is the object copy phase, which requires a lot of bottleneck to move objects.

A brief description of the CMS recycling process

You can actually see several processes from the CollectorState enumeration of the previous problem.

1. Initial mark: this stage is STW. Scan the root set and mark objects directly reachable by the root.

Concurrent marking. This stage is Concurrent with the application thread. Tracing starts from the root reachable object marked in the previous step, and scans all reachable objects recursively.

Concurrent precleaning, which is Concurrent with the application thread, is intended to do some work for the re-marking stage, such as scanning the dirty areas of the card table and objects newly promoted to the old age, etc., because the re-marking is STW, so share some of the work.

4. AbortablePreclean, which is basically the same as the previous stage, in order to share the work of re-marking the mark.

5. Remark, this stage is STW, because the reference relationship will change in the concurrent stage, the Cenozoic object, Gc Roots, card table and so on should be revisited to correct the mark.

Concurrent cleaning, which are stages that run concurrently with application threads and are used to clean up garbage.

Concurrent reset, which resets the CMS internal state at the same time as the application thread.

The bottleneck of CMS is the relabeling phase, which takes a long time to rescan.

CMS write barriers maintain card tables and incremental updates?

There is only one card table, and it is not enough to support YGC and CMS concurrent incremental updates.

Each time YGC scans the reset card table, the records of incremental updates are cleaned up.

Therefore, a mod-union table is also created, and the corresponding position of the mod-union table will be updated when YGC needs to reset the card table record during concurrent marking.

In this way, the CMS relabelling stage can combine the current card table and mod-union table to handle incremental updates and prevent missing label objects.

What are the two main goals of GC tuning?

These are the minimum pause time and throughput.

Minimum pause time: Since GC STW suspends all application threads, this is statically stagnant for the user, so reducing STW time is key for delay-sensitive applications.

Throughput: For some delay-insensitive applications, such as some background computing applications, throughput is a key concern. They do not care about the time of each GC pause, but only the low total pause time and high throughput.

Here’s an example:

Scheme 1: GC pauses 100 ms per time, 5 times per second.

Scheme 2: GC pauses for 200 ms each time and pauses twice per second.

The first one has low latency and the second one has high throughput.

So when tuning, you need to be clear about the purpose of the application.

How is GC tuned

This is an easy question to ask in an interview. Get to the core of your answer.

Nowadays, generational GC is used, and the idea of tuning is to try to recycle objects in the new generation to prevent too many objects from being promoted to the old generation and reduce the allocation of large objects.

You need to balance generational size, garbage collection times, and pause times.

You need to fully monitor the GC, such as generation occupancy, YGC trigger frequency, Full GC trigger frequency, object allocation rate, and so on.

Then tune it according to the actual situation.

For example, if a Full GC is performed for no apparent reason, it is possible that a third-party library called System.gc.

Full GC frequent It may be that the memory threshold triggered by the CMS GC is too low, so that objects cannot be allocated.

There are also the age promotion threshold of the object, survivor is too small and so on. The specific situation still needs to be analyzed in detail. Anyway, the core remains unchanged.

The last

In fact, there is still some content about ZGC not analyzed, don’t worry, ZGC article has been written half, will be posted later.

Questions about GC are very common in interviews. In fact, there are only a few things that go back and forth. Remember what I said about capturing the core.

Of course, if you have actual tuning experience, that is even better, so take the opportunity to work, if something unusual happens, please actively participate, and then think a lot, this can be real experience.

Benefits: The 2020 interview questions are summarized. The interview questions are divided into 19 modules, which are: Java Basics, Containers, multithreading, reflection, object copy, Java Web, exceptions, Networking, Design patterns, Spring/Spring MVC, Spring Boot/Spring Cloud, Hibernate, MyBatis, RabbitMQ, Kafka, Zookeeper, MySQL, Redis, JVM. Concern public number: programmers with stories, reply [ebook] to get the above information.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

When I’ve answered 18 JVM questions! I thought I could do it!

preface

Young GC, old GC, full GC, mixed GC

What is the young GC trigger condition?

What are the full GC trigger conditions?

You know TLAB? It said the

Does PLAB know?

The true cause of concurrent mode failure

Why is the CMS GC full single-threaded when concurrent mode failure occurs?

Why can’t some collectors from older and newer generations be used in combination, such as ParNew and Parallel Old?

How does the new generation GC avoid full heap scanning?

So what’s the difference between CMS’s memory set and G1’s?

Why doesn’t G1 maintain memory sets from younger generations to older generations?

What do CMS and G1 do to maintain correct concurrency?

What is logging Write Barrier?

A brief description of G1 recycling process

A brief description of the CMS recycling process

CMS write barriers maintain card tables and incremental updates?

What are the two main goals of GC tuning?

How is GC tuned

The last

When I’ve answered 18 JVM questions! I thought I could do it!

preface

Young GC, old GC, full GC, mixed GC

What is the young GC trigger condition?

What are the full GC trigger conditions?

You know TLAB? It said the

Does PLAB know?

The true cause of concurrent mode failure

Why is the CMS GC full single-threaded when concurrent mode failure occurs?

Why can’t some collectors from older and newer generations be used in combination, such as ParNew and Parallel Old?

How does the new generation GC avoid full heap scanning?

So what’s the difference between CMS’s memory set and G1’s?

Why doesn’t G1 maintain memory sets from younger generations to older generations?

What do CMS and G1 do to maintain correct concurrency?

What is logging Write Barrier?

A brief description of G1 recycling process

A brief description of the CMS recycling process

CMS write barriers maintain card tables and incremental updates?

What are the two main goals of GC tuning?

How is GC tuned

The last

Related Posts

Introduction to JAVA to the Great God (details of JAVA “collection” foundation)

RabbitMQ publish subscribe combat – implement delayed retry queue

How does GaussDB(for MySQL) quickly create an index? Huawei cloud database senior architect reveals secrets for you