Today, another dry day. This is the first in a series of the most hardcore JVMS on the web, starting with TLAB. Since the article is very long and everyone has different reading habits, it is hereby divided into single and multiple editions

TLAB Analysis of the most hardcore JVM in the web (single version does not include additional ingredients)

TLAB analysis of the most core JVM in the whole network 1. Introduction of memory allocation idea

TLAB analysis of the most core JVM in the entire network 2. TLAB life cycle and problem thinking

3. JVM EMA expectation algorithm and TLAB related JVM startup parameters

4. Complete analysis of TLAB basic process

Network most core JVM TLAB analysis 5 TLAB source code full analysis

6. TLAB related hot Q&A summary

7. Tlab-related JVM log parsing

8. Monitor TLAB through JFR

10. Q&A of TLAB process frequently Asked questions

I’ll keep you updated here to address any questions you may have

10.1. Why does TLAB need to fill dummy Object when returned to the heap

Mainly to ensure the GC scanning efficiency. Since TLAB only knows what is allocated within the thread and returns to the Eden zone when GC scanning occurs, if it is not filled, the outside does not know which part is used and which part is not, so additional checks need to be made. If the object that is confirmed to be reclaimed is filled, that is, dummy Object. GC will directly mark and skip this memory to increase scanning efficiency. Anyway, this memory already belongs to TLAB, other threads will not be able to use it until the end of the next scan. Our dummy Object is an int array. In order to ensure that there is space to fill dummy Object, the TLAB size is usually reserved for a dummy Object header, which is also an int[] header, so the TLAB size must not exceed the maximum size of the int array. Otherwise, you cannot fill the unused space with dummy Objects.

10.2. Why does TLAB need a maximum waste space limit

When a TLAB is reallocated, there may be room left in the original TLAB. Dummy Object needs to be filled before the original TLAB is thrown back into the heap. This causes the memory to be unable to allocate the object, as shown, which is called “waste”. If there is no restriction, when the remaining TLAB space is insufficient, it will reapply for it, resulting in reduced allocation efficiency. Most of the space is occupied by dummy Object, resulting in more frequent GC.

10.3. Why the TLAB refill count configuration is equal to 100 / (2 * TLABWasteTargetPercent)

TLABWasteTargetPercent describes the ratio of the maximum initial wasted space configuration to TLAB

First of all, the ideal situation is to try to allocate all objects within TLAB, that is, TLAB may fill up Eden. Before the next GC scan, the memory returned to Eden will not be available to other threads because the remaining space has been filled with dummy Object. Therefore, the memory size used by all threads is the number of expected threads allocated to the object in the next EPCOH * the number of times of each thread refill in each epoch. Generally, objects are allocated by a thread in Eden, so the memory size used by all threads is best to be the whole Eden. But this situation is so ideal that there will always be memory filled with dummy Object that it will be wasted because a GC scan can happen at any time. Assuming, on average, that half of the current TLAB for each thread is wasted during GC scanning, the percentage of wasted memory per thread (i.e. TLABWasteTargetPercent) is equal to (note that only the latest TLAB has wasted memory, The previous assumption is that there is no waste at all.

1/2 * (the expected number of spales per thread in each epoch) * 100

So the number of times per thread refill in each epoch is equal to 50 / TLABWasteTargetPercent, which is 50 times by default.

10.4. Why ZeroTLAB

After the TLAB is allocated, the ZeroTLAB configuration determines whether to assign each byte to 0. When applying for TLAB, since the request for TLAB occurs at the time of object allocation, the memory will be used immediately and the assignment will be modified. The operation of memory involves THE CPU cache row. If it is a multi-core environment, false sharing will also be involved in the CPU cache row. In order to optimize, the JVM has done Allocation Prefetch here. We will try to load this memory into the CPU cache, which means that it is most efficient to modify memory when allocating TLAB memory.

When you create an object, you would assign an initial value to each field. Most fields are initialized to 0, and when TLAB returns to the heap, the remaining space is also filled with an int[] array of 0’s.

Therefore, when TLAB is first assigned, assigning 0 avoids assigning 0 later. It is also possible to use the Allocation Prefetch mechanism to accommodate CPU cache rows (more on the Allocation Prefetch mechanism in another series).

10.5. Why the JVM needs to warm up, and why Java code executes faster (just for TLAB, JIT, MetaSpace, GC, etc.)

According to the previous analysis, the TLAB size of each thread will constantly change and tend to be stable according to the characteristics of thread allocation. The size is mainly determined by the allocation ratio EMA, but this collection needs a certain number of runs. In addition, the default of EMA for the first 100 collections is not stable enough, so the TLAB size also changes frequently at the beginning of the program. As the program threads stabilize and run for some time, the per-thread TLAB size also stabilizes and adjusts to the size that best fits the allocation characteristics of the thread object. In this way, it is closer to the ideal that only the Eden region is GC when it is full, and all objects in the Eden region are efficiently allocated through TLAB. This is why Java code executes faster and faster in terms of TLAB.

6. TLAB related hot Q&A summary

10. Q&A of TLAB process frequently Asked questions

10.1. Why does TLAB need to fill dummy Object when returned to the heap

10.2. Why does TLAB need a maximum waste space limit

10.3. Why the TLAB refill count configuration is equal to 100 / (2 * TLABWasteTargetPercent)

10.4. Why ZeroTLAB

10.5. Why the JVM needs to warm up, and why Java code executes faster (just for TLAB, JIT, MetaSpace, GC, etc.)

Related Posts

Python example to download all king of Glory hero posters

FastThreadLocal is 3 times faster than ThreadLocal.

Chaos Mesh® in Tencent — Tencent Interactive Chaos Engineering practice