Today, another dry day. This is the first in a series of the most hardcore JVMS on the web, starting with TLAB. Since the article is very long and everyone has different reading habits, it is hereby divided into single and multiple editions

  • TLAB Analysis of the most hardcore JVM in the web (single version does not include additional ingredients)
  • TLAB analysis of the most core JVM in the whole network 1. Introduction of memory allocation idea
  • TLAB analysis of the most core JVM in the entire network 2. TLAB life cycle and problem thinking
  • 3. JVM EMA expectation algorithm and TLAB related JVM startup parameters
  • 4. Complete analysis of TLAB basic process
  • Network most core JVM TLAB analysis 5 TLAB source code full analysis
  • 6. TLAB related hot Q&A summary
  • 7. Tlab-related JVM log parsing
  • 8. Monitor TLAB through JFR

9. Source code analysis of OpenJDK HotSpot TLAB

If you’re having trouble reading this, you can go straight to Chapter 10, Popular Q&A, which has a lot of popular questions

9.1. TLAB class composition

During thread initialization, if TLAB is enabled on the JVM (it is enabled by default and can be disabled by -xx: -usetlab), then TLAB is initialized.

TLAB includes the following several field (HeapWord * can be understood as in the heap memory address) : SRC/hotspot/share/gc/Shared/threadLocalAllocBuffer CPP

Static size_t _max_size; Static int _reserve_for_allocation_prefetch; Static unsigned _target_refills = static unsigned _target_refills; // The number of refills expected for each GC cycle // The main components of TLAB are field HeapWord* _start; HeapWord* HeapWord* _top; // Last allocated memory address HeapWord* _end; // TLAB end address size_t _desired_size; // The size of TLAB includes the reserved space, which means that the size of memory needs to be size_t. That is, the actual number of bytes divided by the value of HeapWordSize. // TLAB maximum wasted space, the remaining space is insufficient allocation of wasted space limits. When the remaining TLAB space is insufficient, the allocation strategy is determined according to this value. If the wasted space is greater than this value, it is directly allocated to Eden; if less than this value, the current TLAB is put back to Eden for management and a new TLAB is applied from Eden for allocation. AdaptiveWeightedAverage _allocation_fraction; Field HeapWord* _allocation_end; field HeapWord* _allocation_end; HeapWord* _pf_top; // TLAB can actually allocate the end address of memory. This is the _end address which excludes the reserved space (for dummy object header space). // Allocate Prefetch to the CPU cache optimization mechanism. Size_t _allocated_before_LAST_GC is not required here. // This is used to calculate the size of the thread allocated in the current GC in Figure 10. The size of the thread allocated in the last GC was unsigned _number_of_refills; Unsigned _fast_refill_waste; // Unsigned _fast_refill_waste; Unsigned _slow_refill_waste; unsigned _slow_refill_waste; unsigned _slow_refill_waste; // Thread allocation memory data collection related, TLAB slow allocation waste, slow allocation is to fill a TLAB allocation unsigned _gc_waste; // Thread allocations are associated with memory data collection, GC wastes unsigned _slow_allocations; // Thread allocation memory data collection related, TLAB slow allocation count size_t _allocated_size; // Allocate memory size size_t _bytes_since_last_sample_point; // JVM TI collection metrics related field, not concerned hereCopy the code

9.2. TLAB initialization

Global TLAB is JVM startup time, first need to initialize: SRC/hotspot/share/gc/Shared/threadLocalAllocBuffer CPP

Void ThreadLocalAllocBuffer: : startup_initialization () {/ / initialization, that is, zero statistics ThreadLocalAllocStats: : initialize (); // If, on average, half of each thread's current TLAB is wasted during GC scanning, the percentage of memory wasted per thread (TLABWasteTargetPercent) is equal to (note that only the latest TLAB is wasted). The previous assumption is that there is no waste at all. 1/2 * (the expected number of spales per thread in each epoch) * 100 // The number of spales per thread in each epoch is equal to 50 / TLABWasteTargetPercent, which is 50 times by default. _target_refills = 100 / (2 * TLABWasteTargetPercent); // However, the initial _target_refills need to be set no more than twice to reduce GC possibility during VM initialization. _target_refills = MAX2(_target_refills, 2U); // If C2 JIT compilation exists and is enabled, then the Allocation Prefetch space is reserved for the CPU cache optimization. #ifdef COMPILER2 if (is_server_compilation_mode_vm()) {int lines = MAX2(AllocatePrefetchLines, AllocateInstancePrefetchLines) + 2; _reserve_for_allocation_prefetch = (AllocatePrefetchDistance + AllocatePrefetchStepSize * lines) / (int)HeapWordSize; } #endif // initialize TLAB guarantee(Thread::current()->is_Java_thread(), "tlab initialization thread not Java thread"); Thread::current()->tlab().initialize(); log_develop_trace(gc, tlab)("TLAB min: " SIZE_FORMAT " initial: " SIZE_FORMAT " max: " SIZE_FORMAT, min_size(), Thread::current()->tlab().initial_desired_size(), max_size()); }Copy the code

Each thread maintains its own TLAB, and the TLAB size varies for each thread. The size of TLAB is mainly determined by Eden’s size, the number of threads, and the object allocation rate of threads. In Java thread starts running, will first allotment TLAB: SRC/hotspot/share/runtime/thread. The CPP

void JavaThread::run() { // initialize thread-local alloc buffer related fields this->initialize_tlab(); // Ignore the rest of the code}Copy the code

Assigning TLAB is a call to the Initialize method of ThreadLocalAllocBuffer. src/hotspot/share/runtime/thread.hpp

Void initialize_tlab() {// Initialize TLAB if TLAB is not disabled by -xx: -usetlab if (UseTLAB) {TLAB ().initialize(); } } // Thread-Local Allocation Buffer (TLAB) support ThreadLocalAllocBuffer& tlab() { return _tlab; } ThreadLocalAllocBuffer _tlab;Copy the code

The Initialize method of ThreadLocalAllocBuffer initializes the various fields mentioned above in TLAB that we care about: src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp

Void ThreadLocalAllocBuffer: : initialize () {/ / set the initial pointer, because have not allocate memory from the Eden, Initialize (NULL, // start NULL, // top NULL); // end // Calculate the initial expected size and set set_desired_size(initial_desired_size()); // The total size of all tlabs, different GC implementations have different TLAB capacities, usually the Eden region size // for example G1 GC, (_policy->young_list_target_length() -_survivor. Length ()) * HeapRegion::GrainBytes Size_t capacity = Universe::heap()->tlab_capacity(thread())/HeapWordSize; Float alloc_FRAc = desired_size() * float alloc_frac = desired_size() * float alloc_frac = desired_size() * target_refills() / (float) capacity; EMA _allocation_fraction. Sample (alloc_frac); // Calculate the maximum space wasted at the initial refill and set the value. / TLABRefillWasteFraction set_refill_waste_LIMIT (initial_refill_waste_LIMIT ()); // Reset_statistics (); }Copy the code

9.2.1. How is the initial expected size calculated?

src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp

/ / calculate the initial size size_t ThreadLocalAllocBuffer: : initial_desired_size () {size_t init_sz = 0; // If the TLAB size is set by -xx :TLABSize, this is the initial expected size. // If the TLAB size is set by -xx :TLABSize, this is the initial expected size. TLABSize/HeapWordSize if (TLABSize > 0) {init_sz = TLABSize/HeapWordSize; } else {// Get the expected number of threads in the current epoch, This as previously described by EMA predict unsigned int nof_threads = ThreadLocalAllocStats: : allocating_threads_avg (); // Different GC implementations have different TLAB capacity, Universe::heap()->tlab_capacity(thread()) (_policy->young_list_target_length() -_survivor. Length ()) * HeapRegion::GrainBytes = = = = = = = = = = = = = = = = = = = = = = = = = = Init_sz = (Universe::heap()->tlab_capacity(thread())/HeapWordSize)/(noF_Threads * target_refills()); Init_sz = align_object_size(init_sz); Init_sz = MIN2(MAX2(init_sz, min_size())); max_size()); return init_sz; } // The minimum size is determined by MinTLABSize, which needs to be represented as HeapWordSize, and should consider object alignment, The final "alignment_Reserve" is the size of the dummy Object's filled header. (The JVM's CPU cache prematch is not considered here, but will be examined in more detail in another section.) static size_t min_size() { return align_object_size(MinTLABSize / HeapWordSize) + alignment_reserve(); }Copy the code

9.2.2. How is the maximum TLAB size determined?

Different GC methods, there are different ways:

G1 GC for large objects (humongous object) size, G1 is half the size of the region: SRC/hotspot/share/GC/G1 / g1CollectedHeap CPP

// For G1 TLABs should not contain humongous objects, so the maximum TLAB size // must be equal to the humongous object limit. size_t G1CollectedHeap::max_tlab_size() const {  return align_down(_humongous_object_threshold_in_words, MinObjAlignment); }Copy the code

The ZGC is 1/8 of the page size, similarly the Shenandoah GC is 1/8 of each Region size in most cases. They are expected to return at least 7/8 of the area is not reduced when choosing Cset scans of complexity: SRC/hotspot/share/gc/shenandoah/shenandoahHeap CPP

MaxTLABSizeWords = MIN2(ShenandoahElasticTLAB ? RegionSizeWords : (RegionSizeWords / 8), HumongousThresholdWords);
Copy the code

src/hotspot/share/gc/z/zHeap.cpp

const size_t      ZObjectSizeLimitSmall         = ZPageSizeSmall / 8;
Copy the code

For other GCS, this is the maximum size of the int array, which is related to filling the empty space of dummy Object representing TLAB. This reason has been explained before.

9.3. TLAB allocates memory

When new an object, you need to call instanceOop InstanceKlass: : allocate_instance (TRAPS) SRC/hotspot/share/oops/InstanceKlass CPP

instanceOop InstanceKlass::allocate_instance(TRAPS) { bool has_finalizer_flag = has_finalizer(); // Query before possible GC int size = size_helper(); // Query before forming handle. instanceOop i; i = (instanceOop)Universe::heap()->obj_allocate(this, size, CHECK_NULL); if (has_finalizer_flag && ! RegisterFinalizersAtInit) { i = register_finalizer(i, CHECK_NULL); } return i; }Copy the code

Its core is the heap () – > obj_allocate (this, the size, CHECK_NULL) from the pile top allocate memory: SRC/hotspot/share/gc/Shared/collectedHeap inline. HPP

inline oop CollectedHeap::obj_allocate(Klass* klass, int size, TRAPS) {
  ObjAllocator allocator(klass, size, THREAD);
  return allocator.allocate();
}
Copy the code

Use global ObjAllocator implement object memory allocation: SRC/hotspot/share/gc/Shared/memAllocator CPP

oop MemAllocator::allocate() const { oop obj = NULL; { Allocation allocation(*this, &obj); HeapWord* mem = mem_allocate(allocation); if (mem ! = NULL) { obj = initialize(mem); } else { // The unhandled oop detector will poison local variable obj, // so reset it to NULL if mem is NULL. obj = NULL; } } return obj; } HeapWord* MemAllocator::mem_allocate(allocate & allocate) const {// allocate from TLAB, allocate from TLAB, allocate from TLAB, allocate from TLAB If (UseTLAB) {HeapWord* result = allocate_inside_tlab(allocation); if (result ! = NULL) { return result; }} return allocate_outside_tlab(allocation); } HeapWord* MemAllocator::allocate_inside_tlab(Allocation& allocation) const { assert(UseTLAB, "should use UseTLAB"); HeapWord* mem = _thread-> TLAB ().allocate(_word_size); // Return if (mem! = NULL) { return mem; } return allocate_inside_tlab_slow(allocation);} return allocate_inside_tlab_slow(allocation);} return allocate_inside_tlab_slow(allocation); }Copy the code

9.3.1. TLAB Fast allocation

src/hotspot/share/gc/shared/threadLocalAllocBuffer.inline.hpp

The inline HeapWord * ThreadLocalAllocBuffer: : the allocate (size_t size) {/ / verify each memory pointer is valid, That is, _top invariants() within the limits of _start and _end; HeapWord* obj = top(); If (pointer_delta(end(), obj) >= size) {set_top(obj + size); invariants(); return obj; } return NULL; }Copy the code

9.3.2. TLAB slow allocation

src/hotspot/share/gc/shared/memAllocator.cpp

HeapWord* MemAllocator::allocate_inside_tlab_slow(Allocation& allocation) const { HeapWord* mem = NULL; ThreadLocalAllocBuffer& tlab = _thread->tlab(); // If the remaining TLAB space is greater than the maximum wasted space, If (tlab.free() > tlab.refill_waste_limit()) {tlab.record_slow_allocation(_word_size); return NULL; } // Recalculate the TLAB size; size_t new_tlab_size = tlab.compute_size(_word_size); Tlab. retire_before_allocation(); //TLAB puts the Eden zone back. if (new_tlab_size == 0) { return NULL; } / / computing minimum size size_t min_tlab_size = ThreadLocalAllocBuffer: : compute_min_size (_word_size); // Allocate new TLAB space, Mem = Universe::heap()->allocate_new_tlab(min_tlab_size, new_tlab_size, & allocATED_allocated_tlab_size); if (mem == NULL) { assert(allocation._allocated_tlab_size == 0, "Allocation failed, but actual size was updated. min: " SIZE_FORMAT ", desired: " SIZE_FORMAT ", actual: " SIZE_FORMAT, min_tlab_size, new_tlab_size, allocation._allocated_tlab_size); return NULL; } assert(allocation._allocated_tlab_size ! = 0, "Allocation succeeded but actual size not updated. mem at: " PTR_FORMAT " min: " SIZE_FORMAT ", desired: " SIZE_FORMAT, p2i(mem), min_tlab_size, new_tlab_size); // If the JVM argument ZeroTLAB is enabled, set all fields of the object to zero if (ZeroTLAB) {//.. and clear it. Copy::zero_to_words(mem, allocation._allocated_tlab_size); } else { // ... } and allocated zap just allocated object.} // Allocated TLAB. Fill (mem, mem + _word_size, allocation._allocated_tlab_size); // Return the allocated object memory address return mem; }Copy the code

9.3.2.1 Maximum wasted space of TLAB

The initial value is TLAB size divided by TLABRefillWasteFraction: src/hotspot/share/gc/shared/threadLocalAllocBuffer.hpp

size_t initial_refill_waste_limit()            { return desired_size() / TLABRefillWasteFraction; }
Copy the code

For each slow allocation, call record_slow_allocation(size_t obj_size) to record the slow allocation and increase the size of TLAB’s maximum wasted space:

src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp

Void ThreadLocalAllocBuffer: : record_slow_allocation (size_t obj_size) {/ / slow distribution at a time, Refill_waste_limit add refill_waste_limit_increment, TLABWasteIncrement set_refill_waste_LIMIT (refill_waste_LIMIT () + refill_limit_increment ()); _slow_allocations++; log_develop_trace(gc, tlab)("TLAB: %s thread: " INTPTR_FORMAT " [id: %2d]" " obj: " SIZE_FORMAT " free: " SIZE_FORMAT " waste: " SIZE_FORMAT, "slow", p2i(thread()), thread()->osthread()->thread_id(), obj_size, free(), refill_waste_limit()); } //refill_waste_limit_increment is the JVM argument TLABWasteIncrement static size_t refill_waste_limit_increment() {return TLABWasteIncrement; }Copy the code

9.3.2.2. Recalculate TLAB size

Recalculation will take the smaller of the current heap space available to TLAB and the expected size of TLAB + the current space to allocate:

src/hotspot/share/gc/shared/threadLocalAllocBuffer.inline.hpp

The inline size_t ThreadLocalAllocBuffer: : compute_size (size_t obj_size) {/ / get the current heap to TLAB can allocate space remaining const size_t available_size  = Universe::heap()->unsafe_max_tlab_alloc(thread()) / HeapWordSize; Size_t new_tlab_size = MIN3(available_size, desired_size() + align_object_size(obj_size), max_size()); If (new_tlab_size < compute_min_size(obj_size)) {log_trace(gc, tlab)("ThreadLocalAllocBuffer::compute_size(" SIZE_FORMAT ") returns failure", obj_size); return 0; } log_trace(gc, tlab)("ThreadLocalAllocBuffer::compute_size(" SIZE_FORMAT ") returns " SIZE_FORMAT, obj_size, new_tlab_size); return new_tlab_size; }Copy the code

9.3.2.3. Put the current TLAB back into the heap

src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp

// Slow allocation is called in TLAB, Current TLAB back on the heap of void ThreadLocalAllocBuffer: : retire_before_allocation () {/ / add the current TLAB the remaining space size size _slow_refill_waste + = slow distribution waste of space  (unsigned int)remaining(); // Perform TLAB return to the heap, which is called later in GC to return all threads to heap retire(); } // For TLAB slow allocation, stats is null // for GC calls, Stats to record each thread data void ThreadLocalAllocBuffer: : retire (ThreadLocalAllocStats * stats) {if (stats! = NULL) { accumulate_and_reset_statistics(stats); } // If TLAB is currently valid if (end()! = NULL) { invariants(); Thread ()-> incr_allocATED_bytes (used_bytes()); // Fill dummy Object insert_filler(); // Clear the current TLAB pointer initialize(NULL, NULL, NULL); }}Copy the code

9.4. Gc-related TLAB operations

9.4.1. Prior to GC

Different GCS may be implemented differently, but the timing of TLAB operation is basically the same. Take G1 GC as an example here, before the real GC:

src/hotspot/share/gc/g1/g1CollectedHeap.cpp

Void G1CollectedHeap::gc_prologue(bool full) {// Fill TLAB's and such {Ticks start = Ticks::now(); Ensure_parsability (true); ensure_parsability(true); Tickspan dt = Ticks::now() - start; phase_times()->record_prepare_tlab_time_ms(dt.seconds() * MILLIUNITS); } // Omit other code}Copy the code

Why make sure the heap memory is parsable? This allows for faster scanning of objects on the heap. Make sure that memory parses what’s going on in there? The main thing is to return TLAB for each thread and fill dummy Object.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp

Void CollectedHeap::ensure_parsability(bool retire_tlabs) {// True GC must occur at a safe point, The security section will detail behind the assert (SafepointSynchronize: : is_at_safepoint () | |! is_init_completed(), "Should only be called at a safepoint or at start-up"); ThreadLocalAllocStats stats; for (JavaThreadIteratorWithHandle jtiwh; JavaThread *thread = jtiwh.next();) { BarrierSet::barrier_set()->make_parsable(thread); // If TLAB is enabled globally, then TLAB if (retire_tlabs) {// If TLAB is enabled globally, then TLAB if (retire_tlabs) {// Reclaim TLAB, call 9.3.2.3. TLAB ().retire(&stats); Thread -> TLAB ().make_parsable(); thread-> TLAB ().make_parsable(); } } } stats.publish(); }Copy the code

9.4.2. After GC

Different GC implementations may be different, but the timing of TLAB operation is basically the same. Here, take G1 GC as an example, after GC:

SRC/hotspot/share/gc/g1 / g1CollectedHeap CPP _desired_size become what time? How?

Void G1CollectedHeap::gc_epilogue(bool full) {// Omit the rest of the code resize_all_tlabs(); }Copy the code

src/hotspot/share/gc/shared/collectedHeap.cpp

Void CollectedHeap: : resize_all_tlabs () {/ / need to be in safety, GC will be safer assert (SafepointSynchronize: : is_at_safepoint () | |! is_init_completed(), "Should only resize tlabs at safepoint"); / / if UseTLAB and ResizeTLAB are open (the default is open) if (UseTLAB && ResizeTLAB) {for (JavaThreadIteratorWithHandle jtiwh; JavaThread *thread = jtiwh.next(); ) {// re-compute the expected TLAB size for each thread thread-> TLAB ().resize(); }}}Copy the code

To calculate each thread TLAB expected size: SRC/hotspot/share/gc/Shared/threadLocalAllocBuffer CPP

void ThreadLocalAllocBuffer::resize() { assert(ResizeTLAB, "Should not call this otherwise"); // Multiply the average value of the EMA by the Eden fraction, Size_t alloc = (size_t)(_allocation_fraction () * (Universe::heap()->tlab_capacity(thread()) /  HeapWordSize)); Size_t new_size = alloc / _target_refills; New_size = clamp(new_size, min_size(), max_size()); size_t aligned_new_size = align_object_size(new_size); Log_trace (GC, tlab)(" tlab new size: thread: "refills %d alloc: %8.6f desired_size: " SIZE_FORMAT " -> " SIZE_FORMAT, p2i(thread()), thread()->osthread()->thread_id(), _target_refills, _allocation_fraction.average(), desired_size(), aligned_new_size); // Set the new TLAB size set_desired_size(aligned_new_size); Set_refill_waste_limit (initial_refill_waste_limit()); }Copy the code