Parse the Java runtime data area in depth

preface

As a Java program runs, the JVM divides the memory it manages into several areas, collectively known as run-time data areas. Some shared areas between threads are created when the JVM starts and destroyed when the JVM exits. Other thread-private areas are created at the start of a thread and destroyed at the end. As shown, the runtime data area consists of the following areas: program counter, Java virtual machine stack, local method stack, method area, and heap.

Program counter

According to the JVM’s running model, the JVM loads the compiled bytecode into memory before the program runs. When the program runs, the bytecode parser reads the bytecode in memory and interprets the bytecode instructions into fixed operations in sequence. During this process, the Program Counter Register holds the address of the bytecode being executed by the current thread.

From the principle of bytecode operation, the program counter under the single-thread model seems to be dispensable, and the bytecode parser will translate the bytecode into fixed operations in order, and the program will run correctly even if there is a branch jump. However, in the real world, programs often complete a task through multi-threaded cooperation. The CPU allocates a certain amount of time to each thread, and a thread is suspended when its time slice is exhausted and will not run again until it gets another time slice. To ensure that the program runs correctly, the thread must be restarted from where it was suspended. With a program counter, you can guarantee that your program will continue to run correctly even when a thread context switch is involved.

Therefore, program counters are thread private, avoiding interaction between threads. The JVM allocates a very small memory space for each thread to use as a program counter, and this is the only runtime area where the Java Virtual Machine specification does not specify OutofMemoryErrors.

If a thread executes a native method, its program counter is undefined. This is because the JVM performs native methods through JNI calls to other native languages, not bytecode.

Java virtual machine stack

The JVM assigns each thread a private memory space called the Java VM Stack. A Java virtual machine Stack is created as a thread is created, and acts like a Stack in a traditional language (such as C), with only two operations performed by the JVM: Stack Frame loading and Stack unloading. That is, the Java virtual machine stack is a LIFO queue that stores stack frames.

The execution of each method is accompanied by the creation, loading and unloading of stack frames. Stack frames are data structures used to store partial data and partial process results, Local Variable Table, Operand Stack, Runtime Constant Pool Reference to the class of the current method, Local Variable Table, Operand Stack, and Runtime Constant Pool Reference to the class of the current method. The model of the Java virtual machine stack is shown.

Local variable scale (LVT)

A local variable table (LVT) is a byte array with an index starting at 0 that stores all the input arguments and local variables of a method. LVT stores types that are known at compile time, Includes basic types (byte, char, short, int, Long, float, double, Boolean), object references (reference types), and returnAddress types (addresses pointing to a bytecode instruction).

LVT has the following characteristics:

The zerothSlot(slot) fixed to store the object to which the method belongsthisPointer.
In addition tolonganddoubleExcept for two consecutive slots, all other types of slots occupy only one Slot.
LVT is stored in the order in which variables are declared.

Consider the following code to verify these features of LVT:

First compile it into a class file with the javac command:

Javac -g jvmStackLvt. Java # javac compile. Java file, output. Class fileCopy the code

Using the javap command to parse the class file, you can see the LVT of the showLvt function.

Can be seen from the result at the output of javap, LVT zeroth Slot this name, signature of Lcom/yrunz/JDK/chapter1 / JvmStackLvt said is refers to the this pointer JvmStackLvt type, verify the characteristics of 1; The index difference between the Slot where variables L and D reside and the Slot where variables adjacent to them is 2, indicating that L and D occupy two slots, while other variables occupy only one Slot, verifying feature 2. The storage order of variables in LVT is the same as their declaration order, which verifies feature 3.

Operand stack (OS)

The operand stack (OS) is a last-in-first-out (LIFO) queue used to store intermediate results, method inputs, and return results during a method operation. The JVM provides instructions for OS loading and unloading, such as load instruction belongs to the loading instruction and store instruction belongs to the unloading instruction.

Consider the following code:

After compiling it into a class file using the javac command, parse it using javap to get:

Since javap output does not involve the contents of the operand stack, we can infer the loading and unloading process of OS from the instruction code and LVT, as shown in the figure.

The process of loading and unloading operand stacks

Runtime Constant Pool Reference

Each stack frame contains a run-time constant pool Reference to the class to which the current method belongs, also known as a Symbolic Reference, which is used to dynamically link code during class loading. What dynamic linking does is convert an actual reference to a method or variable based on the name represented by the symbolic reference to achieve a Late Binding at run time.

Consider the following code:

After compiling it into a class file using the javac command, parse it using javap to get:

It can be seen that in f2() ‘s instruction code, symbolic references are used in two places. Getfield command reference symbols to # 2, and finally parsed into com/yrunz/JDK/chapter1 JvmStackCpr. A: I; Invokevirtual instruction reference symbols for # 3, eventually parsed into com/yrunz/JDK/chapter1 / JvmStackCpr: (f1) V.

The Java Virtual Machine specification states that the Java virtual machine stack can be implemented to a fixed size or to dynamically expand and contract. The JVM usually provides parameters to set the size or range of the Java virtual machine stack, such as the -xSS parameter to set the stack size. A StackOverflowError is raised when thread requests allocate more stack space than the maximum size specified by the JVM. In the case of dynamically expandable stack space, an OutOfMemoryError is thrown when sufficient memory cannot be allocated in an attempt to expand.

Local method stack

The Native Method Stack is similar to the Java virtual machine Stack except that the latter serves Java methods while the Native Method Stack serves Native methods. The Java Virtual Machine specification does not mandate the native method mechanism or its implementation language. If the JVM does not provide native methods, there is no need to implement the native method stack.

The local method stack can be implemented either to a fixed size or to dynamically expand and shrink, so StackOverflowError and OutOfMemoryError exceptions can also be thrown in certain scenarios.

Methods area

Method areas are areas shared between threads that are created at JVM startup to store meta information for classes, static variables, constants, bytecode for common methods, and so on. The method area can be implemented as a fixed size or dynamically expanded and contracted, and an OutOfMemoryError is thrown if the memory space does not meet the memory allocation requirements.

For the HotSpot virtual machine, prior to JDK 1.8, the method area was implemented as “Permanent Generation”, which was a logical part of the heap and provided two parameters to adjust its size, -xx :PermSize to set the initial capacity, -xx :MaxPermSize Sets the maximum capacity. After JDK 1.8, HotSpot no longer has the concept of a “permanent generation”. The Metaspace data of the class is migrated to a new area called Metaspace, while static variables, constants, and so on are stored in the heap. Instead of using heap memory, meta space is allocated in local memory, and by default its capacity is limited only by the available local memory size. Similarly, the HotShot virtual machine provides two parameters to adjust its size, -xx :MetaspaceSize for the initial capacity and -xx :MaxMetaspaceSize for the maximum capacity.

Runtime Constant Pool

The Runtime Constant Pool is the part of the method area where the Constant Pool information (including symbolic references and compile-time literal constants) is stored after the class file is loaded into memory. This information can be viewed through a JavAP parsing class file. Consider the following code:

After compiling it into a class file using the javac command, parse it using javap to get:

As a result, the class constant pool information contains all symbolic references and compile-time literal constants for the MethodAreaRtcp class.

What literals are literal constants?

String literals. When compiling Java code, the compiler adds string literals to the constant pool, such as “hello” and “world” in the example at positions #67 and #83, respectively.

2. The literal value of a member variable of the underlying type modified with final. Note that it must be final and must be a literal of a class member variable to be added to the constant pool by the compiler, such as the value 2 of i2 at position #32 in the constant pool. Although i5 is a member variable of the underlying type with final, its value 5 is the result of a run-time function call and therefore does not appear in the constant pool; I7, although final, is itself a local variable, so its value 7 does not appear in the constant pool.

In addition, short, byte, and char literal constants are converted to int by the compiler and stored in the constant pool. For example, s1, B1, and C1 literal constants correspond to positions #41, #44, and #47 in the constant pool, respectively, and are all of type Integer.

3. The result of adding literal constants. As the compiler optimizes, the result of adding literal constants is added to the constant pool. For example, i6 and STR3 correspond to positions #38 and #85 in the constant pool.

New constants are added to the runtime constant pool during program execution.

During Java program execution, new constants are added to the runtime constant pool that are not part of the Class constant pool, so we cannot find these constants in the JavAP results.

1. The return value of the string.Intern () method is added to the runtime constant pool. String.intern() returns the String if an equal String already exists in the constant pool (true when comparing equals); Otherwise, add the string to the constant pool and return it. The following example verifies this feature:

Base type wrapper classes also use constant pool technology. Some of Java’s basic wrapper classes (Character, Byte, Short, Integer, Long, Boolean) also use constant pooling technology, and are stored in the runtime constant pool when they are assigned values using numeric literals. In addition, the result of adding wrapped classes in the runtime constant pool is also stored in the constant pool. It is important to note that the constant pool technique is only used for wrapped classes in the range [-127, 127], beyond which they are stored in the heap. The following examples demonstrate these features:

The heap

The Heap is the largest area of the run-time data area, where most objects (including class instances and arrays) are stored. The heap is shared by all threads and is created with JVM startup. The objects we create with new are allocated here and are managed and destroyed by the Garbage Collector (GC) without actively freeing the object’s memory — this is one of the biggest differences between Java and C++. OutOfMemoryError is thrown when there is not enough memory in the heap to create objects.

Generational management of the heap

The JVM manages the heap by Generation and divides it into Young Generation and Old Generation, among which the new Generation is divided into Eden Space, From Survivor Space, and To Survivor Space. As shown in the figure.

The main reason the JVM divides the heap into so many regions is to make it easier for garbage collectors to manage objects, and modern garbage collectors typically use generational collection algorithms. For HotSpot, the collection of the new generation is called Minor, the collection of the old generation is called Major, and the collection of the new generation, old generation, and method area is called Full. If the frequency of Full GC is too high, the current heap memory is insufficient.

The Cenozoic era is where almost all objects are born. Objects in this region have a short life cycle, and they come and go quickly. Each time the garbage collector collects this area, a large number of objects are typically collected. The new generation of objects are “small”, and the survival rate is low, and therefore the replication cost is low, so the replication algorithm is usually used for garbage collection. As shown in the figure, when collecting, the garbage collector copies surviving objects From Eden Space and From Survivor Space into To Survivor Space, and then empties the first two zones. When the time comes for the next collection, the surviving objects in Eden Space and To Survivor Space are copied To From Survivor Space, and the first two zones are emptied, alternating between them.

If objects in the new generation survive several rounds of garbage collection, the JVM dumps them to the old generation; In addition, some “large” objects are created, and if the new generation does not have enough free memory, the JVM will also allocate memory for them in the old generation. As a result, objects in the old world tend to be long-lived or space-consuming objects that are expensive to copy, so the garbage collector usually uses a mark-sweep algorithm for garbage collection. As shown in the figure, when collecting, the garbage collector marks the recyclable objects and then cleans them directly.

Thread-local Allocation Buffer (TLAB)

As can be seen from the above, most objects are created in Eden Space of the new generation. In the scenario where multiple threads create objects at the same time, there will inevitably be a thread safety problem of allocating memory. If you solve the problem of thread conflicts by locking, creating an object would involve both acquiring and releasing the lock, which is incredibly inefficient. The JVM solves this problem with a technique called Thread Local Allocation Buffer (TLAB).

A TLAB is a thread-private memory Space that the JVM allocates to Eden Space during thread initialization. Later, if a thread needs to create an object, the JVM allocates it first on the thread’s TLAB, which is isolated from each other, eliminating the need for synchronous locking, which greatly improves efficiency.

TLAB space is limited and its size can be adjusted according to JVM parameters. When the object that the thread needs to create cannot fit in the current TLAB, two things can happen depending on the specific JVM parameter configuration:

1. Create a new TLAB. In this case, the previous TLAB will be “retired” and the subsequent object allocation of the thread will take place on the new TLAB. The disadvantage of this situation is that it is prone to memory fragmentation.

2. Allocate objects directly on Eden Space. In this case, the object is allocated directly to the Eden Space shared by the thread, so it is inevitable to use a synchronous lock mechanism.

conclusion

In order to facilitate the memory management of Java programs, THE JVM divides the memory into five areas, including program counter, Java virtual machine stack, local method stack, method area and heap, which constitute the runtime data area. The first three are private to the thread, and the last two are shared by the thread. Thread sharing means that there is a thread conflict problem, which can be solved by using a synchronous lock mechanism because the program does not operate on the method area frequently. However, because each object creation was allocating memory on the heap, such frequent operations would have resulted in a significant performance cost if a synchronous locking mechanism was used, and the JVM used TLAB technology to solve this problem.

The JVM hides many of the low-level details for us, and Java programmers seem to be able to do just as much as they need to know that objects created using New are allocated in the heap and their memory is reclaimed by the garbage collector. However, in order to tune performance or write more efficient Java code, knowledge of the JVM’s memory management mechanisms is essential.

Parse the Java runtime data area in depth

preface

Program counter

Java virtual machine stack

Local variable scale (LVT)

Operand stack (OS)

Runtime Constant Pool Reference

Local method stack

Methods area

Runtime Constant Pool

The heap

Generational management of the heap

Thread-local Allocation Buffer (TLAB)

conclusion

Related Posts

Node Basics

Redis distributed lock and ZK distributed lock?

Java source code parsing | collection classes | ArrayList