There are a lot of articles explaining the fundamentals of the JVM, so hopefully this one is easy enough to understand

“This article has participated in the good article call order activity, click to see: back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!”

takeaway

Students who have studied C/C++ have had such an experience. No matter what kind of function is implemented, the following two problems will exist when implemented with C/C++ :

Memory management: Using C/C++ programming, we must manage the system memory well, if a little careless, there may be a risk of memory overflow
Cross-platform: For example, we use C/C++ to implement chat tool, in order to make the tool can be used in Windows, Mac OS, Linux and other operating systems, optical network communication part, we have to call the built-in library functions of these operating systems to achieve, this cost is very high

So the big guys at Sun decided to develop the Java language, which runs its programs using the JVM, and let the JVM handle both of these issues: memory management and cross-platform connectivity. The leaders hope that through such a scheme, programmers to put more energy on the implementation of features.

There are a lot of articles about JVM memory management on the web, but most of them have two problems:

It doesn’t go far enough, leaving you feeling like you know the big picture, but not enough
The content is indeed easy to understand, but the total feeling is fragmented, knowledge cannot be connected in series, give you a feeling of incomplete

Therefore, today, small K to a real case as a starting point, from the PERSPECTIVE of JVM source code in-depth analysis of the case program in the JVM process, to give you a more thorough, more coherent feeling.

case

Assuming that the mining community backend is developed in Java, the mining programmer uses the following code to start:

package com.juejin; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplication public class JueJinApplication { public static void main(String[] args) { SpringApplication.run(JueJinApplication.class, args); }}Copy the code

This is a classic Spring Boot Boot class, so when we type the class into a JAR, we execute the jar using the following Java command:

java -cp juejin.jar com.juejin.JueJinApplication
Copy the code

What happens inside the JVM at this point?

JNI

If you’ve written Java, you know that the entry point for a Java program is a main method. So, to execute the main method in the jar package above and manage the memory of the program, the JVM first has to find the corresponding main method in the JAR, which is in the JueJinApplication class. It is then loaded into the JVM so that the JVM has the autonomy to manage the memory used by the main method.

So Sun’s programmers started writing the logic to find the main method. In the Guide, I mentioned that when programming in C/C++, we had to manage the system’s memory very well. So they found that writing the main method in C++ had to manage the memory themselves, which was too much work. They came up with a solution: JNI.

JNI provides a set of contracts for Java to interact with other programming languages. Through this contract, we can achieve two-way interaction between Java and other programming languages. For example, we can call Java methods from C++, and conversely, we can call C++ functions from Java. Like this one:

With JNI, Sun’s programmers can now find the main method in the Java implementation case, as shown below:

The figure above is a diagram of the JVM looking for the main method when the Java command is started. The JVM uses the LoadMainClass function implemented in C++ to call the checkAndLoadMain method implemented in Java to look for and load the main method.

Above the red line part describes the JVM startup process, to find and load the com. Juejin. JueJinApplication and the detailed process of the main method:

Start the JVM with the JLI_Launch function
JLI_Launch calls ParseArguments internally to parse the start parameters
If the boot parameter is -cp and the JVM boot mode is set to LM_CLASS, mainClass is specified to start
Call GetStaticMethodID to get the ID of the checkAndLoadMain method
Call NewPlatformString function transformation checkAndLoadMain method into the parameter, namely start class com. Juejin. JueJinApplication name
Call CallStaticObjectMethod to execute the checkAndLoadMain method, as shown in the yellow box on the right:
- Since boot mode for LM_CLASS, using SystemClassLoader start to load the class mainClass, namely com. Juejin. JueJinApplication, certainly also includes the main classes of methods

Through the above process, we find that since checkAndLoadMain is a Java method, the JVM calls the method through JNI.

From this, we summarize the contract for calling Java methods through JNI:

Get the name of the called Java method using the GetStaticMethodID function

The called Java method is executed through the CallStaticObjectMethod function

This will help you find the entry point for the corresponding method when debugging JVM source code.

Those of you who have looked closely at the picture may have noticed that I’m missing something. Yes, I should add that the JVM will load mainClass via different links depending on the boot mode. In the figure, I only draw two links (-cp and -jar), because these are the two boot modes we usually use:

-cp: specifies the boot class initiator, which is the link I mentioned above.
-jar: specifies the jar package to start the program. This link has the following steps, as shown in the purple line in the figure above:
- The JVM finds that the boot parameter is -jar, so it sets the boot mode to LM_JAR
- Since the launch mode is LM_JAR, then, find the manifest file from the JAR, extract the main-class keyword in the file, and find the corresponding mainClass name
- Use the SystemClassLoader to load the startup class mainClass and its internal main method, just as the LM_CLASS mode loads the startup class

The other two startup modes, LM_SOURCE and LM_MODULE, can be studied by interested partners

Our Java program is ultimately executed by the JVM, so the main method loaded into the JVM is ultimately processed and executed by the JVM.

But before I show you how the JVM executes the main method, little K will give you an analysis:

We all know that the class files inside the package are bytecode, whether they are packaged by Maven or Gradle. We also know that:

The law of CPU processing is shown in the figure above: as the pyramid goes down, CPU processing performance gradually declines, with the CPU cache being fastest, registers being second, and disk being slowest.

Since the CPU cache reads and writes, the program has no control over it, so the JVM wants to execute the program efficiently by putting the program in as many registers as possible so that the CPU can handle the program quickly.

However, the program in our JAR is a piece of bytecode, and any computer student knows that the register contains machine instructions, namely binary instructions. Therefore, the JVM has to convert the program bytecode into machine instructions, and finally, the corresponding machine instructions of the program can be put into the register.

Thus, as shown in the example in the introduction above, the JVM does bytecode to instruction when loading the JueJinApplication using the SystemClassLoader. Ps: For easy interpretation, the machine instructions to the right of the arrow in the figure are replaced by assembly expressions.

But there’s a problem: The class JueJinApplication and the annotation @SpringBootApplication in the introduction case are shared by threads, and the instructions in the register are read by a thread. Therefore, It is not appropriate to write the class JueJinApplication and the annotation @springbootapplication into the register, so the JVM has designed MetaSpace to hold both. There are a lot of articles on MetaSpace and JMM that I won’t go into here.

The main method in JueJinApplication executes elements that are thread exclusive and can be stored in registers. Therefore, today we will focus on how the main method in JueJinApplication is translated into machine instructions.

Template interpretation execution

Let’s first look at what the bytecode of the JueJinApplication class looks like:

public class com.juejin.JueJinApplication { public com.juejin.JueJinApplication(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: ldc #2 // class com/juejin/JueJinApplication 2: aload_0 3: invokestatic #3 // Method org/springframework/boot/SpringApplication.run:(Ljava/lang/Class; [Ljava/lang/String;)Lorg/springframework/context/ConfigurableApplicationContext; 6: pop 7: return }Copy the code

Here’s a quick rundown of the structure. Code stands for bytecode:

The bytecode in the JueJinApplication class:
- Aload_0: pushes the this reference to the top of the stack
- Invokespecial #1: calls the constructor of JueJinApplication’s superclass java.lang.Object
Bytecode in the main method:
- LDC #2: Push the class JueJinApplication to the top of the stack
- Aload_0: pushes the args parameter to the top of the stack
- Invokestatic # 3: call the static method SpringApplication. Run, method into the reference for the class JueJinApplication and args, return for ConfigurableApplicationContext structure
- Pop: Pops the return value of the springApplication.run method, because the return value of springApplication.run is not used in the main method

Given the bytecode in the JueJinApplication class, we have to consider the premise that the machine instruction format is different for the instruction set of different CPU architectures in order to convert these bytecode instructions into corresponding machine instructions. For example, there are the x86 instruction set, the ARM instruction set, and so on, all of which have different machine instruction formats. Therefore, the JVM has designed a scheme for converting the main method bytecode instruction to the machine instruction in the JueJinApplication class:

The Bytecodes structure defines all the Bytecodes that will be used in Java, which the JVM passes to the TemplateTable. As shown in the top box above, aload_0 and pop are bytecode instructions in the JueJinApplication class.
TemplateTable uses the full bytecode obtained in the previous step to generate a template for the bytecode that defines the mapping between the bytecode and the machine instruction template. Here I take a look at the template using the aload_0 bytecode instruction as an example:
- Aload_0 = > ubcp | * * * * | CLVM | * * * *, vtos, atos, aload_0, _, among them, said = > aload_0 bytecode instructions and the corresponding machine template mapping:
  - => AloAD_0 on the left represents the bytecode instruction ALOAD_0
  - => The right side represents the machine instruction template corresponding to the aloAD_0 bytecode instruction. The template contains five parameters:
    - Flags: There are four flags defined:
      - Ubcp: If the method in the classfile is a Java method, then the bytecode directive in the classfile needs this pointer. In this case, the flag is true. If the method in the classfile is a native method, the flag is true. Since native methods are implemented in C/C++, they can be called directly without Pointers
      - Disp: whether to forward within the scope of the template. For example, if the goto directive jumps to another directive, the flag is true
      - CLVM: no need to call vm_call function, because aload_0 internal vm_call function, so CLVM is true, anyway, is false
      - Iswd: For example, the ILoAD bytecode instruction is a wide instruction that reads variables from the local variable table and pushes them to the top of the stack. When the local variable table holds 256 variables, that is, 2^8, then isWD is false, and the ILoAD instruction may read more local variables than 2^8. In this case, We need to extend the size of the local variable table to 2^16 to hold 65536 variables, in which case ISWD is true
      As defined by flags, the aloAD_0 bytecode instruction is a Java method, so ubcp is true,
    - Aload_0: indicates that the aloAD_0 bytecode instruction uses the aloAD_0 function to generate the corresponding machine instruction, because the ALOAD_0 bytecode instruction corresponds to more than one machine instruction
    - Vtos: the input parameter of the aloAD_0 bytecode instruction, which is the entry address of the machine instruction operand for executing the aloAD_0 bytecode instruction, described in detail below in “Top of stack Cache”
    - Atos: The output parameter of the aloAD_0 bytecode instruction, which may be used as the input parameter of the next instruction
    - _ : the local variable used by the aload_0 bytecode instruction. Since aload_0’s input is a stack input, a non-local variable, this parameter is set to __
Then, the JVM bytecode and mapping relationship to TemplateInterpreterGenerator machine instruction template
TemplateInterpreterGenerator invoke different CPU architecture compiler generated bytecode instruction corresponding machine instructions, I am still in aload_0 bytecode instruction, for example:
- Assume that the JVM calls an x86 Assembler to generate machine instructions, such as the x86 Assembler shown above:
  - As shown in the figure above, aloAD_0 on the left side of the bottom blue box is the aloAD_0 parameter in the template in Step 2, indicating that the ALOAD_0 bytecode instruction uses this parameter to generate the corresponding machine instruction.
  - As shown in the figure above, the aloAD_0 machine instruction on the right side of the bottom blue box represents the machine instruction corresponding to the ALOAD_0 bytecode instruction
  Thus, ALOAD_0 => ALOAD_0 machine instruction represents the process that defines the aloAD_0 bytecode instruction to generate machine instructions.
TemplateInterpreterGenerator according to the step 2 get aload_0 machine instruction template, matching the aload_0 x86 compiler parameters in step 3, figure in two standard red aload_0 said this match, and then, Call this parameter to execute and generate the machine instruction for aloAD_0. The aloAD_0 instruction in the yellow box in the figure above represents the machine instruction corresponding to the ALOAD_0 bytecode instruction.
Write the generated ALOAD_0 machine instruction to the ICache, instruction cache
Similarly, as with the AloAD_0 bytecode instruction, the JVM converts the other bytecode instructions in the main method of the JueJinApplication class into corresponding machine instructions, which are written as ICache.

Above the JVM will generated directly by TemplateInterpreterGenerator template explain generator machine instructions, then perform explanation way of machine instruction is called template. While this is one form of Java program execution by the JVM, there are also two types of execution in Hotspot: bytecode interpreted execution and C++ interpreted execution. Those of you who are interested can find out for yourself.

The stack cache

Earlier, I mentioned that the PURPOSE of the JVM to convert bytecode into machine instructions is to write the converted instructions into registers to improve the performance of the CPU processor. In the JVM, this writing is called top-stack caching. Let’s take the aload_0 bytecode instruction in the main method as an example of how the JVM does top stack caching.

Write top stack cache

The JVM writes the converted machine instruction to the register after the machine instruction is generated. The figure above shows how the aload_0 bytecode instruction is written to the main method in the Introduction example:

After parsing the classfile, we know that the main method’s input is args, so we push args to the top of the stack. See the dotted line in the figure above.
The top stack cache defines 10 states that represent the variable types of the cache, as shown in the green box above. Here, I’ll explain:
- Btos: cache the variable of type bool, corresponding to beP, the address of the variable in the stack
- Ztos: cache variables of byte type, corresponding to beP, and the address of the variable in the stack
- Ctos: cache variable of type char, corresponding to cEP, the address of the variable in the stack
- Stos: Cache variable of type short, corresponding to sep, the address of the variable in the stack
- Itos: cache variable of type int, corresponding to iEP, the address of the variable in the stack
- Ltos: cache variables of type LONG, corresponding to LEP, the address of the variable in the stack
- Ftos: cache variables of type float, corresponding to the feP representation, the address of the variable in the stack
- Dtos: cache variables of type double, corresponding to dep, the address of the variable in the stack
- Atos: cache variable of type object, corresponding to aEP, the address of the variable in the stack
- Vtos: This is a special one, which means that the variable/parameter required by the instruction is already on the top of the stack and does not need to be cached, and the corresponding VEP is the address of the variable on the stack
The change of operands in the stack before and after the instruction is reflected in the variable *ep. These * eps form an array entry, as shown in the green section above. The reason for using arrays is that the state of an instruction before and after execution is reflected in the stack through multiple EP variables.

Because 0 in aload_0 instruction means to fetch the variable at the top of the stack, indicating that the variable is already at the top of the stack. Therefore, referring to the 10 states of the cache at the top of the stack, the VEP corresponding to this ALOad_0 instruction is the address at the top of the stack. In the figure above, the VEP in the entry array points to the top of the stack. Because the aload_0 instruction has no other operands, the other EP variables point to the top of the stack.
Write each EP variable to a 2-d array submarked top cache state [bytecode instruction]. This 2-D array is the top cache. As shown in the figure above, the entry is this two-dimensional array. The JVM writes each EP variable in the entry array, the position of the aload instruction operand on the stack, to [vtos][aload_0], [atos][aload_0], and so on. This completes the top stack cache.

Read the stack top cache

With the stack top cache, the JVM can find the corresponding operand from the stack top cache according to the instruction + operand when executing the machine instruction corresponding to the main method. Finally, the INSTRUCTION is handed over to the CPU for execution. Take the aload_0 bytecode instruction of the main method as an example, the specific process is as follows:

Let’s focus on the red line:

The JVM is positioned from the top of the stack cache by aload_0 + args, which means to fetch the args value from the top of the stack[vtos][aload_0]In this cell is the stack position corresponding to VEP: top of stack
Since vtos corresponds to VEP pointing to the top of the stack, the JVM takes the value of the input parameter args from the top of the stack
Pass the value of args to the CPU
The CPU extracts the machine instruction for aload_0 from the ICache
CPU executes machine instruction (aload_0 instruction + value of operand args)

conclusion

In this article, I’ve focused on the concepts of JNI, template interpretation execution, and top stack caching. I’m sure you may have some related questions, such as:

How and when are stacks generated?
What data is stored on the stack, binary or hexadecimal, or depending on the data type?
How does the JVM manipulate stacks?

These are all good questions, and Little K will explain them in more detail in later articles.

In the end, small K still hopes that the nuggets through the study of the article, can be inspired, harvest and growth. Of course, if you have any questions, feel free to leave them in the comments section. I believe everyone will become a technology expert in the future!

There are a lot of articles explaining the fundamentals of the JVM, so hopefully this one is easy enough to understand

takeaway

case

JNI

Template interpretation execution

The stack cache

Write top stack cache

Read the stack top cache

conclusion

Related Posts

Dynamic proxy for design patterns

Spring IOC detailed solution and Bean life cycle detailed process, read the direct blow the interviewer!

Serverless briefly