1. Programming languages are compatible with the underlying system in two ways: 1. The compiler is used to achieve compatibility with programming languages such as C and C++. It runs on both x86 and AMD platforms. This capability is not a feature of the programming language, but is given by the compiler. Specific compilers are developed for different hardware platforms and operating systems. The compilers can translate the same C/C++ language into machine instructions matching the target platform, so as to achieve language compatibility.

2, through the implementation of intermediate language compatible Java, C# and other languages, all belong to this compatible way

Java/C# is compiled to generate an intermediate language (ML), whose instructions are interpreted and executed by the virtual machine. The virtual machine performs real-time translation of the intermediate language into machine instructions that match the specific underlying platform. Source code is compiled to generate intermediate language instructions that are the same regardless of the underlying platform on which the program ends up running. The compatibility of intermediate languages is done by the virtual machine.Copy the code

2. Intermediate language translation Translates intermediate language into corresponding machine instructions and executes them.

1. One possible way to translate from an intermediate language to machine code is to use a C program that interprets each bytecode instruction line by line. When the program that executes the bytecode — the JVM program itself — is compiled, the C program corresponding to the bytecode instructions is compiled into machine code, so the VIRTUAL machine will execute the native machine code corresponding to the C program when interpreting the bytecode instructions.

But this approach is inefficient.Copy the code

The principle of using the CPU to execute code. To get the CPU to execute a piece of code, simply execute the CS:IP segment register to the entry of the code segment.

CS register stores segment address, IP stores offset address. The value of CS and IP registers can uniquely determine an address in memory. Before executing instructions, CPU can locate the target memory location through these two registers, and take out the machine instructions at this location for calculation. Routines: Use syntactic sugar (syntax rules) provided by the C program to make CS:IP point directly to a string of machine codeCopy the code

const unsigned char code[] = “\x55\x89\x35\x8b\x45\x0c\x8b\x55\x08\x01\xd0\x5d\xc3”; int main(){ int a = 5; int b = 3; int (fun)(int,int); // Define the function pointer fun = (void)code; Int r = fun(a,b); printf(r); return 0; } This example implements the sum of two positive integers. The fun pointer points to the beginning of a char array.

3. Although local compilation translates the intermediate language into machine code and runs it directly, its efficiency is much higher than using C language to explain the execution, but because the intermediate language has its own set of memory management and code execution. Thus, even though it takes only a few lines of code to perform the same function in an intermediate language, the translated machine code has many more instructions than the machine code directly written. Not very efficient.

To provide performance, the JVM provides a mechanism for translating intermediate languages (bytecodes) directly into native machine instructions.Copy the code

Instructions are translated from the Java language into an intermediate language by the compiler, which is then handed to the virtual machine, which translates the intermediate language into instructions on the corresponding machine platform.

The so-called intermediate language is the Java bytecode instruction set. All instructions in Java are described in 8-bit binary, so the total number of Instructions in Java does not exceed 255.Copy the code

1. Common assembly instructions mainly study the internal implementation mechanism of Java virtual machine execution engine and learn five simple assembly instructions.

(1) Data transfer instructions These instructions mainly transfer data between register and memory, register and input/output port. Pop % eAX (2) // Add the natural number 3 to the number in the eAX register Add 3,%eax // add 1 inc %ebx (3) logical operators with, or, not, left shift, right shift, etc. // move the number in eAX register left 1 binary SHL %eax,1 And al,00111011B (4) serial instruction continuous space allocation, continuous space value, transfer, etc. (5) Program transfer instruction if else judgment, for loop, while loop, function call and other common: JMP jump, loop loop, RET, etcCopy the code

Java is an object-oriented programming language, and naturally needs a special set of instructions to support type operations.

1, data exchange instructions for JVM memory, divided into operand stack, local variable table, Java heap, constant pool, method area. Data exchange instructions support data transfer and exchange between these memory areas. The primary place where the JVM performs logical operations is in the operand stack (with the exception of iINC instructions, which can operate directly on local variables). Whether you put data on the stack or in the constant pool, as long as you perform an operation, eventually the JVM passes the data to the operand stack. And the area where the hardware does the computation is the register, and whether you put the data in the data segment, or the code segment, eventually the CPU will deliver the data to the register. After the logical operation is complete, the result is escaped. The JVM specification also provides instructions like getFeild and PutFeild to exchange data between fields and operand stacks of objects in the Java heap; Directives such as getStatic and putstatic exchange data between fields and operand stacks in a class; Baload, Bastore, Caload, and Castore instructions: implement data exchange between arrays and operand stacks in the JVM heap. 2. Function call instructions Due to the rich function types in Java, it is necessary to support more function call methods. For example, Invokevitual, InvokeInterface, Invokespecial, Invokestatic, and return. This is a richer set of function call instructions than the hardware supports. X86 primarily uses Call and RET to save and restore the scene, often with CPU physical registers on and off the stack. The JVM does not have physical registers, so operand stacks and PC registers are used instead. The SOLUTION for the JVM to save the scene and restore the scene is to push a stack frame into the Java stack and pop a stack frame from the Java stack when the function returns. When a JVM calls a function, it cannot jump to the corresponding code segment as CPU hardware does. This is because the code for Java functions is not stored in a code snippet, but in a code cache. Each block of Java function code has an index location in the code cache to which the JVM eventually jumps to execute a Java function call. At the same time, Java functions must be wrapped in classes, so when the JVM makes a function call, it has to go through a series of operations, such as addressing, to locate the entry. 3, operation instruction set THE JVM and operation related instruction set mainly include arithmetic operation, bit operation, comparison operation, logic operation, etc. Iadd: summation of two ints isub: summation of two ints fadd: summation of two float numbers ddiv: summation of two float numbers Control transfer instructions Like CPU hardware, the JVM provides common control transfer instructions. The JVM provides a set of instructions for creating objects. Using the keyword new at the Java syntax level instantiates an object and the corresponding bytecode is also new. The JVM specification also provides "narrow cast" and "wide cast" instructions. The latter is naturally supported within the JVM. In addition to the above instructions, the JVM specification provides many other instructions that are not available on the physical CPU. For example, throw exception instructions, instructions for thread synchronization, etc.Copy the code

Conclusion: What the Java language is trying to figure out is how to achieve compatibility without paying attention to the underlying technical details, and achieve cross-platform compatibility with an intermediate language. Since the intermediate language is not a local machine instruction, the machine cannot recognize it directly, so the intermediate language cannot be run directly by the physical CPU. Use virtual machines to interpret intermediate languages and translate them into their local machine language counterparts.

The efficient way is to directly translate Java bytecode instructions into local machine instructions, and the runtime is directly executed by the Java virtual machine to call the corresponding machine instructions. This calling mechanism mainly relies on the CALL and JMP instructions provided by CPU.Copy the code

Ps: This study note is based on the study of The author of “Uncovering the Java Virtual Machine – JVM Design principle and Implementation”. It is the first time to write and publish a blog, which are basically study notes. If there is any copyright infringement, please contact me to delete. Like it if you like it.