Virtual machine bytecode execution engine

The so-called “virtual machine bytecode execution engine” is a stack interpreter based execution mechanism of the JVM based on bytecode instructions given in the Class file. In layman’s terms, this is the process by which the JVM parses bytecode instructions and outputs the results of a run. Let’s look at this section in detail.

The nature of a method call

Before describing the “bytecode execution engine,” let’s look at what stack frame-based method calls look like at the assembly level. (Take IA32 CPU instruction set as an example)

IA32 programs use the stack frame data structure to support procedure calls (called methods in the Java language). Each procedure corresponds to a stack frame, and the procedure calls correspond to the loading and unloading of the stack frame. At some point, only the stack frame at the top of the stack is available, which represents the various states in which a method is executing. The top stack frame is defined by two Pointers, the stack pointer and the frame pointer. Their corresponding stack addresses are stored in registers % eBP and % ESP, respectively. The general structure of the stack is as follows:

The frame pointer points to the bottom of the current stack frame, not the bottom of the entire stack.

Let’s take a look at some C code:

#include<stdio.h>
void sayHello(int age)
{
    int x = 32;
    int y = 2323;
    age = x + y;
}

void main()
{
    int age = 22;
    sayHello(age);
}
Copy the code

Very simple a section of code, we assemble to generate the corresponding assembly code, omit part of the link code, leaving the core part:

main:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$20, %esp
	movl	$22, -4(%ebp)
	movl	-4(%ebp), %eax
	movl	%eax, (%esp)
	call	sayHello
	leave
	ret
	
sayHello:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$16, %esp
	movl	$32, -4(%ebp)
	movl	The $2323, -8(%ebp)
	movl	-8(%ebp), %eax
	movl	-4(%ebp), %edx
	addl	%edx, %eax
	movl	%eax, -12(%ebp)
	leave
	ret
Copy the code

The first two assembly instructions in the main function are the same as the first two sayHello instructions, which we’ll save for the latter.

The subl instruction subtracts the address in register % ESP by 20, that is, the stack pointer is extended by 20 bytes (the stack grows backwards), that is, the current stack frame is allocated 20 bytes. Next, the MOVL writes the value 20 to address -4(%ebp), which is essentially four bytes above the frame pointer position of the relative register % eBP. If %ebp is 0x14, then 20 is stored at the stack address 0x10.

The next MOVL instruction takes the value of the parameter age and stores it in register %eax.

Now comes the core call method. The computer has a program counter (PC) to point to the location of the next instruction, and often our program will be called to another method. How to restore the state after the call and continue execution?

The solution here is that the first step of the call instruction is to stack the return address and then jump to the sayHell method to execute it. We don’t see the stack here, but it is integrated into a single instruction.

Pushl then jumps to the first instruction of sayHello and pushes the address in register %ebp. %ebp is the address of the last frame pointer on the stack. This operation is a save action. The MOVL instruction then points the frame pointer to the position of the stack pointer, the top of the stack, and expands the stack pointer up by 16 bytes.

Next, write the values 32 and 2323 to different stack addresses that can be calculated relative to the address of the frame pointer.

The next operation is to write x and y to registers %eax and %edx, respectively, and then add and store to register %eax. The results are then pushed.

The leave instruction is equivalent to the sum of the following two instructions:

movl %ebp %esp
popl %ebp
Copy the code

What does that mean?

Return the stack pointer to the frame pointer at the bottom of the current stack frame, and then pop the stack. In this case, the entire stack frame occupied by sayHello is no longer referable, thus freeing the current stack frame.

The RET directive is used to restore the state before the call and continue with the main method.

The whole IA32 method call is basically like this, for 64-bit x86-64, 16 registers are added, and registers are preferentially used for parameter calculation and transfer, which improves efficiency. However, the disadvantage of this stack-based storage method is that it is “poor portability”, and register usage must be different from machine to machine. So there is no doubt that Java uses stacks.

Run time stack frame structure

In Java, a stack frame corresponds to a method call, and the local variables, operands, and return addresses involved in the method are stored in the stack frame. The stack frame size for each method is determined after compilation, and information such as how large the local variable table and how deep the operand stack should be is written into the method’s Code attribute. Therefore, during the runtime, the stack frame size of the method has been fixed, and the memory can be directly calculated and allocated.

Local variable scale

A local variable table is used to store variables and method parameters used when a method is run. Each slot should be able to hold any Boolean, byte, char, short, int, float, The reference, etc.

As I understand it, a slot is equivalent to a black box, depending on how many bytes it takes, but the black box can explicitly hold a variable of any type.

The local variable table is different from the operand stack in that it uses an index mechanism to access elements, rather than the way in and out of the operand stack. Such as:

public void sayHello(String name){
        int x = 23;
        int y = 43;
        x++;
        x = y - 2;
        long z = 234;
        x = (int)z;
        String str = new String("hello wrold ");
    }
Copy the code

Let’s decompile to see its local variable table:

As you can see, the first entry in the list of local variables is a class reference named this, which points to a reference to the current object in the heap. Then we have our method arguments, local variables x, y, z, and STR.

By default, each of our instance methods takes the parameter this, which refers to the instance reference of the current class.

The operand stack

The operand stack, also known as the operation stack, does not access elements like the index mechanism used by local variable tables. Instead, it is the standard stack operation, in and out of the stack, first in and then out. The operand stack is empty at the beginning of the method execution. As the method runs step by step, the operand stack will continue to be on and off the stack until the method execution ends.

Operand stack is a very important part in the process of method execution. In the process of method execution, each intermediate result needs to be stored by means of operand stack.

The return address

After a method has called another method, it needs to return to the point of call to continue executing the subsequent method body. Then the location of the call to other methods is called the “return address”, we need to ensure that the CPU can return to the original call after executing other methods, and then continue to call the method body.

Just like the assembly code we introduced at the beginning, this return address is often pushed into the caller’s stack frame ahead of time, and when the method call ends, the top element of the stack is removed to obtain the subsequent method body entry.

The method call

Method call is a core content of this article, it solves the virtual machine to call the target method to determine the problem, because often a VIRTUAL machine instruction requires to call a method, but the method may have overloading, rewriting and other problems, so the virtual machine and how to determine which method to call? This is the only task to be dealt with at this stage.

First we’ll talk about the parsing process. As you saw in the previous article, when a class is first loaded, symbolic references in the constant pool are replaced with direct references during the parsing phase. This includes the translation of symbolic references to methods into direct references, but this is only for some methods; some methods are determined only at run time and will not be resolved. We call the parsing process during the class loading phase “static parsing.”

So which methods are resolved statically and which are resolved dynamically?

Take this code for example:

Object obj = new String("hello");
obj.equals("world");
Copy the code

Object has an equals method and String has an equals method. The above procedure obviously calls String’s equals method. So if we load the Object class and refer to equals directly to its own equals method, then obj will always call Object’s equals method. Then our polymorphism will never be realized.

Only “compile-time, run-time invariant” methods can be resolved statically when a class is loaded. These methods include private modified private methods, class-static methods, class-instance constructors, and superclass methods.

All other methods are collectively referred to as “virtual methods,” and the parsing phase of the class load is not resolved. There is no problem with these methods. The virtual machine can find the method entry directly by reference, but the virtual machine needs to use certain strategies to locate the actual method. Let’s take a look.

Static dispatching

First let’s look at some code:

public class Father {
}
public class Son extends Father {
}
public class Daughter extends Father {
}
Copy the code

public class Hello {
    public void sayHello(Father father){
        System.out.println("hello , i am the father");
    }
    public void sayHello(Daughter daughter){
        System.out.println("hello i am the daughter");
    }
    public void sayHello(Son son){
        System.out.println("hello i am the son"); }}Copy the code

public static void main(String[] args){
    Father son = new Son();
    Father daughter = new Daughter();
    Hello hello = new Hello();
    hello.sayHello(son);
    hello.sayHello(daughter);
}
Copy the code

The following output is displayed:

hello , i am the father

hello , i am the father

I wonder if you got that right? This is a common interview question that tests your understanding of method overloading and method dispatch logic. Let’s analyze it:

First, I need to introduce two concepts, “static typing” and “real typing.” A static type is a type wrapped in the outermost layer of a variable. For example, Father is a static type and Son or Daughter is an actual type.

When generating bytecode instructions, our compiler selects the appropriate method to call based on the static type of the variable. For our example above:

The two methods are the same as the sayHello method called twice in main, but you’ll notice that the type of argument passed in is the same as Father, which is the same method called:

(LStaticDispathch/Father;) V

That is

public void sayHello(Father father){}

All dispatch actions that rely on static types to locate the version of a method execution are called “static dispatch,” and method overloading is a typical manifestation of static dispatch. Note, however, that static dispatch only makes method calls based on your static type, regardless of what your actual type is.

Dynamic dispatch

public class Father {
    public void sayHello(){
        System.out.println("hello world ---- father");
    }
}
public class Son extends Father {
    @Override
    public void sayHello(){
        System.out.println("hello world ---- son"); }}Copy the code

public static void main(String[] args){
    Father son = new Son();
    son.sayHello();
}
Copy the code

Output result:

hello world —- son

The sayHello method of the subclass is called, and the bytecode instruction is called:

See yet? The compiler generates a method call instruction for us that selects the corresponding method of the static type, but why does it end up calling the corresponding method of the actual type?

When we call a specific method of an instance of a type, we first push the current instance onto the operand stack, and then our Invokevirtual directive completes the following steps to invoke a method:

Pop up the top element of the operand stack, determine its actual type, and call it C
Look for a method in type C that has the same simple name and descriptor as the method to call, and return a direct reference to that method if there is one
Otherwise, the search for C’s parent class returns a direct reference to the method
Otherwise, an exception is Java. Lang. AbstractMethodError anomalies

Therefore, it is self-evident that our example here calls the sayHello method of subclass Son.

As for why the virtual machine can be so accurate and efficient search specified method in a class of the realization of the virtual machine is different, but the most common is to use a “virtual method table,” the concept is simple, is to maintain a method table for each type, this table records the current types of all the methods of description information. Therefore, when the VM retrieves a method, it only needs to search the method table. If the method table of the current type does not exist, it needs to search the method table of the parent class.

Support for dynamic typing features

A key feature of dynamically typed languages is that type checking occurs at run time. That is, the compiler does not care what type of variable you are calling or whether the method exists at compile time. Such as:

Object obj = new String("hello-world");
obj.split("-");
Copy the code

In Java, two lines of code will not pass the compiler because the compiler checks that the static type of the Object variable obj is Object, and the Object class does not have a subString method.

In dynamically typed languages, this code is fine.

Static languages check variable types at compile time and provide strict checks, while dynamic languages check actual variable types at run time, giving programs greater flexibility. There are pros and cons. Static languages have the advantage of security and the disadvantage of lack of flexibility, while dynamic languages have the opposite.

JDK1.7 provides two ways to support Java’s dynamic features, the InvokeDynamic directive and the java.lang. Invoke package. The two are implemented in a similar way, and we will only cover the basics of the latter.

// This method is my own custom, Public static MethodHandle getSubStringMethod(Object obj) throws NoSuchMethodException, IllegalAccessException {// Defines a method template, MethodType MethodType = methodType. MethodType (String[]. Class, string.class); // Find a method that matches the simple name and template information of the specified methodreturn lookup().findVirtual(obj.getClass(),"split",methodType).bindTo(obj);
}
Copy the code

public static void main(String[] args){
    Object obj = new String("hello-world"); String[] STRS = (String[]) getSubStringMethod(obj).invokeexact ()"-");
    System.out.println(strs[0]);
}
Copy the code

Output result:

hello

You see, even though our static type for OBj is Object, this way I can bypass the compiler’s type checking and execute the methods I specify at runtime.

I will not take you to see how to achieve the specific, more complex, later have the opportunity to write a separate article to learn. Anyway, this way, we don’t care what the static type of a variable is, as long as it has a method that I want to call, we can call it at run time.

To summarize, the HotSpot VIRTUAL machine interprets and executes methods based on the operand stack, and the intermediate results of all operations, method parameters, etc., are basically fetched or stored with the operation in and out of the stack. The biggest advantage of this mechanism is portability. Different from the register-based method execution mechanism, it relies too much on the underlying hardware and cannot be easily cross-platform, but the disadvantage is also obvious, that is, the same operation requires relatively more instructions to complete.

All the code, images and files in this article are stored in the cloud on my GitHub:

(https://github.com/SingleYam/overview_java)

Welcome to wechat public number: jump on the code of Gorky, all articles will be synchronized in the public number.

Virtual machine bytecode execution engine

The nature of a method call

Run time stack frame structure

The method call

Support for dynamic typing features

Related Posts

Deep Parsing of PriorityQueue in Java Collection Framework

TypeError: object() takes no parameters

Top 5 exception-handling coding practices to avoid in Java