Let’s learn some questions:

  1. What is the principle of polymorphism?
  2. MethodHandle mechanism.
  3. How to find the correct method when the virtual machine executes code, how to execute the bytecode in the method, and the memory structure involved in executing the code.

First, run time stack frame structure

Java Virtual machines (VMS) use methods as the most basic execution unit, and Stack frames are the data structures behind method invocation and method execution supported by VMS. They are also Stack elements of the Virtual Machine Stack in the data area when VMS are running. A stack frame stores information about a method’s local variogram, operand stack, dynamic linkage, and method return address.

The amount of memory allocated to a stack frame is not influenced by the program’s runtime variable data, but only by the program’s source code and the specific virtual machine implementation of the stack memory layout.

1. Local variation scale

** Local Variables Table is a storage space for a set of variable values. It is used to store method parameters and Local Variables defined within methods. ** When the Java program is compiled as a Class file, the max_locals data item of the method’s Code property determines the maximum size of the local variable table that the method needs to allocate.

The capacity of a local Variable table is based on Variable Slot. The Java Virtual Machine Specification does not specify the memory space that a Variable Slot should occupy. It is just a guide to say that each variable slot should be able to hold a Boolean, byte, CHAR, short, int, float, Reference, or returnAddress. All eight data types can be stored using 32 bits of physical memory or less.

Figure 8-1 Conceptual structure of stack frames

The Java VIRTUAL machine uses the local variable table in the form of index location. The index value ranges from 0 to the maximum number of variable slots in the local variable table.

The virtual machine does not set “zero” for local variables.

2. Operand stack

As with the local variable table, the maximum depth of the operand stack is written into the max_stacks data item of the Code attribute at compile time. Each element of the operand stack can be any Java data type, including long and double. 32-bit data types have a stack capacity of 1 and 64-bit data types have a stack capacity of 2.

When a method is first executed, the operand stack of the method is empty. During the execution of the method, various bytecode instructions are written to and extracted from the operand stack, namely, out and onto the stack. For example, arithmetic operation is carried out by pushing the operand stack involved in the operation to the top of the stack and then calling the operation instruction. For example, when calling other methods, method parameters are passed through the operand stack. For example, the bytecode instruction iadd for integer addition requires that the two elements closest to the top of the operand stack have already stored two ints. When this instruction is executed, the two Ints are pushed off the stack and added together, and then the sum is pushed back onto the stack.

The data types of the elements in the operand stack must exactly match the sequence of bytecode instructions. In the case of the iADD directive, the two elements closest to the top of the stack must be of type int.

In most virtual machine implementations, some optimization is done to make the two stack frames overlap. In this way, part of the operand stack of the lower stack frame is overlapped with part of the local variable table of the upper stack frame, which not only saves some space, but more importantly, part of the data can be directly shared during method calls without additional parameter replication and transfer. The overlapped process is shown in Figure 8-2.

Figure 8-2 Data sharing between two stack frames

The interpreted execution engine of the Java virtual machine is called “stack-based execution engine”, where the “stack” is the operand stack.

3. Dynamic connection

Each stack frame contains a reference to the method in the runtime constant pool to which that stack frame belongs, which is held to support Dynamic Linking during method invocation. We know that Class files have a large number of symbolic references in the constant pool, and the method invocation instructions in the bytecode take symbolic references to methods in the constant pool as arguments. Some of these symbolic references are converted to direct references during class loading or the first time they are used, which is called static resolution. The other part, which is converted to a direct reference at each run, is called the dynamic join.

4. The method returns the address

Once a method is executed, there are only two ways to exit the method. The first is when the execution engine encounters a bytecode instruction returned by any of the methods, and there may be a return value passed to the upper method caller. This exit method is called “normal call completion.”

Another exit is when an exception is encountered during the execution of a method that is not handled properly within the method body. This way of exiting a method is called “exception call completion.”

After a method exits, it must return to where it was when the original method was called in order for the program to continue. When a method returns, it may need to store some information in the stack frame to help restore the execution state of its upper calling method. In general, when a method exits normally, the value of the PC counter of the calling method can be used as the return address, and it is likely that this counter value will be stored in the stack frame. When a method exits with an exception, the return address is determined by the exception handler table, and this information is generally not stored in the stack frame.

Method of process in fact is equal to the current frame out of the stack, so exit may perform operations are: to restore the upper method local variables and the operand stack, the return value (if any) into the caller stack frame of the operand stack, adjust the PC counter value to point to the method call instructions behind an instruction, etc.

5. Additional information

The Java Virtual Machine Specification allows virtual machine implementations to add information to the stack frame that is not described in the specification, such as information related to debugging and performance collection. This information depends entirely on the virtual machine implementation and is not detailed here. When discussing concepts, it is common to lump together dynamic connections, method return addresses, and other additional information called stack frame information.

Method calls

Method invocation is not equivalent to the code in the method being executed. The only task in the method invocation stage is to determine the version of the method being invoked (that is, which method to call). The specific running process inside the method is not involved for the time being.

1, parsing,

The target method of all method calls is a symbolic reference in a constant pool in the Class file. During the parsing phase of the Class load, some symbolic references are converted into direct references. This resolution can be valid only if: Methods have a determinable invocation version before the program actually runs, and the invocation version of this method is immutable at runtime. The invocation of this type of method is called Resolution.

In the Java language in line with the “immutable” compile time, the run time this requirement, the method of static methods and main private method two kinds, the former directly associated with the type, the latter cannot be accessed outside, which determines the characteristics of two kinds of methods are impossible through inheritance or other ways to rewrite the other version, They are therefore suitable for parsing during class loading.

Different types of methods are called, and different instructions are designed in the bytecode instruction set. Java virtual machines support the following five methods to invoke bytecode instructions:

  • Invokestatic. Used to call static methods.
  • Invokespecial. Use to call instance constructor () methods, private methods, and methods in parent classes.
  • Invokevirtual. Used to call all virtual methods.
  • Invokeinterface. An object that implements the interface is determined at run time.
  • Invokedynamic. The method referenced by the call point qualifier is resolved dynamically at run time and then executed. The dispatch logic for the previous four invokes is hardwired into the Java VIRTUAL machine, whereas the dispatch logic for the InvokeDynamic directive is determined by the user-specified bootstrapped method.

As long as the method can be invoked by invokestatic and Invokespecial instructions, it can determine the unique invocation version in the parsing stage. In Java language, there are four methods that meet this condition: static method, private method, instance constructor and parent method. Together with the final modified method ** (even though it is called using the Invokevirtual directive), these five method calls resolve symbolic references to direct references to the method at class load time.

The parse call is always a static process, fully determined at compile time, and all symbolic references involved are turned into explicit direct references during the parse phase of the class load without having to be deferred to runtime. The other main form of method invocation, Dispatch invocation, is much more complex. It can be static or dynamic and can be single Dispatch or multiple Dispatch based on the number of cases on which it is dispatched. These two types of dispatch are paired together to form four dispatch combinations: static single-dispatch, static multi-dispatch, dynamic single-dispatch, and dynamic multi-dispatch. Let’s take a look at how method dispatch works in virtual machines.

2, the dispatch

Static dispatch

Listing 8-6 demonstrates method static dispatch

package org.fenixsoft.polymorphic;

/** * method static dispatch demo *@author zzm
 */
public class StaticDispatch {

    static abstract class Human {}static class Man extends Human {}static class Woman extends Human {}public void sayHello(Human guy) {
        System.out.println("hello,guy!");
    }

    public void sayHello(Man guy) {
        System.out.println("hello,gentleman!");
    }

    public void sayHello(Woman guy) {
        System.out.println("hello,lady!");
    }

    public static void main(String[] args) {
        Human man = new Man();
        Human woman = new Woman();
        StaticDispatch sr = newStaticDispatch(); sr.sayHello(man); sr.sayHello(woman); }}Copy the code

Running results:

hello,guy!
hello,guy!
Copy the code

We call the “Human” in the code above “Static Type”, or “appearance Type”. The following “Man” is called the “Actual Type” or “Runtime Type” of the variable.

The virtual machine (or, more accurately, the compiler) is judged by the static type of the parameter when reloading, not by the actual type. Since static types are known at compile time, at compile time the Javac compiler determines which overloaded version to use based on the static type of the argument, so sayHello(Human) is selected as the call target. Write symbolic references to this method to the arguments of the two Invokevirtual directives in the main() method.

All dispatch actions that rely on static types to determine the version of a method’s execution are called static dispatch. The most typical application of static dispatch is method overloading. Static dispatch occurs at compile time, so the action to determine static dispatch is not actually performed by the virtual machine.

Note that while the Javac compiler can determine an overloaded version of a method, in many cases this overloaded version is not “unique” and often only determines a “relatively suitable” version. Its static type can only be understood and inferred from the rules of language and syntax.

2. Dynamic dispatch

It is closely related to Override, another important manifestation of Java language polymorphism.

Listing 8-8 shows a method for dynamic dispatch

package org.fenixsoft.polymorphic;

/** * method dynamic dispatch demo *@author zzm
 */
public class DynamicDispatch {

    static abstract class Human {
        protected abstract void sayHello(a);
    }

    static class Man extends Human {
        @Override
        protected void sayHello(a) {
            System.out.println("man say hello"); }}static class Woman extends Human {
        @Override
        protected void sayHello(a) {
            System.out.println("woman say hello"); }}public static void main(String[] args) {
        Human man = new Man();
        Human woman = new Woman();
        man.sayHello();
        woman.sayHello();
        man = newWoman(); man.sayHello(); }}Copy the code

Running results:

man say hello
woman say hello
woman say hello
Copy the code

How does the Java virtual machine decide which method to call?

This starts with the invokevirtual command. The runtime resolution process of the Invokevirtual command is roughly divided into the following steps:

  1. Find the actual type of the object pointed to by the first element at the top of the operand stack. Call it C.
  2. If a method is found in type C that matches both the descriptor and the simple name in the constant, access is checked. If it passes, a direct reference to the method is returned, and the search process ends. Not through the return Java. Lang. IllegalAccessError anomalies.
  3. Otherwise, search and verify the second step of each parent class of C from bottom to top according to the inheritance relationship.
  4. If didn’t find the right way, it throws the Java. Lang. AbstractMethodError anomalies.

Because the first step in the invokevirtual directive execution is to determine the actual recipient type at runtime, the Invokevirtual directive in both calls does not end up resolving symbolic references to methods in the constant pool to direct references. Instead, the invokevirtual directive selects method versions based on the actual type of method recipients. This process is the essence of method rewriting in the Java language. We call this dispatch process, which determines the version of method execution at run time based on the actual type, dynamic dispatch.

Are fields polymorphic in Java?

Listing 8-10 has no polymorphism in the fields

package org.fenixsoft.polymorphic;

/** * fields do not participate in polymorphism *@author zzm
 */
public class FieldHasNoPolymorphic {

    static class Father {
        public int money = 1;

        public Father(a) {
            money = 2;
            showMeTheMoney();
        }

        public void showMeTheMoney(a) {
            System.out.println("I am Father, i have $"+ money); }}static class Son extends Father {
        public int money = 3;

        public Son(a) {
            money = 4;
            showMeTheMoney();
        }

        public void showMeTheMoney(a) {
            System.out.println("I am Son, i have $"+ money); }}public static void main(String[] args) {
        Father gay = new Son();
        System.out.println("This gay has $"+ gay.money); }}Copy the code

After running, the output result is:

I am Son, i have $0
I am Son, i have $4
This gay has $2
Copy the code

The Father constructor calls showMeTheMoney() as a virtual method. The Father constructor calls showMeTheMoney() as a virtual method. The actual version executed is the Son::showMeTheMoney() method, so the output is “I am Son”. The Son::showMeTheMoney() method calls the money field of the child class, and the result is still 0 because it will not be initialized until the child class’s constructor executes. The last sentence of main() accesses money in the parent class via static typing, printing 2.

3. Single dispatch and multiple dispatch

The receiver of a method and its parameters are collectively called the arguments of a method.

Listing 8-11 Single dispatch and multi-dispatch

/** * single dispatch, multiple dispatch demo *@author zzm
 */
public class Dispatch {

    static class QQ {}
    static class _360 {}

    public static class Father {
        public void hardChoice(QQ arg) {
            System.out.println("father choose qq");
        }

        public void hardChoice(_360 arg) {
            System.out.println("father choose 360"); }}public static class Son extends Father {
        public void hardChoice(QQ arg) {
            System.out.println("son choose qq");
        }

        public void hardChoice(_360 arg) {
            System.out.println("son choose 360"); }}public static void main(String[] args) {
        Father father = new Father();
        Father son = new Son();
        father.hardChoice(new_360 ()); son.hardChoice(newQQ()); }}Copy the code

Running results:

father choose 360
son choose qq
Copy the code

Our first concern is the compiler selection process at compile time, which is the process of static dispatch. In this case, the target method is selected based on two points: first, the static type is Father or Son, and second, the method parameter is QQ or 360. The final result of this selection is two Invokevirtual directives. The two directives are symbolic references to the Father::hardChoice(360) and Father::hardChoice(QQ) methods in the constant pool. Because the selection is based on two cases, ** The Java language’s static dispatch is of the multi-dispatch type **.

Consider the selection of virtual machines at run time, which is the process of dynamic dispatch. When executing the “son.hardChoice(new QQ())” line, or more precisely, when executing the corresponding invokevirtual directive, since the compile time has determined that the target method signature must be hardChoice(QQ), At this time, the VM does not care whether the parameter “QQ” is “Tencent QQ” or “Chery QQ”, because the static type and actual type of the parameter do not affect the method selection. The only factor that can affect the vm selection is whether the actual type of the method recipient is Father or Son. Dynamic dispatch in the Java language is of the single-dispatch type because there is only one case to choose from. **

4, virtual machine dynamic dispatch implementation

Dynamic dispatch is do very frequent, and the dynamic dispatch method version when selecting process needs to run in the receiver type of method metadata search suitable target method, therefore, the Java virtual machine implementation based on considerations of performance, real runtime does not generally so frequently to repeatedly search type metadata. In this case, a basic and common optimization approach is to create a Virtual Method Table (vtable) for the type in the Method area. Interface Method Table (itable) is also used for invokeInterface execution, using virtual Method Table indexes instead of metadata lookups to improve performance. Let’s take a look at listing 8-11 for an example of the virtual method table structure, as shown in Figure 8-3.

Figure 8-3 Method table structure

The virtual method table stores the actual entry address of each method. If a method is not overridden in a subclass, the address entry in the virtual method table of the subclass is the same as the address entry of the same method in the parent class, pointing to the implementation entry of the parent class. If a subclass overrides this method, the address in the subclass’s virtual method table is replaced with the entry address pointing to the subclass’s implementation version.

The virtual method table is initialized during the connection phase of class loading. After the initial variable values of the class are prepared, the virtual machine initializes the virtual method table of the class.

Dynamically typed language support

Invokedynamic instruction, a new addition that is one of the improvements in JDK 7’s project goal of implementing Dynamically Typed Language support.

Dynamically typed languages

What is a dynamically typed language? The key feature of dynamically typed languages is that the main process of type checking takes place at run time rather than compile time. The most commonly used statically typed languages are C++ and Java.

The fact that variables have no type but values have type is also a core feature of dynamically typed languages.

Statically typed languages can determine variable types at compile time. The most significant benefit of a statically typed language is that the compiler can provide comprehensive and rigorous type checking so that potential problems related to data types can be found at code time, which benefits stability and makes it easier for projects to scale up. Dynamically typed languages, on the other hand, are typed at runtime, which can provide great flexibility for developers. Some functions that would take a lot of bloat code to implement in statically typed languages can be done in dynamically typed languages with clarity and simplicity, and clarity and simplicity usually mean improved development efficiency.

Java and dynamic typing

Support for dynamically typed languages at the Java virtual machine level has always been lacking, mainly in the area of method calls: The bytecode instruction set prior to JDK 7, The first parameter of the four method invocation instructions (Invokevirtual, Invokespecial, Invokestatic, invokeInterface) is the symbolic reference of the invoked method (CONSTANT_Methodref_info or CONSTANT_) InterfaceMethodref_info constant), as mentioned earlier, symbolic references to methods are made at compile time, whereas dynamically typed languages can only determine the recipient of a method at run time.

Providing direct support for dynamic typing at the Java virtual machine level has become an imperative for the Development of the Java platform. This is the technical background for the InvokeDynamic directive and java.lang.Invoke package in the JSR-292 proposal in JDK 7.

3, Java. Lang. Invoke the package

The main purpose of this package is to provide a new mechanism for dynamically determining the target Method of a call, called a “Method Handle,” rather than relying solely on symbolic references to determine the target Method.

After having method handles, the Java language can also have tools such as function Pointers or delegate method aliases. Listing 8-12 illustrates the basic usage of method handles. The println() method can be called correctly regardless of obj type (temporarily defined ClassA or system.out, the implementation class that implements the PrintStream interface).

Listing 8-12 shows the method handle

import static java.lang.invoke.MethodHandles.lookup;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodType;

/** * JSR 292 MethodHandle basic usage demo *@author zzm
 */
public class MethodHandleTest {

    static class ClassA {
        public void println(String s) { System.out.println(s); }}public static void main(String[] args) throws Throwable {
        Object obj = System.currentTimeMillis() % 2= =0 ? System.out : new ClassA();
        // The following statement calls println correctly no matter which implementation class obj ends up being.
        getPrintlnMH(obj).invokeExact("icyfenix");
    }

    private static MethodHandle getPrintlnMH(Object reveiver) throws Throwable {
        // MethodType: represents "MethodType" and contains the return value of the method (the first parameter to MethodType ()) andConcrete arguments (the second and subsequent arguments to methodType()). MethodType mt = MethodType.methodType(void.class, String.class);
        The // lookup() method comes from methodhandles.lookup, which looks for a given method in a given className, method type, and method handle that matches the calling authority.// Because this is a virtual method, the first argument of the method is implicit, according to the rules of the Java language, representing the connection of the methodThe receiver, that is, the receiverthisObject, which used to be passed in the argument list, and now provides the bindTo() method to do this.return lookup().findVirtual(reveiver.getClass(), "println", mt).bindTo(reveiver); }}Copy the code

The getPrintlnMH() method emulates the execution of the Invokevirtual directive, and the return value of the method itself (a MethodHandle object) can be considered a “reference” to the final method called.

From the above examples, using MethodHandle isn’t too difficult, but after looking at its use, the reader might wonder if reflection could have done the same thing already. Indeed, just from the perspective of the Java language, MethodHandle has a lot in common with Reflection in its use and effects. However, they also have these differences:

  • Both Reflection and MethodHandle mechanisms essentially simulate method calls, but Reflection simulates method calls at the Java code level, while MethodHandle simulates method calls at the bytecode level. The three methods on MethodHandles.Lookup, findStatic(), findVirtual(), and findSpecial(), correspond to invokestatic, Invokevirtual (and invokeInterface), and invokeInterface The execution permission verification behavior of the several bytecode instructions invokespecial, and these low-level details are not needed when using the Reflection API.
  • Reflection of Java. Lang. Reflect the Method object is far better than MethodHandle mechanism in Java. Lang. Invoke. MethodHandle object contains more information. The former is a comprehensive image of the method on the Java side, including the method signature, descriptor, and Java side representation of various properties in the method property sheet, as well as runtime information such as execution permissions. The latter only contains information about executing the method. In developer colloquial terms, Reflection is the heavyweight and MethodHandle is the lightweight.
  • Since MethodHandle is a simulation of bytecode method instruction calls, it is theoretically possible to implement similar optimizations (such as method inlining) in MethodHandle (although the implementation is still being improved). It is almost impossible to implement various call-point optimization measures directly by calling methods through reflection.

4. Invokedynamic instructions

In a sense, invokedynamic instruction and MethodHandle mechanism play the same role. Both of them aim to solve the problem that the original 4 “Invoke *” instruction method dispatching rules are completely fixed in the VIRTUAL machine, transferring the decision of how to find the target method from the virtual machine to the specific user code. Give users (broadly defined, including designers of other programming languages) greater freedom. Moreover, the idea of both is analogous, with the same goal, but one is done using upper-layer code and apis, and the other is done using bytecode and other attributes and constants in Class. Therefore, it is not difficult to understand the Invokedynamic instruction if you understand the previous MethodHandle example.

Stack – based bytecode interpretation engine

1. Explain execution

Most of the program code has to go through the steps shown in Figure 8-4 before it can be converted into object code for the physical machine or an instruction set for execution by the virtual machine.

Figure 8-4 Compilation process

2. Stack based instruction set and register based instruction set

What’s the difference between a stack-based instruction set and a register-based instruction set? To take the simplest example, using the two instruction sets separately to compute the result of “1+1”, a stack-based instruction set would look like this:

iconst_1
iconst_1
iadd
istore_0
Copy the code

After the two iconst_1 instructions successively push the two constants 1 onto the stack, the iADD instruction pushes the top two values off the stack, adds them together, and then puts the results back on the top of the stack. Finally, istore_0 puts the top value into the 0th variable slot of the local variable table. The instructions in this kind of instruction flow usually take no parameters, using the data in the operand stack as the input of the operation of the instruction, and the operation result of the instruction is also stored in the operand stack. With a register-based instruction set, the program might look like this:

mov  eax, 1
add  eax, 1
Copy the code

The MOV instruction sets the value of the EAX register to 1, and the add instruction increases the value by 1. The result is stored in the EAX register. These two-address instructions are the mainstay of the x86 instruction set, each containing two separate input parameters and relying on registers to access and store data.

The main advantage of stack-based instruction sets is portability, because registers are provided directly by the hardware [inset], and programs that rely directly on these hardware registers are inevitably constrained by the hardware.

The main disadvantage of stack instruction sets is that the execution speed is relatively slow in theory, as all major physical machine instruction sets are register architecture [illustration]. However, the execution speed here is limited to the state of interpretation execution, if the real-time compiler output into the physical machine assembly instruction stream, it does not matter what kind of instruction set architecture virtual machine.

3. Stack-based interpreter execution process

To understand this, look at the code in Listing 8-17.

Listing 8-17 shows a simple arithmetic code

public int calc(a) {
    int a = 100;
    int b = 200;
    int c = 300;
    return (a + b) * c;
}
Copy the code

Use the Javap command directly to look at its bytecode instructions, as shown in Listing 8-18. Listing 8-18 Bytecode representation of a simple arithmetic code

public int calc(a);
    Code:
        Stack=2, Locals=4, Args_size=1
         0:   bipush  100
         2:   istore_1
         3:   sipush  200
         6:   istore_2
         7:   sipush  300
        10:  istore_3
        11:  iload_1
        12:  iload_2
        13:  iadd
        14:  iload_3
        15:  imul
        16:  ireturn
}
Copy the code

Javap suggests that this code needs a depth of 2 operand stack and a local variable space with four variable slots. Based on this information, seven images from Figure 8-5 to Figure 8-11 are drawn to describe the changes in the code, operand stack and local variable table during the execution of Listing 8-13. Figure 8-5 Executing the instruction whose offset address is 0

The Bipush instruction pushes the one-byte integer constant value (-128 to 127) to the top of the operand stack, followed by an argument that specifies the constant value to be pushed, in this case 100. Figure 8-6 Executing the instruction whose offset address is 1

Execute the instruction at offset 2. Istore_1 removes the integer value from the top of the operand stack and stores it in the first local slot. The next four instructions (up to offset 11) all do the same thing, which is to assign variables A, B, and C to 100, 200, and 300 in the corresponding code. The diagrams of these four instructions are omitted. Figure 8-7 Executing the instruction whose offset address is 11

Execute the instruction at offset address 11. The iloAD_1 instruction copies the integer value in the first slot of the local variable table to the top of the operand stack. Figure 8-8 Executing the instruction whose offset address is 12

Execute the instruction with offset address 12. Iload_2 executes the same procedure as ILoAD_1, pushing the integer value of the second slot. The main purpose of this illustration is to show the stack before the next IADD instruction is executed. Figure 8-9 Executing the instruction whose offset address is 13

Execute the instruction with offset address 13. The iADD instruction takes the first two top elements of the operand stack off the stack, adds integers, and pushes the result back onto the stack. After the iADD command completes, the original 100 and 200 are removed from the stack, and their sum and 300 are reloaded. Figure 8-10 Executing the instruction whose offset address is 14

Execute the instruction with offset address 14. Iload_3 pushes 300 in the third local slot onto the operand stack. The operand stack is then two integers, 300. The next instruction, IMul, takes the first two top elements of the operand stack off the stack, multiplies them by integer, and pushes the result back onto the stack, exactly like IADD.

Figure 8-11 Executing the instruction whose offset address is 16

Executes an instruction with offset address 16. The iReturn instruction is one of the method return instructions that terminates method execution and returns the integer value at the top of the operand stack to the method caller. That’s it. This method is done.