Project code: github.com/zexho994/gv…

The overall VM process

Graph LR n1 (Java code) -- - > | javac | n2 (class bytecode) -- - > | class loader | n3 inner class object (virtual machine) - > analytical | | n4 interchange (klass and oop object) -- - > | | to explain instructions n5 (interpreter)

Different types of VMS

There are two types of virtual machine execution architectures in terms of how interpreters are implemented.

  1. Based on stack structure
  2. Register-based

The JVM family of virtual machines are basically stack based, stack based implementation is a little simple, high portability. Register-based virtual machines perform similarly to cpus and perform faster, such as V8. Lua virtual machines changed from stack virtual machines prior to 5.0 to register virtual machines at 5.0

Lars Bak is the author of V8, one of hotspot, and the author of the Dart language.

Stack the virtual machine

A piece of Java code

public void foo(){
	int a = 1;  
    int b = 2;  
    int c = a*b;
}
Copy the code

In different types of virtual machines, the code is first compiled differently. In a stack-based virtual machine, the compiled instruction is

public void foo(); Code: 0: iconST_1 // Store constant 1 to operand stack 1: istore_1 // pop to local variable table 1 2: iconST_2 3: istore_2 4: ILoAD_1 // Local variable table index 1 number push to operand stack 5: Iload_2 6: iadd // Add two numbers at the top of the operand stack 7: iconst_5 8: imul // multiply two numbers at the top of the operand stack 9: istore_3 10: returnCopy the code

Register virtual machine

Similar in style to assembly, since the CPU is also essentially a register-based interpreter. The biggest advantage of register virtual machine is good performance, support random access, compared to stack virtual machine instruction execution is very troublesome.

In register-based virtual machines:

Add ax bx // where ax register is 1, bx register is 2, put the result into AXCopy the code

Vm type choice

Why did the JVM choose to use a stack virtual machine? There are many reasons, historically speaking:

  • In stack virtual machine, the platform independence of instruction is good.
  • One of the advantages of stack virtual machines is that the instructions are shorter, using only 1 byte, whereas register-based ones require extra storage addresses, typically 2 bytes. At the time, memory was more important.
  • James Gosling is familiar with this approach (he previously implemented the PostScript virtual machine).

Impact on Java

Compiler reorder, one of the instruction reorders, is a means of IR optimization called expression lift, expression sink.

One reason for optimization is to consider the nature of the stack.

a=1;         b=2;
b=2;         a=1;
c=a+1;  ==>  c=a+1;
Copy the code

Methods to perform

The stack frame

A stack frame is the unit of execution for a method, and each method corresponds to one stack frame.

Struct {framePC uint nextFrame *Frame *LocalVars *OperandStack *klass.MethodKlass *Thread}Copy the code

The virtual machine stack

The virtual machine stack is thread-private and is used to hold stack frames. When a method is ready to execute, the stack frame push proceeds, complete

Type Thread struct {... *Stack // Stack frame... } // struct {... // Stack maximum size maxSize uint // stack maximum size uint // top *Frame... }Copy the code

Local variable table and operand stack size

How to determine the size of the local variable table and operand stack? In the code property table, MaxStack represents the maximum depth of the stack, and MaxLocals represents the maximum depth of the local variable table.

Type Attr_Code struct {NameIdx uint16 name string AttrLen uint32 cp constant_pool.ConstantPool // Maximum depth of method operand stack at any point in time, At compile time you can determine the size of the MaxStack uint16 // Parameter description MaxLocals Uint16 codeLen uint32 Code []byte // ExceptionTable []*ExceptionTable // Attribute table attrCount Uint16 attrInfo AttributesInfo }Copy the code

The sizes of the two values are determined at compile time, stored in bytecode, and finally retrieved during class loading.

func (c *Attr_Code) parse(reader *classfile.ClassReader) {
	c.MaxStack = reader.ReadUint16()
	c.MaxLocals = reader.ReadUint16()
	c.codeLen = reader.ReadUint32()
	c.code = reader.ReadBytes(c.codeLen)
	c.ExceptionTable = parseExceptionTable(reader)
	c.attrCount = reader.ReadUint16()
	c.attrInfo = ParseAttributes(c.attrCount, reader, c.cp)
}
Copy the code

The method call

Invoke instruction cluster

  • Invokeinterface: Invokes interface methods
  • Invokespecial: Call instance methods, (parent method, private method, instance initialization method)
  • Invokevirtual: invokevirtual methods, which are dispatched based on the type of instance
  • Invokestatic: Invokes static methods
  • Invokedynamic: invokes dynamic methods (new in java7, supports method invocation in dynamic languages)

Call execution logic

public class Invokevirtual {
    
    public static void main(String[] args) {
        Invokevirtual invokevirtual = new Invokevirtual();
        int res = invokevirtual.additive(1, 2);
        GvmOut.to(res);
    }

    public int additive(int x, int y) {
        int r = x + y;
        return r;
    }

}
Copy the code

How are function calls implemented in Java?

The main method’s code instruction stream:

Additive method code instruction flow:

Static binding versus dynamic binding

Many articles on the web explain that overloading is statically bound (polymorphic at compile time) and overwriting is dynamically bound. That’s not exactly true, and overrides can be overridden by subclasses, again, at run time.

To be precise, static binding in the Java Virtual machine refers to a situation where the target method is directly recognized at parsing time, while dynamic binding refers to a situation where the target method needs to be recognized at run time based on the dynamic type of the caller. Generalization is

  • A static binding knows exactly what method to call only at compile time.
  • Dynamic binding means that the method to be called is not known until runtime.

In the invoke command, Invokestatic corresponds to a static binding (using the static modifier), while Invokevirtual and InvokeInterface are dynamically bound.

I’ll do it in code

The logic of invokestatic

func (i *INVOKE_STATIC) Execute(frame *runtime.Frame) { cp := frame.Method().CP() contantMethod := cp.GetConstantInfo(i.Index).(*constant_pool.ConstantMethod) className := contantMethod.ClassName() perm := jclass.GetPerm() class := perm.Space[className] if class == nil { class = jclass.ParseInstanceByClassName(className) } name, _type := contantMethod.NameAndDescriptor() methodInfo, err := class.FindStaticMethod(name, // Get the target method if err! = nil { panic("[gvm]" + err.Error()) } if ! jclass.IsStatic(methodInfo.AccessFlag()) { panic("[gvm] invoke static error") } methodInfo.SetJClass(class) base.InvokeMethod(frame, methodInfo, true) } func (j JClass_Instance) FindStaticMethod(name, Descriptor string) (*MethodInfo, error) {// Get for I := range j.methodinfo {MethodInfo := j.methothods [I] if! IsStatic(methodInfo.accessFlag) { continue } mName := j.ConstantPool.GetUtf8(methodInfo.nameIdx) mDesc := j.ConstantPool.GetUtf8(methodInfo.descriptorIdx) if name ! = mName || mDesc ! = descriptor { continue } return j.Methods[i], nil } return nil, exception.GvmError{Msg: "not find static method it name " + name} }Copy the code

The logic of invokevirtual

func (i *INVOKE_VIRTUAL) Execute(frame *runtime.Frame) { constantMethod := frame.Method().CP().GetConstantInfo(i.Index).(*constant_pool.ConstantMethod) methodNameStr, methodDescStr := constantMethod.NameAndDescriptor() exception.AssertTrue(methodNameStr ! = "<init>" && methodNameStr ! = "<clinit>", "IncompatibleClassChangeError") classNameStr := constantMethod.ClassName() permSpace := jclass.GetPerm().Space jc := permSpace[classNameStr] if jc == nil { jc = jclass.ParseInstanceByClassName(classNameStr) } exception.AssertTrue(jc ! //***** ***** methodInfo, err, _ := jc.findMethod (methodNameStr, methodDescStr) exception.AssertTrue(err == nil, "no find the method of "+methodNameStr) exception.AssertFalse(jclass.IsStatic(methodInfo.AccessFlag()), "IncompatibleClassChangeError") if jclass.IsProteced(methodInfo.AccessFlag()) { // todo if is proteced , need to judge the relation between caller and called } base.InvokeMethod(frame, methodInfo, false) } func (j *JClass_Instance) FindMethod(name, descriptor string) (*MethodInfo, error, *JClass_Instance) { for i := range j.Methods { methodInfo := j.Methods[i] if IsStatic(methodInfo.accessFlag) { continue } mName := j.ConstantPool.GetUtf8(methodInfo.nameIdx) mDesc := j.ConstantPool.GetUtf8(methodInfo.descriptorIdx) if mName == name && mDesc == Descriptor {return j.methods [I], nil, j}} jc := j.SuperClass.FindMethod(name, descriptor) if err == nil { return m, nil, For I := range j.issn faces {m, err, jc := j.issn faces[I]. descriptor) if err == nil { return m, nil, jc } } return nil, exception.GvmError{Msg: "not find method it name " + name}, nil }Copy the code

Brief introduction of JNI method

There are some scenarios where Java itself does not have the ability to do this. The most common is when system calls are made using JNI (Java Native Interface).

For example, Selector in NIO

private native int poll0(long var1, int var3, long var4);
Copy the code

unsafe.cas

public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5);
Copy the code

This is essentially calling a function from another place, and the linking is done by the virtual machine. GvmOut is the implementation of Go based Print method when invoke is invoked, and a lot of extended operations can be carried out using native methods.

public class GvmOut{
    public native static void to(int i);
    public native static void to(float i);
    public native static void to(double i);
    public native static void to(boolean i);
    public native static void to(long i);
    public native static void to(String i);
}
Copy the code

Reference:

  • Stack vs. register?
  • Virtual Machine Talk (I) : interpreter, tree traversal interpreter, stack based and register based, hodgepodge – Script Ahead, Code Behind – ITeye blog

– “Virtual Machine Specification (SE8)” – “Do some Virtual Machines yourself”