JVM class loading mechanism

The JVM is actually cross-platform and cross-language. Any Bytecode that conforms to the Java Virtual machine specification can be parsed and executed by the JVM. In the case of Java, we develop.java files and compile them into.class files using Javac. This is a Bytecode that the JVM can read and parse and then the VIRTUAL machine translates the Bytecode into a virtual machine instruction, which is then loaded and executed by the execution engine.

1. Klass model

Objects of the Klass class are represented by the JVM as a Java class.

  1. Static data types There are eight data types built into the JVM
  2. Dynamic data types are dynamically generated at runtime

Example:

    int[] arr = new int[5];
    Object[] arr2 = new Object[5];
Copy the code
  • Int [] virtual machine instruction for newarry 解释 : allocate a new array of data members of basic data types TypeArrayKlass the existence of an array of basic data types in a virtual machine

  • ObjectArrayKlass The existing form of an array of reference data types in a VM

2. Class loading process

When a type is loaded into vm memory and unloaded from memory, Its entire life cycle will go through Loading, Verification, Preparation, Resolution, and initialization Initialization, Using and Unloading. The three parts of Initialization, preparation and parsing are collectively known as Linking.

Next, we’ll take a closer look at the entire process of class loading in a Java virtual machine, namely the specific actions performed in the five phases of load, validation, preparation, parsing, and initialization.

loading

JVM class loading is a lazy mode, meaning that it is not loaded until it is first used actively. The following 7 cases will be actively loaded.

  1. When you encounter four bytecode instructions — New, getstatic, putstatic, or Invokestatic — if the type has not been initialized, you need to trigger its initialization phase first. Typical Java code scenarios that can generate these four instructions are:
  • When an object is instantiated using the new keyword.
  • Read or set a static field of a type (except for static fields that are modified by final and have been put into the constant pool at compile time).
  • When a static method of a type is called.
  1. When a reflection call is made to a type using the java.lang.Reflect package’s methods, initialization needs to be triggered if the type has not already been initialized.
  2. When initializing a class, if the parent class has not been initialized before, the parent class needs to be initialized first.
  3. When the virtual machine starts, the user needs to specify a primary class (the one containing the main() method) to execute, and the virtual machine initializes this primary class first.
  4. When using the new dynamic language support in JDK 7, If a Java lang. Invoke. The final solution analysis results for MethodHandle instance REF_getStatic, REF_putStatic, REF_invokeStatic, REF_newInvokeSpecial handle four kinds of methods, If the class corresponding to the method handle has not been initialized, it needs to be initialized first.
  5. When an interface defines a new JDK 8 default method (an interface method decorated with the default keyword), if any of the interface’s implementation classes are initialized, the interface should be initialized before it.

validation

  1. File format verification:
  • Does it begin with the magic number 0xCAFFBABA
  • Check whether the major and minor versions are acceptable to the current VM
  1. Metadata validation
  • Whether this class has a parent (all classes except java.lang.Object should have a parent).
  • Whether the parent of this class inherits classes that are not allowed to be inherited (classes modified by final).
  • If the class is not abstract, does it implement all the methods required by its parent or interface?
  1. Bytecode verification
  • Ensure that the data type of the operand stack and the sequence of instruction codes work together at any time. For example, there is no such thing as “putting an int on the operand stack and loading it into the local variable table as long”.
  • Ensure that no jump instruction jumps to a bytecode instruction outside the method body.
  • Ensure the method body type conversion is always effective, for example, can put a subclass object assignment to the parent class data type, which is safe, but the parent class object is assigned to a subclass data types, even the object assignment give it no inheritance relationships, and completely irrelevant to a data type, is dangerous and illegal.
  1. Symbolic reference verification
  • Whether a class can be found for a fully qualified name described by a string in a symbol reference.
  • Whether a field descriptor for a method and methods and fields described by a simple name exist in the specified class.

To prepare

Stage is formally defined as a class variable (namely static variables, by static modified variable) allocates memory and set up the class variables in the stage of initial value, conceptually, the memory used by these variables should be distributed in the method, but must pay attention to the method of area itself is a logical area, before the JDK 7 and, When HotSpot uses persistent generation to implement method areas, the implementation is entirely logical; In JDK 8 and later, Class variables were stored in the Java heap along with Class objects, so “Class variables in the method area” was a logical concept.

At this stage, the initial assignment to the static variable is completed; Non-static variables are created and assigned during initialization. If a static variable is modified by final, the ConstantValue attribute is added at compile time, and the assignment is done directly in preparation, i.e. the initial value is not assigned.

parsing

The parsing phase is the process by which the Java VIRTUAL machine replaces symbolic references in the constant pool with direct references, and the parsed information is stored in the ConstantPoolCache class instance.

Symbolic References: Symbolic References describe the referenced target as a set of symbols, which can be any literal, as long as they are used to unambiguously locate the target. Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily something that has been loaded into the virtual machine’s memory. Virtual machine implementations can have different memory layouts, but the symbolic references they accept must all be consistent, because the literal form of symbolic references is explicitly defined in the Class file format of the Java Virtual Machine Specification.

Direct References: A Direct reference is a pointer that can point directly to the target, a relative offset, or a handle that can be indirectly located to the target. A direct reference is directly related to the memory layout implemented by the VIRTUAL machine. The direct reference translated from the same symbolic reference on different virtual machine instances will not be the same. If there is a direct reference, the target of the reference must already exist in the virtual machine’s memory.

1. Class or interface resolution 2. Field resolution 3. Method resolution 4

Initialize the

The initialization phase of a class is the last step in the class loading process. In addition to the user application’s partial participation in the loading phase through custom class loaders, the rest of the class loading actions are completely controlled by the Java VIRTUAL machine. It is not until the initialization phase that the Java virtual machine actually begins to execute the Java program code written in the class, handing control to the application.

During the preparation phase, variables have already been assigned the initial zero value required by the system, while during the initialization phase, class variables and other resources are initialized according to a subjective plan made by the programmer through the program code. We can also express this in a more direct way: the initialization phase is the process of executing the class constructor () method. () is not a method written directly by a programmer in Java code. It is automatically generated by the Javac compiler, but it is important to understand how this method is generated and the details of how () method execution may affect the behavior of the program. This part of the class loading process is closer to the actual work of the average program developer than any other.

() method is by the compiler automatically collect all class variable assignment in class action and static blocks (static {} block) of the statement in merger, the compiler collection order is decided by the order of the statement in the source file, static block can only access to the definition in the static block variables before and after its variables, The previous static block can be assigned, but not accessed

Example code:

public class Test { 
    static { 
        i = 0; // Copy variables to compile correctly
        System.out.print(i); // The compiler will say "illegal forward reference"
    }
    static int i = 1; 
}
Copy the code

3. Class loading verification

  1. Code block and constructor execution order 1). Superclass static code block 2). Subclass static code block 3). Superclass code block 4). Superclass constructor 5). Subclass code block 6)

Example:

package cn.edu.cqvie.jvm;

public class Test_2 extends Test_2_A {
    static {
        System.out.println("Subclass static code block");
    }

    {
        System.out.println("Subclass code block");
    }

    public Test_2(a) {
        System.out.println("Subclass Constructor");
    }

    public static void main(String[] args) {
        newTest_2(); }}class Test_2_A {

    static {
        System.out.println("Superclass static code block");
    }

    {
        System.out.println("Parent code block");
    }

    public Test_2_A(a) {
        System.out.println("Parent class constructor");
    }

    public static void find(a) {
        System.out.println("Static method"); }}// Output result:
// Static block of superclass code
// Subclass static code block
// Parent code block
// Parent constructor
// Subclass code block
// Subclass constructor
Copy the code
  1. Class load Demo 1
public class Test_1 {
    public static void main(String[] args) { System.out.printf(Test_1_B.str); }}class Test_1_A {
    public static String str = "A str";

    static {
        System.out.println("A Static Block"); }}class Test_1_B extends Test_1_A {
    static {
        System.out.println("B Static Block"); }}// Output the result
//A Static Block
//A str

Copy the code

4. Read the implementation of static properties

The Java code is as follows:

public class Test_1 {
    public static void main(String[] args) { System.out.printf(Test_1_B.str); }}class Test_1_A {
    public static String str = "A str";

    static {
        System.out.println("A Static Block"); }}class Test_1_B extends Test_1_A {
    static {
        System.out.println("B Static Block"); }}Copy the code

How are static fields stored, there are two core concepts stored in instanceKlass before JVM 1.8 jdK1.8 is stored in instanceMirrorKlass only

The Test_1_A

The value of the static variable STR is stored in StringTable, and the mirror class holds Pointers to strings

In the Test_1_B

Test_1_B static field Test_1_A static field Test_1_A static field Test_1_A static field Test_1_A static field Test_1_A

Test_1_B = Test_1_B; If not, the request is thrown up the inheritance chain. Obviously, the performance of this algorithm increases with death of the inheritance chain, and the algorithm complexity is O(n).

2, with the help of another data structure, using k-V format storage, query performance is O(1)

Hotspot is used in the second way, with the help of another data structure, ConstantPoolCache, to which the ConstantPool class ConstantPool has an attribute _cache. Each piece of data corresponds to a class ConstantPoolCacheEntry.

Where’s ConstantPoolCacheEntry? After the ConstantPoolCache object, look at the code \ openJDK \hotspot\ SRC \share\vm\oops\ cpcache.hpp

ConstantPoolCacheEntry* base(a) const           { 
  return (ConstantPoolCacheEntry*)((address)this + in_bytes(base_offset()));
}
Copy the code

ConstantPoolCache The ConstantPoolCache is a run-time data structure reserved for constant pools. Holds interpreter runtime information for all field accesses and calls to bytecode. The cache is created and initialized before the class is actively used. How to read \ OpenJDK \hotspot\ SRC \ Share \vm\interpreter\bytecodeInterpreter. CPP

CASE(_getstatic):
        {
          u2 index;
          ConstantPoolCacheEntry* cache;
          index = Bytes::get_native_u2(pc+1);
          // QQQ Need to make this as inlined as possible. Probably need to
          // split all the bytecode cases out so c++ compiler has a chance
          // for constant prop to fold everything possible away.
          cache = cp->entry_at(index);
          if(! cache->is_resolved((Bytecodes::Code)opcode)) {
            CALL_VM(InterpreterRuntime::resolve_get_put(THREAD, (Bytecodes::Code)opcode),
                    handle_exception);
            cache = cp->entry_at(index); }...Copy the code

As you can see from the code, get ConstantPoolCacheEntry directly

5. Common tools

  1. Bytecode view idea plug-injclasslib

The effect is as follows:2. HSDB JDK built-in plug-in

  • use
#Run it in the JDK installation directory
#Note that the need to add a sudo may cause permissions issues on MacOS
sudo java -cp lib/sa-jdi.jar sun.jvm.hotspot.HSDB
Copy the code

  1. Javap commands

View bytecode and decompilation information

javap -v xxx
Copy the code
  1. Vm command query address