This is the sixth day of my participation in the November Gwen Challenge. See details: The Last Gwen Challenge 2021.

The virtual machine loads the data (Class files) that describe the classes into memory, verifies, transforms, parses, and initializes them, and eventually forms Java types that can be referenced directly. This is the class loading mechanism of the virtual machine. For the format of Class files, see this article:Java Class loading mechanism in detail.

According to the Java Virtual Machine Specification (Java SE8), the Loading process of a class or interface can be divided into Loading, Linking, and Initialization. Linking can be divided into: Verification, Preparation, and Resolution.

Loading is the process of finding binary representations of a class or interface type based on a particular name and creating a class or interface from that binary representation. Linking is the process of incorporating a class or interface into the running state of the Virtual machine so that it can be properly executed by the Java Virtual machine. Initialization of a class or interface refers to the execution of the initialization method of the class or interface < Clinit >.

The order of the five stages of load, validation, preparation, initialization and uninstallation is determined. Parsing doesn’t have to be, and in some cases it starts after initialization, in order to support runtime binding (also known as dynamic binding or late binding) in the Java language. In addition, the above phases are usually intermixed, often invoking and activating another phase during the execution of one phase.

Here’s how the virtual machine does it each step of the way!

Class loading timing

If a Class is not an array Class, it can create the Class by loading the corresponding Class (binary) file (after the previous steps) through the Class loader. Because there are no external binaries for array types, they are loaded inside the virtual machine rather than through the loader.

The virtual machine specification specifies that classes must be “initialized” in only five cases, and of course the three pre-initialization phases (load, validate, and prepare) must begin before then. The five scenarios that must be initialized are as follows:

  1. When you encounter four bytecode instructions — New, getstatic, putstatic, or Invokestatic — if the class is not initialized, you need to trigger its initialization first. Common scenarios for these four instructions are instantiating objects with the new keyword, reading or setting static fields of a class (except static fields that are final and have been put into the constant pool at compile time), and calling static methods of a class. Note: Static content is associated with the class, not the object of the class.
  2. When a reflection call is made to a class using the java.lang.Reflect package’s methods, initialization needs to be triggered if the class has not already been initialized. When a reflection call is made to a class using the java.lang.Reflect package’s methods, initialization needs to be triggered if the class has not already been initialized. Note: The reflection mechanism is in the running state, for any class, can know all the attributes and methods of that class; For any object, you can call any of its methods and properties; This ability to dynamically retrieve information and dynamically invoke methods on objects is called the Reflection mechanism of the Java language, which is relatively easy to understand why classes need to be initialized.
  3. When initializing a class, if the parent class has not been initialized, the initialization of the parent class must be triggered first. Note: The parent constructor must be executed before the subclass can execute the constructor.
  4. When the virtual machine starts, the user needs to specify a primary class (the one containing the main() method) to execute, and the virtual machine initializes this primary class first. Note: The main method is the execution entry of the program
  5. When using JDK1.7 dynamic language support, if a Java lang. Invoke. The final analytical results REF_getStatic MethodHandle instance, REF_putStatic, REF_invokeStatic method handles, And the class to which the method handle corresponds has not been initialized. You need to trigger its initialization first. Note: a new reflection mechanism in JDK1.7 is a dynamic operation on classes.

The behavior in these five scenarios is called an active reference to a class, literally, where a programmer actively references a class and if the class is not initialized, initialization is triggered first.

Otherwise, referring to a class without initialization is called a passive reference, such as referring to a static field of a parent class by subclass, referring to a class by array definition, or calling a static constant field of a class directly.

1.1 case

1. Here is an example of a static field that references a parent class by subclass:

public class SuperClass {
    static {
        System.out.println("SuperClass init!");
    }

    public static int value = 123;
}

class SubClass extends SuperClass {
    static {
        System.out.println(Ttf_subclass init! ""); }}class NotInitialization {
    public static void main(String[] args) {
        System.out.println(SubClass.value);
        //System.out.println(SubClass.class); // Initialization takes place
        //SubClass subClass = new SubClass(); // Initialization takes place}}Copy the code

Result: For static fields, only the class that directly defines the field is initialized. Note: The subclass has already been loaded by the system, but is not in the initialization phase.

TraceClassLoading: -xx :+TraceClassLoading:

2. Referencing a class through an array definition does not trigger initialization of the class

The virtual machine initializes a SuperClass array class, generated automatically by the virtual machine, by executing the Newarray bytecode, without using the classloader
public class TestArrayNotInitialization {
    public static void main(String[] args) {
        SuperClass[] sca = new SuperClass[10]; }}Copy the code

Call class static final

public class ConstClass {
    static {
        System.out.println("ConstClass init!");
    }
    public static final String HELLOWORLD = "hello world";
}
class TestStaticFinalNotInitialization {
    public static void main(String[] args) { System.out.println(ConstClass.HELLOWORLD); }}Copy the code

Add vm arguments: – XX: + TraceClassLoading start again, you can see by the load class does not include ConstClass, actually this constant is optimized at compile time, in the constant pool TestStaticFinalNotInitialization class.

Use javap -v TestStaticFinalNotInitialization. Class decompiled TestStaticFinalNotInitialization class class files, from which you can see, The main method references a string that points to the fourth constant in the string constant pool of its own class:

According to the fourth and 25 constants can see, constant ConstClass field is in the constant pool TestStaticFinalNotInitialization class.

Class loading process

2.1 loading

Note: Loading is a phase of the “class loading” process. During the load phase, the virtual machine needs to do three things:

  1. Gets the binary byte stream that defines a class by its fully qualified name. Note that the binary byte stream in article 1 here is not just retrieved from Class files; for example, it can be retrieved from Jar packages, retrieved from the network (most typically applets), generated from other files (JSP applications), and so on.
  2. Transform the static storage structure represented by this byte stream into the runtime data structure of the method area. The method area, like the Java heap, is an area of memory shared by threads to store information about classes that have been loaded by the virtual machine, constants, static variables, just-in-time compiled code, and so on. The format of the data store structure in the method area is customized by the VM.
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area. Instantiate an object of the java.lang.Class Class in Java memory (not explicitly in the Java heap, but in the case of the HotSpot VIRTUAL machine, the Class object is special because it is an object but stored in the method area) so that the data in the method area can be accessed through that object.

The loading stage of an array class is different, and as we know from “passive reference example 2” above, the application of an array class does not initialize the class. Instead, the virtual machine uses the bytecode instruction “Newarray” to create an “[Object” Object. Next, let’s look at the rules for the array class loading process:

  1. If the component type of the array is a reference type (not a base type), load the component type recursively.
  2. If the array component type is not a reference type (such as an int[] array), the Java virtual machine will mark the array as associated with the boot class loader.
  3. The visibility of an array class is the same as the visibility of its component type. If the component type is not a reference type, the visibility of the array defaults to public.

Compared to other phases of class loading, the loading phase (specifically, the action of the loading phase to retrieve the binary byte stream of the class) is the most controllable, because developers can either use the system-provided class loader to complete the loading, or they can customize their own.

The first thing in the class loading phase, “getting the binary byte stream that defines a class by its fully qualified name,” is done by starting the class loader. Although the class loader is only used to implement the loading action of the class, but it plays a role in the Java program is far from limited to the loading stage of the class. For any Class, the uniqueness of the Java virtual machine must be determined by its classloader and the Class itself. That is, even if two classes come from the same Class file, as long as they are loaded by different classloaders, the two classes must be different. Equality includes the return of equals (), isAssignableFrom (), isInstance () on the Class objects representing the Class, as well as the determination of object ownership using the instanceof keyword.

Stages of loading and the connection part of the content, such as part of the bytecode file format validation action) was performed by cross loading phase is not yet complete, may have begun connection phase, but those in the middle of the loading stage, still belongs to the content of connection phase, the two stages of start time still maintained a fixed order, That is, it must be loaded before it can be verified.

2.2 validation

Verification is the first step in connection. The purpose of the verification phase is to ensure that the byte stream of the Class file contains the information required by the current virtual machine and that the Java virtual machine is not vulnerable to malicious code attacks. On the whole, the verification stage will roughly complete the following four stages of the check action: first, file format verification; Second, metadata verification; Third, bytecode verification; Fourth, symbol reference verification.

  1. Verify that the byte stream complies with the Class file format specification and can be processed by the current version of the virtual machine. For example, whether to start with a magic number 0xCAFEBABE; Check whether the major and minor versions are within the processing range of the current VM. Whether the constants in the constant pool have unsupported constant types (check the constant tag); Whether the various index values that point to constants point to non-existent constants or do not conform to the type of constants; CONSTANT_Utf8_info whether the constant does not conform to UTF8 encoding data; Whether any other information has been deleted or attached to each part of the Class file and the file…
  2. Metadata verification to ensure that there is no metadata information that does not conform to Java language specifications. For example, whether this class has a parent (all classes except java.lang.Object should have a parent); Whether the parent of this class inherits classes that are not allowed to be inherited (classes modified by final); If the class is not abstract, whether it implements all methods required in its parent class or interface; If a field or method ina class conflicts with the parent class (for example, overwriting a final field in the parent class, or overloading a method that does not conform to rules, such as the same method parameters but different return value types, etc.)…
  3. Bytecode authentication to ensure that the methods of the verified class do not cause events that harm VM security when running. For example, ensure that the data type of the operation stack and the sequence of instruction code work together at any time. For example, do not have the situation that a data of type int is placed in the operation stack, but is loaded as a local variable by type long. Ensure that jump instructions do not jump to bytecode instructions outside the method body; Ensure that type conversions in the method body are valid; Bytecode validation is the most time consuming phase of the entire validation phase, but it does not guarantee absolute security.
  4. Symbolic reference validation occurs when the virtual machine converts symbolic references to direct references, in the “parse” phase, ensuring that it can be performed properly in subsequent “parse” phases. Whether the corresponding class can be found for fully qualified names described by strings in symbolic references; Whether field descriptors that match methods and methods and fields described by simple names exist in the specified class; The accessibility of classes, fields, and methods in symbolic references (private, protected, public, default) is accessible to the current class; A noSuchClass exception is thrown if our code fails symbolic validation, such as if the corresponding class cannot be found based on the fully qualified name.

2.3 to prepare

The preparation phase is the formal allocation of memory for class variables to set the initialization values of class variables, and the memory used by these variables will be allocated in the method area.

Only class variables (static modified variables) are allocated, not instance variables, which are allocated in the Java heap along with the object when it is instantiated. Public static int value = 123; public static int value = 123; The value initialized after the preparation phase is 0 instead of 123, because it has not yet started executing any Java methods. The putStatic instruction that assigns value to 123 is compiled and stored in the class constructor () method. So assigning value to 123 will only be performed during initialization.

As mentioned above, the preparation phase usually assigns a value of zero, but there are special cases, such as a constant modified by final, in the preparation phase assigns the value specified by value, as follows:

public static final int value = 123;
Copy the code

In the compiled Class file, the constant field has the ConstantValue property, stored in the constant pool. The ConstantValue property holds the specific value of the constant, which is assigned to the specific value during preparation (not caused by Java bytecode). If it is not a constant, then it will be assigned at the initialization stage (in the case of methods, caused by bytecode, such as LDC, putStatic bytecode instructions). For more information about the Class file structure and the ConstantValue property, see: Java Class file structure.

2.4 analytical

The parsing phase is where the virtual machine replaces symbolic references to the constant pool directly with direct references.

A symbolic reference describes the referenced object as a set of symbols, which can be any literal, as long as they are used to unambiguously locate the object. The target referenced by the symbol is not necessarily loaded into memory. A direct reference can be a pointer to a target, a relative offset, or a handle that points directly to the target. Direct references are related to the memory layout implemented by the virtual machine.

It is common to parse requests for the same symbolic reference multiple times, and virtual machine implementations can cache the results of the first parse (logging direct references in the runtime constant pool and identifying constants as resolved) to avoid repeating the parse action. But for invokedynamic instructions, this rule does not hold.

Parse actions refer to class or interface symbols, fields, class methods, interface methods, method types, method handles, and call point qualifiers. The following describes the parse process of the first four types of references:

2.4.1 Class or interface resolution

Assuming that the current code is of class D, if the virtual machine resolves a symbolic reference N that has never been resolved into a direct reference to a class or interface C, there are three steps to complete the parsing process:

If C is not an array type, the virtual machine will pass the fully qualified name representing N to D’s classloader to load C. During the loading process, due to the requirements of metadata verification and bytecode verification, loading actions of other related classes may be triggered, such as loading the parent class or implementing the interface of this class. As soon as any exception occurs during the load process, the parsing process is declared to have failed.

If C is an array type and the element type of the array is an object, that is, the descriptor of N is something like “[Ljava/lang/Integer”, then the element type of the array will be loaded according to the rules in point 1. If N is described in the form assumed earlier, the element type to load is “java.lang. Integer”, and the virtual machine generates an array object representing the dimensions and elements of the array: “[Ljava/lang/Integer” (Array references refer back to class Loading Timing – Passive Reference Demonstration ii above).

If nothing goes wrong in the above steps, then C is actually a valid class or interface in the virtual machine, but symbolic reference validation is required to verify that D has access to C before parsing is complete. If it is found that do not have access, will throw Java. Lang. IllegalAccessError anomalies.

2.4.1 Field Parsing

To resolve an unparsed symbol reference to a field, the CONSTANT_Class_info symbol reference of the index in the class_index entry in the field table is first resolved, that is, the symbol reference of the class or interface to which the field belongs. When the field is parsed, the class is first searched to see if it contains the field whose simple name and field descriptor match the target. If so, the search ends. If no, the interface implemented by the class and its parent interface are recursively searched from top to bottom according to the inheritance relationship. If no, the parent class is recursively searched from top to bottom according to the inheritance relationship until the search is complete.

2.4.1 Class method analysis

Class method parsing is similar to the search steps for field parsing, but there are more steps to determine whether the method is in a class or an interface, and the matching search for class methods is to search the parent class first and then the interface.

2.4.1 Interface method Analysis

Similar to the class method resolution step, the knowledge interface does not have a parent class, so you just recursively search up the parent interface.

2.5 the initialization

The initialization phase is when you actually start executing the Java program code defined in the class. In the preparation phase, class variables have already been assigned the required initial values by the system. In the initialization phase, class variables and other resources are initialized according to a subjective plan specified by the programmer through the program, or can be expressed another way: the initialization phase is the process of executing the class constructor < clinit>() method.

Here’s a quick explanation of how the < clinit>() method executes:

  1. < clinit > method () method is by the compiler automatically collect all of the class variables in the assignment of movement and static block statements consolidation, produced by the compiler to collect the order by the order of the statement in the source file is determined by, static block can only access to the definition in the static block variables before and after its variables, In the previous static statement, you can assign, but not access.
  2. The < clinit>() method, unlike the instance constructor < init>() method (the constructor of the class), does not explicitly call the superclass constructor, and the virtual machine guarantees that the () method of the superclass will complete before the < clinit>() method of the subclass executes. Therefore, the first class for the < Clinit > method to be executed in the virtual machine must be java.lang.object.
  3. The < clinit>() method is not required for a class or interface. If a class/interface has no static block, no assignment to a class variable, or the class declares a class variable, However, the compiler may not generate the < Clinit >() method for this class without explicitly using a class variable initializer or static initializer or if the class contains only static final variables and the class variable initializer is a compile-time constant expression.
  4. Static blocks cannot be used in the interface, but assignments to final static variables are still initialized, so the interface generates the < Clinit >() method just as the class does. But interfaces differ from classes in that the < clinit>() method of the executing interface does not need to execute the < clinit>() method of the parent interface first. The parent interface is initialized only when a variable defined in the parent interface is used. In addition, the implementation class of the interface does not execute the interface’s < Clinit >() method when initialized.
  5. The virtual machine ensures that a class’s < Clinit >() methods are locked and synchronized correctly in a multithreaded environment. If multiple threads initialize a class at the same time, only one thread will execute the class’s < Clinit >() methods, and all the other threads will block and wait. Until the active thread completes executing the < clinit>() method. Lengthy operations in a class’s < clinit>() method can cause multiple threads to block, which in practice is often hidden.

2.5.1 Execution Sequence Cases

class Father {
    public static int a = 1;
    static {
        a = 2; }}class Child extends Father {
    public static int b = a;
}
public class ClinitTest {
    public static void main(String[] args) { System.out.println(Child.b); }}Copy the code

Executing the code above prints 2, which means that the value of b is assigned 2. Let’s look at the steps to get that result. Memory is allocated and initial values are set for class variables in the preparation phase so that both A and B are assigned A default value of 0, and then given the values specified in the program when the < clinit> () method is called. When we call child.b, the Child’s < clinit> () method is triggered. According to rule 2, the < clinit> () method of the Father class is executed before the < clinit> () method is executed. According to Rule 1, the < clinit> () method is executed. The related static statements need to be executed in the same order as the static statements or static variable assignment operations appear in the code, so when the Father < clinit> () method is triggered, a is assigned to 1 and then the static statement in the block assigns a to 2. The < clinit> () method of the Child class is then executed, which assigns b a value of 2.

Public static int a = 1; public static int a = 1; Statement and “static block”, which prints 1 after the program executes. Father’s < clinit> () method executes the contents of the static block first, and then executes “public static int a = 1; Statements.

Also, after reversing the order of the two, if a call is made to a in a static block (such as assigning a to a variable), an error will be reported at compile time because, according to rule 1, it can only assign to a, not access it.

2.5.1 Demonstrating how to initialize deadlock

/** * to initialize deadlock */
public class InitLock extends Thread {
    private String name;

    private InitLock(String name) {
        this.name = name;
    }

    @Override
    public void run(a) {
        try {
            setName(name);
            Class.forName("com.ikang.JVM.staticfiled." + name);
            System.out.println("init " + name + " is ok!");
        } catch(ClassNotFoundException e) { e.printStackTrace(); }}public static void main(String[] args) throws InterruptedException {
        // Thread 1: initializes ClassA first, request initializing ClassB inside ClassA
        InitLock a = new InitLock("ClassA");
        a.start();
        // Wait for thread 2 to complete initialization of ClassB and ClassA, and then try initialization again without deadlock.
        Thread.sleep(2000);
        // Thread 2: initializes ClassB first, request initializing ClassA inside ClassB
        InitLock b = new InitLock("ClassB"); b.start(); }}class ClassA {
    static {
        try {
            // To initialize ClassB in ClassA, you must hold ClassA and ClassB
            Class.forName("com.ikang.JVM.staticfiled.ClassB");
            System.out.println("ClassB is ok!");
        } catch(ClassNotFoundException e) { e.printStackTrace(); }}}class ClassB {
    static {
        try {
            // To initialize ClassA in ClassB, you must hold ClassB and ClassA
            Class.forName("com.ikang.JVM.staticfiled.ClassA");
            System.out.println("ClassA is ok!");
        } catch(ClassNotFoundException e) { e.printStackTrace(); }}}Copy the code

Using JPS and JStack, you can see that both threads are waiting, and that deadlock has actually occurred, but the thread state has not changed and is RUNNABLE, thus misleading the developer.

Related articles:

  1. In-depth Understanding of the Java Virtual Machine

If you need to communicate, or the article is wrong, please leave a message directly. In addition, I hope to like, collect, pay attention to, I will continue to update a variety of Java learning blog!