The process of loading a Class file into memory, validating, converting, parsing, initializing, and finally forming a directly usable Java type is called the Class loading mechanism of a virtual machine.

In Java. Types are loaded, linked, and initialized not at compile time, but at run time, unlike in other static languages. So there is a loss of performance overhead, but great flexibility because Class files can be loaded not only from disk, but also from network streams, or even in-memory (dynamically generated Class files). Because no loading occurs at compile time, new types can be added at run time indefinitely.

The timing of class loading

The life cycle of a type is divided into seven categories: load, validate, prepare, parse, initialize, use, and unload.

Verification, preparation and parsing can also be called linking.

Loading is the process of loading a bytecode file from IO or memory into memory; Initialization is the process of class initialization using <clinit>(), which is different from calling the constructor; Use means literally; To unload is to remove a type from the method area.

The five stages of load -> verify -> Prepare -> initialize -> uninstall start order is fixed, emphasis here, because these stages may cross, but the start order must be fixed, not necessarily the parsing stage, it may be after the initialization, depending on the actual situation.

The loading of another class doesn’t have to start at any time, so you can load it after you use it, and if it references another class, you can load it again, chain reaction.

For the initialization phase, there are clear provisions, there are only the following six:

  • 1️ when using new, putstatic, getstatic, invokestatic. For example, when an object is instantiated using the new keyword; Static fields are read or set (fields that are set to final are not counted because they are placed in the class file constant pool). When a static method is called.
  • 2 discount a reflection call to a type using the Reflect package may need to initialize the type first.
  • 3 discount ️ If the parent class of this type is not initialized during initialization, the parent class initialization is triggered.
  • 4 One Main class is initialized on JVM startup.
  • 5 discount if a class inherits an interface that has a default method, the class initialization will trigger the initialization of the interface that has a default method.
  • 6 ⃣ ️ if a Java. Lang. Invoke the MethodHandle instance analytical results of the final REF_getStatic, REF_putStatic, REF_invokeStatic, REF_newInvokeSpecial handle four kinds of methods , and the class corresponding to the method handle has not been initialized, it needs to be initialized first.

These six methods are also known as active references to types. Since there are active references, there will be passive references.

1 ONE typical passive reference scenario is that using subclasses to refer to static fields of parent classes will only trigger parent class initialization, but not trigger subclass initialization.

2 one is that when using object array, creating a new object array will not trigger the initialization of reference type;

3 one more is that a reference to a constant value (a field of static final modifications) will not trigger initialization as the constant is placed in the constant pool.

Class loading process

Class loading is divided into five stages: loading, validation, preparation, parsing, and initialization.

  • 1 ️ loading is to load memory, nothing to say;
  • 2 discount ️ verification mainly verifies the validity of the loaded Class file;
  • 3 preparation stage of ️ is responsible for the process of allocating space and assigning zero value to class variables (note that ⚠️ is not an instance variable), i.e., those variables that are modified as static (note that ⚠️ is zero value, not initial value);
  • 4 Analysis of ️ is the process of replacing symbolic reference with direct reference;
  • 5 Initial ️ is the process of calling < Clinit >() method to initialize and assign initial value.

loading

During the load phase, the virtual machine does several things:

  • 1 one ️ Gets the binary byte stream that defines the class by its fully qualified name.
  • 2 one ️ Convert the static storage structure described by byte stream into the runtime data structure of method area.
  • 3 one ️ Generate the corresponding java.lang.Class object in memory as the access point to various data of this Class.

Class loaders do all three of these steps, so you can also customize class loaders to do this. The class loader is responsible for the loading of classes, which is the first process of class loading. Please don’t get confused. These words are very similar.

Because the first step has no restrictions on where to load the binary stream from, developers can play around with it, and dynamic proxy technology relies on that; There is no restriction on the source of binary byte streams that are not array types, and classes can also be loaded using custom class loaders.

One thing to mention here is the loading of array types. If the component type of an array class (int[] -> int, Integer[] -> Integer) is a reference type, the array type will be loaded using the component type loader, otherwise it will be loaded using the bootstrap class loader.

When the binary streams are loaded, they are arranged in the methods area in a certain order according to the Settings of the VIRTUAL machine. You then set up clazz objects of type java.lang.Class as data access points.

validation

The main purpose of verification is to ensure that these class files do not compromise virtual machine security.

Validation is basically divided into four aspects, the first will load the byte stream into the method area, the last three are based on the method area validation:

  • 1 one ️ file format verification. Verify that the file format is a. Class file and that the file version is supported by the current VIRTUAL machine. These are the most basic tests.
  • 2 metadata validation. This phase is mainly about semantic verification of the metadata information of the class, such as whether the final class is inherited, whether the interface or the abstract method of the parent class is implemented, and whether the method overwriting is not allowed occurs.
  • 3 ️ bytecode verification. This is the most complex phase, involving analysis of the data flow and control flow, including analysis of the Code attribute in the Class file, known as method body analysis. For example, if bytecode wants to operate on an int but there’s a long or double in the operand stack, or if an instruction jumps somewhere else and visits something it shouldn’t, or if there’s a casting problem, like assigning a superclass object (which is an object rather than a reference, which is different from polymorphism) to a subclass type.
  • 4 Discount ️ symbol reference verification. In plain English, this is to see if the class refers to a type, data, that it should not refer to.

Once the validation phase is successful, each subsequent call to the class is successful. If you are sure of the classes in your project (there are no malicious types), you can turn off the validation phase, which is time-consuming, and because you only need to validate once, you can turn off the validation once it passes and use it every time.

To prepare

As the name suggests, prepare class variables. Class variables prior to JDK7 are allocated to the method area (immortal generation), and Class objects are stored in the heap after JDK8. Class variables are variables that are static. Instance variables are allocated to the heap as the instance is initialized.

After allocating space for class variables, it is to assign zero value, as for each type of zero value, presumably everyone knows, will not say.

parsing

The parsing phase is the process of replacing symbolic references in the constant pool with direct references. Let’s look at what a symbolic reference is and what a direct reference is.

Symbolic reference: A set of symbols used to describe the object being referenced. Symbols can be any form of literal, but must be unambiguous.

Direct reference: can be an object’s memory address, a handle to the object, or an offset.

There is no limit to when to parse; you can generally parse when using this symbol.

In general, parsing parsing can be divided into four types:

  • 1 Analysis of classes/interfaces on ️. Given that the current class is D and the symbol N is resolved to a direct reference to class or interface C, the following happens: if C is not an array type, the fully qualified name represented by N is passed to the classloader for loading; If it is an array type, load the array element type in the first way.
  • 2 Field analysis on ️. First find the type of the field, let’s say C, and then use the interface/field parsing method just described. Then in C, C implements the interface, C inherits the parent class of the field to find the simple name and field descriptor match the target, throw the exception.
  • 3 Analysis on method of ️. First find the type that the method belongs to, and then balabala a bunch like field resolution.
  • 4 Analysis of interface method on ️. First get the interface method type C, then look in C, if not found, look in C’s parent interface.

Initialize the

Program control does not pass to the user program until the initialization phase, when class variables and other resources are initialized according to the code logic. Alternatively, the initialization phase is the process of calling < Clinit >().

< Clinit >() is automatically generated by the JVM by collecting assignment operations for class variables and code in the static{} statement block. The order of collection is related to the order of writing in the source file, so there is no “forward reference”.

In addition, unlike <init>(), <clinit>() does not explicitly call the parent constructor, which ensures that the subclass’s <clinit>() has already been called by the time it is called.

In addition, <clinit>() is not required unless there are statements such as class variable assignment. The < Clinit >() call of an interface differs from the call of a class in that the call of a subinterface does not trigger the < Clinit >() of the parent interface unless the parent interface is required. The same goes for the implementation class of the interface, and the implementation class call may not trigger the <clinit>() of the interface.

Java must ensure that <clinit>() is synchronized in order to be safe in a multi-threaded environment, where time-consuming operations in <clinit>() can block multiple threads. Also, <clinit>() will only be called once, and other threads will not call <clinit>() after one thread has called.

Class loader

A type and its classloader together determine uniqueness. If two classes are the same but have different loaders, they are different. This is one of the advantages of the parent delegate mechanism.

Parent delegation mechanism

In Java, there are only two kinds of loaders. One is the BootstrapClassLoader, which is implemented in C++ and belongs to the JVM. The other part is a custom class loader, implemented in Java and independent of the JVM. If you’ve used the Unsafe class, Unsafe can only be used by the classes loaded by Bootstrap, which limits its use and ensures its safety.

Java has maintained a three-tier class loader, parent-delegate model since JDK2.

  • 1 BootstrapClassLoader (one ️) is mainly responsible for loading the standard library code under the lib package, and user programs cannot directly use Bootstrap.
  • 2 ExtensionClassLoader used to load the code under the /lib/ext package, developers can use directly.
  • 3 application ️ ApplicationClassLoader is used to load user-written code, which is also the default loader for loading user code and can be used directly.
  • 4 one ️ custom class loader is used to realize the special class loading function that programmers want.

Classloaders are usually composed rather than inherited.

The parent delegate model is one in which the bottom class loader delegates the actual loading to the upper class, and if the upper class can’t do it, it does it itself, and if it can’t, it throws an exception. The nice thing about this is that the same class can always be loaded by the same loader, no matter how many different loaders there are across the system, because they all eventually map to the top. Plus the principle of comparing type consistency is type + loader, so you can guarantee that the same type is always the same.