Hi, I’m Kunge

In engineering we’re dealing with objects almost all the time, so have you ever wondered where these objects come from, what happens when you create a new object?

I’m sure you’ve guessed today’s topic, yes, the classloading mechanism, which is important to understand, not only because it helps us understand how the JVM works, but also because it helps explain some of the phenomena that seem strange to us, such as the lazy singleton pattern below

public class Singleton {
  private Singleton(a) {}
  private static class LazyHolder {
    static final Singleton INSTANCE = new Singleton();
  }
  public static Singleton getInstance(a) {
    returnLazyHolder.INSTANCE; }}Copy the code

At first glance it might seem that multiple Singleton instances might be generated in a multithreaded environment. In fact, since class initialization is thread-safe and is executed only once, the program can ensure that there is only one Singleton instance in a multithreaded environment. This brings us to the Clinit method of class initialization, so you can see how important it is to understand the underlying mechanisms of class loading.

The mind map of this paper is as follows:

Introduction to the class loading mechanism

The overall class loading process is shown below, which is also the life cycle of the class

As you can see, bytecode files need to be loaded, linked (including validation, preparation, parsing), initialized before they can be converted to classes, and then objects can be created from classes

It is important to note that the figure in the red box represents load, validation, preparation, the initialization, unload the order of the five stages is certain, class loading must strictly according to the five stages of the order to start, but not necessarily, parsing stage could begin after initialization, mainly in order to support dynamic binding characteristics of Java, So what are the main things that are done at each stage

loading

During the load phase, the virtual machine needs to do three things

  1. Gets the binary byte stream of a class by its fully qualified name
  2. Convert the static storage structure represented by this byte stream to the runtime structure of the method area
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area

What is a Class object, such as getClass() or foo.class of an instance

Each class has only one object instance (object), multiple objects share the class object, there is one point to note is class object in the heap is not in the way (in this case is for Java 7 and later versions), all objects are allocated in the heap, class objects are objects, so also is allocated in the heap, quite many people confuse this web, A little bit of caution

Objects and classes are represented in a model called oop-klass, and each object or class has a corresponding C++ class representation. As you can see from the diagram below, instance objects and Class objects are linked in this way. We’ll talk more about oop- Klass objects in another article on the object model

Class meta information, that is, the information about a class, is mainly allocated in the method area. In Java 8, the method area is implemented in metaspace, so the class meta information is stored in the metaspace.

Note that although this stage is called loading, there is some verification work in the loading stage, mainly the following verification

  • File format verification: For example, verify whether the bytecode starts with the magic number 0xCAFEBABE and whether the primary and secondary versions are acceptable to the current VM. Only after passing the verification in this stage can the loaded byte stream be stored in the method area of the Java VM memory.

  • Metadata verification: This stage is mainly concerned with semantic analysis of the information described by bytecode, such as ensuring that every loaded class has a parent class in addition to Object. This means that once a class is loaded, its parent class, its ancestor class… Etc will also be loaded (but not yet linked, initialized)

One might wonder why you need to do all this verification work, and aren’t bytecode files secure? Generally speaking, bytecode files are compiled by normal Java compiler, but bytecode files can also be edited and modified, and may be tampered with and injected with malicious bytecode, which will cause unpredictable risks to the program, so verification in the loading stage is very necessary

The JVM arguments -verbose:class or -xx :+TraceClassLoading can be added to the Java program to observe the loading of the class. For example, we wrote the following test class

public class Test {
    public static void main(String[] args) {}}Copy the code

After compiling, execute Java -xx :+TraceClassLoading Test

You can see the following loading process

[Opened /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/rt.jar] [Loaded java.lang.Object from /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/rt.jar] [Loaded java.lang.CharSequence from /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/rt.jar] [Loaded java.lang.String from /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/rt.jar] ... / / Loaded the ellipsis said many lib/rt. The class under the jar [the Loaded Test the from file: / Users/because/practice /]...Copy the code

If you look at the second to last line, you can see that the Test class is loaded. This is understandable because Test is initialized and loaded when the main method of Test is executed (more on initialization conditions later).

To answer this question, we have to be clear about one question: who exactly does classloading?

Parental delegation pattern

Class loading must be done by the classloader. The classloader + the fully qualified name of the class (package name + class name) uniquely identifies a class.

You guessed right! Class loaders do have more than one, and why? They serve two main purposes: security and separation of responsibilities

First, security. Imagine if we had only one class loader. We might define a java.lang.virus class, which is under the same package name as core classes like java.lang. String. This class then has access to the package methods of these core classes, and if a user defines a java.lang.String class, it is possible to replace the original String class if it is loaded by the class loader, which obviously poses a huge security risk

As for the separation of responsibilities, the core classes under rt.jar package can be loaded directly without any special requirements, and since they are core classes, the program will be loaded as soon as it is started, and can also be further optimized to improve the loading speed, while some bytecode files may need to be encrypted due to decompilation and other reasons. The class loader will need to decrypt it when loading the bytecode file, such as to achieve the thermal deployment also need class loader loads from the specified directory file, if these functions are implemented in a class loader, will lead to the function of the class loader is very heavy, so the solution is to define multiple class loaders, each responsible for bytecode file under the specified path, In order to achieve the purpose of responsibility separation, relevant operations are done for the class file loading in the specified path

What class loaders are there in the JVM

There are three main types of loaders


  1. \lib > rt.jar, resources.jar and other core libraries or files specified by -xbootclasspath

  2. The Extension ClassLoader is responsible for loading all libraries in the

    \lib\ext directory or in the path specified by the java.ext.dirs system variable.

  3. The Application ClassLoader. The class loader is responsible for loading specified libraries on the user’s classpath, and we can use this class loader directly. In general, if we don’t have a custom class loader this is the default.

The primary role of the class loader is to load the binary stream of bytecode and eventually turn it into a class object in the method area

Now that we know that there are several classes of classloaders, there are three questions to answer:

  1. How do you specify that a class is loaded by the specified class loader?
  2. Class loaders can ensure the consistency of a class. The class loaders and the fully qualified name of the class are used to determine the consistency of a class. How to avoid a class being loaded by multiple class loaders
  3. The ClassLoader (java.lang.classloader) is used to load classes, but it is also a class, so who loads the ClassLoader

In order to solve the above problems, the class loader adopts the parental delegation model pattern to design the hierarchy of the class loader

What is the parental delegation pattern

Take a look at the overall design architecture diagram for the parent delegate pattern

As you can see, the program is loaded by AppClassLoader by default. Each class is cached after being loaded by the corresponding loader, so that the next time you encounter the relevant class can be directly fetched from the cache, avoiding repeated loading. At the same time, each class is only loaded by the corresponding class loader, ensuring the uniqueness of the class. For example, java.lang.Object is only loaded by the BootstrapClassLoader, ensuring that Object is unique

How does a class loader load a class?

  1. When a class is first loaded (assuming it is an ArrayList), AppClassLoader does not load it immediately. Instead, AppClassLoader delegates up to its parent, ExtClassLoader, to see if the class has been loaded. If not, delegate up to the BootsrapClassLoader for loading, at which point the BootsrapClassLoader loads the generated class object from lib/rt.jarAnd cache itBootsrapClassLoader then returns the object to ExtClassLoader, which in turn returns the object to AppClassLoader, and then creates an instance object of the class based on the object
  2. When new ArrayList() is called again, the ArrayList is loaded, and the AppClassLoader delegates the load to the BootsrapClassLoader. Since the BootsrapClassLoader already has such objects in its cache, So just look it up in the cache and recurse back to the AppClassLoader.

Again, question 3, by whom is the classloader loaded?

Both AppClassLoader and ExClassLoader are subclasses of java.lang.ClassLoader. They are loaded by the BootstrapClassLoader when the application is started. So these three class loaders must exist first, so who will load the BootstrapClassLoader, if it is still loaded by another class loader, then design a class loader to load it,… The BootstrapClassLoader itself is implemented in C++ in the JVM. It is part of the JVM and exists when the application is started. Therefore, the BootstrapClassLoader itself is created by the JVM and does not need to be loaded by another loader. So it’s also called the root loader

Java.lang core classes such as Object,String, and Class are important and often used, so BootstrapClassLoader will load them in advance when the application starts. AppClassLoader and ExClassLoader can also be loaded with core classes such as List, so rt.jar can also be loaded with core classes. Is that why we saw these core classes loaded before we saw the Test class loaded

Does class loading follow the parent delegate mechanism

No, a typical application scenario is the class loading of Tomcat. Since Tomcat may load multiple Web applications, there may be classes with the same package name and class name in multiple applications. The most typical example is that two applications use the same third-party class library, but their versions are different. In this case, if the parent delegate is used, there will be only one class object, which is obviously a problem. In this case, to distinguish the classes of each application, we need to break the parent delegate mechanism, as follows:

The green part is the class loader automatically generated by Tomcat when a Java project forms a WAR package. That is, tomcat automatically generates a class loader for each project to load the WAR package. When loading the classes in the WAR package, It is loaded by the webappClassLoader first, rather than delegating the load to the upper class loader first. In this case, because the webappClassLoader of each WAR package is different, the class object generated by each war package must be different! Isolation of classes between applications is achieved

Finally, it is important to note that not all classes need to be created by class loaders. For example, array classes are special, which are dynamically constructed by the Java virtual machine directly in memory. However, due to the nature of classes (class loaders + class fully qualified names uniquely identify a class), Array classes will still eventually be identified in the namespace of a loader, depending on the component type of the array (for example, int[] is int, String[] is String). If the component type is basic, such as int, Is identified under the bootstrapClassloader, or, if it is a reference of another type (such as a custom class Test, array Test), as the classloader that finally loads this class

The loading phase is really important, not only because it is the first phase of class loading, but also because it involves parental delegation and other principles. If you don’t understand it, I suggest you read it several times, and it should be more clear.

For example, if I have a Test class and I define a Test[] list = new Test[10] in the method; if I have a Test[] list = new Test[10] in the method, I define a Test[] list = new Test[10]. This will trigger the load of the Test class, but will not trigger the link of the Test class, initialization. So what are links and initializations

link

The link consists of three stages:

Validation, preparation, and parsing, which includes bytecode validation and symbol reference validation

There are two main types of bytecode validation, symbol reference validation

Bytecode verification

In this phase, the method body of the Class (Code attribute in the Class file) is verified and analyzed to ensure that the methods of the verified Class will not endanger the security of virtual machines during runtime, for example:

  • Ensure that any jump instruction does not jump to a bytecode instruction outside the method body

  • Ensure that class conversions are valid, such as assigning subclass objects to superclass variables, but not vice versa

  • .

Symbolic reference verification

This validation is in phase, symbol references can be thought of as the class itself outside the various accord with references in the reference (common pool) fit verification of all kinds of information, we know that if the call in the bytecode method or refer to a class, then the class is in the form of symbols referenced in the bytecode, So I want to make sure that when I actually use this class, I can find it. If I don’t find it, I will get an error. For a simple example, suppose I have two classes that will obviously pass at compile time. If the main method of A needs to load class B, an error will be reported because class B cannot be loaded because the b.lass file is missing

// B.java
public class B {}// A.java
public class A {
    public static void main(String[] args) {
        B b = newB(); }}Copy the code

Symbolic reference validation validates not only classes, but also methods, fields, and so on

Note that class validation is not required. If you are sure that your class files are perfectly safe, you can turn on -xVerify: None to disable class validation. This will speed up the class’s loading time.

To prepare

There are two main objectives for the preparation phase

  1. To allocate memory for static fields of the loaded class and assign them default initial values, such as 0 for static variables of type int

  2. Some Java virtual machines also construct other data structures related to the class hierarchy at this stage, such as method tables that are used to implement dynamically bound virtual methods.

parsing

As mentioned earlier, symbolic reference validation is performed in this stage. The main function is to parse the constant pool symbolic references in the bytecode class into direct references (specific addresses in memory) that can be located to the corresponding class information in the method area of memory at runtime. Take the above code as an example

// B.java

public class B {}// A.java

public class A {
	public static void main(String[] args) {
			B b = newB(); }}Copy the code

Class A bytecode file a.class contains A symbolic reference to B, so after executing the main method, the symbolic reference of B will be converted to A direct reference to the class object of B, because B is not loaded, so, So this will also trigger the loading of B to generate the class object of B, so that symbolic references can be converted to direct references. Here is a class resolution example, but in fact, constants, methods, fields and other symbolic references will also be resolved

However, it is important to note that this phase may occur after initialization, because parsing is only necessary when the method is actually used, such as calling a class method. If the method is not used at the time of initialization, parsing is not necessary at all

Initialize the

There are two main things to do in this stage

  1. Initialize a static variable and assign a value to it
  2. Execute static code block content

Both to initialize static variables and perform static code block, the Java compiler, they will be together in a method, called clinit and JVM to lock in place to ensure that this method will be executed only once, and only after initialization is complete, the class really become executable state, also need to pay attention to, Before a subclass’s Clinit is complete, the JVM ensures that the parent class’s Clinit is complete, which is easy to understand from an inheritance perspective. After all, a subclass inherits its parent class.

One important point to note here is that if a static variable is final and its type is primitive or string, the field is marked as a constant value and its initialization is done by the JVM rather than put into Clinit, such as the following class static variable

public class Test {
	private static final int field = 1;
}
Copy the code

Because this field is a constant value, it is not put into Clinit and is initialized by the JVM

So when does initialization happen? The Java Virtual Machine Specification specifies six situations in which classes must be initialized immediately

1. When a bytecode instruction such as New, getstatic, putstatic, or Invokestatic is encountered, if the class is not initialized, it needs to be initialized first. The most common Java code scenario that generates these four instructions is:

  • When an object is instantiated using the new keyword

  • Read or set static fields of a class (except static fields that are modified by final and have been put into the constant pool at compile time)

    If a subclass reads a static field from its parent class, the parent class will initialize it, but the subclass will not, as shown in the following code

    public class SuperClass{
        static {
            System.out.println("SuperClass init");
        }
        public static int value = 10;
    }
    
    public class SubClass extends SuperClass {
        static {
            System.out.println("SubClass init"); }}public class NotInitialization {
        public static void main(String[] args) { System.out.println(SubClass.value); }}Copy the code

    The output executed is

    SuperClass init
    10
    Copy the code

    As you can see, a subclass that gets a static variable from its child is initialized by the parent class, but the subclass itself is not initialized

  • When a static method of a class is called

If java.lang.reflect is used to reflect a class and its parent has not been initialized, the class initialization will be triggered first 4, when the virtual machine to start, need to specify a user to perform the main class (including the main method of that class), virtual opportunity to initialize the main class 5, when using jdk7 new dynamic language support, if a Java, lang. Invoke. MethodHandler The final parsing result of the instance is REF_getStatic,REF_putStatic,REF_invokeStatic,REF_newInvokeSpecial method handle. And the method to handle the corresponding class does not been initialized, so need to trigger the initialization. 6, (new) when an interface defines the default JDK8 new method (by default the keyword method of interface modification), if the interface implementation class initialization happens, so that the interface to before its initialization

7. When you first call a MethodHandle instance, initialize the class to which the method points.

The behavior of these six scenarios is called an active reference to a type. In addition, all reference types are in a way that does not trigger their initialization, called passive references

After reading these, I believe you can answer why the singleton pattern at the beginning is safe and feasible

Just to summarize

How to more popular understanding of load, links, initialize these stages, in fact, I often said before to understand the technical concepts, scene in life will be more easy to understand, for example, we want to build a house, you always drawings (bytecode file), after the construction according to the drawing processing (load) a house (class object), but the house is only a semifinished product, isn’t ready for people, Test[] list = new Test[10]; This array variable is just a loading reason. Since you didn’t call the Test method, there is no need to do the following steps. But if you want to move in after the house is built, first of all, it is a raw building and someone has to check the house (check in link). Or if there are conditions (such as the main wall knocked to become the dangerous house) this house doesn’t conform to the acceptance criteria must refuse, ok, after acceptance by that can begin to decorate, to A sofa, A television set aside space (prepare), now you’re just in the corresponding place tags, leave out A location to TV, B position appeared to the sofa, At this time is just made a symbolic reference, but what you really want to see the TV, not right now, so you have to go to buy loaded on the corresponding position, that is, when the house decorate after completion of initialization (complete), the house is available (that is, the class in the available state), can be delivered to the people. In addition, it is not hard to see that the resolution step can be put after initialization, just like, although you reserve the space for the TV, you can still stay without watching or buying the TV

Welcome to pay attention to public number: code sea, common progress ^_^