“What will be hot and what to learn in 2022? This article is participating in the” Talk about 2022 Technology Trends “essay campaign.

Technical advancements such as JVM loading phase details

Class loading phase overview

Note: Although the order of these phases is defined, they are often intermixed, invoked in one phase and activated in the other


loading

The load stage, as the name implies, is when the Class/interface represented by the Class file is loaded into the virtual machine.

So when (when the Class loads), who loads (which classloader), and which Class file (what is the name of the Class) to the virtual machine?


1. When does it load? The JAVA VIRTUAL Machine Specification does not enforce any restriction. Different VMS determine the loading time, that is, the loading time of different VMS is different.

2. Who loads it? To load through a class loader, a class must work with the class loader to determine its uniqueness. Since this article focuses on the class loading stage, the next article will describe the class loader.

3. Which Class file? The program is required to specify the fully qualified name of the class/interface (package name + class/interface name).


Therefore, it can be concluded that what is done in the loading stage:

Get the binary byte stream that defines a class by its fully qualified name

Convert the static storage structure represented by the byte stream to the runtime data structure of the method area. 3. Generate a java.ang.Class object representing the Class in the heap memory as the access point to the method area Class


Get binary stream

The virtual machine specification does not specify where or how to get it, only that it is ok to get the binary stream by the fully qualified name.

So we can generate from compressed packages (JARS), networks (Web applets), encrypted files (dynamic decryption on load), runtime generation (dynamic proxies)…. These paths are loaded by fetching the binary byte stream with the fully qualified name of the class.

Converts a byte stream to a runtime data structure

Converting a byte stream into a run-time data structure for the method area naturally requires some validation, namely file format validation in the validation phase.

If there are no problems with the file format validation phase, the data is then stored in the method area according to the data store format of the method area in the VIRTUAL machine (the data structure of the method area is not specified, so different virtual machines implement different structures).

As mentioned in the beginning, these stages are interlaced, but their order is definite. This makes sense. Without the binary byte stream obtained during the load phase, the validation phase does not know what to validate.

Class objects are generated in the heap

The steps below:


special

All of the above are non-array loading phases. Developers can use custom class loaders to retrieve binary byte streams to perform some operations.

In the case of array classes, arrays are not created by class loaders, but are dynamically constructed in memory by the virtual machine. But the element types of the array need to be loaded through the class loader.

Array element types are types with all dimensions removed, such as a one-dimensional array of type int and a two-dimensional array of type int, both of which have elements of type int. It can also be interpreted as dimensionally independent.

Array type loading:


The connection


validation

This step is to ensure that the byte stream in the Class file contains information that meets the requirements of the virtual machine specification and does not compromise the security of the virtual machine itself if it is run as code.

1. Why is a validation phase needed? As mentioned in the loading phase above, binary byte streams can come from a variety of sources, as well as handwritten zeros and ones. If these byte streams are not verified, they can crash the entire system by loading incorrect or malicious code.

Therefore, bytecode verification is a necessary stage, which determines the robustness of the virtual machine and makes the virtual machine less vulnerable to attack. Therefore, in terms of the amount of code and performance cost, the workload of the verification stage is very large in the class loading process.

2. What to verify?

The verification process is described in the JAVA Virtual Machine Specification (JAVA SE Version 7) in 130 pages, and can be searched for by interested readers. This article only covers the important parts of the verification process throughout the whole phase.

2.1. File format verification

This step was mentioned in the loading process above, and the file format of the Class needs to be verified when the loading phase stores byte stream data into a data structure in the method area.

After the file format is verified, the byte stream data is stored in the data structure in the method area, so the subsequent validation phase does not operate directly on the binary stream, but on the data structure in the method area.

Purpose: This step is to verify that the byte stream complies with the Class file format specification, ensuring that the byte stream data is properly parsed and stored in the method area data structure, and that the current virtual machine version can process it

Check whether magic is correct. 2. Check whether the major and minor versions are within the range that the current VM version can handle. Check whether a constant in the constant pool has an unsupported constant type (tag). 4. Check whether a constant refers to an index that does not exist in the constant pool


2.2. Metadata validation

The information described by bytecode is semantic analyzed to ensure that the information described conforms to the requirements of JAVA Language Specification

Simple understanding is to verify the metadata information of the class, such as the information check on the parent class, class field method definition, data type verification

Verification point: eg: 1. Does the class have a parent? 2. Does the parent of the class inherit from a class that is not allowed to be inherited (final modified classes cannot be inherited) 3. If the class is not abstract, has all the methods in the parent class or interface been implemented? Class fields, methods that conflict with the parent class (modified final fields of the parent class…)


2.3 Bytecode Verification

After the validation of metadata in the previous step, it’s time to validate the method body.

This step should be called the validation of the code property (code is the property of the method body in the class).

The most complicated step in the whole validation process, analyzing the semantics of the program through data flow and control flow, is legal and logical. Ensure that the method does not make errors or harm the virtual machine while running.

Verification point, eg: 1. Ensure that the data types of operations in the instruction code sequence of the data types of the operand stack are consistent, and the operand stack will not be int. When the instruction code is taken out, fetch 2 according to long. Jump instructions do not jump to bytecode instructions outside the method body. 3. Ensure that the type conversion is correct

Passing the Code attribute does not necessarily mean that the Code in the method body is safe. It is not possible to use a program to determine whether a program has a Bug (search for “downtime” for interested readers), a problem in discrete mathematics.

The above verification points are analyzed based on data flow and control flow. This method is too complex and the validation phase takes too long to execute, so after JDK6 we added a new attribute “StackMapTable” to the property list.

You can check this part out in my other blog post, which is also covered in this article:

“StackMapTable” attribute describes the method of each block of code in accordance with the control flow split at the beginning of the local variables and the operand stack should be state, simply be performing to a bytecode instruction (after break up the block of code) provides the performing to a bytecode instruction of local variables and the operand stack need to verify the data type.

In the past, the data type was derived from the type, and the type was judged to be consistent. Now it’s just a matter of verifying that the records stored in the StackMapTable attribute are valid, and then checking for type consistency without complicated derivation


2.4 symbol reference verification

This phase takes place before the parsing phase, which converts symbolic references to direct references

Verification point, eg: 1. Whether the corresponding class can be found for a fully qualified name described by string in a symbol reference.

2. Check whether the specified class has field descriptors and methods and fields described by simple names that match methods. 3. Accessibility of classes, fields, and methods in symbolic references (private, protected, public, package) is accessible to the current class.

The purpose of this phase is to ensure that normal execution parsing stage, if this error is can’t through the symbols referenced authentication, the virtual opportunity thrown Java lang. NoSuchFieldError, Java. Lang. NoSuchMethodError anomaly, etc.


To prepare

All you do in this stage is allocate memory for static variables, and then assign values (default values for normal static variables, plus final static variables).

Note that this phase is only for “static variables” (which can be accessed without creating objects), while for instance variables (which cannot be accessed without creating objects), memory allocation is deferred until the object is instantiated.

For ordinary static variables, the assignment is done in the initialization phase; Static variables that are final are already assigned at this stage. This is thanks to the ConstantValue attribute, which you can see in the previous property sheet structure.

For example,

public static int value = 123;

public static final int value1 = 123;
Copy the code

After the preparation phase, value is 0; Value1 is 123.

Here is:

Static variables themselves should be stored in the method area, which was implemented in the permanent generation before JDK7; But after JDK7, method areas are in direct memory, while static variables are allocated in heap memory along with Class objects. So class variables in the method area can be said this way before JDK7, but not after JDK7.


parsing

This stage is the process of converting symbolic references to direct references

Attention to class loading phase is the phase can be directly converted into direct reference of this step is called static link, because some of the methods, fields can be in the compiler can determine (such as: overloaded methods, this is can determine which method calls at compile time, because according to the determined parameters of the length and the type of; When I say parameter types, I mean the static type of a method, not the actual type. But the parsing phase can only convert symbolic references that are determined at run time to not change, most of which are dynamic links, such as polymorphisms, overrides these, and at compile time you don’t know which method to call or which field to use, so it takes runtime to determine.

Noun explanation

Symbolic reference:

  1. Use symbols to describe the target of the reference, symbols can be any form of literal, just need to be able to quasigroup to locate the target on the line;

  2. A symbolic reference exists in a Class file, while a direct reference is a memory address. So for symbolic references, you just need to ensure that the loaded target is guaranteed.

For example, a field in the constant pool is represented by CONSTANT_Fieldref_info. How memory is allocated in a virtual machine is up to the virtual machine, but the CONSTANT_Fieldref_info attribute is the same for different virtual machines.

A symbolic reference is a property list of fields/classes/methods that exists in a Class file. Symbolic references are the same from virtual machine to virtual machine.

Direct quote:

  1. The pointer that can directly locate the target or the handle that can indirectly locate the target is related to the vm memory layout. Different VMS have different memory space, so the pointer and offset are naturally different.

A direct reference converts symbolic references (that is, property lists of fields/classes/methods) in a Class file to real memory addresses. (Access read modifications are based on real memory addresses for later operations.) Since it is a memory address, the memory layout implementation may be different for different VMS, and the direct reference is different and uncertain for different VMS.

For example, the same fruit has different names in different countries. But the fruit itself is constant, so the fruit itself can be likened to a symbolic reference, and the actual memory distribution of this reference varies depending on the name.

Static link

** Determines ** at compile timeCopy the code

A is the parent class, B is the subclass

public A a=new B();

public void invoke(A a){}public void invoke(B b){}
Copy the code

The invoke method is called with the first invoke() method when a variable is passed in. For a variable, the static type is A and the actual type is B. Which version of the method to call invoke (there are two versions of overloading) is determined based on the static type of the parameter variable, which can be determined at compile time; Prior to this is the direct reference transformation process for the parsing phase.

Can this be determined in the compiler if I change the static type by type coercion meaning is it a static link?

Type strong: For example, the static type of a variable is changed to B [B(a)] when the invoke method is called. Type coercion is known during compilation (the corresponding strong bytecode instruction is used to set the static type of the variable again), that is, its static type can be obtained. You will know which version of the method to call (the second Invoke method). This is not actually a transformation at the parsing stage, but it can be determined at compile time. (This is also called static dispatch.) It follows that static types are mutable (strong cast) and can be converted directly during the parsing phase for methods that are not overloaded; For overloaded methods, if not found the corresponding static type will be carried out on the static type conversion (if the parameter length parameter types are different, there will be a corresponding upward conversion process (implementing an interface – inheritance of class – packing – into a variable length type) even if the conversion can be determined during compilation). Both of these are direct drinking conversions but they don’t conflict. The resolution phase determines the version of the method to be called, and if there is no static type method in the program, an automatic conversion is performed to determine which static type parameter’s method version will be called.

Remember the multiple method versions mentioned above? Yes, method overloading in static dispatch is also called multi-dispatch because there are multiple method versions (more on that later). Dispatch: dynamic linking

** can only be determined at runtimeCopy the code

The reasons why this part is described differently in Chinese and foreign languages: static connection and dynamic connection; If the static type of the parameter is used as the basis, then both the static dispatch and class load parsing phases are static chaining,

But if the reference to the final call is determined based on whether it is run-time (which version of the method being called), static dispatch is essentially a run-time method that converts the static type to look for the converted statically typed method parameters because it can’t find a method that corresponds to the target parameter. But the parsing phase is determined when the class is loaded. (Note that the static dispatch and parsing phases determine which method to call.)

Static linking: Which direct reference to convert is determined at compile time.

It is known at compile time that the static type of a variable can be determined, which is why it is called a static link. Let’s explain why and how we can determine the static type of a variable.

Let’s be clear: both static and actual types can be changed. The only difference is the change in the static type is implemented through the strong and in Java have corresponding to bytecode to get change after compile-time variables static type as it is to identify the static type of variable, but the actual type need to be done according to the runtime can determine (dynamic link details below).

The phase in which static linking occurs can also be divided into: static dispatch and parsing phase The parsing phase converts symbolic references that can be determined at compile time to be unchanged into direct references. There are constructor methods, private methods, and static methods

These are called non-virtual methods that don’t change at runtime, and the compiler can make sure of that

Static dispatch, on the other hand, is the static type of a variable that changes, but also ensures that no change occurs at compile time.

The difference between static dispatch and dynamic dispatch is that static dispatch determines which method to call based on the static type, while dynamic dispatch determines the actual type of the variable. Static dispatch can be determined at compile time, whereas dynamic dispatch needs to be determined at run time.) Typically, overloading takes advantage of the fact that method parameters determine which method to call based on the static type of the parameter, while call polymorphism is determined based on the actual type of the variable.

Like method rewrite.

The static type is also converted by default (implementing interface – inheriting class – boxing – to a variable length type) when no corresponding static type is found.

Dynamic linking: compile time does not confirm which reference to convert until run time confirms which method to call.

Dynamic chaining is related to the actual type of A variable, because the actual type of A variable cannot be determined at compile time and can only be determined at run time. For example, if A condition is true, the variable is converted to A, and if it is not true, the variable is converted to B. It cannot be checked at compile time such as if(true)), this condition must be checked at run time and it cannot be known at compile time what the actual type is. That’s why it’s called dynamic link runtime.

From the compiler’s point of view: first variables must have a type (static type) that can be used later in the field table to store variables representing what type they are.

That is obtained from the field list to the type of static type, this is generated at compile time can determine the properties of the said, but the static type is to change, such as the strong type is the static type of change, but the type of strong turn after compile time can determine (there is a corresponding strong bytecode instruction). So static type does not change can be determined at compile time (not limited to strong turn similar such as compile time if you can’t find the corresponding static type compile time will change according to some rules behind the static type description 】, simply be field must declare a type, and the compiler can naturally know what is the type, and also can know the change, So this is called static linking (compile-time determination)

For the parameters of the method, only what is static type of the specified argument (need to match the static type), the parameters in the process of running the actual type is will change, but also can not determine the type of the final, but for static type no matter how he gets compiled are able to determine what’s his type. So for overloaded methods, the final call to which method is compiled will have the final call version of the method written after the bytecode of the method call.

But in the case of the object calling a method, which method is actually called depends on the actual type of the object (for example, the subclass overrides the superclass method, creating a subclass object, and ultimately calling the method in the subclass, regardless of the static type). Due to the actual type compile time not sure, so also called dynamic link (runtime to determine) but until run time each time to undertake searching efficiency is too low, so at the time of parsing this stage also generates a virtual method table to optimize search efficiency, in the class/interface has a virtual method table, so how to optimize? : virtual method table if subclasses override the parent class method, then its corresponding directly drinkable address is their own, without rewriting would point to the parent class in the corresponding method of directly drinkable address, in order to faster matching virtual method in a subclass table corresponding method without rewriting the subscript subscript and superclass virtual method table.

Finally put a few pictures to explain (PS: graph than write text also tired….)

Because the image is too big, I put it in ProcessOn, which you can view by clicking on the link below

The analytical phase is comprehensively summarized


When to

The virtual machine can choose to parse as soon as the class is loaded or as soon as it is actually used

However, symbolic references must be parsed before the following instructions need to operate on them:

Ane-warray, checkcast, getField, getStatic, Instanceof, InvokeDynamic, InvokeInterface, invoke-Special, InvokeStatic, Invokevirtu Al, LDC, LDC_W, LDC2_W, multianewarray, new, putField, putStatic

Symbolic references to parse:

The parse action is mainly for class or interface, field, class method, interface method, method type, method handle, and call point qualifier.

CONSTANT_Class_info, con-stant_fieldref_info, CONSTANT_Methodref_info, CONSTANT_InterfaceMethodref_info, and CONSTANT_M correspond to the constant pool EthodType_info, CONSTANT_MethodHandle_info, constant_dyna-mic_info and CONSTANT_InvokeDynamic_info are 8 constant types


Which types to parse

Class parsing is more like loading a class and then verifying access rights

1.1.1 Which Class loader is used to load classes and interfaces using fully qualified names (symbolic references)? If A class B is defined, then the class loader of class B is used by the class loader of class A.

Note: In general, classes are loaded according to the parent delegate model mechanism, and some class loaders are dedicated to loading classes below certain paths. And this relationship can be broken, for example in SPI, where the class loader that loads the ServiceLoader is the system class loader that loads the developer’s custom classes (the load method in ServiceLoader accepts the developer’s classes). The class is not found in the system classloader’s loading classpath, so it is the parent class loader that needs to load the subclass’s classloader by introducing the ContextClassLoader thread classloader, which by default uses the application classloader. It is also important to note that the parent subclass loader is not an inherited relationship on Java, but an internal field parent.

So the loading of a class involves these stages of class loading, so I’m going to move on.

1.1.2 When encountering an array object, first load the type of the array, as mentioned above, class parsing is the same process; When this type is loaded (the class loading phase is over), the virtual machine generates a class. What is the purpose of this class? (As mentioned earlier, all dimensions are removed from the type, so the virtual machine automatically generates classes that represent the dimensions and lengths of arrays.)

Notice that the class type is dimensionally insensitive that is, the type that gets rid of all the dimensions, for example, class A of A two-dimensional array and class A of A three-dimensional array, both of which belong to class A, parse is class A, and load is class A.

1.2 Verifying Access Permissions If the class that parses does not have permission to use the parsed class, the resolution fails.

CONSTANT_Class_info = CONSTANT_Class_info; CONSTANT_Class_info = CONSTANT_Class_info; We know that a field has two properties (simple name, descriptor), such as String a=””; The simple name is A, and the descriptor is the permission name of String. These two information are the basic information of the field, so the search will also be based on these two attributes.

If the field can be found in the class (interface) that parses it, the lookup success representation is returned; If the field is not found, the class/interface implements/inherits the interface from the bottom up to find the field. If not found, and is not an interface (interface can only be inherited, the above lookup; But a class can implement interfaces and also inherit from its parent class. If no NoSuchFieldError exception has been found.

Step 3: Permission verification Verify that the class/interface that parses the field has the permission to access the field. If not, the field fails to be parsed.

Initialize the

As mentioned earlier, static final constants are initialized and assigned during the preparation phase of a class, while static variables that are not modified by final are assigned default values.

The initialization phase assigns values to static variables that have default values for static types, and also executes the contents of the static block. Putting these two steps together is the static variable assignment operation and the static block execution operation, and the compiler combines the two operations to produce a method called CINit. Execution and assignment are determined by the order in which the user writes Java files.

class

During initialization, you also need to make sure that the parent class finishes loading, so the cinit method of the parent class will finish executing before the cinit method executes. Unlike the class’s constructor init method, cinit does not need to display the constructor that called the parent class.

2. The JAVA virtual machine guarantees that the cinit method of the superclass executes first. There is no need to invoke the superclass constructor to ensure that the superclass init method completes, as shown in init.

interface

There are no static code blocks in the interface, and the fields are static and final by default.

Note: 1. Before the cinit method of the interface is executed, it is not necessary to complete the cinit method of the parent interface. The parent interface is initialized when a variable in the parent interface is used. 2. Cinit methods of the interface will not be executed before initializing the interface implementation class. 3. Cinit method is synchronized with lock. When multithreading initializes the same class, it will block and only when cinit method is finished can the lock be released. Conclusion:

Initialization is simply the execution of the cinit methods automatically added by the compiler to assign static variables and static statement blocks in the order in the Java file.

use

Static variables, initialization of code blocks to perform operations, so what about class member variables, class constructors? These are called init method execution constructors, which do the initialization of class member variables (i.e. instance variables), of course, after the creation of the object, and need to be displayed to execute the parent class’s init method before init is executed.

uninstall

Type uninstallation conditions are strict:

2. The java.lang.Class object of this Class is not referenced by any object or variable. Whenever a Class is loaded into the method area by the virtual machine, there is an object in the heap that represents that Class: java.lang.class. This object is created when the class is loaded into the method area and cleared when the class is deleted from the method area. 3. The ClassLoader that loaded the class has been reclaimed