Let’s talk about Class loading first and then the Class file structure. We are introduced in sections according to the life cycle diagram.

Class loading

1. The load

The loading phase is a phase of the class loading process that does three things:

  • Gets the binary byte stream that defines a class by its permission name.
  • Converts the static storage structure represented by the entire byte stream into the runtime data structure of the method area.
  • Generate a java.lang.Class object that represents the entire Class in memory, and serve as various data access points for the entire Class in the method area.

The requirements in the Java Virtual Machine Specification for these three points are not particularly dike breaking, leaving the virtual machine implementation and Java application extremely flexible. For example, “get the binary stream that defines a class by its permission name” can be obtained from a Zip package, network, runtime generation, database read, etc.

In the case of an array class, the class itself is not created by a classloader. It is dynamically constructed directly in memory by the Java Virtual Machine, but the Element Type of the array class is still loaded by the loader

2. Verify

The purpose of this phase is to ensure that the information contained in the byte stream of the Class file meets all the constraints of the Java Virtual Machine Specification, including the following four aspects:

  • File format validation: The first stage is to verify that the byte stream complies with the Class file format specification and can be processed by the current version of the virtual machine.

    • Whether to start with the magic 0xCAFEBABE
    • Whether the primary and secondary versions are within the acceptable range of the Current Java VIRTUAL machine.
    • Whether there are unsupported constants in the pool constants (check the constant tag flag).
    • Whether any of the various index values that point to constants point to constants that do not exist or do not conform to the type.
    • Whether the constant of type CONSTANT_Utf8_info contains data that does not conform to UTF-8 encoding.
    • Whether sections of the Class file and the file itself have been deleted or have additional information
    • .
  • Metadata validation: The second phase is the semantic analysis of the information described by the bytecode to ensure that the information described complies with the Requirements of the Java Language Specification.

    • Does this class have a parent class? (All classes except java.lang.Object should have a parent class.)
    • Whether the superclass of this class inherits from a class that is not allowed to inherit (a class that is final decorated).
    • If the class is not abstract, does it implement all the methods required in its parent class or interface?
    • If a field or method in the class is inconsistent with the parent class (for example, a final field that overrides the parent class, or a method overload that does not conform to rules, such as the same method parameters but different return value types, etc.).
    • .
  • Bytecode verification: The third stage is the most complex stage in the whole verification process. The main purpose is to determine that the program semantics are legal and logical through data flow analysis and control flow analysis. In the second phase, after the data types in the metadata information are verified, the method body of the Class (Code attribute in the Class file) is verified and analyzed to ensure that the methods of the verified Class do not endanger the security of the VIRTUAL machine.

    • Ensure that the data type of the operand stack works with the sequence of instruction code at any given time. For example, there is no such thing as “an int is placed on the operation stack and is loaded into the local variable table as a long”.
    • Ensure that no jump instruction jumps to a bytecode instruction outside of the method body.
    • Ensure the method body type conversion is always effective, for example, can put a subclass object assignment to the parent class data type, which is safe, but the parent class object is assigned to a subclass data types, even the object assignment give it no inheritance relationships, and completely irrelevant to a data type, is dangerous and illegal.
    • .
  • Symbolic reference validation: The fourth stage verifies that the class is missing or denied access to external classes, methods, fields, and other resources on which it depends.

    • Whether a fully qualified name described by a string in a symbolic reference can find the corresponding class.
    • Whether there are methods and fields in the specified class that match the method’s field descriptor and simple name.
    • The accessibility (private, protected, public,) of classes, fields, and methods in symbolic references.

3. Prepare

Preparation stage is formally defined as a class variable (namely static variables, by static modified variable) allocates memory and set at the beginning of class variable value in the beginning stage, conceptually, the memory used by these variables should be distributed in the method, but must pay attention to the method of area itself is a logical area, before the JDK 7 and, When HotSpot uses a permanent generation to implement method areas, the implementation is completely consistent with this logical concept; In JDK 8 and beyond, Class variables are stored in the Java heap along with Class objects, and “Class variables are in the method area” is a complete statement of logical concepts.

There are two other confusing concepts about the preparation phase that I should emphasize. The first is that memory allocation only involves class variables, not instance variables, which are allocated in the Java heap along with the object when it is instantiated. The second is that the initial value here is “normally” a zero value of the data type.

Public static int value = 123; The initial value of the variable value after the preparation phase is 0, not 123, because no Java methods have yet been executed. The putstatic instruction assigning value to 123 is stored in the constructor () method after the program is compiled. So the action of assigning value to 123 will not be performed until the class is initialized.

4. The parsing

The parsing phase is the process by which the Java Virtual Machine replaces symbolic references in the constant pool with direct references.

  • Symbolic References: A Symbolic reference uses a set of symbols to describe the referenced target. A symbol can be a literal in any form, as long as it can be used to locate the target without ambiguity. Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily the content already loaded into the virtual machine’s memory. The memory layout of the various virtual machine implementations can vary, but they must all accept the same symbolic references because the literal form of symbolic references is clearly defined in the Java Virtual Machine Specification’s Class file format.
  • Direct References: A Direct reference is either a pointer, a relative offset, or a handle that can indirectly locate a target. Direct references are directly related to the memory layout implemented by the virtual machine. The same symbolic reference generally translates into different direct references across different virtual machine instances. If there is a direct reference, the target of the reference must already exist in the memory of the virtual machine.

The following contents are included:

  • 1. Class or indirect interface resolution
  • 2. Field parsing
  • 3. Method analysis
  • 4. Interface method parsing

5. The initialization

The Java Virtual Machine Specification strictly defines six situations in which a class must be initialized immediately:

1. When you encounter the four bytecode directives new, getStatic, putStatic or Invokestatic, if the type has not been initialized, you need to trigger the initialization phase first. Typical Java code scenarios that generate these four instructions are:

  • When you instantiate an object with the new keyword.
  • When reading or setting a static field of a type (except for static fields that are final decorated and have been put into the constant pool at compile time).
  • When a static method of a type is called.

2. When using the java.lang. Reflect package’s methods to reflect on a type that has not already been initialized, the type needs to be initialized first.

3. When initializing a class, if the parent class has not been initialized, trigger the initialization of the parent class first.

4. When the VM starts, the user needs to specify a main class (the class containing the main() method) to execute. The VM initializes this main class first.

5. When using the new dynamic language support in JDK 7, If a Java. Lang. Invoke. The final analytical results for MethodHandle instance REF_getStatic, REF_putStatic, REF_invokeStatic, REF_newInvokeSpecial Four types of method handle, and the method handle corresponding to the class has not been initialized, need to trigger the initialization first.

6. When an interface has a new default method defined in JDK 8 (an interface method modified by the default keyword), if an implementation class of the interface is initialized, the interface must be initialized before it.

It is not until the initialization phase that the Java Virtual Machine actually starts executing the Java program code in the class, handing over control to the application. In the preparation phase, variables have been assigned once to the initial zero value required by the system, while in the initialization phase, class variables and other resources are initialized according to the programmer’s subjective plan through the program.

Classes and class loaders

Class loaders, while only used to implement the loading action of a class, play a much bigger role in a Java program than the class loading phase. For any class, its uniqueness in the Java Virtual Machine must be established by the class loader that loads it and the class itself, each of which has a separate class namespace. This sentence can be expressed more informally: Comparing two classes to be “equal” only makes sense if they are loaded by the same Class loader. Otherwise, even if they come from the same Class file and are loaded by the same Java virtual machine, they are not equal as long as they are loaded by different Class loaders. This includes the equals(), isAssignableFrom(), and isInstance() methods that represent the Class object. It also includes using the instanceof keyword to determine the ownership of an object.

7. Parent delegation model, three-tier class loading architecture

(1) Three-tier class loaders: including startup class loaders, extension class loaders, application class loaders.

  • Start the class loader

Bootstrap Class Loader Bootstrap Class Loader This class loader is responsible for loading the <JAVA_HOME>\lib directory, or the path specified by the -xbootclasspath parameter, and is recognized by the Java VIRTUAL Machine (identified by file names such as rt.jar, tools.jar, etc.). Libraries with incorrect names will not be loaded even if they are placed in the lib directory. The bootstrap classloader cannot be referenced directly by a Java program. If the user needs to delegate the loading request to the bootstrap classloader when writing a custom classloader, he can use null instead.

  • Extend the class loader

Extension ClassLoader: This ClassLoader is implemented as Java code in the sun.misc.Launcher$ExtClassLoader Class. It is responsible for loading all libraries in the <JAVA_HOME>\lib\ext directory, or in the path specified by the java.ext.dirs system variable. From the name “Extended class loader”, you can infer that this is a mechanism for expanding Java system class libraries. The JDK development team allows users to place generic class libraries in ext directories to extend Java SE functionality. After JDK 9, This extensibility mechanism is replaced by the natural extensibility that modularity brings. Because extension classloaders are implemented in Java code, developers can use extension classloaders directly in their programs to load Class files.

  • Application class loader

Application ClassLoader: This ClassLoader is implemented by sun.misc.Launcher$AppClassLoader. Because the application ClassLoader is the return value of the getSystemClassLoader() method in the ClassLoader class, it is also called the “system ClassLoader” in some cases. It is responsible for loading all classes on the user ClassPath, and developers can also use this class loader directly in their code. If the application does not have its own class loader, this will generally be the default class loader for the application.

(2) Parent delegation model

  • Workflow:

If a classloader receives a classloader request, it will not first attempt to load the class itself. Instead, it will delegate the request to the parent classloader, as is the case for classloaders at each level, so all loading requests should eventually be passed to the top-level startup classloader. Only when the parent responds that it cannot complete the load request (it does not find the required class in its search scope) will the child loader attempt to complete the load itself.

  • Model advantage

Classes in Java have a hierarchy of priorities along with their class loaders. For example, the java.lang.Object class, which is stored in rt.jar, is eventually delegated to the bootstrap class loader at the top of the model by whichever class loader loads the class, so the Object class is guaranteed to be the same class across all class loader environments in the program. On the other hand, if the parent delegate model is not used and the class loaders load it themselves, if the user writes a class called java.lang.Object and places it in the program ClassPath, there will be several different Object classes in the system. The most basic behavior in the Java type system would not be guaranteed, and the application would become a mess. If you’re interested, try writing a Java class with the same name as one already in the Rt.jar library, and you’ll find that it compiles, but never loads and runs.

Model implementation: Implementation method loadClass

Check to see if the requested type has already been loaded. If not, call the loadClass() method of the parent loader. If the parent loader is empty, the default parent loader is the initiator class loader. If the parent class loader fails to load and throws a ClassNotFoundException, then call your own findClass() method to try to load it.

8. Break the parent delegation model

The parent delegate model: custom class loaders -> application class loaders -> Extend class loaders -> start class loaders and break the parent delegate model is to break this order or change the link.

  • Why break the parent delegate model?

For example, JDBC breaks the parent delegate model. Why? After JDBC 4.0, we don’t actually need to call class.forname to load the driver, we just need to put the driver JAR package in the project’s Class load path, and the driver will be loaded automatically. The auto-loading technology is called SPI, and database driver vendors have updated it. If you look in the meta-INF /services directory of the jar package, there is a java.sql.Driver file that contains the full path name of the Driver. To create a connection to a database, we simply need to use the following sentence:

Connection con =  DriverManager.getConnection(url , username , password ) ;   
Copy the code

Because class loaders are limited by their loading scope, in some cases the parent class loader cannot load the required file, and in this case the subclass loader needs to be delegated to load the class file.

The JDBC Driver interface is defined in the JDK, and its implementation is provided by various database service providers, such as the MySQL Driver package. The DriverManager class is loaded with classes that implement the Driver interface and managed by the BootStrap loader. However, DriverManager is stored in the JAVA_HOME jre/lib/rt.jar package. The implementation class of the Driver interface is a Jar package provided by the server. According to the classloading mechanism, when a loaded class references another class, the virtual machine will use the same classloader that loaded the first class to load the referenced class. The BootStrap class loader also loads the Driver interface implementation class in the JAR package. By default, the BootStrap class loader is only responsible for loading all classes in the JAVA_HOME jre/lib/rt.jar. Therefore, the Driver implementation needs to be loaded by the subclass loader, which is a break of the parent delegate model.

(2) Class file structure

Figure 2-1

1. Magic: 0~3 U4

Bytecode starts with four bytes 0xCAFEBABE: fixed for each class file.

2. Version number: 4~5 U2

3. Major version number: 6~7 U2

Jdk1.1 The Java version is 45d, major version +1. For example, jdK8 45+8-1=52 can be backward compatible with earlier Class files, irreversible.

4. Constant pool U2 +

8 to 9 Place capacity count: Figure 2-1 0x0016 decimal 22. The index 0 is left blank and reserved. Therefore, 21 constants can be stored. There are two main types of constants: literal and symbolic references.

  • Literal stores: constant values, such as text strings, that are declared final.
  • Symbolic references are stored as follows:
    • Packages exported or opened by modules
    • Fully Qualified Name of class and interface
    • Name of the field and Descriptor
    • The name and descriptor of the method
    • Method Handle, Method Type, Invoke Dynamic
    • Dynamic Call points and constants (Dynamically-Computed Call Site and Dynamically-Computed Constant)

Figure 2-2

There is no fixed index since 10. We read a constant value according to the graph. The constant type is u1,A is 07, and the corresponding table type is CONSTANT_Class_info. The structure of CONSTANT_Class_info is as follows: u2 is name_index. We read two bytes BC: 0x0002, which refers to the second constant in the constant pool.

5. Access logo: U2

At the end of the constant pool, the next two bytes represent access_flags (access_flags), which identify some Class or interface level access information, including: is the Class a Class or an interface; Whether to define a public type; Whether to define as abstract type; If it is a class, whether it is declared final;Sign: use or operation such as 0 x0001 x0021 | 0 x0020 = 0.

6. Set of class index, Superclass index and interface index (U4)+(U2 +)

This_class and super_class are both types u2, and interfaces are collections of types U2, which are used to determine the inheritance of this type in the Class file. The class line is referenced to determine the fully qualified name of the class, and the parent index is used to determine the fully qualified name of the parent class of the class. Since the Java language does not allow multiple inheritance, there is only one superclass index. All Java classes except java.lang.Object have a superclass. Therefore, all Java classes except java.lang.Object have a non-zero superclass index. The interface index collection is used to describe what interfaces the Class implements, which are listed from left to right after the implements keyword (if the Class file represents an interface, it should be the extends keyword).

7. Collection of field tables

The field table (field_info) is used to describe variables declared in an interface or class. A “Field” in the Java language includes class-level variables as well as instance-level variables, but does not include local variables declared inside methods. The reader may recall what information a field can contain in the Java language. Modifiers that fields can include include the scope of the field (public, private, protected), whether it is an instance or class variable (static), variability (final), concurrency visibility (volatile, Whether to force reading and writing from main memory), whether to serialize (TRANSIENT modifier), the field data type (base type, object, array), and the field name. Each of these modifiers is a Boolean value, and either there is a modifier or there is none, which is a good place to use a marker. The name of the field and the type of data it is defined are not fixed, but can only be described by referring to constants in the constant pool.

8. Collection of method tables

The structure method table is structured just like the field table. Because the volatile and TRANSIENT keywords do not modify methods, the ACC_VOLATILE and ACC_TRANSIENT flags are missing from the access flags in the method table. In contrast, synchronized, native, strictfp, and Abstract keywords can modify methods, The ACC_SYNCHRONIZED, ACC_NATIVE, ACC_STRICTFP and ACC_ABSTRACT flags are added to the access flags of the method table.

9. Property sheet collection

Class files, field tables, and method tables can all carry their own set of property sheets to describe information that is specific to certain scenarios.

Panda Notebook Email:[email protected]