The author | TongHui

Almond Java development engineer, focus on underlying technology.

Introduction: We all know that to run a Java file containing the main method, we compile it into a class file, load it into the JVM, and then run it, but there are some questions: what exactly is in the compiled class file? How does the JVM execute a class file? Let’s look at a very simple example of how the JVM works.

1. Prepare

The following are examples of Java files and class files:

Java file:

The class files:

2. Structure of the class file

As you can see from the above, the class file is indeed made up of bits of bytecode like its other name, a bytecode file. Note that class files are made up of bytes, so if data is larger than one byte, it is stored in unsigned big-endian mode. See here for the difference between big-endian mode and small-endian mode. So what do these bytes mean? How does the JVM parse these bytes of data? Oracle defines the structure of a class file:

As you can see from the above, each byte in a Class file has a specific meaning. For example, the first four bytes represent a magic number (CAFEBABE), which is the same for all Class files. The next two bytes are the next version number. Another example is cp_info, which is a very important field and is the constant pool that I’ll focus on later.

Java Language and Virtual MachineSpecifications are even possible if you need to see what each field represents https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.1

The above structure may seem abstract, so take a look at the following diagram:

As you can imagine, all the bytes in a class file represent fixed information, so the JVM knows exactly what the class file contains, such as method information, field information, etc., based on the format of the class file.

3. Important components of the class file

Now that we know the structure of a class file, let’s look at some of the important components of a class file.

3.1 constant pool

The constant pool is the cp_info field in ClassFile you saw earlier. Let’s take an intuitive look at what a constant pool looks like:

This is the constant pool of HelloWorld.class. The first two bytes of a constant pool indicate the number of constant entries in the constant pool, which is finite because there are only two bytes. You can figure out how many. The number of constant terms is followed by each constant term. Each constant item has a 1-byte tag bit, which is used to indicate what the constant item represents. If tag is 0A, it indicates that this is a MethodRef constant item.

In technical terms the constant pool holds literal and symbolic references. Literals are like text strings, or constant values that are declared final. Symbolic references include fully qualified names of three classes of constant classes and interfaces, field names and descriptors, and method names and descriptors.

In particular, the index of constant entries in the constant pool starts at 1, so that the index value is set to 0 when other structures need to indicate that no constant entries are referenced.

As you can summarize from the previous description, all constant pool entries have the following common format:

cp_info {
   u1 tag;
   u1 info[];
}Copy the code

In the constant pool, each cp_INFO entry (that is, constant entry) must have the same format, starting with a single-byte tag entry that represents the cp_INFO type. The contents of the following info[] items are determined by the type of tag.

The types of tags are as follows:

Some common constants:

The Class Info:

CONSTANT_Class_Info {
    u1 tag;
    u2 name_index;
}Copy the code
  • The value of tag is 7

  • Name_index refers to a constant item in the constant pool whose index is name_index

UTF8 Info:

CONSTANT_UTF8_Info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}Copy the code
  • The value of tag is 1

  • Length represents the number of bytes of the UTF8 encoded string

  • Bytes [length] Specifies the length of the string

Note: The method name and field name in the class file reference UTF8 Info, but the UTF8 Info data length is 2 bytes, so the maximum length of the method name and field name is 65535.

String Info:

CONSTANT_String_INFO {
    u1 tag;
    u2 string_index;
}Copy the code
  • The value of tag is 8

  • String_index refers to a constant entry in the constant pool whose index is sent to string_index

Field_Ref Info:

CONSTANT_Fieldref_Info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}Copy the code
  • The value of tag is 9

  • Class_index refers to a constant item in the constant pool whose index is class_index, and this constant item must be of Class Info type

  • Name_and_type_index refers to a constant item in the constant pool whose index is name_AND_type_index, And this constant item must be of Type Name And Type Info

Method_Ref Info:

CONSTANT_Methodref_Info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}Copy the code
  • The value of tag is 10

  • Class_index refers to a constant item in the constant pool whose index is class_index, and this constant item must be of Class Info type

  • Name_and_type_index refers to a constant item in the constant pool whose index is name_AND_type_index, And this constant item must be of Type Name And Type Info

NameAndType Info:

CONSTANT_NameAndType_Info {
    u1 tag;
    u2 name_index;
    u2 descriptor_index;
}Copy the code
  • The value of tag is 12

  • Name_index refers to a constant item in the constant pool whose index is name_index

  • Descriptor_index refers to the constant item in the constant pool whose index is Descriptor_index

3.2 the field

As with the previous constant pools, because the number of fields in each class is uncertain, the first two bytes of the field section are used to indicate the number of fields in the current class file, followed by the specific fields.

Let’s take a look at the structure of the fields

    Field_Info {
        u2 access_flag;
        u2 name_index;
        u2 descriptor_index;
        u2 attribute_count;
        attribute_info attributes[attribute_count];
    }Copy the code
  • Access_flag represents the access modifier for the field, which is similar to the representation of the class, but has a different content

The access identity of the field

  • Name_index points to the constant entry of the name_INDEX index in the constant pool

  • Descriptor_index The constant item pointing to the Descriptor_index in the constant pool

  • Attribute_count indicates the number of attributes in the field

  • Attributes [attribute_count] indicates the specific attributes of the field

Note: The type of the field descriptor here is the type of the field, but it’s not the type of the whole word like int or String when you write code, it’s the shorthand for some character, like this:

  

So, for example, if the field is String, the descriptor is Ljava/lang/Object; If the field is int[][], its descriptor is [[I

The properties of the fields are the same as those of the methods described below.

3.3 methods

Methods, like fields, need a field that represents the number of methods, followed by the specific method

Again, look at the structure of the method:

    Method_Info {
        u2 access_flag;
        u2 name_index;
        u2 descriptor_index;
        u2 attribute_count;
        attribute_info attributes[attribute_count]
    }Copy the code
  • The value of access_flag is the same as that of field, but the value is different. The value of access_flag for method can be as follows:

  • Name_index has the same meaning as field, indicating the name of the method

  • Descriptor_index also means the same thing as field except that it is used differently, so let’s see how it is used:

Method Descriptors are made up of two parts: parameter descriptors and return value descriptors. So method descriptors take the following form:

( ParameterDescriptor* ) ReturnDescriptorCopy the code

And the parameter descriptor is the field descriptor, and the return value descriptor is the field descriptor, but one more type is void, and the descriptor is the following:

VoidDescriptor:VCopy the code

So for example, if the signature of a method is

Object m(int i, double d, Thread t) {.. }Copy the code

So its descriptor is theta

(IDLjava/lang/Thread;) Ljava/lang/Object;Copy the code
  • Attribute_count means the number of attributes as field does

  • Attributes [attribute_count] and field also represent specific attributes, and the number of attributes is determined by attribute_count

3.4 attributes

3.4.1 Attribute structure

Property This data structure can appear in class files, field tables, and method tables. Some properties are unique, and some properties are common to all three.

Attributes are described as follows:

Instead of explaining each Attribute in detail, let’s look at the most important Attribute in a method table, the Code Attribute. This is important because the Code for our function is in the Code Attribute (which actually stores instructions). Other properties of some explanation can refer to the Oracle of the JVM specification described in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7-300

3.4.2 Code Attribute

First look at the structure of the Code Attribute

Code_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 max_stack;
    u2 max_locals;
    u4 code_length;
    u1 code[code_length];
    u2 exception_table_length;
    {   u2 start_pc;
        u2 end_pc;
        u2 handler_pc;
        u2 catch_type;
    } exception_table[exception_table_length];
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}Copy the code

As you can see, the Code Attribute Attribute is very complex. Let’s briefly explain the meaning of each member:

  • The index of the constant entry in the constant pool to which attribute_name_index points, and the constant entry must be of type UTF8 Info and value “Code”.

  • Attribute_length indicates the length of the attribute, but does not include the first six bytes

  • Max_stack represents the maximum depth of the operand stack in the function stack frame formed at runtime by the method with the Code attribute

  • Max_locals indicates the length of the maximum local variable table

  • Code_length indicates the length of the method in which the Code attribute is located (this length is the length of the method Code compiled into bytes)

  • Code [length] indicates the specific code, so the code length of Java functions is limited. The length of compiled byte instructions can only be the maximum of 4 bytes. So the code for a function should not be too long, or it will not compile

  • Exception_table_length indicates the number of exceptions that the method will throw

  • Exception_table [Exception_table_length] Indicates a specific exception

  • Attributes_count represents the length of the subattribute of the Code attribute, which is complicated because it can also be nested

  • Attributes [attributes_count] represents a specific attribute

Helloworld.class Code Attribute: helloWorld.class Code Attribute: helloWorld.class

3.4.3 Code Attribute Two sub-attributes

Here’s an additional Code Attribute with two child attributes. Have you ever wondered why when we run an error using an IDE, the IDE can pinpoint exactly which line of code is wrong? Why is it that when we use a method in the IDE we can see the parameter name of the method and get the variable value from the parameter name when debugging? The key reason is these two child properties of the Code property.

LineNumberTable

The structure of LineNumberTable

LineNumberTable_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 line_number_table_length;
    {   u2 start_pc;
        u2 line_number;
    } line_number_table[line_number_table_length];
}Copy the code

Start_pc is the index value of the Code [] array in the Code Attribute. Line_number is the line number of the source file

LocalVariableTable

The structure of LocalVariableTable

LocalVariableTable_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 local_variable_table_length;
    {   u2 start_pc;
        u2 length;
        u2 name_index;
        u2 descriptor_index;
        u2 index;
    } local_variable_table[local_variable_table_length];
}Copy the code

Local_variable_table [local_variable_table_length];

  • Start_pc and length indicate the index range of the local variable ([start_pc, start_PC + length))

  • Name_index indicates the index of the variable name in the constant pool

  • Descriptor_index indicates the index of the variable descriptor in the constant pool

  • Index represents the index of this local variable in the local variable table

The LocalVariableTable property is actually used to describe the relationship between the variables in the LocalVariableTable in the stack frame and the variables defined in the Java source code, so that other people can refer to the method and know the property name of the method. And you can get parameter values from the context based on parameter names during debugging.

4. Execution engine

After the JVM has parsed the class file, it transforms it into a runtime structure, stores it in a method section (also known as a permanent generation), and creates a class object (also known as a class object) that provides an interface to access the class data.

When executing, the JVM will always start with the main method, which is to find the main method from all the methods in the Class, and then find the bytecode of the method body from the Code Attribute of the main method and call the execution engine. So to understand how the JVM executes code, you need to know something about bytecode.

4.1 Runtime stack frame structure

Let’s take a look at the RUNTIME structure of the JVM

Because the JVM is a stack-based virtual machine, almost all operations need to be done through operations on the stack. This is done by starting with main (a stack frame for main is created at the beginning), executing instructions for main (in the Code Attribute), creating a new stack frame if you want to call a method, and popping up the first stack frame if the function is finished.

4.2 JVM instructions

No matter what function you write in the Java source file and what sophisticated algorithm you use, the compiler compiles it into the class file one by one, and the bytes in the Code [] field in Code Attribute are the bytecode instructions translated from the function.

The instructions supported by the JVM can be roughly divided into three categories: those with no operands, those with the sum of one operand, and those with one operand. Of two operands. Because the JVM represents instructions in a single byte, there are only 256 instructions at most.

The general form of JVM instructions is as follows:

4.3 Several commonly used instruction parsing

Because the JVM has too many instructions to parse all of them here, a few instructions were chosen to parse.

4.3.1 invokespecial

Invokespecial is used to call instance methods. It is specifically used to handle calls to superclass methods, private methods, and instance initialization methods.

IndexByte1 and indexByte2 used in the index of the constant pool ((indexByte1 < < 8) | indexByte2). The constant term pointed to must be of type MethodRef Info. It also creates a function stack frame, pushes the argument of the called method from the current operand stack, and places it in the local variable table of the function stack frame of the called method.

4.3.2 aload_n

Aload_n loads a reference type value from the local variator into the operand stack. The value of N determines which variable to load from the local variator of the current function stack frame.

4.3.3 astore_n

Note: Store a reference type data in the local variable table, and the location of the local variable table is determined by the value of N.

All right, so that’s it. For instructions, you can look at Oracle’s JVM instruction set, which has detailed instructions for each instruction.

So the job of the execution engine is to implement the corresponding function according to each instruction.

5. To summarize

Due to the richness of the JVM, only the major processes performed by the JVM are analyzed here, and some of the processes such as class loading, class linking (validation, preparation, parsing), and initialization are not explained. This is not to say that these are not important, but we can pay more attention to some of these things when we write code. Here I also wrote a can run according to the above example, can be found here https://github.com/thlcly/Mini-JVM.

6. Reference

  1. The Java® Virtual Machine Specification

  2. Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (version 2)

  3. In-depth Java Virtual Machine version 2

The full text after


You may also be interested in the following articles:

  • Lego micro Service Transformation (I)

  • Lego micro service Transformation (II)

  • A startup’s Path to containerization (I) – Containerization before

  • A startup’s path to containerization (II) – Containerization

  • The containerization of a startup (iii) – The container is the future

  • Four-dimensional Reading: My Secret Technique for Efficient Study

  • The skills necessary for an engineer to grow

  • Responsive programming (PART 1) : Overview

  • Responsive programming (part 2) : Spring 5

  • Apple’s three kits in health care

  • Talk about the mobile cross-platform database Realm

  • Processing of complex business state: from state mode to FSM

  • React to Preact Migration Guide

  • Brief introduction to back-end cache system

  • What exactly is abstraction and the principles of abstraction in software design

Almond technology station

Long press the left QR code to pay attention to us, here is a group of passionate young people looking forward to meeting with you.