1. The bytecode

Java’s ability to “compile once, run many places” is a big part of what it is today. The JVM and bytecodes are the main reasons why JVMS can run the same bytecodes (.class files) on different platforms and operating systems tailored to the JVM specification, and get the same results. It is called bytecode because bytecode files are made up of hexadecimal values, and the JVM reads them in bytes in groups of two hexadecimal values. In Java, the source code is compiled into bytecode files using javac commands, and a. Class file is generated and sent to the Java VIRTUAL machine for execution. On Android, the class file is packaged into a. Dex file and sent to the DVM for execution.

By learning Java bytecode instructions, you can have a better understanding of the underlying operation structure of the code and the implementation principle behind the code. For example, the implementation principle of string addition is to add strings through the Append of StringBuilder. Using the bytecode perspective to see its execution steps, you can also have a deeper understanding of Java code, know what it is, and why.

Bytecode knowledge can also be learned to achieve bytecode staking functions, such as using ASM, AspectJ and other tools to operate bytecode layer code, to achieve some functions that Java code is not easy to operate.

1. Format of bytecode

Here is a simple example to analyze the structure of its bytecode

public class Main {
    public static void main(String[] args) {
        System.out.println("HelloWorld"); }}Copy the code

In the figure above, the pure alphanumeric letters are the bytecodes, and on the right are the bytecode instructions executed by the specific code.

The above may seem like a lot of gibberish, but the JVM has a specification for bytecode, and the code structure is examined bit by bit

1.1 Magic Number

The magic number’s only function is to determine whether the file is a Class file that the virtual machine can receive. Many file storage standards use magic numbers for identification, such as in GIF and JPEG headers. The definition of a magic number can be arbitrary, as long as the magic number is not widely adopted and does not cause confusion.

The magic number in the bytecode is 0xCafeBabe, a magic number that was determined when Java was called Oak, in search of something fun and easy to remember, according to the original developers. 0xCafeBabe was chosen because it symbolizes Baristas Coffee, a popular Coffee brand from Peet’s Coffee, which is also the logo of Java.

1.2 Version Number

The next four bytes of the magic number (00 00 00 33) store the version number of the Class file. The first two are Minor versions, which are converted to 0 in decimal. The last two Major versions are 52 in decimal notation, and the corresponding Major Version number of serial number 52 is 1.8, so the Java Version number for compiling this file is 1.8.0. Older JDK versions are backward compatible with previous versions of Class files, but cannot run later versions of Class files, even if the file format has not changed, and the virtual machine must refuse to execute Class files older than its version number.

1.3 Constant Pool

There is a brief note in front of this section for those who are interested.

Following the version number is the entry to the constant pool, which can be understood as the repository of resources in the Class file. It is the data structure that is most associated with other items in the Class file structure. It is also one of the data items that occupy the largest control of the Class file.

The first two bytes of the constant pool (00 22) represent the constant pool capacity counter. Contrary to Java language conventions, this capacity count starts at 1, where 22 in decimal is 34. Removing a subscript count means 33 constants in the constant pool. This can also be seen from the bytecode Constant pool, the last of which is #33 = Utf8 (Ljava/lang/String;). V

Capacity counters store constant pool data. The constant pool stores two types of constants: literals and symbolic references. Literal for the code statement for Final constant values (such as a string), symbolic reference such as the global qualified names of classes and interfaces, the name of the field, and the name of the descriptor, method and descriptor, when virtual during, need to get corresponding symbols referenced from constant pool, again when creating a class or run-time parsing, translated into the memory address. The diagram below.

Each constant in the constant pool is a table. Before JDK71.7, there were 11 different table structures. After JDK1.7, in order to better support dynamic language calls, Three additional methods (CONSTANT_MethodHandle_info, CONSTANT_MethodType_info and CONSTANT_InvokeDynamic_info) have been added. In total 14, the table structure is shown below

In the figure above, tag is the flag bit used to distinguish constant types, length is the length of the UTF-8 encoded string, followed by the contiguous length of the length byte is a UTF-8 encoded string. In the figure above, U1, U2, U4, and U8 represent the number of bits, which are 1,2,4, and 8 bytes respectively.

The utF-8 thumbnail encoding differs from the normal UTF-8 encoding by: The abbreviation encoding of the characters between \u0001 and \ U007f (equivalent to ASCII codes 1-127) is one byte, and the abbreviation encoding of all characters between \u0080 and \u07ff is two bytes. All the characters between \ U0800 and \ UFFFF are encoded in three bytes according to normal UTF-8 encoding rules, mainly to save space.

CONSTANT_Utf8_info is the maximum length of a Java method or field name because all methods, fields, and so on in Class files refer to CONSTANT_Utf8_info constants. The maximum length here is the maximum length, that is, the maximum u2 type can express 65535, so if the Java program defines a variable or issue name with more than 64K English characters, it will not compile.

Going back to the example above, 00 22 is followed by 0A 0006 0014. The first byte 0A is converted to decimal 10 and represents the constant type CONSTANT_Methodref_info, As you can see from the constant table, this type is followed by two u2 to denote index, CONSTANT_Class_info and CONSTANT_NameAndType_info. So 0006 and 0014 into base 10 are 6 and 20, respectively. It may not be clear what these numbers mean, but the compiled bytecode instructions shown below will make it clear.

Constant pool:
   #1 = Methodref #6.#20 // java/lang/Object."
      
       ":()V
      
   #2 = Fieldref #21.#22 // java/lang/System.out:Ljava/io/PrintStream;
   #3 = String #23 // HelloWorld
   #4 = Methodref #24.#25 // java/io/PrintStream.println:(Ljava/lang/String;) V
   #5 = Class #26 // com/verzqli/snake/Main
   #6 = Class #27 // java/lang/Object
   #7 = Utf8 
      
   #8 = Utf8 ()V
   #9 = Utf8 Code
  #10 = Utf8 LineNumberTable
  #11 = Utf8 LocalVariableTable
  #12 = Utf8 this
  #13 = Utf8 Lcom/verzqli/snake/Main;
  #14 = Utf8 main
  #15 = Utf8 ([Ljava/lang/String;)V
  #16 = Utf8 args
  #17 = Utf8 [Ljava/lang/String;
  #18 = Utf8 SourceFile
  #19 = Utf8 Main.java
  #20 = NameAndType #7:#8 // "
      
       ":()V
      
  #21 = Class #28 // java/lang/System
  #22 = NameAndType #29:#30 // out:Ljava/io/PrintStream;
  #23 = Utf8 HelloWorld
  #24 = Class #31 // java/io/PrintStream
  #25 = NameAndType #32:#33 // println:(Ljava/lang/String;) V
  #26 = Utf8 com/verzqli/snake/Main
  #27 = Utf8 java/lang/Object
  #28 = Utf8 java/lang/System
  #29 = Utf8 out
  #30 = Utf8 Ljava/io/PrintStream;
  #31 = Utf8 java/io/PrintStream
  #32 = Utf8 println
  #33 = Utf8 (Ljava/lang/String;) V
Copy the code

The first Constant is Methodref, which refers to the Main Class, which is the basic Object Class. There are two indexes in the Constant pool that refer to Class 6 and NameAndType 20. As described above in hexadecimal bytecode.

1.4 Access Flags

After the constant pool ends, the next two bytes represent the access flag, which identifies some Class or interface level access information, including whether the Class is a Class or an interface; Whether it is defined as public; Whether it is defined as abstract, and if it is a class, whether it is declared as final, etc. See the table below for the specific flag bits and their meanings.

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Identifies whether the type is public
ACC_FINAL 0x0010 Identifies whether or not it is declared final, which only the class can set
ACC_SUPER 0x0020 For compatibility with older compilers, newer compilers all set the flag to do special processing for subclass methods when using the Invokespecial directive
ACC_SYNTHETIC 0x1000 Indicates that this class is not generated by user code, but by the compiler
ACC_INTERFACE 0x0200 Indicates whether it is an interface. The default setting for the interface is ACC_ABSTRACT
ACC_ABSTRACT 0x0400 Identifies whether it is an abstract class and cannot be set at the same time as ACC_FINAL
ACC_ANNOTATION 0x2000 Identifies whether this is an annotation class
ACC_ENUM 0x4000 Identifies whether this is an enumeration

There are 16 flag bits available for ACCESS_FLAGS, and only eight of them are currently defined. ACC_PRIVATE, ACC_PROTECTED, ACC_STATIC, ACC_VOLATILE, and ACC_TRANSTENT are not modifier classes. There are other identifiers), and all unused identifiers are required to be 0. The combination of Java not exhaustive above all signs, but with | to combine operation, as to how these flags represent various state, we can see this article, speak clearly.

Let’s go back to the example

1.5 Class Index, Superclass Index, and interface Index

Both the Class index and the parent index are u2-type data. The interface index is a set of U2-type data. The Class file determines the inheritance relationship of this Class by three data. The three are listed in order after the access flags, and in this example they are: 0005 0006 0000, that is, the class index is 5, the parent index is 6, the interface index set size is 0, query the above bytecode instruction constant pool can be one-to-one corresponding (5 for com/verzqli/snake/Main,6 for Java /lang/Object).

The class index determines the fully qualified name of the class, and the superclass index determines the fully qualified name of the class. Java does not allow multiple inheritance, so there is only one superclass index. All classes except Object have a parent class, that is, the index of the parent class is not 0. An interface index can be used to describe which interfaces are implemented by the class. The implemented interfaces are listed in the index collection from left to right after implements (or extends statement if the class itself is an interface).

1.6 Set of Field tables (Field Info)

Field tables describe variables declared in classes and interfaces, including class-level variables as well as instance variables. However, local variables declared inside the method are not included. Describing a field in Java might contain the following information:

  • Field scope (public, private,protected modifiers)
  • Is it an instance variable or a class variable (static modifier)
  • Whether to be mutable (final modifier)
  • Concurrent visibility (vlolatile modifier, whether to force reads and writes from main memory)
  • Deplorable serialization or not (Transient modifier)
  • Field data primitives (primitives, objects, arrays)
  • Each of the modifiers in the above information is a bool, either present or absent, and is appropriately represented by the same token bits as the access token. Field names and field data types can only be described by referring to constants in the constant pool. The access symbols and meanings of field modifiers are shown in the following table.
Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Identifies whether the type is private
ACC_PRIVATE 0x0002 Identifies whether the type is private
ACC_PROTECTED 0x0004 Whether the protection type is protectes
ACC_STATIC 0x0008 Identifies whether the type is static
ACC_FINAL 0x0010 Identifies whether or not it is declared final, which only the class can set
ACC_VOLATILE 0x0040 Identifies whether volatile is declared
ACC_TRANSIENT 0x0080 Whether the flag is declared transient
ACC_SYNTHETIC 0x1000 Indicates that this class is not generated by user code, but by the compiler
ACC_ENUM 0x4000 Identifies whether this is an enumeration

The structure of the field table is divided into two parts. The first part is two bytes describing the number of fields (fields_count). The second part is the details of each field (fields_info), The order is access_flags, name_index, descriptor_index, attribute_count, and Attributes. Except for the last unknown attribute information, all are U2 data types.

Moving on to the example, this is a bit awkward because I forgot to put a variable in it, so the first u2 data after the class index is 0000 which means the number of fields is zero, so the subsequent data is also lost. You can only assume a set of data to see the structure of the field table

The bytecode 00 01 00 02 00 03 07 00 00 00
describe Number of field tables Access tokens Field name index The descriptor index of the field Number of attributes
content 1 ACC_PRIVATE 3 7 0

The field table collection does not list fields inherited from the superclass or superclass interface, but it may list fields that do not exist in the original Java code, such as fields that point to an external class instance that are automatically added to an internal class in order to maintain access to external classes. In addition, fields cannot be overloaded in Java, and for bytecodes, only two fields with inconsistent descriptors are valid.

To make it easier to understand, here are some of the terms mentioned above

  • Fully qualified name: Fully qualified name of the Main class in this articlecom/verzqlisnake/Main, just put the package name in.replace/In order to confuse consecutive fully qualified names, one is usually used at the end;“, indicating that the fully qualified name ends.
  • Simple nounThe: value is the name of a method or field without type and parameter modifications, for examplePublic void fun ()andprivate int aThe simple name for thefunanda.
  • Descriptors for methods and fieldsThe: descriptor is used to describe the data type of a field or the parameter list (number, type, and order) and return value of a method. The descriptor contains both primitive data types and those with no return valuevoid, is mainly expressed in the form in the following table.
Describe the character meaning
describe Number of field tables
I Basic int
S Basic type short
J Base type long, not L here, L is the last one
F Base type float
D Base type double
B Basic type byte
C Base type char
Z Basic type Boolean
V Special type void
L Object type, such as Ljava/lang/String

For array types, each bit is described with a prepended [, for example,String[] arrays will be recorded as [Ljava/lang/String, and String[][] arrays will be recorded as [[Ljava/lang/String;int[] arrays will be recorded as [I.

Methods are described by descriptors in the order of the argument list followed by the return value, which is enclosed in a set of parentheses () in the exact order of the arguments. For example, the descriptor of void fun() is ()V, and the descriptor of string.toString () is ()Ljava/lang/String. Public void multi(int I,String j,float[] c) {ILjava/lang/String; (F) V.

1.7 Method Table Set (Field Info)

The structure of the method table is almost identical to that of the field table, and the format and description of the storage are very similar. The method table has the same structure as the field table and consists of two parts. The first part is method counters, the second part is details for each method, This sequence includes access_flags, name_index, descriptor_index, attribute_count, and Attributes. The meaning of this data is also very similar to that of the field table, except for the options for accessing the flags and property table collections.

type The name of the The number of
u2 access_flags 1
u2 name_index 1
u2 descriptor_index 1
u2 attribute_count 1
attribute_info attribute_info attribute_count

Volatile and TRANSIENT keywords cannot modify methods, so they are not included in the access flags of methods. Corresponding methods such as synchronized, native strictFP, and abstract can modify methods. Therefore, these types of marks have been added to the visit marks published by Party A, as shown in the following table

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Identifies whether the method is private
ACC_PRIVATE 0x0002 Identifies whether the method is private
ACC_PROTECTED 0x0004 Check whether the method is protectes
ACC_STATIC 0x0008 Identifies whether the method is static
ACC_FINAL 0x0010 Identifies whether a method is declared final
ACC_SYNCHRONIZED 0x0020 Identifies whether a method is declared synchronized
ACC_BRIDGE 0x0040 A bridge method that identifies whether a method is generated by the compiler
ACC_VARARGS 0x0080 Identifies whether this class accepts indefinite parameters
ACC_NATIVE 0x0100 Identify whether the method is native
ACC_ABSTRACT 0x0400 Identifies whether the method is abstract
ACC_STRICTFP 0x0800 Check whether the method is StricTFP
ACC_SYNTHETIC 0x1000 Identifies whether a method is automatically generated by the compiler

Continuing with this example, the method table data is after the field table data 0002 0001 0007 0008 0001 0009

The bytecode 00 02 00 01 07 00 00 08 00 01 0009
describe Number of method tables Access tokens Method name index Method descriptor index Property sheet counter Attribute name index
content 1 ACC_PUBLIC 7 8 1 9

As you can see from the table above, there are two methods in the method table, the instance constructor

added by the compiler and the main() method in the code. The first method has an access flag of ACC_PUBLIC, a method name index of 7 (for

), and a method descriptor index of 8 (for ()V), which matches the data in the previous constant pool.

   #7 = Utf8 
      
   #8 = Utf8 ()V
   #9 = Utf8 Code
Copy the code

Then the property table counter value is 1, indicating that the property table set of this method has a box of properties, the property name index is 9, corresponding to the constant pool is Code, indicating that this property is the bytecode description of the method.

Method overrides: If a parent method is not overridden in a subclass, method information from the parent does not appear in the method table collection. But again, it is possible to add methods automatically by the compiler, most notably the class constructor

method and the instance constructor

method. Method OverLoad: In Java, in addition to having the same simple nouns as the original method, a method will also have a signature that is completely different from the original method. A signature is a set of field symbol references for each parameter in a method in the constant pool. The return value is not included in the preceding column, so it is impossible to override a method just because the return value is different. In Class files, however, the scope of signature is larger, and two methods can coexist as long as the descriptors are not identical. That is, if two methods have the same name and signature but return different values, they can legally coexist in the same Class file, which is not supported by Java syntax but supported by the Class file.

1.8 Attribute Info

Property sheet has appeared several times in the lecture in front of, in the Class files, fields, tables, collection table method table can bring their own property, have been used to describe certain scenes of proprietary information and other data items in the Class files require strict order, different length and content, attribute table set limit a little loose some, Instead of requiring the property tables to be in strict order, any compiler implementing them can write their own property information to the property tables as long as they do not duplicate existing property names: When the Java virtual machine runs, it ignores properties it does not recognize, and the specific predefined properties are listed in the following table.

The attribute name Use location meaning
Code Method table Bytecode instructions compiled into Java code
ConstantValue Field in the table Constant pool defined by the final keyword
Deprecated Class, method, field list Methods and fields declared deprecated
Exceptions Method table Method throws an exception
EnclosingMethod The class file This property is available only if a class is local or anonymous and identifies the enclosing method of the class
InnerClass The class file Inner class list
LineNumberTable Code attributes The mapping of Java source line numbers to bytecode instructions
LocalVariableTable Code attributes Method local variable description
StackMapTable Code attributes New property in JDK1.6 that allows the new type checker to check and process classes that are required to match local variables and operands of the target method
Signature Class, method table, field table New properties in JDK1.5 to support method signing in the case of generics. Any class, interface, initializer, or member whose generics are preceded by Type Variables or Parameterized types will be recorded by the signature attribute. Since Java generics are implemented using erasers, This property is needed to record relevant information in the generic in order for the face type information to be erased to cause confusion in the signature.
SourceFile The class file Record the source file name
SourceDebugExtension The class file Properties added in JDK1.6 to store additional debugging information
Synthetic Class, method table, field table Flag methods or fields are automatically generated by the compiler
LocalVariableTypeTable class The new properties in JDK1.5, which use characteristic signatures instead of descriptors, were added to describe generic parameterized types after the introduction of generic syntax
RuntimeVisibleAnnotations Class, method table, field table New properties in JDK1.5 provide support for dynamic annotations that indicate those annotations are visible at runtime (when reflection calls are made)
RuntimeInvisibleAnnotations Table, method table, field table The new properties in JDK1.5, by contrast, indicate which annotations are not visible at runtime
RuntimeVisibleParameterAnnotation Method table JDK1.5 of new properties, functions and RuntimeVisibleAnnotations properties similar to the only role for the object
RuntimeInvisibleParameterAnnotation Method table New attribute, role in JDK1.5 like RuntimeInvisibleAnnotations attribute, function object which for method parameters
AnnotationDefault Method table Attribute added in JDK1.5 to record default values for annotation class elements
BootstrapMethods The class file Property added in JDK1.7 to hold bootstrap qualifiers for invokedDynamic directive references

For each attribute, a constant of type CONSTANT_Utf8_info should be applied to the name of the attribute from the constant pool to mark, and the structure of the attribute value is completely submoral. Only a length attribute of U4 is needed to describe the number of bits occupied by the attribute value. The structure of the attribute conforms to the rules is shown in the following figure.

type The name of the The number of
u2 attribute_name_index 1
u4 attribute_length 1
u1 infoattribute_length

Because there are more than two dozen properties in the property list, the following is a brief description of a few properties.

  • , version 1.8.1 Code attributes

The Code in the method body of a Java program, processed by the Javac compiler and eventually transformed into bytecode instructions, is stored in the Code property, which appears in the property set of the method table, but not all method tables must have this property: methods in interfaces or abstract classes do not have Code properties. If the method table had a Code attribute, its structure would look like the following table.

type The name of the The number of
u2 attribute_name_index 1
u4 attribute_length 1
u2 max_stack 1
u2 max_locals 1
u4 code_length 1
u1 code code_length
u2 exception_table_length 1
exception_info exception_table exception_length
u2 attributes_count 1
attribute_info attributes attributes_count

Attribute_name_index: An index to a constant of type CONSTANT_Utf8_info. The constant value is fixed to “Code”, which represents the name of the attribute. Attribute_length: The value length of the attribute. Since the index and length of the attribute name are 6 bytes, the value length of the attribute is fixed to the entire attribute list length minus 6 bytes. Max_stack: the maximum depth of the operand stack. This value is used to allocate the operand stack depth in the stack frame when the vm is installed. This value is exceeded by recursion that does not define regression. Max_locals: storage space required by local variables. The unit here is Slot, which is the minimum unit that a VIRTUAL machine can use to allocate memory for a local variable table. Each local variable occupies a Slot for byte, CHAR, float, int, short, Boolean, and returnAddress integer data whose length does not exceed 32 bits. Two 64-bit data types like double and float require two slots to hold positions. ** method parameters (hidden this in the instance method), display exception handler parameters (that is, exceptions defined by the catch lock in a try-catch statement), and local variables defined in magnification all need to be stored in the local variable table. ** Because Slot can be reused, this maximum value is not the sum of all slots. When code executes more than one local variable, the Slot occupied by that local variable can be used by other local variables, so this value is mainly calculated based on the domain used by the variable. Code_length: indicates the byte code length. Although the length is U4, the virtual machine limits the number of bytecode instructions in a method to U2 (65535), which the compiler will reject. Code: Stores bytecode instructions generated after compilation. Each bytecode instruction is a single byte of type U1. When the virtual machine is supervising a bytecode, it can find the instruction of the bytecode code and know whether the instruction needs to be followed by parameters and what the parameters mean. The value range of a U1 data is 0x00~ 0xFF, that is, a total of 256 instructions can be expressed. At present, The Java VIRTUAL machine has defined the meanings of instructions corresponding to more than 200 encoding values. For specific instructions, you can see the vm bytecode instruction table. Because exception tables are not required for Code attributes, the latter types are not of great importance, so I’ll skip them for now.

  • 1.8.2 Exceptions properties

    The Exceptions attribute is rated with the Code attribute in the method table. The purpose of the Exceptions attribute is to list Checked Exceptions that may be thrown by the methodthrowsThe structure of the exceptions listed after keywords is shown in the following figure.

type The name of the The number of
u2 attribute_name_index 1
u2 attribute_lrngth 1
u2 number_of_exception 1
u2 exception_index_table number_of_exceptions

Number_of_exception: Specifies the checked exceptions that may be thrown by the method. Each checked exception is exception_INDEx_TABLE. Exception_index_table: indicates an index to the CONSTANT_Class_indo constant in the constant pool. Therefore, it represents the type of this checked exception.

  • 1.8.3 SourceFile properties

The SourceFile attribute records the name of the SourceFile that generated the Class file. This information can be turned off or generated using Javac’s -g: None and -g:source options. For most classes, the class name and filename are the same, but some special cases, such as inner classes, are different. If this attribute is not generated, when an exception is thrown, the file name of the error code is not displayed on the stack, as shown in the following table:

type The name of the The number of
u2 attribute_name_index 1
u4 attribute_length 1
u2 sourcefile_index 1

Sourcefile_index: points to a CONSTANT_Utf8_indo constant in the constant pool. The constant value is the file name of the sourcefile.

  • 1.8.3 InnerClass properties

The InnerClass attribute is used to record the association between the InnerClass and the host. If an InnerClass is defined in a class, the compiler will generate the InnerClasses attribute for that class and the InnerClasses it contains.

type The name of the The number of
u2 attribute_name_index 1
u4 attribute_length 1
u2 number_of_classes 1
inner_classes_info inner_classes number_of_classes

Number_of_classes: specifies the number of internal class information. The inner_classes_info table describes the inner_classes_info information for each inner class.

type The name of the The number of
u2 inner_class_info_index 1
u2 outer_class_info_index 1
u2 inner_name_index 1
u2 inner_class_access_flags 1

Inner_class_info_index: Index pointing to a CONSTANT_Class_indo constant in the constant pool, representing a symbolic reference to an inner class. Outer_class_info_index: An index to a CONSTANT_Class_indo constant in the constant pool, representing a symbolic reference to the host class. Inner_class_access_flags: The access flag of an inner class, similar to the access_flags of a class.

  • 1.8.4 ConstantValue properties

    The ConstantValue property notifies the VIRTUAL machine to automatically assign values to static variables. This property can only be used for variables that are modified with the static keyword (class variables), such as int a=1 and static int a=1, which are assigned differently by the virtual machine and at different times. In other words, the member variables of a class are not assigned until the constructor method of the class is executed. For the latter, there are two options: in the class constructor method or using the ConstantValue attribute. The current choice for the Javac compiler is to generate ConstantValue if both final and static are used to modify a variable, and the variable is of a primitive or string type. If the variable is not modified by final, or is not a primitive variable or string, The

    method is chosen for initialization.

    < Clinit >: Class constructor. When the JVM first loads a class file, it is only loaded once. This is the order in which the compiler automatically collects all class variables (static modifiers) and static statement blocks (static{}) from the class. Is determined by the order in which the programmer writes code in the source file.

    : instance constructor methods, called when the instance is created, including calling the new operator; Call the Class or Java. Lang. Reflect. The Constructor object newInstance () method; Call the clone() method of any existing object; Through the Java. IO. ObjectInputStream class getObject () method deserialization.

The

method, unlike the class constructor, does not need to display the constructor calling the parent class. The virtual machine guarantees that the parent class will execute before the subclass’s < Clinit > method executes, so the first method to be executed in the virtual machine must be java.lang.object. < Clinit > requires class-level variables and code blocks to be loaded first, and then object-level information to be loaded.

public class Main { static final int a=1; } bytecode: static final int a; Descriptor: I flags: ACC_STATIC, ACC_FINAL ConstantValue: int 1 Final public class Main {static int a=1; } bytecode: public com.verzqli.snake.Main(); descriptor: ()V flags: ACC_PUBLIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokespecial#1 // Method java/lang/Object."
      
       ":()V
      
         4: returnLineNumberTable: line 12: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Lcom/verzqli/snake/Main; Static {}; static {}; descriptor: ()V flags: ACC_STATIC Code: stack=1, locals=0, args_size=0 0: iconst_1 1: putstatic#2 // Field a:I
         4: returnLineNumberTable: line 13: 0 } public class Main { int a=1; } bytecode: // You can see that initialization is placed in the instance constructor of Main: public com.verzqli.snake.Main(); descriptor: ()V flags: ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: aload_0 1: invokespecial#1 // Method java/lang/Object."
      
       ":()V
      
         4: aload_0
         5: iconst_1
         6: putfield      #2 // Field a:I
         9: return
}
Copy the code

Bytecode instructions

A bytecode instruction is a byte long number representing the meaning of a particular operation, not exceeding 256 bytes in total (all bytecode instruction assembly). Most data type-related bytecode instructions have special characters in their opcode mnemonics to indicate which data type is served specifically, as shown in the following table:

Describe the character meaning
i Basic int
s Basic type short
l Base type long, not L here, L is the last one
f Base type float
d Base type double
b Basic type byte
c Base type char
b Basic type Boolean
a Object type reference

One caveat here is this for byte, char, short, Boolean that are not integers. The compiler extends byte and short data with sign-extend to the corresponding int at compiler or runtime, and Boolean and char data with zero-extend to the corresponding int. Array data of type appeal is also converted to bytecode instructions of type int.

2.1 Load and store instructions.

Load and store instructions are used to transfer data back and forth between the local variable table and operand stack in a stack frame.

< type >load_< subscript > : loads a local variable onto the operand stack. For example, iload_1 loads a local variable of type int (subscript 1,0 is usually this) from the local variable table onto the operation stack, and the rest is similar, e.g. Dload_2,fload_3. < type >store_< subscript > : Stores a value from the top of the operand stack to the local variator table. For example, istore_3 stores a value of type int from the top of the operand stack into local variable 3 with suffix 3, proving that two values already exist in the local variable table. < type > CONST_ < concrete value > : loads a constant onto the operand stack. For example, iconst_3 loads the constant 3 onto the operand stack. Wide extension: When the flag above exceeds 3, instead of underlining, istore 6 is used, and load is written the same way. Bipush, sipush, LDC: The above const directive changes when the value after it becomes too large.

  • When int ranges from -1 to 5, the JVM uses the iconst instruction to push constants onto the stack.
  • When an int is between -128 and 127, the JVM uses bipush to push constants onto the stack.
  • When an int is between -32768 and 32767, the JVM uses sipush to push the constant onto the stack.
  • When an int is between -2147483648 and 2147483647, the JVM uses the LDC instruction to push the constant onto the stack.

See the examples:

 public void save() {
       int a = 1;
       int b = 6;
       int c = 128;
       int d = 32768 ;
       floatF = 2.0 f; } bytecode: Code: stack=1, locals=6, args_size=1 0: iconst_1 1: istore_1 Bipush 6 // pushes constant 6, also in wide extension form 4: istore_2 // stores the top 6 in local variable table with subscript 2 5: sipush 128 // pushes constant 128, 8: Istore_3 // store top 128 in local variable table with subscript 3, same meaning 9: LDC#2 // int 32768
       11: istore        4
       13: fconst_2
       14: fstore        5
       16: return
Copy the code

2.2 Operation instruction.

Operations are mainly divided into two types: the instruction for the operation of credit data and the instruction for the operation of floating point data. As mentioned above, the arithmetic quality of byte, CHAR, short, and Boolean are replaced by the instruction of int type. Integer and floating-point operations also behave differently when removed and led out. Specific instructions can also be added to the corresponding type before the operation instruction, such as addition instruction: iadd,ladd,fadd,dadd.

  • Addition instruction :(I,l,f,d) add
  • Subtraction instruction :(I,l,f,d) sub
  • Multiplication instruction :(I,l,f,d) mul
  • Division instruction :(I,l,f,d) div
  • Complementary instructions :(I,l,f,d) rem
  • Take the inverse instruction :(I,l,f,d) neg
  • Displacement commands: ISHL, ISHR, IUSHR, LSHL, LSHR, LUShr
  • Bitwise or instruction: IOR, LOR
  • Bit and instruction: IAND, LAND
  • Xor instruction by bit: IXOR, LXOR
  • Local variable increment: iinc (for example i++ in the for loop)
  • Comparison commands: DCMPG, DCMPL, FCMPG, FCMPL, LCMP

There is no need to memorize the above instructions, when you need to look for it, and you will be familiar with it naturally. I won’t go into details about the precision loss of floating-point operations.

2.3 Type conversion instructions.

Conversion instructions convert two different numeric types to each other. These conversions are typically used to implement display conversion operations in user code.

The Java VIRTUAL machine directly supports wide data type conversions (converting small data to large data types) without the need for explicit conversion instructions, such as int to convert long, float, and double. Int a=10; long b =a

Java virtual machine conversion to narrow the data type conversion when the call conversion instructions must be displayed. Example: long b=10; Int a = (long)b.

The bytecode instructions for type conversion are relatively simple, < pre-type >2< post-type >, such as i2L, L2I,i2f,i2d. The checkcast command is used when the data type is similar to the narrow data type.

public class Main { public static void main(String[] args) { int a = 1; long b = a; Parent Parent = new Parent(); Son son = (Son) Parent; }} Bytecode: Code: stack=2, locals=6, args_size=1 0: iconST_1 1: istore_1 2: ILoAD_1 3: i2L 4: lstore_2 5: new#2 // class com/verzqli/snake/Parent
         8: dup
         9: invokespecial #3 // Method com/verzqli/snake/Parent."
      
       ":()V
      
        12: astore        4
        14: aload         4
        16: checkcast     #4 // class com/verzqli/snake/Son
        19: astore        5
        21: return
Copy the code

The checkcast directive is responsible for the error in the checkcast directive. The parent class cannot be converted to a subclass. The checkcast directive is responsible for the error in the checkcast directive.

2.4 Object creation and access instructions

Although instances and arrays are both objects, Java Suniki uses different bytecode instructions for the creation and manipulation of class instances and arrays. Once the object is created, fields or array elements in the object instance or array instance can be obtained by object access instructions as follows.

  • new: the instruction to create an instance of a class
  • Newarray, AneWarray, multianewarray: the instruction to create an array
  • Getfield, putfield, getStatic, putStatic: Accesses class fields (static fields, called class variables) and instance fields (non-static fields,).
  • B, C, S, I, L, F, D, aThe basic data type plus ALOAD obviously loads an array element onto the operand stack.
  • (B, C, S, I, L, F, D, A: Stores the value at the top of the operand stack into an array element as above.
  • arraylength: Takes the array length
  • Instanceof, checkcast: an instruction that checks the class instance type.

2.4 Operand stack management instructions

The Java virtual machine provides instructions for manipulating the operand stack directly, just as it would in a normal data structure.

  • Pop, pop2: Removes one or two elements from the top of the operand stack.
  • Dup, DUP2, DUp_X1, DUp2_x1, DUP_x2, dup2_x2: Takes one or two values to the top of the stack and copies one or two of them and pushes them back to the top.
  • swap: Switch the top two numbers on the stack.

2.5 Method calls and return instructions.

The instructions for a method call need only contain the following five

  • invokespecial: used to call instance methods that require special handling, including instance initializer methods, private methods, and superclass methods.
  • invokestatic: is used to call static methods.
  • invokeinterface: is used to call interface methods. It searches at runtime for an object that implements the interface method and finds the appropriate method to call.
  • invokevirtual: an instance method used to call an object, dispatching according to the actual type of the object.
  • invokedynamic: used to dynamically resolve the method referenced by the call point qualifier at run time and execute the method. The dispatch logic of the first four instructions is fixed inside the Java VIRTUAL machine, and the dispatch logic of this instruction is determined by the user set boot method.
  • (I, l, f, d, empty) return: Determines the data type to be returned based on the previous type. Void denotes void

2.5 Exception Handling Instructions.

The operations (throws) that display an exception thrown in a Java program are implemented by the Athrow directive. But catch statements are handled not by bytecode instructions, but by exception tables, as shown in the following example.

public class Main { public static void main(String[] args) throws Exception{ try { Main a=new Main(); }catch (Exception e){ e.printStackTrace(); }} bytecodes: public static void main(java.lang.string []) throws java.lang.exception; descriptor: ([Ljava/lang/String;)V flags: ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=2, args_size=1 0: new#2 // class com/verzqli/snake/Main
        3: dup
        4: invokespecial #3 // Method "
      
       ":()V
      
        7: astore_1
        8: goto          16
       11: astore_1
       12: aload_1
       13: invokevirtual #5 // Method java/lang/Exception.printStackTrace:()V
       16: return
Copy the code

2.6 Synchronization Command

Java virtual machines can support method-level synchronization and synchronization of a sequence of instructions within a method, both of which are implemented using Monitor. Java normally runs synchronously without the need for bytecode control. A virtual machine can tell whether a method is declared synchronous from the ACC_SYNCHRONIZE access flag in the method table structure of the method constant pool. When the method is invoked, the calling instruction checks to see if the ACC_SYNCHRONIZE access representation for the method is set. If so, the thread of execution requires that Monitor be held before the method can execute, and that Monitor be released when the method completes. During method execution, the thread of execution holds the Monitor, and no other thread can retrieve the same Monitor. If an exception is thrown during the execution of a synchronous method and the secondary exception cannot be handled within the method, the Monitor held by the synchronous method is automatically released when the exception is thrown outside the synchronous method. Synchronizing a sequence of instructions is usually represented by a synchronized statement block. Java VIRTUAL machine instructions include monitorenter and Monitorexit to support the synchronized keyword. The following example

public class Main {
    public void main() {
        synchronized (Main.class) {
            System.out.println("synchronized");
        }
        function(a); } private voidfunction() {
        System.out.printf("function"); }} bytecode: Code: stack=3, locals=3, args_size=1 0: LDC#2 // class com/verzqli/snake/Main2: dUP // copy top reference Main 3: astore_1 // store top application to local variable astore1 4: Monitorenter // lock top element (Main) and start synchronization 5: getStatic#3 // Field java/lang/System.out:Ljava/io/PrintStream;
         8: ldc           #4 // String Synchronized LDC creates this String at run time
        10: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;) V13: ALOad_1 // Push astore1 of local variable table (Main) 14: Monitorexit // exit synchronization 15: goto 23 Astore_2 // Here is the abnormal path, put the top element in the local variable table 19: ALOad_1 // Put astore1 in the local variable table (Main) 20: Monitorexit // Exit synchronization 21: Aload_2 // Push the exception that was previously stored in the local variable astore2 22: athrow // throw the long line of the exception object to the caller of the main method 23: ALOad_0 // push the class this so that the next call to the class method 24: invokespecial#6 // Method function:()V
        27: return
Copy the code

The compiler must ensure that regardless of how the method completes, each Monitorenter directive called in the method must execute its corresponding Monitorexit directive, whether the method terminates normally or abnormally.

Three examples

What I have said before is that empty theory is boring and difficult to understand. Theory is all about fun. If you really understand it, you have to see examples.

Example 1:

I believe that people who have been interviewed have basically seen this interview question, and then debated the question of value passing or reference passing. Here is a bytecode analysis of this question.

public class Main {
    String str="newStr";
    String[] array={"newArray1"."newArray2"};

  public static void main(String[] args) {
      Main main=new Main();
      main.change(main.str, main.array);
      System.out.println(main.str);
      System.out.println(Arrays.toString(main.array));
  }

  private   void change(String str, String[] array) {
      str="newStrEdit";
      array[0]="newArray1Edit"; NewStr [newArray1Edit, newArray2] bytecodes: private void change(java.lang.String, java.lang.string []); descriptor: (Ljava/lang/String; [Ljava/lang/String;)V flags: ACC_PRIVATE Code: stack=3, locals=3, args_size=3 0: ldc#14 // String newStrEdit
       2: astore_1
       3: aload_2
       4: iconst_0
       5: ldc           #15 // String newArray1Edit
       7: aastore
       8: return

}
Copy the code

The bytecode content of the main method can be ignored here, focusing on the change method, which is illustrated below.

This is just entering the way, this time is not the content of the execution method, saved the three values of local variables, the first is that this refers to the class, in ordinary method can get the external global variable is because within a method is the first class of the local variables of this, when access to external variables, By pushing this to aload_0, we can retrieve all of the class’s member variables (i.e., external global variables). Since this method passes in two values, the local variable table stores references to these two objects, which are memory addresses on the heap.

str = "newStrEdit";
astore_1

array[0] = "newArrar1Edit";
aastore

Example 2:

You should have seen the previous example to get a sense of what a local variable scale is, but I’m not going to draw that in this example. This example is also a common interview question to determine the order of try-catch-final-return execution.

Finally is a block of code that is eventually executed. A return in finally overwrites a return ina try and catch. Modifying a local variable in finally does not affect the value of a local variable ina try or catch. Unless the value returned in trycatch is a reference type.

 public static void main(String[] args) {
        Main a=new Main();
        System.out.println("args = [" + a.testFinally() + "]");;
    }

    public   int testFinally(){
        int i=0;
        try{
            i=2;
            return i;
        }catch(Exception e){
            i=4;
            returni; }finally{ i=6; } bytecode: public inttestFinally(); Descriptor: ()I flags: ACC_PUBLIC Code: stack=1, locals=5, args_size=1 0: iconst_0 Istore_1 // Assigns memory variable 1 (I) I =0 2: iconst_2 // Constant 2 to the stack 3: istore_1 // Assigns memory variable 1 (I) I =2 4: ILoAD_1 // Memory variable 1 (I) to the stack 5: Istore_2 // store data in memory variable 2 for the following reasons: 6: bipush 6 // constant 6 push 8: istore_1 // save memory variable 1 9: iload_2 // load memory variable 2 10: Ireturn (I) I =2 11: astore_2 // if (I) =2 11: astore_2 // Istore_1 // save local variable 1 14: iloAD_1 // load local variable 1 15: istore_3 // Save the contents of local variable 1 to local variable 3 for the same reason as above 5 16: 18: istore_1 // saves local variable 1 19: iload_3 // loads local variable 3 and returns 20: Ireturn // similar statement above, but redundant code generated by catch-finally path 21: astore 4 // finally, where exceptions are thrown 23: bipush 6 25: istore_1 26: aload 4 28: athrow Exception table: from to targettype2 6 11 Class Java /lang/Exception // If 2-6 occurs the specified Exception (try), start from 11. 2 6 21 any // If 2-6 occurs any other Exception (finally), Start from 21 11 16 21 any // If there is any catch between 11 and 16, start from 21 21 23 21 any //Copy the code

After Java1.4, the Javac compiler stopped generating JSR and ret directives for finally statements. When a finally block is present in exception handling, The compiler automatically generates the content of the finally block after every possible branch path to implement the finally semantics. (21 ~ 28). But in our Java code, the finally statement is last. When the compiler generates the bytecode, it actually moves the execution of the finally statement before the iReturn instruction. The instructions are reordered. So, at the bytecode level, we’ve explained why finally statements are always executed!

If there is a return in the try, the code in finally is executed before the return, but a copy of the variable is saved (lines 5 and 15). Finally modiates the original variable, but the return ina try returns a duplicate variable, so if an assignment is performed, the variable may not change even if the code in finally is executed, depending on whether the variable is of a primary or reference type. But if you add a return to finally, then lines 9 and 19 load the values changed in the finally block (ILoAD_1), and then add iload_1 and iReturn at the end. Take a look at the bytecode for yourself.

Example 3:

So let’s do the same example up here, but let’s make a change

 public static void main(String[] args) {
        Main a = new Main();
        System.out.println("args = [" + a.testFinally1() + "]");
        System.out.println("args = [" + a.testFinally2() + "]");
    }

    public StringBuilder testFinally1() {
        StringBuilder a = new StringBuilder("start");
        try {
            a.append("try");
            return a;
        } catch (Exception e) {
            a.append("catch");
            return a;
        } finally {
            a.append("finally");
        }
    }

    public String testFinally2() {
        StringBuilder a = new StringBuilder("start");
        try {
            a.append("try");
            return a.toString();
        } catch (Exception e) {
            a.append("catch");
            return a.toString();
        } finally {
            a.append("finally"); Args = [startTryFinally] args = [starttry]Copy the code

I won’t enumerate global bytecode here, there are a lot of two methods, you can try to see for yourself. Here’s why the first result returned did not finally. First of all, this method stores a StringBuilder address in table 1. There is no difference between executing the try~finally part and copying the address of variable 1 to variable 3. Note that these two addresses are the same. The s.tostring () method looks like it comes after a return, but it’s part of the try block. It’s a two-step call to toString() that generates a new string, starttry, and returns, So here’s the bytecode logic:

      17: aload_1
      18: invokevirtual #12 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      21: astore_2
      22: aload_1
      23: ldc           #18 // String finally
      25: invokevirtual #8 // Method java/lang/StringBuilder.append:(Ljava/lang/String;) Ljava/lang/StringBuilder;
      28: pop
      29: aload_2
      30: areturn
Copy the code

You can clearly see that after calling the append method to concatenate “start” and “try”, the toString() method is called and the value is stored in the local variable 2. Finally does not copy a variable as above. Instead, it continues append with a reference to local variable 1. The result is stored in local variable 1. Note, however, that the value of StringBuilder pointed to in local variable 1 is starttryFinally, so this is the value returned in method 1.

4. How to quickly view the bytecode

Setting->Tools->External Tools to create a custom Tools.

The first line is the javap.exe address of your computer, the second line is the command you want, and the third line is the display location. After setting up the code, right click the code to view the bytecode instructions.

5. 5. Smart refrigerator

The object is 5.1newWhy is an instruction executed after it is createddupCopy the data at the top of the stack and push it onto the stack.

After the object is new, we need to call invokespecial

to initialize it. In this case, we need to get a copy of the memory address allocated by the new instruction, and then there is an address in the stack for the object to be called elsewhere. Otherwise, if the reference does not exist in the stack, the class can not be accessed anywhere. So even if the class isn’t called anywhere, there’s still a reference to it on the stack.

6. Summary

Originally, I just wanted to write some notes about bytecode instructions, but the result is more and more. Most of the theoretical knowledge in this paper comes from “In-depth Understanding of Java VIRTUAL Machine — Zhou Zhiming”. If I write too much, mistakes are inevitable.