Three ways to interpret bytecode instructions

What’s in the bytecode file?

When the source code is compiled by the compiler, it generates a bytecode file, which is a binary class file that contains instructions for the JVM, unlike C or C++, which generates machine code directly from the compiler (one reason C is so efficient).

What is a byte code instruction?

Java virtual machine instructions consist of a byte long opcode representing the meaning of a particular operation followed by operand representing the parameters required for that operation. Many instructions in the virtual machine do not contain operands, only one opcode.

  • For example: opcodes (operands)

How to interpret binary bytecode for virtual machine interpretation execution?

  • Method one: look at them binary by binary. The HEX/Editor plugin is required, or the Binary Viewer is required
  • Method 2: Use the JavAP directive: Enter the following command on the terminal using the JDK reverse resolution tool
    • javap -v xxx.class
    • Written to the filejavap -v xxx.class >xxx.txt
  • Method 3: Use IDEA plug-in: Jclasslib or JclassLib Bytecode Viewer client tool. (Better visualization)

Class file structure

  • The official document location: docs.oracle.com/javase/spec…
  • The essence of the Class Class: Every Class file has a unique Class or interface definition, but on the other hand, a Class file does not actually exist as a disk file. A Class file is a set of binary streams in 8-bit byte base units.
  • Class file format
    • Class is not structured like a description language such as XML because it does not have any delimiters. Therefore, the data items in it, whether byte order or number, are strictly limited, which byte represents what meaning, how long, what order, are not allowed to change.
    • The Class file format stores data in a manner similar to the C language structure, which has only two data types: unsigned numbers and tables.
      • Unsigned numbers are basic data types. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or string values encoded in UTF-8.
      • A table is a composite data type consisting of multiple unsigned numbers or other tables as data items, all of which habitually end with “_info”. Tables are used to describe hierarchical composite structures of data, and the entire Class file is essentially a table. Since tables have no fixed length, they are usually preceded by a number

Official documents are as follows:

ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}
Copy the code
type The name of the instructions The length of the The number of
u4 magic Magic number, identify Class file format 4 bytes 1
u2 minor_version Minor Version Number (minor version) 2 bytes 1
u2 major_version Major Version number (large version) 2 bytes 1
u2 constant_pool_count Constant pool counter 2 bytes 1
cp_info constant_pool Constant pool table N bytes constant_pool_count-1
u2 access_flags Access to identify 2 bytes 1
u2 this_class Class index 2 bytes 1
u2 super_class Index of the parent class 2 bytes 1
u2 interfaces_count Interface count 2 bytes 1
u2 interfaces Interface index set 2 bytes interfaces_count
u2 fields_count Field counter 2 bytes 1
field_info fields Field in the table N bytes fields_count
u2 methods_count Method counter 2 bytes 1
method_info methods Method table N bytes methods_count
u2 attributes_count Attribute counter 2 bytes 1
attribute_info attributes Property sheet N bytes attributes_count

1. Magic number: the symbol of a Class file

Magic Number

  • The 4-byte unsigned integer at the beginning of each Class file is called the Magic Number.
  • Its sole purpose is to determine whether the file is a valid Class file that can be accepted by the virtual machine. That is: magic number is the Class file identifier.
  • Fixed magic value to 0xCAFEBABE. It won’t change.
  • If a Class file does not start with 0xCAFEBABE, the vm will throw the following error during file verification:
    Error: A JNI error has occurred,  please check your installation and try again Exception in thread "main" java.lang.ClassFormatError: Incompatible magic value 1885430635 in classfile StringTestCopy the code
  • The use of magic numbers rather than extensions for identification is mainly for security reasons, since file extensions can be changed at will.

2. Version of the Class file

  • The next four bytes of the magic number store the version number of the Class file. It’s also 4 bytes. The fifth and sixth bytes represent the compiled minor version number minor_version, while the seventh and eighth bytes represent the compiled major version number major_version.
  • Together, they form the format version number of the class file. For example, if the major version number of a Class file is M and the minor version number is M, the format version number of the Class file is determined to be M.m
  • The mapping between version numbers and Java compilers is as follows:
    • Class file version number and platform

      Major Version (decimal) Minor version (decimal) Compiler version
      45 3 1.1
      46 0 1.2
      47 0 1.3
      48 0 1.4
      49 0 1.5
      50 0 1.6
      51 0 1.7
      52 0 1.8
      53 0 1.9
      54 0 1.10
      55 0 1.11
  • Java version numbers start at 45, and each major IDK release after JDK1.1 increases the major version number by 1.
  • Different versions of Java compilers compile Class files that correspond to different versions. Currently, older Java virtual machines can execute Class files generated by older compilers, but older Java virtual machines cannot execute Class files generated by older compilers. Otherwise the JVM will throwjava.lang.UnsupportedClassVersionErrorThe exception.
  • In practical application, the difference between development environment and production environment may lead to this problem. Therefore, we need to pay special attention to whether the JDK version is the same as the IDK version in production environment.
    • If the JDK version of the VM is 1.k (k>=2), the corresponding class file version ranges from 45.0 to 44+ K. 0 (including both ends).

3. Constant pool: Stores all constants

  • The constant pool is one of the richest areas in the Class file. Constant pooling is also crucial for field and method parsing in Class files.
  • With the development of The Java virtual machine, the content of constant pool is becoming more and more abundant. The constant pool is the cornerstone of the entire Class file.
  • The version number is followed by the number of constant pools and several constant pool entries.
  • The number of constants in a constant pool is not fixed, so you need to place an unsigned number of type U2 in the constant pool entry, representing the constant pool capacity count (constant_pool_count). Contrary to language custom in Java, the capacity count starts at 1 instead of 0.
    • type The name of the The number of
      U2 (unsigned number) constant_pool_count 1
      Cp_info (table). constant_pool constant_pool_count-1
    • The data type define instructions
      Unsigned number Unsigned numbers can be used to describe numbers, index references, quantitative values, or utF-8 encoded string values. Unsigned numbers belong to the basic data type. U1, U2, U4, and U8 represent 1 byte, 2 byte, 4 byte, and 8 byte respectively
      table A table is a compound data structure composed of multiple unsigned numbers or other tables. All tables end with “_info”. Since tables have no fixed length, they are usually preceded by a number.
  • As you can see from the table above, the Class file uses a front-loaded capacity counter (constant_pool_count) followed by several consecutive data items (constant_pool) to describe the constant pool contents. We call this continuous set of constant pool data a constant pool set.
    • The constant pool entry is used to store various literal and symbolic references generated at compile time in the runtime constant pool when the class is loaded into the method area

Constant pool counter: constant_pool_count

  • Because the number of constant pools is variable and the duration is short, two bytes need to be placed to represent the constant pool capacity count.
  • Constant pool capacity count (type U2) : Starting with 1, indicates how many constants are in the constant pool. Constant_pool_count =1 indicates that there are 0 constant entries in the constant pool, and the Demo value is:

The value is 0x0016, or 22 if you pinch it. Notice that there are actually only 21 constants. The index ranges from 1 to 21. Why is that? > < p style = “max-width: 100%; clear: both; min-height: 1pt; This is so that some of the data that points to the constant pool index value will need to be expressed “without reference to any constant pool item” under certain circumstances, which can be represented by the index value 0.

Constant pool table: constant_pool []

  • Constant_pool is a table structure with indexes ranging from 1 to CONSTANt_pool_count – 1. It tells you how many constant terms there are.

  • There are two main types of constants in the constant pool: Literal and Symbolic References.

    • Literals: primitive data types, string type constants, etc
    • Symbolic references: Symbolic references to classes, fields, methods, interfaces, etc
  • It contains all string constants, class or interface names, field names, and other constants referenced in the class file structure and its substructures. Each item in the constant pool has the same characteristics. The first byte is a type marker that determines the format of the item. This byte is called a Tag byte.

  • Constant types and structures

    type Mark (or logo) describe
    CONSTANT_utf8_info 1 The character string is utF-8 encoded
    CONSTANT_Integer_info 3 Integer literals
    CONSTANT_Float_info 4 Floating point literals
    CONSTANT_Long_info 5 Long integer literals
    CONSTANT_Double_info 6 A double – precision floating-point literal
    CONSTANT_Class_info 7 Symbolic reference to a class or interface
    CONSTANT_String_info 8 String type literals
    CONSTANT_Fieldref_info 9 Symbolic reference to a field
    CONSTANT_Methodref_info 10 Symbolic references to methods in a class
    CONSTANT_InterfaceMethodref_info 11 Symbolic references to methods in the interface
    CONSTANT_NameAndType_info 12 Symbolic reference to a field or method
    CONSTANT_MethodHandle_info 15 Represents a method handle
    CONSTANT_MethodType_info 16 Flag method type
    CONSTANT_InvokeDynamic_info 18 Represents a dynamic method call point
Literal and symbolic references

Before we can interpret these constants, we need to clarify a few concepts. There are two main types of constants in the constant pool: literals and Symbolic References. The following table:

constant Concrete constant
literal Text string
A constant value declared as final
Symbolic reference Fully qualified names of classes and interfaces
The name and descriptor of the field
The name and descriptor of the method
  • Fully qualified name

Com/DSH /test/Demo this is the fully qualified name of the class. It just replaces the “.” of the package name with “/”. In order to avoid confusion between consecutive fully qualified names, it is usually used with a “; “. Indicates that the fully qualified name ends.

  • The simple name

A simple name is a method or field name that has no type or parameter modification. In the example above, the add () method and num fields of the class have simple names add and num, respectively.

  • The descriptor

Descriptors are used to describe the data type of a field, the parameter list of a method (including the number, type, and order), and the return value. According to the descriptor rules, basic data types (byte, char, double, float, int, long, short, Boolean) and void types representing no returned values are represented by an uppercase character, while object types are represented by the character L plus the fully qualified name of the object, as shown in the following table:

identifier meaning
B Basic data type Byte
C Char, the base data type
D The base data type double
F The basic data type float
I Base data type int
J Base data type long
S The basic data type short
Z Boolean is the basic data type
V Stands for void
L Object types, such as:Ljava/lang/Object;
[ Array type, representing a one-dimensional array. Such as:double[][][] is [[[D

Methods are described by descriptors in the order of the argument list followed by the return value. The argument list is placed within a set of parentheses () in strict order. For example, the descriptor of method java.lang.string toString () is () Ljava/lang/String; , the descriptor of method int ABC (int [] x, int y) is ([II] I

  • Supplementary notes:

The virtual machine does dynamic linking only when the Class file is loaded. That is, the final memory layout of the various methods and fields is not stored in the Class file. Therefore, symbolic references to these fields and methods cannot be directly used by the virtual machine without conversion. When the virtual machine is running, the symbolic reference needs to be obtained from the constant pool, and then replaced by a direct reference during the parsing phase of the class loading process, and the detailed memory address. Here are the differences and associations between symbolic and direct references:

  • Symbolic reference: A symbolic reference describes the referenced object as a set of symbols, which can be any literal, as long as it is used to unambiguously locate the object. Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily already loaded into memory.
  • Direct reference: A direct reference can be a pointer to a target, a relative offset, or a handle that can be indirectly located to the target. A direct reference is related to the memory layout implemented by the VIRTUAL machine. The translation of a symbolic reference on different virtual machine instances will not be the same. If there is a direct reference, the target of the reference must already exist in memory.
Common types and structural details

Resolves all constants in the constant pool
public class Demo{ private int num = 1; public int add(){ num = num + 2; return num; }}Copy the code

For example, the first entry, 0a, is a method symbol reference that can be drawn from a bytecode file

Summary of constant pool entry data

Why 18 constants do not appear byte, short, char, Boolean: They can be interpreted as Integer when compiled

Conclusion a

  • What these 14 tables (or constant item structures) have in common is that the first bit at the beginning of the table is a tag of type U1, which indicates which table structure, or constant type, is being used for the constant item.
  • In the constant pool list, the CONSTANT_Utf8_info constant entry is a modified UTF-8 encoding that stores constant string information such as literal strings, fully qualified names of classes or interfaces, simple names of fields or methods, and descriptors.
  • Another feature of the 14 constant item structures is that 13 of them occupy fixed bytes, while only CONSTANT_Utf8_info occupies fixed bytes and its size is determined by length. Why is that? Because, from the content of the constant pool to store its storage is literal and symbolic references, eventually the content can be a string, the string size is to determine when to write programs, such as you define a class, the class name can take long take short, so before didn’t compile, size is not fixed, compiled, by utf 8 encoding, you can know its length.

Conclusion two

  • Constant pool: A repository of resources in a Class file. It is the data type that is most associated with other items in the Class file structure (many of the following data types will point to this) and one of the data items that occupies the most space in the Class file.
  • Why include this in the constant pool

Java code is not “wired” for Javac compilation like C and C++, but dynamically linked when the virtual machine loads the Class file. That is, the final memory layout of each method or field is not stored in the Class file, so symbolic references to these fields or methods cannot be used directly by the virtual machine without running time conversion to the actual memory entry address. When the virtual machine is running, symbolic references need to be retrieved from the constant pool and then parsed and translated into specific memory addresses at class creation or runtime. More on class creation and dynamic linking will be explained during the virtual machine class loading process

4. Access flag (Access_flag, access flag, access flag)

  • After the constant pool, the access tag is immediately followed. The two-byte token identifies some Class or interface level access information, including whether the Class is a Class or an interface; Whether it is defined as public; Whether to define an abstract type; If it is a class, whether it is declared final, etc. The various access tags are shown below:
    Sign the name Flag values meaning
    ACC_PUBLIC 0x0001 The flag is of type public
    ACC_FINAL 0x0010 The flag is declared final and only the class can set it
    ACC_SUPER 0x0020 The flag allows the use of the new semantics of the Invokespecial bytecode instruction. This flag defaults to true for classes compiled after JDK1.0.2. (Using enhanced methods to call superclass methods)
    ACC_INTERFACE 0x0200 Flag this is an interface
    ACC_ABSTRACT 0x0400 Whether it is of the abstract type. For interfaces or abstract classes, the second flag value is true and the other types are false
    ACC_SYNTHETIC 0x1000 Flag that this class is not generated by user code (i.e., classes generated by the compiler, no source code counterpart)
    ACC_ANNOTATION 0x2000 This is a note
    ACC_ENUM 0x4000 Flag This is an enumeration
  • Class access is usually a constant starting with ACC_.
  • Each type of representation is achieved by setting specific bits in the 32 bits of the access tag. For example, if the public final class, the marked ACC_PUBLIC | ACC_FINAL.
  • Using ACC_SUPER allows a class to more accurately locate the super.method () method of its parent class, which modern compilers set and use.
  • In demo. class, the value 21 is the sum of ACC_PUBLIC (1) and ACC_SUPER (20)

added

  1. A class file with the ACC_INTERFACE flag represents an interface rather than a class, and vice versa represents a class rather than an interface.
    • 1) If a class file has the ACC_INTERFACE flag set, the ACC_ABSTRACT flag must also be set. It can no longer set the ACC_FINAL, ACC_SUPER, or ACC_ENUM flags.
    • 2) If the ACC_INTERFACE flag is not set, the class file can have all of the flags in the table above except the ACC_ANNOTATION flag. Except, of course, for mutually exclusive flags like ACC_FINAL and ACC_ABSTRACT. The two marks shall not be set at the same time.
  2. The ACC_SUPER flag is used to determine which execution semantics are used by the Invokespecial instruction in a class or interface.Compilers targeting the Java virtual machine instruction set should set this flag. For Java SE 8 and later, the Java virtual machine assumes that every class file has the ACC_SUPER flag set, regardless of the actual value of the flag in the class file and regardless of the version number of the class file.
    • 1) The ACC_SUPER flag is designed to be backward compatible with code compiled by older Java compilers. The current ACC_SUPER flag is undefined in access_flags generated by compilers prior to JDK 1.0.2 and will be ignored by Java virtual machine implementations of 0RACle if it is set.
  3. The ACC_SYNTHETIC flag means that the class or interface is generated by the compiler, not the source code.
  4. The annotation type must have the ACC_ANNOTATION flag set. If the ACC_ANNOTATION flag is set, then the ACC_INTERFACE flag must also be set.
  5. The ACC_ENUM flag indicates that the class or its parent class is an enumerated type.

5. Class index, parent index, interface index collection

  • After the tag is accessed, the class category, the parent class category, and the interface implemented are specified in the following format:
    The length of the meaning
    u2 this_class
    u2 super_class
    u2 interfaces_count
    u2 interfaces[interfaces_count]
  • These three pieces of data determine the inheritance of the class.
    • The class index is used to determine the fully qualified name of the class
    • The parent index is used to determine the fully qualified name of the parent of this class. Since the Java language does not allow multiple inheritance, there is only one parent class index, and all Java classes except java.lang.0bject have a parent class, so none of the Java classes except java.lang.object have a parent class index of 0.
    • The interface index collection is used to describe the interfaces implemented by the class. The implemented interfaces are arranged from left to right after the implements statement (or, if the class itself is an interface, the extends statement).
  1. This_class (class index)

    • An unsigned 2-byte integer pointing to the index of the constant pool. It provides fully qualified names for classes such as com/atguigu/java1/Demo. This_class must be a valid index value for an item in the constant pool table. The members of the constant pool at this index must be a CONSTANT_Class_info structure, which represents the class or interface defined by the class file.
  2. Super_class (superclass index)

    • An unsigned 2-byte integer pointing to the index of the constant pool. It provides the fully qualified name of the parent of the current class. If we don’t inherit any classes, the default is Java /lang/Object classes. Also, since Java does not support multiple inheritance, there is only one parent class.
    • Superclass cannot point to a parent class that is final.
  3. interfaces

    • Points to a constant pool index collection that provides a symbolic reference to all implemented interfaces
    • Since a class can implement multiple interfaces, you need to store indexes of multiple interfaces in an array, indicating that each index of the interface is also a CONSTANT_Class pointing to a constant pool (which of course must be an interface, not a class).
  • 3.1 Interfaces_count
    • The value of the interfaces_count item represents the number of direct superinterfaces for the current class or interface.
  • 3.2 interfaces [] (Collection of interfaces)
    • The value of each member in interfaces [] must be a valid index value for an entry in the constant pool table, and its length is interfaces_count. Each member
    • Interfaces [I] must be a CONSTANT_Class_info structure, where 0 <= I < interfaces_count. In interfaces[], the interfaces are represented in the same left-to-right order as given in the corresponding source code; that is, interfaces[0] correspond to the leftmost interface in the source code.

6. Set of field tables

fields

  • Used to describe variables declared in an interface or class. Fields include class-variable and instance-level variables, but exclude local variables declared inside methods or code blocks.
  • The name of the field and the data type for which the field is defined are not fixed. They can only be described by referring to constants in the constant pool.
  • It points to the constant pool index collection, which describes the complete information for each field. For example, the identifier of a field, the access modifier (public, private, or protected), whether it is a class or instance variable (static modifier), whether it is a constant (final modifier), etc.

Matters needing attention

  • The collection of field tables does not list fields inherited from a parent class or an implemented interface, but it may list fields that do not exist in the original Java code. For example, in order to keep the external class accessible, fields pointing to the external class instance are automatically added to the inner class.
  • Fields cannot be overloaded in the Java language. Two fields must have different names, regardless of whether their data types and modifiers are the same. But for bytecodes, if the descriptors of two fields are different, the same name is valid.

6.1 fields_count (field counter) **

The value fields_count represents the number of members in the current class file fields table. It is represented by two bytes. Each member of the fields table is a field_info structure that represents all class or instance fields declared by the class or interface, not including variables declared inside the method or fields inherited from the parent class or interface.

6.2 Fields [] **

  • Each member in the fields table must be a data item of a fields_info structure that represents a complete description of a field in the current class or interface.
  • The information in a field includes the following information. The individual modifiers in this information are Booleans, either present or absent.
    • Scope (public, private, protected modifiers)
    • Is it an instance variable or a class variable (static modifier)
    • Variability (final)
    • Concurrency visibility (Volatile modifier, whether reads and writes are forced from main memory
    • Serialization (TRANSIENT modifier)
    • Field data type (basic data type, object, array) I
    • The field names
  • Field table structure

As a table, the field table also has its own structure: Type number meaning name | | | — – | — – | — – | | – u2 access_flags | access tokens u2 | | 1 name_index | | 1 field name index u2 | descriptor_index | descriptor indexing U2 | | 1 attrubutes_count counter | 1 | attributes attribute_info | attributes attribute set | | attributes_count

6.2.1 Field Access Identifier

We know that a field can be modified by a variety of key variables, such as scope modifiers (public, private, protected), static modifiers, final modifiers, volatile modifiers, and so on. Therefore, it can mark fields with flags just like the access flags of a class. The access flags for the fields are as follows:

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the field is public
ACC_PRIVATE 0x0002 Whether the field is private
ACC_PROTECTED 0x0004 Whether the field is protected
ACC_STATIC 0x0008 Whether the field is static
ACC_FINAL 0x0010 Whether the field is final
ACC_VOLATILE 0x0040 Whether the field is volatile
ACC_TRANSTENT 0x0080 Whether the field is TRANSIENT
ACC_SYNCHETIC 0x1000 Whether the field is generated automatically by the compiler
ACC_ENUM 0x4000 Whether the field is enum
6.2.2 Field name Index

Query the specified index entry in the constant pool based on the value of the field name index

6.2.3 Descriptor Index

Descriptors are used to describe the data type of a field, the parameter list of a method (including the number, type, and order), and the return value. According to the descriptor rules, basic data types (byte, char, double, float, int, long, short, Boolean) and void types representing no return values are represented by an uppercase character, while objects are represented by the character L plus the fully qualified name of the object, as follows:

character type meaning
B byte There is a signed byte-type tree
C char Unicode character, utF-16 encoding
D double A double – precision floating – point number
F float Single-precision floating point number
I int integer
J long Long integer
S short Signed short integer
Z boolean Boolean values true/false
V void Stands for void
L Classname reference An instance named Classname
[ reference A one-dimensional array
6.2.4 Property Table Set

A field may also have attributes that store additional information. Initialization values, some comment information, etc. The number of attributes is stored in attribute_count, and the content of the attributes is stored in the Attributes array. Taking the constant attribute as an example, the structure is:

ConstantValue_attribute{
    u2 attribute_name_index;
    u4 attribute_length;
    u2 constantvalue_index;
}
Copy the code

Note: For constant attributes, the attribute_length value is always 2.

7. Method table collection

Methods: Points to a collection of constant pool indexes that completely describe the signature of each method.

  • In a bytecode file, each method_info entry corresponds to method information in a class or interface. For example, the method’s access modifier (public. Private or protected), the method’s near-return type, and the method’s parameter information.
  • If the method is not abstract or native, it will be reflected in the bytecode.
  • On the one hand, the methods table describes only methods declared in the current class or interface, not methods inherited from a parent class or interface. On the other hand, the methods table may have methods added automatically by the compiler, most typically the method information generated by the compiler (such as the class (interface) initialization method Clinit () and instance initialization method init ()).

Precautions for use: In the Java language, to overloaded (phrase), a method, in addition to simple with the original method with the same name, also requires must have a sign with the original method of different characteristics, characteristics of the signature is a method of each parameter in the constant pool collection referenced by the field symbol, is because the return value does not include among the characteristics of the signature, So there is no way in the Java language to override an existing method just by returning a different value. In the Class file format, however, the signature scope is larger, and two methods can coexist as long as the descriptors are not exactly the same. That is, if two methods have the same name and signature but return different values, they can legally coexist in the same class file. That is, while the Java syntax specification does not allow multiple methods to be declared with the same signature in a class or interface, in contrast, bytecode files do allow multiple methods to be signed with the same signature, provided that the return values of those methods are not the same.

7.1 Methods_count (method counter)

Methods_count specifies the number of members of the methods table in the current class file. It is represented by two bytes. Each member of the Methods table is a method_INFO structure.

7.2 Methods []

  • Each member of the MethodsMethod table must be a _info structure that represents a complete description of a method in the current class or interface. If a method_info structure access has neither ACC_NATIVE nor CC_ABSTRACT flags set in its _flags field, that structure should also contain the Java virtual machine directives used to implement the method.
  • The method_info structure can represent all methods defined in classes and interfaces, including instance methods, class methods, instance initializers, and class or interface initializers
  • The structure of the method table is the same as that of the field table.
    type The name of the meaning The number of
    u2 access_flags Access tokens 1
    u2 name_index Field name index 1
    u2 descriptor_index Descriptor index 1
    u2 attrubutes_count Attribute counter 1
    attribute_info attributes Attribute set attributes_count
7.2.1 Method table access flag

Like field tables, method tables have access flags, and some of their flags are the same and some of their flags are different.

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the field is public
ACC_PRIVATE 0x0002 Whether the field is private
ACC_PROTECTED 0x0004 Whether the field is protected
ACC_STATIC 0x0008 Whether the field is static
ACC_FINAL 0x0010 Whether the field is final
ACC_VOLATILE 0x0040 Whether the field is volatile
ACC_TRANSTENT 0x0080 Whether the field is TRANSIENT
ACC_SYNCHETIC 0x1000 Whether the field is generated automatically by the compiler
ACC_ENUM 0x4000 Whether the field is enum

8. Property sheet collection

  • The collection of property tables following the collection of method tables refers to the auxiliary information carried by the class file, such as the name of the source file of the class file. And any annotations with retentionPolicy. CLASS or retentionPolicy. RUNTIME. This kind of information is usually used for validation and running of Java virtual machines, and debugging of Java programs, and does not require in-depth knowledge.
  • In addition, field tables and method tables can have their own property tables. Used to describe information specific to certain scenarios.
  • The property sheet collection is less restrictive, requiring the property sheets to be in a strict order, and any compiler implementing it can write its own property information to the property sheet as long as it does not duplicate existing property names, but the Java virtual machine will ignore properties it does not recognize when it runs.

8.1 Attributes_ count

The value of attributes_countClass represents the number of members in the current file attribute list. Each entry in the attribute table is an Attribute_info structure

8.2 Attributes []

The value of each entry in the attribute table must be an Attribute_info structure. The structure of the property list is flexible, and various properties can meet the following structure.

8.2.1 General format for attributes
type The name of the The number of meaning
u2 attribute_name_index 1 Attribute name index
u4 attribute_length 1 Attribute length
u1 info attribute_length Property sheet
That is, only the name of the attribute and the length of the occupied digit can be specified, the specific structure of the attribute table can be customized.
8.2.2 Attribute Types

There are actually many types of property sheets, and the Code property seen above is just one of them. There are 23 properties defined in Java8. The following are predefined attributes in the virtual machine:

The attribute name Use location meaning
Code Method table Bytecode instructions compiled into Java code
ConstantValue Field in the table Constant pool defined by the final keyword
Deprecated Class, method, field table Methods and fields declared deprecated
Exceptions Method table Method throws an exception
EnclosingMethod The class file This property is available only if a class is local or anonymous and identifies the enclosing method of the class
InnerClass The class file Inner class list
LineNumberTable Code attributes The mapping of Java source line numbers to bytecode instructions
LocalVariableTable Code attributes Method local variable description
StackMapTable Code attributes New property in JDK1.6 that allows the new type checker to check and process classes that are required to match local variables and operands of the target method
Signature Class, method table, field table Used to support method signatures in case of generics
SourceFile The class file Record the source file name
SourceDebugExtension The class file Store additional debugging information
Syothetic Class, method table, field table Flag methods or fields are automatically generated by the compiler
LocalVariableTypeTable class The use of characteristic signatures instead of descriptors was added to describe generic parameterized types after the introduction of generic syntax
RuntimeVisibleAnnotations Class, method table, field table Support for dynamic annotations
RuntimelnvisibleAnnotations Class, method table, field table Use to indicate which annotations are not visible at runtime
RuntimeVisibleParameterAnnotation Method table Role similar to RuntimeVisibleAnnotations attribute, only role for the object
RuntirmelnvisibleParameterAnniotation Method table Like RuntimelnvisibleAnnotations attribute, function as object which for method parameters
AnnotationDefauit Method table Use to record the default value of the annotation class element
BootstrapMethods The class file Bootstrap qualifier used to hold the invokeddynanic instruction reference
8.2.3 Details of some attributes

The ConstantValue attribute indicates the value of a constant field. Is in the property table of the field_info structure.

ConstantValue_attribute {u2 attribute_name_index; U4 attribute_length; U2 constantvalue_index; // The index of the field value in the constant pool. The entries in the constant pool at the index give the constant value represented by the property. // (for example, the value is long, which in the constant pool is CONSTANT_Long)}Copy the code

The Deprecated attribute was introduced in JDK1.1 to support the keyword @deprecated in comments.

 Deprecated_ attribute{
    u2 attribute_name_ index; 
    u4 attribute_length;
}
Copy the code

**③Code attribute ** Code attribute is the Code that stores the method body. However, not all method tables have Code attributes. Like interfaces or abstract methods, they have no concrete method body, and therefore no Code attribute. The structure of the Code property table is shown below:

type The name of the The number of meaning
u2 attribute_name_index 1 Attribute name index
u4 attribute_length 1 Attribute length
u2 max_stack 1 Maximum depth of operand stack
u2 max_locals 1 The storage space required by the local variable scale
u4 code_length 1 The length of a bytecode instruction
u1 code code_length Store bytecode instructions
u2 exception_table_length 1 Anomaly table length
exception_info exception_table exception_length Exception table
u2 attributes_count 1 Attribute set counter
attribute_info attributes attributes_count Attribute set

As you can see, the first two entries of the Code property table are consistent with the property table, i.e., the Code property table follows the structure of the property table, and the last ones are his own custom structure.

For the sake of illustration, we specify a Class that represents a Class or interface in C format. If C’s constant pool contains a CONSTANT_Class_info member that represents a class or interface that does not belong to any package, then C’s ClassFile structure must have the corresponding InnerClasses attribute in the property table. The InnerClasses property is a property sheet in the ClassFile structure that was introduced in JDK 1.1 to support InnerClasses and internal interfaces.

(5) LineNumber Table properties

  • The LineNumberTable property is an optional variable-length property located in the property table of the Code structure.
  • The LineNumberTable property describes the mapping between Java source line numbers and bytecode line numbers. This property can be used to locate the number of lines of code executed during debugging.
    • Start_pc, which is the bytecode line number; Line_number, the Java source line number.
  • The LineNumberTable attribute can appear in any order in the attribute table of the Code attribute. In addition, multiple LineNumberTable attributes can collectively represent what a line number represents in the source file. That is, the LineNumberTable attribute does not need to correspond to a line in the source file.
  • LineNumberTable Indicates the structure of an attribute table
    LineNumberTable_attribute {
        u2 attribute_name_index:
        u4 attribute_length:
        u2 line_number_table_length;
        {u2 start_pc:
            u2 line_number:
        } line_number_table[line_number_table_length]:
    Copy the code

LocalVariableTable is an optional variabletable property that is located in the property table of the Code property. It is used by the debugger to determine information about local variables in a method during execution.

  • In the property table of the Code property, the LocalVariableTable property can appear in any order. Each local variable in the Code attribute can have at most one LocalVariableTable attribute.
    • Start PC + length indicates the offset of the start and end of the life cycle of this variable in the bytecode (this life cycle is from 0 to 10).
    • Index is the slot of this variable in the local variable table (slot is reusable)
    • Name is the variable name
    • Descriptor indicates a local variable type description
  • LocalVariableTable attributes table structure:
    LocalVariableTable_attribute {
        u2 attribute_name_index:
        u4 attribute_length:
        u2 local_variable_table_length:
        { u2 start_pc:
            u2 length:
            u2 name_index;
            u2 descriptor_index;
            u2 index;
        } local_variable_table[local_variable_table_length]:
    Copy the code

The Signature attribute is an optional fixed-length attribute in the property table of the ClassFile, field_info, or method_info structures. In the Java language, the Signature attribute records generic Signature information for any class, interface, initializer, or member whose generic Signature contains Type Variables or Parameterized Types.

8 SourceFile Attribute General format of the attribute

type The name of the The number of meaning
u2 attribute_name_index 1 Attribute name index
u4 attribute_length 1 Attribute length
u1 info attribute_length Property sheet
As you can see, the total length is always fixed at 8 bytes

⑨ Other Attributes There are more than 20 predefined attributes in the Java VM. This section does not introduce them. You can easily interpret other attributes once you understand the essence of the attributes.

Bytecode parsing final result