Java can be “compiled once, run anywhere” because the JVM is customized for a variety of operating systems and platforms, and because fixed-format bytecodes (.class files) can be compiled and produced for use by the JVM on any platform. Thus, you can also see how important bytecode is to the Java ecosystem. It is called bytecode because bytecode files are made up of hexadecimal values, and the JVM reads them in bytes in groups of two hexadecimal values. The bytecode files are then read and executed by JVMS running on different platforms, thus realizing the purpose of writing once and running anywhere. Today, the JVM no longer only supports Java, resulting in a number of JVM-based programming languages, such as Groovy, Scala, Kotlin, and so on.

Bytecode file structure

A typical class file is divided into: MagicNumber, Version, Constant_pool, Access_flag, This_class, Super_class, Interfaces, Fields, Methods, and Attributes The JVM specification requires that each bytecode file be made up of ten parts in a fixed order.

  1. Magic Numbers The first four bytes of all. Class files are magic numbers with a fixed value of 0xCAFEBABE. The magic number is placed at the beginning of the file, which the JVM can use to determine if the file is likely to be a.class file, and if so, proceed with subsequent operations.
  2. Version Number The Version number is the first four bytes after the magic number. The first two bytes indicate the Minor Version and the last two bytes indicate the Major Version.
  3. Constant pools Two types of constants are stored in a constant pool: literals and symbolic references. Literals are constant values declared Final in code, and symbols refer to globally qualified names of classes and interfaces, field names and descriptors, and method names and descriptors. The constant pool is divided into two parts: the constant pool counter and the constant pool data area. \
  • A. Constant pool counter (constant_pool_count) : Because the number of constants is not fixed, two bytes need to be placed to represent the constant pool capacity count.

  • B. Constant pool data area: The data area is composed of (constant_pool_count-1) cp_INFO structures, one cp_INFO structure corresponds to one constant. There are 14 types of CP_INFO in bytecode, each of which has a fixed structure.

  • Take CONSTANT_utf8_info as an example. Its structure is shown on the left side of Figure 7 below. The first byte, “tag,” takes its value from the tag of the corresponding item in figure 6, and since its type is UTf8_INFO, its value is “01.” The next two bytes identify the Length of the string, Length, and Length specifies the value of the string. Extract a CP_INFO structure from the bytecode in Figure 2, as shown on the right side of Figure 7 below. This constant is a string of type UTF8 with a length of one byte and data of “A”.

4. Access identification

Describes whether the Class is a Class or an interface, and whether it is modified by modifiers such as Public, Abstract, and Final. The JVM specification specifies access flags (Access_flags) as shown in Figure 9 below. It is important to note that all the access token the JVM is not exhaustive, but use, is the bitwise or operator to described, such as a class of modifier to Public Final, is the value of the corresponding access modifiers for ACC_PUBLIC | ACC_FINAL, Namely 0 x0001 | 0 x0011 x0010 = 0.5. Current class index

The two bytes after the access flag describe the fully qualified name of the current class. These two bytes hold the value of the index in the constant pool, from which the fully qualified name of the class can be found

6. Parent index

The two bytes after the name of the current class describe the fully qualified name of the parent class, again holding the index value in the constant pool.

7. Interface index

The parent class name is followed by a two-byte interface counter that describes the number of interfaces implemented by the class or parent class. The next n bytes are the index values of string constants for all interface names.

Table 8. Field

Field tables are used to describe variables declared in classes and interfaces, including class-level variables and instance variables, but not local variables declared inside methods. The field table is also divided into two parts. The first part is two bytes, describing the number of fields. The second part is fields_info with details for each field.

9. The method table

The method table is also composed of two parts. The first part is two bytes describing the number of methods. The second part provides detailed information for each method. The details of a method are complex, including the method access flag, method name, method descriptor, and method properties

10. Additional attributes

This item holds the basic information \ about attributes defined by the class or interface in the file

A concrete example is provided to verify this structure

Through Java source files, JavAP decompile bytecode files, ASM view bytecode files, complete binary comparison interpretation.

Summary of file Structure

Arrange reference source:……