preface

Understanding Java’s compiled class file structure and bytecode is essential to understanding the JVM in depth. While this section (mainly the data structure of the class file) is a bit boring, it is the most basic and the foundation for a deeper understanding of the JVM’s memory, class loading, and so on.

Class Class file structure

A class file is a set of 8-bit byte based binary streams in which data items are arranged in sequence without any delimiters. So the entire class file stores almost all the data necessary for the program to run. When encountering data items that need to occupy more than 8 bits of byte space, the data items are divided into several 8-bit bytes and stored in the first place.

The class file format is as follows:

type The name of the The number of
u4 magic 1
u2 minor_version 1
u2 major_version 1
u2 constant_pool_count 1
cp_info constant_pool constant_pool_count – 1
u2 access_flags 1
u2 this_class 1
u2 super_class 1
u2 interfaces_count 1
u2 interfaces interfaces_count
u2 fields_count 1
field_info fields fields_count
u2 methods_count 1
method_info methods methods_count
u2 attribute_count 1
attribute_info attributes attributes_count

Class files have only two pseudo-data structures: unsigned numbers and tables. As you can see, each table is preceded by a XX count, which is a front-loading capacity counter that keeps track of the number of types.

A data item

There are many data items in the class file. Here we will not expand them one by one, but mainly introduce some key ones.

  1. Constant pool
  2. Field in the table
  3. Method table
  4. Property sheet

Constant pool

Constant pool many of you have heard of constant pool, which refers to the constant pool in the class file.

There are two main types stored in the constant pool: literals and symbolic references.

Literals: A literal is a constant in the Java language layer, such as String s=”123″, which is a constant. For primitive encapsulation types, in the range -127-128, are also constants. Of course, a value declared as final, which is immutable throughout the program, is also constant.

Symbolic references: Symbolic references in Java mainly include the following three types of constants:

  • Fully qualified names of classes and interfaces
  • The name and descriptor of the field
  • The name and descriptor of the method

Java is dynamically linked when the VIRTUAL machine loads a class file. The class file does not store the final memory layout of each method field, so symbolic references to these fields and methods cannot be used by the JVM without being converted at runtime. When the virtual machine is running, symbolic references need to be obtained from the constant pool, which are parsed during class creation or runtime and translated into specific memory addresses.

Each constant in the constant pool is a table, which has a common feature. The first bit of each table is a flag bit of type U1, as follows:

Type of data item in constant pool Type of logo Type description
CONSTANT_Utf8 1 A utF-8 encoded Unicode string
CONSTANT_Integer 3 A literal of type int
CONSTANT_Float 4 A literal of type float
CONSTANT_Long 5 A literal of type long
CONSTANT_Double 6 A literal of type double
CONSTANT_Class 7 A symbolic reference to a class or interface
CONSTANT_String 8 A String literal
CONSTANT_Fieldref 9 A symbolic reference to a field
CONSTANT_Methodref 10 A symbolic reference to a method declared in a class
CONSTANT_InterfaceMethodref 11 Symbolic reference to a method declared in an interface
CONSTANT_NameAndType 12 A partial symbolic reference to a field or method

Access tokens

Access flags are used to identify access information about a Class or interface, such as whether the Class is a Class or an interface, public or private, and declared final. The specific symbols and meanings are as follows:

Sign the name Flag values meaning
ACC_PUBLIC 0x00 01 Whether the type is Public
ACC_FINAL 0x00 10 Only the class can set whether or not to be declared final
ACC_SUPER 0x00 20 Whether the new semantics of the Invokespecial bytecode instruction are allowed.
ACC_INTERFACE 0x02 00 Flag this is an interface
ACC_ABSTRACT 0x04 00 Whether it is of the abstract type. For interfaces or abstract classes, the second flag value is true and the other types are false
ACC_SYNTHETIC 0x10 00 Indicates that this class is not generated by user code
ACC_ANNOTATION 0x20 00 This is a note
ACC_ENUM 00 0 x40 Flag This is an enumeration

A collection of class indexes, parent indexes, and interface indexes

As we all know, Java is single-inheritance multi-implementation, except the Object class every class has a parent class, so they are unique, and a class can implement multiple interfaces, so the interface is not unique, represented by a collection. Both a class index and a superclass index are represented by a SINGLE U-2 data, while an interface index collection is a set of U-2 data representations.

The class index, the superclass index, and the interface index collection are all ordered after the access flag. Class index and superclass index u2-type indexes each point to a class descriptor constant of type CONSTANT_Class_info that describes a specific class. For the first item of the interface index, u2 is the interface index counter, which is used to record how many interfaces are implemented. If it is 0, no bytes are occupied after it.

Set of field tables

Field tables are used to declare variables declared in a class or interface. Fields include class variables and instance variables. How is a field described in Java, for example

public static final String num=”13234″;

As you can see, the first is the access scope, whether public or private, or protected, and this information determines whether the field is visible to the heap-specific scope class.

The second is the description information modified by some keywords, such as instance variable or class variable, mutable, concurrent visibility, serializable, etc. These keywords include static, final, volatile, transient, etc.

The data type (base data type, array, object) and name of the field follow.

These modifiers are described in Boolean values, and the data types and names are indeterminate, usually by referring to constants from the constant pool.

The field table structure is as follows:

type The name of the The number of
u2 access_flags 1
u2 name_index 1
u2 descriptor_index 1
u2 attributes_count 1
attribute_info attributes attributes_count

Field modifiers Under access_flag, the contents of access_flag are as follows:

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the field is public
ACC_PRIVATE 0x0002 Whether the field is private
ACC_PROTECTED 0x0004 Whether the field is protected
ACC_STATIC 0x0008 Whether the field is static
ACC_FINAL 0x0010 Whether the field is final
ACC_VOLATILE 0x0040 Whether the field is volatile
ACC_TRANSTENT 0x0080 Whether the field is TRANSIENT
ACC_SYNCHETIC 0x1000 Whether the field is generated automatically by the compiler
ACC_ENUM 0x4000 Whether the field is enum

Name_index indicates the simple name of the field, like the simple name above (” num “), and descriptor_index indicates the field and method descriptor.

Descriptors are used to describe the data type of a field, the argument list (number, type, and order) of a method, and the return value. According to the rules for descriptors: Basic data types and void types representing no return values are represented by an uppercase character, while object types are represented by the character L plus the fully qualified name of the object, as follows:

identifier meaning
B Basic data type Byte
C Char, the base data type
D The base data type double
F The basic data type float
I Base data type int
J Base data type long
S The basic data type short
Z Boolean is the basic data type
V The base data type void
L Object type

For array types, each dimension is described by a prefixed “[” character. For example, define a two-dimensional array of type int[][] : “[[I”.

When describing methods with descriptors, they are described in the order of the argument list followed by the return value. For example, the void login() descriptor is “()V” and the java.lang.string toString() descriptor is “() ljava.lang.string”.

Method table collection

The description of methods and fields in the Class file storage format is completely consistent. The field structure of the method table is the same as that of the field table, including access flags, name indexes, descriptor indexes and property list sets. The meaning of the data is very similar, with differences in access flags and property sheet collections.

type The name of the The number of
u2 access_flags 1
u2 name_index 1
u2 descriptor_index 1
u2 attributes_count 1
attribute_info attributes attributes_count

Voliate and TRANSIENT cannot modify methods, so access flags do not have ACC_VOLATILE and ACC_TRANSIENT flags. Similarly, the keywords of some methods, such as synchronized, native, StrictFP and abstract, can be modified, and the flag bit values are as follows:

Sign the name Flag values meaning
ACC_PUBLIC 0x00 01 Whether the method is public
ACC_PRIVATE 0x00 02 Whether the method is private
ACC_PROTECTED 0x00 04 Whether the method is protected
ACC_STATIC 0x00 08 Whether the method is static
ACC_FINAL 0x00 10 Whether the method is final
ACC_SYHCHRONRIZED 0x00 20 Whether the method is synchronized
ACC_BRIDGE 0x00 40 Method is a compiler generated method
ACC_VARARGS 0x00 80 Whether the method accepts arguments
ACC_NATIVE 0x01 00 Whether the method is native
ACC_ABSTRACT 0x04 00 Whether the method is abstract
ACC_STRICTFP 0x08 00 Whether the method is strictFP
ACC_SYNTHETIC 0x10 00 Whether a method is generated automatically by a compiler

Method

Where is the code in a method whose definition can be expressed cleanout by accessing flags, names, and descriptor indexes? We mentioned the property sheet collection earlier, where the Java Code in a method is compiled by the compiler into bytecode instructions and stored in a property called “Code” in the property sheet collection of the method.

Property sheet collection

In Class files, field tables, and method tables, an attribute_info entry is used to describe information that is specific to certain scenarios.

With other data class files projects strict sequence length is different, property sheet set limit is relatively loose, does not require the table for each attribute has a strict sequence, as long as you don’t with existing attribute name repetition, anyone to realize the compiler can be to write their own property sheet properties, the JVM running automatically ignore unknown attribute.

The following table defines the properties in java7:

The attribute name Use location meaning
Code Method table Bytecode instructions compiled into Java code
ConstantValue Field in the table Constant pool defined by the final keyword
Deprecated Class, method, field list Methods and fields declared deprecated
Exceptions Method table Method throws an exception
EnclosingMethod The class file This property is available only if a class is local or anonymous and identifies the enclosing method of the class
InnerClass The class file Inner class list
LineNumberTable Code attributes The mapping of Java source line numbers to bytecode instructions
LocalVariableTable Code attributes Local defecate description of the method
StackMapTable Code attributes New property in JDK1.6 that allows the new type checker to check and process classes that are required to match local variables and operands of the target method
Signature Class, method table, field table Used to support method signatures in case of generics
SourceFile The class file Record the source file name
SourceDebugExtension The class file Store additional debugging information
Synthetic Class, method table, field table Flag methods or fields are automatically generated by the compiler
LocalVariableTypeTable class The use of characteristic signatures instead of descriptors was added to describe generic parameterized types after the introduction of generic syntax
RuntimeVisibleAnnotations Class, method table, field table Support for dynamic annotations
RuntimeInvisibleAnnotations Table, method table, field table Use to indicate which annotations are not visible at runtime
RuntimeVisibleParameterAnnotation Method table Role similar to RuntimeVisibleAnnotations attribute, only role for the object
RuntimeInvisibleParameterAnnotation Method table Like RuntimeInvisibleAnnotations attribute, function as object which for method parameters
AnnotationDefault Method table Use to record the default value of the annotation class element
BootstrapMethods The class file Bootstrap qualifier used to hold an InvokedDynamic instruction reference