preface

I have coveted bytecode for a long time, but due to its complexity, I have been delayed for a long time. I have referred to a large number of books in two days over the weekend to summarize this article. Since there are few direct bytecode analysis online, all of them are pasted with a bunch of concepts, which makes people difficult to understand, so this chapter will analyze one by one based on the actual bytecode.

This process will parse all the bytes in a class file one by one. It is recommended to compile a class as well.

Refer to books, websites

Inside the Java Virtual Machine

Introduction to The Java Virtual Machine

In-depth Understanding of advanced Features and Best Practices of the Java Virtual Machine JVM

An Inside Look at Java Web Technology

Docs.oracle.com/javase/spec…

www.ibm.com/developerwo…

Writing a Java file

Start by writing a simple Java class, then compile and generate a class file.

The first time to analyze the content as little as possible, otherwise there will be too much bytecode, analysis is difficult.

public class Test {
    private static final String name ="name";
    
    private  int age;
    
    private static  String addr;

    public Test(a) {}private  void print(a){
        System.out.println("print");
    }
    private static void printName(a){
        System.out.println(name);
    }
    public String getName(a){
        returnname; }}Copy the code

Then open it with WinHex, or some other hexadecimal file viewer, which compiles to only 703 bytes, which is not bad.

Structure of the preview

Magic and version

Most files are fixed with N bytes at the beginning to make sure it’s not a valid file, like changing the.jpg file to.txt, which is still viewable through the image viewer because the internal structure hasn’t changed, or PNG files, which start with 89504E470D0A1A0A, which is fixed, All PNG images start with this, but if you change one, the image viewer won’t be able to see it because it can’t be sure that it’s a PNG file, even though the data behind it will represent the image’s information.

The same goes for the class file, which starts with 4 bytes CAFEBABE. That’s what the JVM uses to verify that a file is a valid bytecode file. Other software, too, mostly doesn’t determine a file’s type by its name extension, but by the first N bytes of the file.

This is followed by versions, representing minor and major versions, which will compile differently on different JDK versions. Mine is JDK8, so 00000034. If 1.7, then 00000033.

Constant pool counter, constant pool

Since the contents of the constant pool are not fixed but increase with the amount of code written, 2 bytes are needed to count the size of the constant pool, so the 2 bytes following the version number are the size of the constant pool and the offset address is 0x00000008.

The constant pool is one of the largest data items ina Class file. It contains two main types of data: literals and symbolic references. Literals are things like strings, constants declared final, etc.

From there, it gets complicated. There’s something called a table, and a table is something that’s made up of different bytes that represent some data. There are 11 different table structures defined, and what they all have in common is that the first byte represents what type, and the second byte represents what type.

In the Test class, the constant pool size is 0x28, and the decimal pool size is 40, which means that it has 39 constants, because there is another 0 constant left out for special consideration, which is not evaluated.

The -v and -verbose javap arguments are equal, but the private method is not printed. The -p argument will be used for reference later.

The data obtained from 1 to 39 above is a table, and the first byte of each table represents the tag bit (tag). The value is 1-12, representing which constant type it belongs to currently (note that there is no data type with tag bit 2), as shown in the figure below.

There are too many constants to say all of them, so I’ll just list two as examples.

The first constant resolution

The first constant, 0x0A and 10 in decimal, is a CONSTANT_Methodref_info (symbolic reference to a method in a class), followed by two bytes that give the entry index of the class CONSTANT_Class_info that declares the referenced method. Its value is 7, which means that this 7 points to the seventh constant in the constant pool, followed by two bytes that are the index value of the name and type descriptor CONSTANT_NameAndType_info, which is 1B and 27 in decimal.

Looking at what constant number 7 and constant number 27 are, which is the following one, this is a cross-reference process, reference to reference, pointing to a CONSTANT_Utf8_info.

The second constant parsing

The second is that the constant has a value of 9, representing CONSTANT_Fieldref_info, and the last 2 bytes point to the index entry of the class or interface descriptor CONSTANT_Class_info that declares the field, The next two bytes are the index entry pointing to the field descriptor CONSTANT_NameAndType_info.This points back and forth, eventually pointing to a CONSTANT_Utf8_info character as well.

The same goes for the following constants, there are 39 of them, and at the end of these 39 constants, the following access flag is displayed.

Access tokens

The next two bytes in the constant pool are access flags, and the question is, how do we locate the offset of these two bytes? We have to add up the previous bytes.

Finally, it gets to the position shown below, with a value of 21.But what does 0021 stand for in the picture above? In fact, it is the butt of the turtle. Take a look at the table below.

Corresponding to the Test class, first of all, he is the public, so there is worth 0001, and in line with the ACC_SUPER, value is 0020, final operation way is to put them on or operations, namely ACC_PUBLIC | ACC_SUPER = 1 | 20 = 21.

If you change this class to abstract, its value will be 0421.

Class index, superclass index

The next four bytes after the access flag are the class index (2 bytes) and the parent index (2 bytes), each of which points to a CONSTANT_Class_info constant entry, Name_index in CONSTANT_Class_info points to the index of the CONSTANT_Utf8_info constant in the CONSTANT_Class_info constant list of type CONSTANT_Utf8_info. This index is used to obtain the fully qualified name string in the CONSTANT_Utf8_info constant.

For example, 05 points to CONSTANT_Class_info, CONSTANT_Class_info points to a CONSTANT_Utf8_info, which has a value of Test, as does the parent class, which defaults to Java /lang/Object.

Interface counter, interface table.

The two bytes following the class index and the superclass index are the interface counter, which represents the number of direct superclass interfaces of the current class or interface, or 0 if the class implements no interfaces. If it is 0, the following implementation interface structure table does not exist.

The interface table is an array containing the index set of the superclass interface of the current class or interface in the constant pool list, through which the fully qualified name of the superclass interface of the current class or interface can be determined.

But now that the interface counter is 0, the information behind the interface is no longer visible. To verify the transformation, add two more interfaces to the code above, and then look at the transformation.As you can see, the number of interfaces becomes 2, followed by 0x0008 and 0x0009, which refer to the class data of the constant pool.

Field counter, field table

The two bytes immediately following the interface counter and interface table are the field counter, followed by the field table, which is used to record the total number of field tables (like the interface counter above), i.e. the total number of class variables (static) and instance variables in a class.

There are three fields defined in the Test class, so its value is 0x0003, and the n bytes after it will represent the field information, which is the field table, Each field contains the field scope (public, protected, private), class or instance level variable (static), variability (final), concurrent visibility (volatile), serializable (transient modifier), and field data type (base type) , object, array) and field names.

The following is the information composition of the field.The access flag for the field.

Field name analysis

Private static Final String name =”name”; 2 bytes is 0 x001a before, so, on behalf of his decorate, calculation is 0 x0002 | 0 x0008 | 0 x0010, through the calculator can count is 1 a.

The next two bytes are 0x0008, which represents the name of the field, which is a reference to the constant pool, and then we go back to the constant pool, and the field name is name, which matches ours.

The next two bytes are the field descriptor, which is also a reference to the constant pool. The descriptor describes the data type of the field. Basic data types are described using B(byte), C(char), D(double), F(float), I(int), J(long), S(short), Z(Boolean), V(void), while object types are composed of the character L plus the fully qualified name of the object. The name field is of type String, so its value is Ljava/lang/String

What is the ninth value in the constant poolLjava/lang/String, also consistent.And if it’s an array, it’s preceded by one[If it’s multidimensional, add a few more[N dimensions plus N[

The next two bytes are the attribute count, which is the size of the final Attributes item. Each attribute is made up of an Attribute_info table. In this class, it is 1, so it has one attribute_info structure.

Here is the structure of the attribute_info.

In further analysis, the first two bytes are the index to CONSTANT_Utf8_info in the constant pool, representing the simple name of the current attribute.

But its value is ConstantValue. What does that mean? Constant value that represents the definition of the final keyword.ConstantValue is also a structure consisting of eight bytes,

  1. Attribute_name_index: a reference to the constant pool, which must be “ConstantValue”.

  2. Attribute_length: it’s fixed, 4 bytes, 2.

  3. Constantvalue_index: a reference to a literal in the constant pool. In this class, its value is 6, so if you look it up in the constant pool, its value must be the string “name”.

The figure below shows the sixth entry in the constant pool.

Field AGE analysis

Int age (0x0002); int age (0x0002); int age (0x0002); Javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class: javap-verbose test. class Int is I, look in the constant pool, the 12th index is I, so we can confirm that the first 6 bytes are 0x0002000B000C.

We’re looking at the next data, which is exactly what we thought it would be.

The next two bytes are 0, so there is no more data for this field, and then the last field.

Field ADDR analysis

The last field is defined as private static String addr; The first two bytes are 0x000A and the last two bytes are the index of the constant pool. In the constant pool, the 13th value is addr. Hex is D, followed by 2 bytes of data type, String will be represented as Ljava/lang/String; In the constant pool, it is the ninth, so the first six bytes are 000A 000B 0009.

The last two bytes are 0, so there is no data related to this field. Now comes the method information, which is the hard part, because the instruction is designed.

Method counter, method table

Field analysis, the method, the same logic, there methods counters, tables, followed by 2 bytes at the end of the above structure is the way to counter, if is 0, means there is no way, method table does not exist, but in this class, method has four, there will be four method table is behind.

A method table represents complete information about a method, such as method modifiers, return values, method parameters, and so on.

The structure of the method table is the same as that of the field table, as shown in the following figure.But the modifier flag will be different, and the method modifier flag is as follows.

Analysis of construction method

In this class, the first method is the constructor. The first two bytes are 0001, and the last two bytes are the name of the method, pointing to the index in the constant pool. The constructor is<init> The next two bytes are the return values. The return value of the constructor is()V.Behind 2 bytes are attribute counter, if 0 means no attribute table, but he has one here, so, there is a property sheet, behind the attribute table structure in the above said, the first two bytes is the property name, pointing to the constant pool index, here, he is 0010, attribute names are in the constant poolCode.

This Code is the bytecode instructions compiled by Java, but it’s a little more complicated here.

Then the attribute_length(4 bytes) in the attribute table represents the size of the info entry, 33 in this case, 51 in decimal, so the next 51 bytes are the constructor code.

The next method starts 51 bytes after attribute_length.

That’s the circle down here, 51 bytes.

So I’m also going to look at the Code property table, which is a little bit more complicated than it used to be.

Forget the first two bytes and start with the third:

Max_stack: maximum depth of operand stack max_locals: storage space required by local variables CODE_length: length of Java bytecode instructions code_length exption_table_length: Size of subsequent EXCEPtion_TABLE EXCEPtion_TABLE: the size of the exception entry is exption_table_length attributes_count: The size of the attributes following the Attributes: Attributes_info structure, size is attributes_count.Copy the code

If you want to view the corresponding hex represents what instruction, then need to check the doc, website: docs.oracle.com/javase/spec…

Let’s do the actual analysis again, from which we can locate the beginning of the command:

Aload_0: Pushes the first reference type local variable to the top of the stack.

Invokespecial: Invokes the superclass constructor and instantiates the method.


:()V is a reference to Java /lang/Object.

Finally, B1 means the instruction return. When this instruction ends, the method ends.

After that exception_TABLE_length is 00000, so the exception_INFO table does not exist. The next two bytes are the total number of attribute_info tables. In the Test class, its value is 2, so there are two Attribute_info tables.

The first property table is LineNumberTable. LineNumberTable is used to record the mapping between line numbers in Java files and line numbers in bytecodes.

There is also a LocalVariableTable table that describes the mapping between local variables within local variables and variables defined in Java code.

I’m not going to do any research on these two tables.

Print method analysis

The second method print(), whose modifier is private, has no other modifier, so the first two bytes are 0002, followed by the method name, method return type, total number of attributes, and attribute table.

B2 0002 12 03 B6 0004 B1

Respectively represent:

B2: instruction is getstatic, obtaining a static field of the specified class, and press it into the stack, the data in 0002 is a constant pool, behind the value is Java/lang/System. Out: Ljava/IO/PrintStream;

Print (int, float, String) print (system.out.println) print (system.out.println) print (system.out.println) print (system.out.println)

B6: instruction is invokevirtual, call the instance method, parameter is 0004, in the constant pool is a Java/IO/PrintStream println (Ljava/lang/String;) V method references.

B1: return

The third and fourth methods are also the same, not to say.

Property counter, property table

Behind the method counters, tables is counters, attribute table, this structure in the above mentioned earlier, such as in the Test of the final is 0001001900000002001 a, represents a property sheet (2 bytes), is at the back of the property sheet data,

The decimal value of 0x0019 is 25, which in the constant pool is SourceFile.

SourceFile is also a table structure that represents the SourceFile name. It consists of eight bytes, the last eight bits.

The last two bytes are references in the constant pool, and in this case, the 26 he’s referring to isTest.java

This is the end of the file, but there are many table structures left unsaid, such as LineNumberTable and InnerClasses.

We’ll talk more about that later.