Hello everyone, I am the second brother ah, the holiday is over, learn up!

Today I grabbed a knife and was going to dissect Java class files.

A popular saying in the CS world is that “any problem in computer science can be solved by adding an intermediate layer”. For Java, the JVM is such a product that “Write once, Run Anywhere” can be achieved by the JVM, which can Run a compiled class file from the same source code on different operating systems.

Java is cross-platform, and the JVM, as the middle tier, naturally provides different implementations for different operating systems. In the case of JDK 11, there are many of the implementations mentioned above.

With JVMS for different operating systems, our source code can be compiled into different binary executables for different operating systems, and the cross-platform goal can be achieved. So what the hell is this class file? How is it recognized by the JVM?

We use IDEA to write a simple Java code, the file named hello.java.

package com.itwanger.jvm;
class Hello {
    public static void main(String[] args) {
        System.out.println("Hello!"); }}Copy the code

After clicking the compile button, IDEA will automatically generate a file named Hello.class in the corresponding package directory of target/classes. Double-click directly to open it and it will look like this:

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package com.itwanger.jvm;

class Hello {
    Hello() {
    }

    public static void main(String[] args) {
        System.out.println("Hello!"); }}Copy the code

It looks just like the source code, but with an empty constructor, right? This is what the class file looks like after being decomcompiled by Fernflower, an IDEA decompiler. What does the actual class file look like?

You can run the XXD hello. class command in the Terminal panel to view the information.

Yi? You look like you don’t understand anything. It is a hexadecimal form of a class file, and the magic of XXD is that it converts a given file to hexadecimal form.

01, the magic number

The first line contains a special string of characters, cafebabe, which is a magic number that the JVM identifies as a class file. The JVM checks if the class file starts with that magic number during validation and throws a ClassFormatError if it doesn’t.

“Cafebabe” means “coffee babe” in Chinese, and the Java icon is a steaming hot coffee, which shows how deep Java and coffee have been.

02. Version number

The four bytes following the magic number, 0000 0037, represent the minor version number and the major version number. That is, the major version number is 55 (decimal of 0x37), which corresponds to Java 11, and the minor version number is 0.

The last LTS release was Java 8 with a major release number of 52, which means Java 9 is 53 and Java 10 is 54, except that Java 9 and Java 10 are both interim releases. The next LTS release is Java 17. It’s expected to launch in September 2021.

03. Constant pool

Immediately after the version number is the constant pool, where string constants and larger certificates are stored, and when these values are used, they are looked up against the index in the constant pool.

Java defines basic data types such as Boolean, byte, short, char, and int, all of which are treated as ints in the constant pool. Let’s take a look at some simple Java code.

public class ConstantTest {
    public final boolean bool = true;
    public final char aChar = 'a';
    public final byte b = 66;
    public final short s = 67;
    public final int i = 68;
}
Copy the code

Boolean true is 0x01 in hexadecimal, character A is 0x61, byte 66 is 0x42, short integer 67 is 0x43, and integer 68 is 0x44. So the position of the compile-generated integer constants in the class file is shown below.

The first byte 0x03 indicates that the constant is of type CONSTANT_Integer_info, one of 14 constant types defined in the JVM, CONSTANT_Float_info, CONSTANT_Long_info, and CONSTANT_Double_info are identified as 0x04, 0x05, and 0x06 respectively.

For int and float, they take up 4 bytes; For long and double, they take up 8 bytes. Let’s look at the maximum value of a long.

public class ConstantTest {
    public final long ong = Long.MAX_VALUE;
}
Copy the code

Let’s see where it is in the class file. “05”, “7f ff ff ff ff ff ff ff ff ff”, “8 bytes”, “long”, “long”, “long”, “long”, “long”, “long”, “long”

Next, let’s look at one more piece of code.

class Hello {
    public final String s = "hello";
}
Copy the code

“Hello” is a string in hexadecimal 68, 65, 6C, 6C, 6F. Let’s see where it is in the class file.

The first byte 0x01 is the identifier of type CONSTANT_Uft8_info, and the second and third bytes themselves 0x00 0x05 are used to indicate the length of the third byte array.

Corresponding to the CONSTANT_Uft8_info type, there is also a CONSTANT_String_info that represents the string object (s in the previous code), identified as 0x08. The former stores the real value of the string, while the latter does not contain the contents of the string and only contains an index to the CONSTANT_Uft8_info in the constant pool. Let’s see where it is in the class file.

CONSTANT_String_info Find CONSTANT_Uft8_info by indexing 19.

CONSTANT_Class_info is also used to represent classes and interfaces. The structure is similar to that of CONSTANT_String_info, with the first byte being the identifier and the value 0x07 and the next two bytes being the constant pool index. Pointing to CONSTANT_Utf8_info — the string stores the full-path qualified name of the class or interface.

Take Hello. Java classes, it the full path of the limit called com/itwanger/JVM/Hello, corresponding hexadecimal for “636 f6d2f697477616e6765722f6a766d2f48656c6c6f”, Is a string of CONSTANT_Uft8_info, pointing to its CONSTANT_Class_info where in the class file?

The Jclasslib Bytecode Viewer is a tool for visualizing bytecode, which can be installed directly in the IDEA plugin market. Once the installation is complete, select the class file and find the Show Bytecode With Jclasslib submenu in the View menu to View key information about the class file.

As you can see from the figure above, the total size of the constant pool is 23, and the CONSTANT_Class_info index of 04 points to the CONSTANT_Uft8_info index of 21 with a value of com/itwanger/ JVM /Hello. The hexadecimal value of 21 is 0x15. With this information, we can find the location of CONSTANT_Class_info in the class file.

0x07 is the first byte, the identifier for CONSTANT_Class_info, followed by two bytes, identifying the index.

There is also CONSTANT_NameAndType_info, which identifies a field or method with an identifier of 12, corresponding to 0x0c in hexadecimal. The first two bytes are the index of the field or method and the last two bytes are the descriptor of the field or method, which is the type of the field or method.

Take a look at this code.

class Hello {
    public void testMethod(int id, String name) {}}Copy the code

Using jclasslib, you can see that CONSTANT_NameAndType_info contains two indexes.

One is 4 and the other is 5, and the composition of CONSTANT_NameAndType_info can be shown in the following figure.

The location in the corresponding class file is shown below.

This is followed by CONSTANT_Fieldref_info, CONSTANT_Methodref_info and CONSTANT_InterfaceMethodref_info, which have similar structures and can be represented by the pseudocode below.

CONSTANT_*ref_info {
  u1 tag;
  u2 class_index;
  u2 name_and_type_index;
}
Copy the code

If you have learned the Symbol Table of C language, you will be familiar with this pseudo-code.

  • Tag is the identifier, Fieldref is 9, which is 0x09 in hexadecimal. Methodref is 10, or 0x0a in hexadecimal; InterfaceMethodref is 11, which is 0x0b in hexadecimal.
  • Class_index forCONSTANT_Class_infoThe constant pool index, said field | | interface methods of class information.
  • Name_and_type_index forCONSTANT_NameAndType_infoThe constant pool index of Fieldref, for field names and field types; Methodref, for example, represents the method name, method parameters, and return value types; Take InterfaceMethodref, which represents the interface method name, the interface method’s parameters, and the return value type.

CONSTANT_MethodHandle_info, CONSTANT_MethodType_info, and CONSTANT_InvokeDynamic_info, which I won’t go into, but you can try with a knife.

Ah, the most complex constant pool part of the class file is dissected. It’s not easy!

04. Access tags

The area immediately following the constant pool is Access flags, which are used to identify Access information for classes or interfaces, such as class or interface? The public? Is it an abstract class? Is it final? And so on. A total of 16 tag bits are available, but only seven of them are commonly used.

Look at a simple enumeration code.

public enum Color {
    RED,GREEN,BLUE;
}
Copy the code

Jclasslib indicates that the access tag is 0x4031 [Public Final enum].

The location in the corresponding class file is shown below.

“This_class” “super_class” interfaces

This_class is the index of the current class, super_class is the index of its parent class, and interfaces are interfaces.

This simple code has no interface and inherits the Object class by default.

class Hello {
    public static void main(String[] args) {}}Copy the code

Jclasslib lets you see the class inheritance.

  • This_class points to index 2 in the constant poolCONSTANT_Class_info.
  • Super_class points to index 3 in the constant poolCONSTANT_Class_info.
  • The information for interfaces is empty because there is no interface.

The location in the corresponding class file is shown below.

06. Field table

Fields defined in a class are stored in a table of fields, both static and non-static.

Let’s look at a piece of code like this.

public class FieldsTest {
    private String name;
}
Copy the code

There is only one field, the modifier is private, the type is String, and the field is name. The following pseudocode can be used to represent the structure of a field.

field_info {
  u2 access_flag;
  u2 name_index;
  u2 description_index;
}
Copy the code
  • Access_flag for field visit tag, such as whether the public | private | protected, is static, is final, etc.
  • Name_index is the index of the field name, pointing to the constant poolCONSTANT_Utf8_infoFor example, the value in the above example is name.
  • Description_index is the description type index of the field, also pointing to the constant poolCONSTANT_Utf8_infoFor different data types, different rule descriptions are provided.

1) For basic data types, use a single character, such as I for int and B for byte.

2) For reference data types, use L***; “L”, “L”, “L”; End, such as String type Ljava/lang/String; .

3) For arrays, there is a prefix [, such as [Ljava/lang/String; for String arrays.

The corresponding location in the class file is shown below.

07. Method table

Method tables are similar to field tables, except that they are used to store method information, including method names, method parameters, and method signatures.

Take the main method for example.

public class MethodsTest {
    public static void main(String[] args) {}}Copy the code

First look at the general information using Jclasslib.

  • The access tag is public static.
  • The method is called main.
  • Method takes an array of strings; The return type is Void.

The corresponding location in the class file is shown below.

08. Property sheet

A property sheet is the last part of a class file and is typically found in fields and methods.

Let’s look at a piece of code like this.

public class AttributeTest {
    public static final int DEFAULT_SIZE = 128;
}
Copy the code

There is only one constant, DEFAULT_SIZE, which belongs to one of the fields, static variables with final added. Take a look at one of the most important properties in jClasslib, ConstantValue, which represents the initial value of a static variable.

  • Attribute Name index Points to the constant whose value is ConstantValue in the constant pool.
  • Attribute Length has a fixed value of 2 because the index is only two bytes in size.
  • Constant value index refers to a specific Constant in the Constant pool, if the Constant type is intCONSTANT_Integer_info.

I have drawn a picture that can completely represent the structure of the field, including the attribute list.

The corresponding location in the class file is shown below.

Take a look at this code.

public class MethodCode {
    public static void main(String[] args) {
        foo();
    }

    private static void foo(a) {}}Copy the code

Foo is called in the main method. Take a look at one of its most important properties, Code, in jClasslib, where key information about methods is stored.

  • Attribute Name index Points to the constant with the value Code in the constant pool.
  • Attribute Length is the length of the Attribute value.
  • Bytecode stores the real bytecode instructions.
  • Exception Table represents exception information inside a method.
  • Maximum stack size indicates the maximum depth of the operand stack that the method can execute at any time.
  • Maximum local variable indicates the size of the temporary variable table. Note that it is not equal to the sum of all temporary variables in the method. When a scope ends, the internal temporary variable will be replaced.
  • Code length indicates the length of the bytecode instruction.

The location in the corresponding class file is shown below.

So far, the internal class file is almost analyzed, I hope to help you. The first time to hold a knife, my hand is a little quivering, if there are any shortcomings, welcome to point out in the comments section mercilessly!

At the end of this article, I recommend a GitHub star 115K + Java learning tutorial, which I personally organize, including Java foundation, Java container, Java concurrency, Java virtual machine and Java IO, can be said to be very comprehensive. Click the link to download the PDF version

This article is very hardcore and I am tired of dissecting the class file for a long time. Please give me a thumbs up