Abstract:Java defines a set of operating system and hardware independent bytecode formats. This bytecode is represented by Java Class files. Java Class files internally define bytecode formats that are recognized by virtual machines and are platform independent.

This article is shared from Huawei Cloud Community “Java in-depth class file”, the original author: technology torchbearer.

The Java language is cross-platform — write once, run anywhere. The reason it’s cross-platform is that Java defines a bytecode format that is operating system and hardware independent. This bytecode is represented by Java Class files. Java Class files internally define a bytecode format that is platform independent and can be recognized by the virtual machine. This is consistent on Linux or Windows. This is like an HTML file, we define a good specification, the system as long as the specification to display the contents of it.

The language independence of the JVM

What is the JVM for?

Someone who runs Java, or someone who runs Python?

This statement is true, but incomplete; the JVM isn’t just about running Java programs.

In fact, it’s not Java files themselves that are running on the JVM, it’s Class files.

Java is not the only one that can be compiled and converted into class files.

This is the language independence of the JVM.

The ability to run Python depends on the availability of a tool that converts Python to a class file.

Of course, it doesn’t make much sense to do this, since Python has its own runtime environment, and in some ways, is more powerful and has a better core class library than Java.

Languages also have their own platforms, so there is no need to force compilation.

But it’s worth knowing the class file.

As a programmer, have you ever had or had the desire to create a language? It is better to develop in Chinese.

But reality, or a mentor in college, throws a cold blanket on you.

Spend three or five years working on the compilation, and then think about implementing it.

Three or five years, day lilies are cold.

Now, with the JVM, there seems to be a glimmer of hope.

The nature of the class file

To implement the previous vision, or rather, to develop a compilation tool. The first thing to do is to deconstruct the class file itself.

In any case, a class file is essentially a set of binary streams based on 8-bit bytes.

Remember, base 2.

To prove this, we’re going to have to use some tools. For example, Sublime.

It is not a tool for viewing hexadecimal directly, but rather a hexadecimal editor (hexadecimal and hexadecimal can be switched seamlessly).

There seems to be some Python stuff in there. To use it, click the sublime_text.exe file directly.

Then select the class file and open it, as shown in the image below.

The person that see is giddy right? What the hell!

I said base 2. No, this is hexadecimal.

If you don’t want to look at hexadecimal, you can use javap and go straight to the bytecode instructions (see “How a piece of Java code executes” for more details).

If you don’t want to open the command line either, there’s a tool called jclasslib that provides a graphical interface, and there’s a plug-in for IDEA.

But that’s not the point. Let’s ignore that.

Class file structure revealed

There are only two data types in the class file format, unsigned numbers and tables.

Among them, unsigned number contains all basic data types, strings, index references, etc. According to the byte length, it can be divided into U1, U2, U4, and U8, which respectively represent the length of unsigned number 1, 2, 4, and 8.

And the table, the object type.

Next, I look at the composition of the Class file in sequence, based on what the Sublime file parses.

(1) The first four bytes of a class file are called magic numbers, which determine whether the file is a class file that can be accepted by the virtual machine.

For example, the magic number above is:

This means that the file is a class file, you can open several class files to see.

(2) The next four bytes represent the JDK version

The above indicates that the JDK version is 1.8.

PS: The number in JDK 1.1 is 45, plus 1 for each subsequent large version, so the number in JDK 1.8 is 51(decimal), which is 34 when converted to hexadecimal.

(3) The next concept is the constant pool

The above is the counter for the constant pool. From this number, we calculate the number of constants to be 15 (subtracting 1 from the calculated number, because the counter does not start at 0, but at 1).

We use javap to open the constant pool, proving that the constants are indeed 15.

(4) The constant pool is followed by access flags, which are mainly divided into the following categories

Let’s go back to the source code for the class.

Java code

public class ByteCode {
    public ByteCode(){

This class is non-interface, non-abstract, non-enumeration, non-system code, non-final, has a pulbic, and the compiler is post-JDK 1.2, so the flag that meets the condition is:

ACC_PUBLIC and ACC_SUPER correspond to the number of flags 0001 and 0020, which, combined, is 0021. Position as shown below:

(5) Class index, parent index and interface index

  • The access flag above is followed by the class index, with an index value of 0002, corresponding to the second constant pool.
  • The class index is followed by a class index with an index value of 0003, which corresponds to the third bit of the constant pool.
  • The index of the parent class is followed by the index of the interface, with an index value of 0000, which means that the class does not implement any interfaces. (6) field table, method table, property table

After the three indexes comes the field table

The field list is 0000, which means no fields.

As shown above, the method table is divided into four parts

  • The result of the method table counter is 1, which means there is one field
  • The method table access flag is 0001, which stands for PUBLIC
  • The method table name index is 0004, corresponding to the fourth constant pool
  • The method table description index is 0005, corresponding to the fifth property table of the constant pool, and so on.

Bytecode instructions

A separate section is devoted to bytecode instructions, which exist in the method table and are classified as follows:

(1) Loading and storing instructions

For this section, see “How a piece of Java code executes”.)

(2) Operation or arithmetic instruction


Java code

public class Test { public void add(int a,int b){ System.out.println(a+b); System.out.println(a-b); System.out.println(a*b); System.out.println(a/b); }}

The bytecode instructions are as follows:

(3) type conversion instruction


Java code

public class Test { public void add(int a,int b){ int c = 1; long d = c; }}

Bytecode instruction:

(4) Create instance instruction

I don’t need to say much about this, it’s new

(5) Create an array instruction


Java code

public class Test { public void add(int a,int b){ int[] c = new int[4]; String[] d = new String[5]; }}

Bytecode instruction:

(6) Access field instruction


Java code

public class Test { private static String name = "1"; private String age = "2"; public static void main(String[] args) { Test test = new Test(); String a = test.age; String b = Test.name; }}

Bytecode instruction:

(7) Array access instruction


Java code

public static void main(String[] args) {
    String[] a = new String[5];
    a[1] = "2";
    String b = a[1];

Bytecode instruction:

(8) Check the instance type directive

It’s instanceof. I’ll skip the demo

(9) Method return instruction

That’s return, just to show you

5. Abnormal operation

Take a look at the code directly:

Java code

public class Test { public void test() { InputStream in = null; try { in = new FileInputStream("i.txt"); } catch (FileNotFoundException e) { e.printStackTrace(); }finally { try { in.close(); } catch (IOException e) { e.printStackTrace(); }}}}

The code is a typical file-stream operation that, unlike other code, catches two exceptions.

So, how does the bytecode instruction handle this exception

We can see that there is an Exception Table at the bottom, that is, the exception table, which records all the exception data

Take the first row of the exception table as an example, from and to respectively represent that if an exception occurs between lines 12 and 16, it will directly jump to line 19 (target).

6. Packing and unpacking

This is a subject that cannot be skirted.

As anyone with a Java background knows, Java has eight basic data types, each with a wrapper class. Int to Integer, long to long.

In general, both the underlying data type and the wrapper class can be assigned to each other. But what’s the logic?

Java code

public class Test { public static void main(String[] args) { Integer i = 1; int a = 2; int b = 3; i = a; b = i; }}

Let’s look at bytecode instructions

From the bytecode instructions, we can see that there are three disassembly operations

  • The first time, we call the valueOf method on Integer to convert constant 1 to Integer type.
  • The second time, we call the valueOf method of Integer, converting the top valueOf 2 to the type of Integer.
  • The third time, we call the intValue method, converting the Integer to int, and assigning the value to b.

The first two for packing, the latter step for unpacking.

This is the underlying implementation logic for unboxing.

Click on the attention, the first time to understand Huawei cloud fresh technology ~