Java bytecode

The computer only knows zeros and ones. This means that programs written in any language eventually need to be compiled into machine code by a compiler before they can be executed by a computer. As a result, we write programs that have to be recompiled before they can be executed on different platforms. When Java was first born, it was famously advertised as “Write once, run anywhere”.

Write Once, Run Anywhere.

To achieve this, Sun and other virtual machine vendors have released a number of JVM virtual machines that run on different platforms, all of which have a common function of loading and executing the same platform-independent bytecodes. As a result, our source code no longer has to be translated into 0 and 1 depending on the platform, but is indirectly translated into bytecode, which is then read and executed by the JVM running on the different platform, so as to achieve the purpose of writing once and running everywhere. Today, the JVM no longer only supports Java, resulting in a number of JVM-based programming languages, such as Groovy, Scala, Koltin, and so on.

The semantics of various variables, keywords, and operation symbols in the source code are eventually compiled into multiple bytecode commands. The semantic description provided by bytecode commands is significantly better than Java itself, so there are other JVA-BASED languages that provide many features that Java does not support.

example

The following steps through bytecode with a simple example.

//Main.java
public class Main {
    
    private int m;
    
    public int inc(a) {
        return m + 1; }}Copy the code

You can run the following command to generate a main. class file in the current directory.

javac Main.java
Copy the code

Open the generated class file as text with the following content:

cafe babe 0000 0034 0013 0a00 0400 0f09
0003 0010 0700 1107 0012 0100 016d 0100
0149 0100 063c 696e 6974 3e01 0003 2829
5601 0004 436f 6465 0100 0f4c 696e 654e
756d 6265 7254 6162 6c65 0100 0369 6e63
0100 0328 2949 0100 0a53 6f75 7263 6546
696c 6501 0009 4d61 696e 2e6a 6176 610c
0007 0008 0c00 0500 0601 0010 636f 6d2f
7268 7974 686d 372f 4d61 696e 0100 106a
6176 612f 6c61 6e67 2f4f 626a 6563 7400
2100 0300 0400 0000 0100 0200 0500 0600
0000 0200 0100 0700 0800 0100 0900 0000
1d00 0100 0100 0000 052a b700 01b1 0000
0001 000a 0000 0006 0001 0000 0003 0001
000b 000c 0001 0009 0000 001f 0002 0001
0000 0007 2ab4 0002 0460 ac00 0000 0100
0a00 0000 0600 0100 0000 0800 0100 0d00
0000 0200 0e
Copy the code

For the hexadecimal code in the file, aside from cafe Babe at the beginning, the rest of the text roughly translates as: what the hell……

Don’t panic heroes, let’s start with cafe Babe as we know it. The first four bytes of the file are called magic numbers. Only class files starting with “cafe Babe “are accepted by the virtual machine. These four bytes are the bytecode file identification. 0,0034 in decimal is 52, which is the major version number. The Java version number starts from 45, except for 1.0 and 1.1, which use 45.x. After each liter, the version number increases by one. That is, the JDK version that compiled the class file was 1.8.0. You can obtain the result by running the Java -version command.

Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Copy the code

The results were verified.

And then we go down to the constant pool. Instead of trying to analyze the hexadecimal file directly, which would be a bit cumbersome, we’ll analyze the class file in a more readable way.

Decompile bytecode files

Bytecode files can be decompiled using Javap, a built-in Java decompiler. You can learn about javap usage by using javap-help

Javap <options> <classes> where possible options include: -help --help -? Output this usage message -version version information -V-verbose Output additional information -L Output line number and local variable table -public Displays only public classes and members -protected Displays protected/public classes and members -package Displays package/protected/public classes and members (default) -p -private Displays all classes and members -c disassembles code -S outputs internal type signatures -sysInfo Displays system information (path, size, date, MD5 hash) -constants Displays final constants -classpath <path> Specifies the location where the user class file is to be found -cp <path> Specifies the location where the user class file is to be found -bootclasspath <path> Overwrites the location of the bootclass fileCopy the code

Run the javap -verbose -p main. class command to view the output.

Classfile /E:/JavaCode/TestProj/out/production/TestProj/com/rhythm7/Main.class
  Last modified 2018-4-7; size 362 bytes
  MD5 checksum 4aed8540b098992663b7ba08c65312de
  Compiled from "Main.java"
public class com.rhythm7.Main
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC.ACC_SUPER
Constant pool# 1:= Methodref          #4.#18         // java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#19         // com/rhythm7/Main.m:I
   #3 = Class              #20            // com/rhythm7/Main
   #4 = Class              #21            // java/lang/Object
   #5 = Utf8               m
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               LocalVariableTable
  #12 = Utf8               this
  #13 = Utf8               Lcom/rhythm7/Main;
  #14 = Utf8               inc
  #15 = Utf8               ()I
  #16 = Utf8               SourceFile
  #17 = Utf8               Main.java
  #18 = NameAndType        #7: #8          // "<init>":()V
  #19 = NameAndType        #5: #6          // m:I
  #20 = Utf8               com/rhythm7/Main
  #21 = Utf8               java/lang/Object
{
  private int m;
    descriptor: I
    flags: ACC_PRIVATE

  public com.rhythm7.Main();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   Lcom/rhythm7/Main;

  public int inc(a);
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: getfield      #2                  // Field m:I
         4: iconst_1
         5: iadd
         6: ireturn
      LineNumberTable:
        line 8: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       7     0  this   Lcom/rhythm7/Main;
}
SourceFile: "Main.java"
Copy the code

Bytecode file information

The first seven lines of information include the current location of the Class file, last modified time, file size, MD5 value, which file was compiled from, the fully qualified name of the Class, the JDK minor version number, and the major version number. ACC_PUBLIC, ACC_SUPER, ACC_SUPER, ACC_SUPER, ACC_SUPER, ACC_SUPER, ACC_SUPER, ACC_SUPER

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the type is Public
ACC_FINAL 0x0010 Only the class can set whether or not to be declared final
ACC_SUPER 0x0020 Whether the new semantics of the Invokespecial bytecode instruction are allowed.
ACC_INTERFACE 0x0200 Flag this is an interface
ACC_ABSTRACT 0x0400 Is it abstract? For interfaces or abstract classes,

The subflag value is true, and the other types are false
ACC_SYNTHETIC 0x1000 Indicates that this class is not generated by user code
ACC_ANNOTATION 0x2000 This is a note
ACC_ENUM 0x4000 Flag This is an enumeration

Constant pool

Constant pool means Constant pool. A constant pool can be thought of as a repository of resources in a Class file. There are two main types of constants: Literal and Symbolic References. Literals are similar to constants in Java, such as text strings and final constants, while symbolic references are concepts related to compilation principles, including the following three types:

  • Fully Qualified Name of class and interface
  • Field name and Descriptor (Descriptor)
  • The name and descriptor of the method

Unlike C/C++, the JVM dynamically links the Class file at load time, meaning that these field and method symbol references only get the actual memory entry address after runtime conversion. When the virtual machine is running, symbolic references need to be retrieved from the constant pool, parsed and translated into specific memory addresses during class creation or runtime.

To view the bytecode content directly by decompiling the file:

#1 = Methodref          #4.#18         // java/lang/Object."<init>":()V
#4 = Class              #21            // java/lang/Object
#7 = Utf8               <init>
#8 = Utf8               ()V
#18 = NameAndType        #7: #8          // "<init>":()V
#21 = Utf8               java/lang/Object
Copy the code

The first constant is a method definition that points to constants 4 and 18. And so on for constants 4 and 18. Finally, we can concatenate the comment to the right of the first constant:

java/lang/Object."<init>":()V
Copy the code

This can be interpreted as a declaration of the class’s instance constructor. Since the Main class does not override the constructor, it calls the constructor of its parent class. The immediate parent of Main is Object. The default return value of this method is V, which is void, with no return value.

Similarly, the second constant can be analyzed:

#2 = Fieldref           #3.#19         // com/rhythm7/Main.m:I
#3 = Class              #20            // com/rhythm7/Main
#5 = Utf8               m
#6 = Utf8               I
#19 = NameAndType        #5: #6          // m:I
#20 = Utf8               com/rhythm7/Main
Copy the code

Here we declare a field m of type I, where I is an int. The corresponding types of bytecode are as follows:

Identification character meaning
B Basic type byte
C Base type char
D Base type double
F Base type float
I Basic int
J Base type long
S Basic type short
Z Basic type Boolean
V Special type void
L Object type, ending with a semicolon, such as Ljava/lang/Object;
For array types, each bit is described with a prefixed “[” character. For example, defining a dimensional array of type java.lang.String[][] will be recorded as “[[Ljava/lang/String;”

Method table collection

After the constant pool is the description of the methods inside the class, represented as a collection of tables in the bytecode, regardless of the hexadecimal content of the bytecode file, let’s go straight to the decompiled content.

private int m;
  descriptor: I
  flags: ACC_PRIVATE
Copy the code

Here we declare a private variable m of type int and return int

public com.rhythm7.Main();
   descriptor: ()V
   flags: ACC_PUBLIC
   Code:
     stack=1, locals=1, args_size=1
        0: aload_0
        1: invokespecial #1                  // Method java/lang/Object."<init>":()V
        4: return
     LineNumberTable:
       line 3: 0
     LocalVariableTable:
       Start  Length  Slot  Name   Signature
           0       5     0  this   Lcom/rhythm7/Main;
Copy the code

Here is the constructor: Main(), which returns void and exposes the method. The main attributes in code are:

  • stack

The maximum operand stack that the JVM assigns to the depth of the operands in the Frame, which is 1

  • locals:

The unit is Slot. Slot is the minimum unit of 4 bytes used by a VM to allocate memory for local variables. Method parameters (including the hidden parameter this in the instance method), display exception handler parameters (exceptions defined by the catch block in the try catch), and local variables defined in the method body need to be stored using the local variable table. It is worth noting that the size of locals does not necessarily equal the sum of the slots occupied by all local variables, since slots in local variables can be reused.

  • args_size:

The number of method arguments, in this case 1, because each instance method has a hidden parameter this

  • attribute_info

Method body, 0,1,4 is the bytecode “line number “, which means to push the first reference type local variable to the top of the stack, then execute the instance method of that type, which is the first variable stored in the constant pool, which is” Java /lang/Object.”” ()V” in the comment, and then execute the return statement. End method.

  • LineNumberTable

This property describes the mapping between the source line number and the bytecode line number (the bytecode offset). You can use the -g: None or -g:lines options to disable or require this information to be generated. If you choose not to generate LineNumberTable, you will not be able to obtain the source line number of the exception when the program runs abnormally, and you will not be able to debug the program based on the number of lines of the source code.

  • LocalVariableTable

This property describes the relationship between local variables in the frame stack and variables defined in the source code. You can cancel or generate this information by using -g: None or -g:vars. If this information is not generated, then when someone references the method, the parameter name will not be obtained. Instead, placeholders such as arg0, arg1 will be used. Start indicates the line on which the local variable is visible, length indicates the number of visible lines, Slot indicates the frame stack position, Name indicates the variable Name, and then the type signature.

The same can be said of another “inc()” method in the Main class: this is pushed, field #2 is taken and placed at the top of the stack, 1 of int is pushed, the top two values are added together, and an int is returned.

SourceFile

Source file name


In actual combat

Analysis of the try-catch – finally

From the simplest example above, you can get an idea of what source code looks like when compiled into bytecode. Let’s use this knowledge to analyze a Java problem: What value does this method return when an exception occurs and when no exception occurs? Think about it, and then we’ll see what happens.

public class TestCode {
    public int foo(a) {
        int x;
        try {
            x = 1;
            return x;
        } catch (Exception e) {
            x = 2;
            return x;
        } finally {
            x = 3; }}}Copy the code

Ask what the return value of foo() is when no exception occurs and when an exception occurs. Pull one’s old tricks

javac TestCode.java
javap -verbose TestCode.class
Copy the code

View the contents of the foo method of bytecode:

public int foo(a);
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=5, args_size=1
         0: iconst_1 // type 1 on the stack -> top =1
         1: istore_1 // Store the value of int at the top of the stack into the second local variable -> local 2=1
         2: iload_1 // Push the second int local variable to the top of the stack -> top =1
         3: istore_2 / /!!!!! Store the top int value into the third local variable -> local 3=1
         
         4: iconst_3 //int type 3 on the stack -> top =3
         5: istore_1 // Store the value of int at the top of the stack into the second local variable -> local 2=3
         6: iload_2 / /!!!!! Push the third int local variable to the top of the stack -> top of the stack =1
         7: ireturn // Returns the top int value from the current method ->1
         
         8: astore_2 // -> local 3=Exception
         9: iconst_2 // -> top of stack =2
        10: istore_1 // -> local 2=2
        11: iload_1 / / = 2 - > stack
        12: istore_3 / /!!!!! - > local 4 = 2
        
        13: iconst_3 // -> top of stack =3
        14: istore_1 // -> local 1=3
        15: iload_3 / /!!!!! - = 2 > stack
        16: ireturn / / - > 2
        
        17: astore        4 // Store the top reference value into the fifth local variable =any
        19: iconst_3 // add int 3 to the stack -> top 3
        20: istore_1 // Store the first int value at the top of the stack into the second local variable -> local 2=3
        21: aload         4 // Push the fifth local variable (reference type) to the top of the stack
        23: athrow // Throw the exception at the top of the stack
      Exception table:
         from    to  target type
             0     4     8   Class java/lang/Exception // Lines 0 through 4 correspond to the exception stored in #8
             0     4    17   any // Exception other than Exeption
             8    13    17   any
            17    19    17   any
Copy the code

The same operation is performed in 4,5, and 13,14 of the bytecode, which pushes the int 3 to the top of the operand stack and stores the second local variable. This is exactly what our source code contains in the finally statement block. That is, when the JVM handles an exception, the finally statement is repeated on every possible branch before the return statement is executed. It is worth noting, however, that the x assignment in the finally statement block does not take effect depending on the order in which variables are placed on the stack and assigned. So the final running result is:

  • If no exception occurs: return 1
  • When an exception occurs: return 2
  • An Exception that is not an Exception or its subclasses is thrown and no value is returned

The above examples are from the table of virtual machine bytecode instructions in In-depth Understanding of Advanced features and Best practices of the Java Virtual Machine JVM, and can also be obtained in In-depth Understanding of Advanced features and Best practices of the Java Virtual Machine JVM – Appendix B.

Kotlin function extension implementation

Kotlin provides a language feature for extension functions that allows you to add custom methods to any object. The following example adds the “sayHello” method to an Object

//SayHello.kt
package com.rhythm7

fun Any.sayHello(a) {
    println("Hello")}Copy the code

Once compiled, use Javap to look at the bytecode that generated the sayHellokt.class file.

Classfile /E:/JavaCode/TestProj/out/production/TestProj/com/rhythm7/SayHelloKt.class
Last modified 2018-4-8; size 958 bytes
 MD5 checksum 780a04b75a91be7605cac4655b499f19
 Compiled from "SayHello.kt"
public final class com.rhythm7.SayHelloKt
 minor version: 0
 major version: 52
 flags: ACC_PUBLIC.ACC_FINAL.ACC_SUPER
Constant poolOmit constant pool part of bytecode{
 public static final void sayHello(java.lang.Object); descriptor: (Ljava/lang/Object;) V flags: ACC_PUBLIC, ACC_STATIC, ACC_FINAL Code: stack=2, locals=2, args_size=1
        0: aload_0
        1: ldc           #9                  // String $receiver
        3: invokestatic  #15                 // Method kotlin/jvm/internal/Intrinsics.checkParameterIsNotNull:(Ljava/lang/Object; Ljava/lang/String;) V
        6: ldc           #17                 // String Hello
        8: astore_1
        9: getstatic     #23                 // Field java/lang/System.out:Ljava/io/PrintStream;
       12: aload_1
       13: invokevirtual #28                 // Method java/io/PrintStream.println:(Ljava/lang/Object;) V
       16: return
     LocalVariableTable:
       Start  Length  Slot  Name   Signature
           0      17     0 $receiver   Ljava/lang/Object;
     LineNumberTable:
       line 4: 6
       line 5: 16
   RuntimeInvisibleParameterAnnotations:
     0:
       0: #7()
}
SourceFile: "SayHello.kt"
Copy the code

If you look at the header,koltin has generated a class for the file SayHello with the name “com.rhythm7.sayhellokt “. Sayhello. kt cannot be instantiated because we didn’t want SayHello to be an instantiable object class when we first wrote sayHello. kt. SayHelloKt doesn’t have a constructor. Look at the only method: find that the concrete implementation of any.sayHello () is of the form of a statically immutable method:

public static final void sayHello(java.lang.Object);
Copy the code

So when we use any.sayHello () elsewhere, we are essentially calling Java’s sayHellokt.sayHello (Object) method. Incidentally, when the extended method is Any, which means that Any is non-null, the compiler checks for non-null arguments at the beginning of the method body. Is called kotlin. JVM. Internal. Intrinsics. CheckParameterIsNotNull (Object value, String paramName) method to check the incoming whether Any type of Object is empty. If we extend the function to Any? .sayhello (), the bytecode will not appear in the compiled file.