An overview of the

  • Java bytecode is to virtual machines what assembly language is to computers, basic execution instructions.
  • Java virtual machine inserts consist of a one-byte number representing the meaning of a particular operation (called Opcode) followed by zero or more parameters representing the operation (called Operands). Because the Java virtual machine has an operand-stack-oriented structure rather than a register-oriented structure, most instructions contain no operands and only one opcode.
  • By limiting the length of Java virtual machine opcodes to one byte (that is, 0 to 255), this means that the total number of opcodes in the instruction set cannot exceed 256.
  • Official documentation :docs.oracle.com/javase/ specs/ JVMS/se8/ HTML/jVMS-6.html
  • Familiarity with virtual machine instructions is of great value for dynamic bytecode generation, decompiling Class files, and patching Class files. Therefore, bytecode reading is a basic skill to understand the Java virtual machine and requires proficiency in common instructions.

Execution model

Except for exception handling, the Java virtual machine interpreter can use the following pseudocode as the most basic execution model to understand

Do {automatically calculates the value of the PC register plus 1; Extract the opcodes from the bytecode stream according to the indicated position of the PC register; If (bytecode exists operands) fetches operands from the bytecode stream; Perform the operation defined by the opcode; }while(bytecode length >0);Copy the code

Bytecode and data types

In the Java virtual machine instruction set, most instructions contain the data type information corresponding to their operations. For example, the ILoAD directive loads int data from a local variable table into the operand stack, while the FLOAD directive loads float data.

For most data type-related bytecode instructions, there are special characters in their opcode mnemonics to indicate which data type is served specifically:

  • I stands for operation on int data
  • On behalf of the 1 l ong
  • S is short
  • B on behalf of the byte
  • C on behalf of the char
  • F on behalf of the float
  • D on behalf of the double

There are also instructions that have no letters in the mnemonic that explicitly indicate the type of operation, such as the ArrayLength instruction, which has no special characters for data types, but whose operands can only ever be objects of an array type.

Other instructions, such as the unconditional jump instruction goto, are data type independent.

Most directives do not support the integer types byte, char, and short, or even Boolean. The compiler extends byte and short data with sign-extend – to the corresponding int at compile time or run time, and Boolean and char data with zero-extend – to the corresponding int. Similarly, arrays of types Boolean, byte, short, and CHAR are converted to use the corresponding bytecode instructions of type int. Therefore, most operations on Boolean, byte, short, and CHAR data,

Classification of instruction

It takes a lot of time to fully introduce and learn these instructions. To make it easier to get familiar with and understand these basic instructions, the set of bytecode instructions in the JVM has been broken down into nine broad categories by purpose.

  • Load and store instructions
  • Arithmetic instructions
  • Type conversion instruction
  • Object creation and access instructions
  • Method calls and return directives
  • Operand stack management instructions
  • Comparison control instruction
  • Exception handling instruction
  • Synchronous control instruction

When doing value dependent operations:

  • An instruction that fetches data from local variables, constant pools, objects in the heap, method calls, system calls, etc., which are pushed onto the operand stack (possibly values, possibly references to objects).
  • An instruction can also fetch one or more values from the operand stack (pop multiple times), complete assignments, addition, subtraction, multiplication, division, method arguments, system calls, and so on.

1. Load and store instructions

Load and store instructions are used to pass data back and forth between the local variable table and operand stack of a stack frame.

2. Common instructions

Commonly used instructions
Xload, xload_ (x = I, 1, f, d, a, n = 0 to 3)
2. Constant stack instruction: add a constant to the operand stack :bipush, sipush, LDC, LDc_w, 1dc2_W, aconST_NULL, iconst_M1, iconst_, LCONST_, fCONST_, dconst_
3. Store a value from the operand stack to the local variable table :xstore, xstore_ (where x is I, L, f, D, a, and n is 0 to 3); Xastore (where x is I, 1, f, D, a, B, C, s)
4. Instruction to expand the access index of local variable table: wide.
Some of the instruction mnemonics listed above end in Angle brackets (e.g. Iload_ n). These instruction mnemonics actually represent a set of instructions (for example, ILoAD_ represents iloAD_0, ILOAD_1, ILOAD_2, and ILOAD_3). These sets of instructions are special forms of a general instruction with one operand (such as ILOAD). For these sets of special instructions, there is no operand on the surface, so there is no need to take the operand, but the operand is hidden in the instruction.
  • For example, iloAD_0: pushes data at index 0 in a locally variable list onto the operand stack

In addition, their semantics are exactly the same as the native generic instructions (for example, iloAD_0 has exactly the same semantics as ilOAD instructions with operands of 0). The letters between Angle brackets specify the data types of the instruction’s implied operands, representing non-negative integers, I for int, <1> for 1ONG, float, and double.

Talk about operand stack and local variable table

1. Operand Stacks

As we know, Java bytecode is the set of instructions used by the Java virtual machine. As such, it is inseparable from the Stack-based computing model of the Java Virtual Machine. . During interpretation execution, whenever a Java method is allocated a frame, the Java virtual machine often needs to carve out an extra space as an operand stack to hold the calculated operands and return results.

The Java virtual machine requires that the operands of each instruction be pushed into the operand stack before it can be executed. When an instruction is executed, the Java virtual machine pops up the operands required by the instruction and pushes the result of the instruction back onto the stack.Take the addition instruction iadd for example. Assuming that the two elements at the top of the stack are int 1 and int 2 before executing this instruction, iadd will pop these two ints and push the sum and int 3 onto the stack.Since iADD consumes only two elements at the top of the stack, iADD does not care about the existence of the element 2 away from the top of the stack, namely the question mark in the figure, let alone modify it.

2. Local variation scale

Another important part of the Java method frame is the local variable area, where bytecode programs can cache the results of calculations. In effect, the Java virtual machine treats the local variable area like an array, holding the this pointer (non-static methods only), the parameters passed in, and the local variables in the bytecode.

And the operand stack,long type and double type value will occupy two units, the other types only occupy a unit.

For example:

public void foo(long l,float f){ { int i = 0; } { String s = "Hello,World"; }}Copy the code

Corresponding diagram:

  • The first one is this
  • The second L is long and occupies two slots
  • The third f is of type float and occupies one slot
  • I and S share the same slot

The part of the stack frame that is most relevant for performance tuning is the local variable table. Variables in the local variable table are also important garbage collection root nodes, as long as objects referenced directly or indirectly in the local variable table are not collected.

When a method executes, the virtual machine uses a local variable table to complete the method’s delivery

Local variable pushdown instruction

The local variable pushdown instruction pushes data from a given local variable table onto the operand stack.

  • These directives can be broadly classified as:
    • Xload_ (x is I, L, f, d, a, n is day to 3)
    • Xload (x = I, l, f, d, a)
    • Note: Here, the value of x indicates the data type.
  • The instruction xload_n represents pushing the NTH local variable onto the operand stack, such as iloAD_1, fload_0, aload_0, etc. Where aload_n means pushing an object reference.
  • The xload directive pushes local variables into the operand stack by specifying parameters. When used, the number of local variables may exceed four, such as iload, fload, etc.

Constant push instruction

The function of constant push instruction is to push constant onto the operand stack. According to different data types and pushing contents, it can be divided into const series, push series and LDC instructions.

Const series of instructions: Used to push specific constants that are implicit in the instruction itself. The commands are :iconst_< I > (I from -1 to 5), lCONST_ < L > (L from 0 to 1), fCONST_ < F > (f from 0 to 2), dCONST_ < D > (D from 0 to 1), aconst_NULL. For instance,

  • Iconst_m1 pushes -1 onto the operand stack;
  • Iconst_ (x = 0 to 5) pushes x onto the stack:
  • Lconst_0 and LCONST_1 push long integers and 1, respectively;
  • Fconst_0, fCONST_1, and fCONST_2 push floating point numbers, 1, and 2, respectively.
  • Dconst_0 and dCONST_1 push double and 1, respectively
  • Aconst_nul pushes nu11 onto operand stack;

It is not difficult to find the rule from the naming of instructions. The first character of an instruction mnemonic always likes to represent data types. I represents an integer,1 represents a long integer, F represents a floating point number, D represents a double precision floating point number, and it is customary to use a to represent object references. If an instruction implies an operation parameter, it is given in underlined form.

Instruction push series: mainly includes bipushsipush and. They differ in the type of data they receive, with bipush receiving 8-bit integers as arguments and SIpush receiving 16-bit integers, and both pushing arguments onto the stack.

Instruction LDC series: If none of the above is sufficient, the versatile 1DC directive can be used, which takes an 8-bit argument to the index of an int, float, or String in the constant pool and pushes the specified contents onto the stack. Similarly, 1dc_w takes two 8-bit parameters and supports indexes greater than 1DC. If the element to be pressed is of type long or double, the 1dc2_w directive is used similarly.

The summary is as follows:

type Constant instructions The scope of
Int (Boolean, byte, char, short) iconst (1, 5)
bipush [128127]
sipush [- 32768327, 67]
ldc any int value
long lconst 0, 1
ldc any long value
float fconst 0,
ldc any float value
double dconst 0, 1
ldc any double value
reference aconst null
ldc String Literal, Class lteral

Ex. :

Int exampleExamples of other types

Unload the local variable table instruction

The off-stack loading local variable table instruction is used to load the specified position of the local variable table after the top element in the operand stack is popped, and used to assign values to local variables. This type of instruction mainly exists in the form of store, such as xstore (x is I, 1, f, D, a), xstore_n (x is I, 1, f, d, a, n is 0 to 3).

  • Where the instruction istore_n pops an integer from the operand stack and assigns it to the local variable index n position.
  • Since the instruction xstore has no implicit parameter information, it needs to provide a byte parameter class to specify the location of the target local variable table.

In general, commands such as store require an argument that specifies the position in the local variable table to place the pop-up element. However, in order to minimize the instruction size, a special ISTore_1 instruction is used to indicate that the pop-up element is placed at position 1 in the local variable table. Similarly, istore_0, istore_2, and istore_3 represent an element popped from the top of the operand stack at positions 0, 2, and 3 of the local variable table, respectively.

Because the first few positions of the local variable table are often used, this increases the number of instructions but greatly reduces the size of the generated bytecode. If the local variable table is large and needs to be stored in more than 3 slots, the istore instruction can be used with an additional parameter indicating the slot location to be stored.

Foo method parsing

  • The maximum depth of operand stack is 5

  • The two code blocks share a slot, and the second code block reuses the slot of the first code block after the execution of the first code block

  • As is shown in

2. Arithmetic instruction

  1. Arithmetic instructions are used to perform a particular operation on the values on two operand stacks and push the result back onto the operand stack

  2. Roughly speaking, arithmetic instructions can be divided into two types: those that operate on integer data and those that operate on floating-point data.

  3. Byte, Short, CHAR, and Boolean Types In each category, there are special arithmetic instructions for the specific data type of the Java VIRTUAL machine. However, there is no direct support for byte, short, CHAR, and Boolean arithmetic instructions. These data operations are handled by ints. In addition, arrays of types Boolean, byte, short, and CHAR are converted to use the corresponding bytecode instructions of type int. Actual and operation types in the Java VIRTUAL machine

    The actual type Operation type classification
    boolean int one
    byte int one
    char int one
    short int one
    int int one
    float float one
    reference reference one
    returnAddress returnAddress one
    long long two
    double double two
  4. Data operations can cause overflows, such as the addition of two large positive integers, which can result in a negative number. The Java VIRTUAL machine specification does not specify the ArithmeticException result of overinteger data overflow, but only the division instruction and the ArithmeticException that the virtual machine throws when the divisor is 0 in the ArithmeticException instruction.

  5. Operation mode to the closest number rounding mode: the JVM requirements at the time of floating-point calculation, all calculations must be rounding to the appropriate precision, the precise results must be rounding is closest to the precise value can be said, if there are two can be expressed in the form of and as close to the value, it will give preference to the least significant bit is zero; Rounding to zero: When converting a floating point number to an integer, this mode selects the closest, but no greater, number of the target value type as the most accurate rounding result;

  6. NaN value usage When an operation produces an overflow, it is represented by a signed infinity, and if the result of an operation is not mathematically defined, it is represented by a NaN value. And all arithmetic operations that use NaN values as operands return NaN;

    public void method1(){ int i = 10; Double j = I / 0.0; System.out.println(j); // double d1 = 0.0; Double d2 = d1/0.0; System.out.println(d2); //NaN: not a number }Copy the code

All arithmetic instructions

All arithmetic instructions include

  • Add instructions :iadd, ladd, fadd, dadd
  • Subtraction instructions: ISub, LSUB, fsub, dsub
  • Multiplication instruction: IMU, LMU, FMUl, DMUL
  • Division instructions: IDIV, Ldiv, fdiv, ddiv
  • Remainder instructions: iREM, LREM, frem, drem // REMAINDER: remainder
  • Negation :ineg, lneg, fneg, dneg //negation: negation
  • Increment instruction :iinc
  • Bit operation instructions, which can be divided into:
    • Displacement commands: ISHL, ISHR, IUSHR, LSHL, LSHR, LUShr
    • Bitwise or instruction: IOR, LOR
    • Bit and instruction: IAND, LAND
    • Xor instruction by bit: IXOR, LXOR
  • Comparison commands: DCMPG, DCMP1, FCMPG, FCMP1, LCMP

Case 1:

public void method2(){
    float i = 10;
    float j = -i;
    i = -j;
}
Copy the code

Bytecode parsing

0 LDC #4 <10.0> // press I 2 fstore_1 // store the number at index 1 on the operand stack 3 fload_1 // load the number at index 1 on the operand I 4 fneg // invert 5 fstore_2 for I Fstore_1 // Store j on index I of operand stack 1 and change the value of I 9 return // The method returnsCopy the code

Case 2

Augmentation and addition operations

Example of xOR operation

Static method operand stack does not have this

  • So the method entry parameter I occupies the 0th index

Before ++ and after ++ problem

  • If assignment is not involved, the bytecode view is the same
    public void method6(){
        int i = 10;
        //i++;
        ++i;
    }
    Copy the code
    0 bipush 10
    2 istore_1
    3 iinc 1 by 1
    6 return
    Copy the code
  • When it comes to assignment
    public void method7(){
        int i = 10;
        int a = i++;
    
        int j = 20;
        int b = ++j;
    }
    Copy the code
    • I++ is evaluated first
    • ++ I is evaluated after assignment
     0 bipush 10
     2 istore_1
     3 iload_1
     4 iinc 1 by 1
     7 istore_2
     8 bipush 20
    10 istore_3
    11 iinc 3 by 1
    14 iload_3
    15 istore 4
    17 return
    Copy the code

Compare instructions

  • The compare instruction compares the size of the two elements at the top of the stack and pushes the result onto the stack.
  • The comparison commands are DCMPG, DCMPL, FCMPG, FCMPL, and LCMP.
    • Like the previous instructions, the first character d represents type double, f represents float, and 1 represents long.
  • For double and float numbers, there are two versions of the comparison instruction each due to NaN. For float, there are FCMPG and FCMPL directives. The difference between them is that when a NaN value is encountered during a numeric comparison, the result is different.
  • Instruction DCMPL and DCMPG are similar, and their meanings can be inferred from their names, which will not be described here.
  • The LCMP instruction is for long integers. Since long integers have no NaN value, there is no need to prepare two sets of instructions.

For example, the instructions FCMPG and FCMPL both pop two operands from the stack and compare them. Set the element at the top of the stack as v2, and the element at the second position at the top of the stack as v1. If V1 =v2, press 0. If v1 > v2, press 1; If v1 is less than v2, press 1. The difference between the two instructions is that FCMPG presses 1 if a NaN value is encountered, whereas FCMPL presses 1.

Type conversion instruction

1. Type conversion instructions

  • The type conversion instruction can convert two different numeric types to each other. (Chiefly of the seven types except Boolean)
  • ② These conversions are generally used to implement explicit type conversions in user code, or to deal with the problem that data type-related instructions in the bytecode instruction set cannot correspond to data types one by one.

Number Conversions for the design of experiments

  1. Conversion rules:

The Java VIRTUAL machine directly supports wide type conversion of the following values (by the creation of numeric Conversion for the Korean Manufacturing Process). That is, there is no instruction to execute, including:

  • From int to 1ONG, float, or double. The corresponding instructions are: i2L, i2f, i2D
  • From long to float to double. The corresponding commands are L2F and L2D
  • From float to double. The corresponding instruction is :f2d

Simplified as:Int one > long one > float one > double

  1. Accuracy loss problem
  • A wide conversion does not lose information by exceeding the maximum value of the target type. For example, converting from int to long or from int to double does not lose any information, and the values are exactly the same.
  • When a value of type int, long, float, or long double is converted, precision loss may occur — it may be lost

The value of the least significant bits, converted to a floating point value that is the correct integer value based on IEEE754’s closest rounding mode.

  • Although it is actually possible to lose precision with a wide cast, such a cast will never cause the AVA VIRTUAL machine to throw a runtime exception.
    @test public void upCast2(){int I = 123123123; float f = i; System.out.println(f); // long l = 123123123l; l = 123123123123123123L; double d = l; System.out.println(d); // 123123123123120 precision missing}Copy the code

3. Note that broadcasting from byte, CHAR, and short to int does not actually exist. When the byte type is converted to int, the virtual machine does no actual conversion, but simply swaps the two data through the operand stack. When converting byte to long, i2L is used. It can be seen that byte is already treated as int, and short is also treated similarly. This processing method has two characteristics:

  • On the one hand, the actual data type can be reduced. If a set of instructions is prepared for both short and byte, the number of instructions will be greatly increased. However, the current design of virtual machine only wants to use one byte to represent instructions, so the total number of instructions cannot exceed 256. It also makes sense to treat short and byte as ints.
  • On the other hand, since slots in the local variable table are fixed at 32 bits, whether byte or short are stored in the local variable table, 32 bits of space will be occupied. From this point of view, there is no need to distinguish between these data types.

Narrowing type Conversion (Numeric Conversion)

  1. Transformation rules

The Java virtual machine also directly supports the following narrow type conversions:

  • From int to byte, short, or char. The corresponding instructions are: I2B, I2C, i2S
  • From long to int. The corresponding instruction is l2I
  • Directives from float to int or long are f2i and f2l
  • From double to int, long, or float. The corresponding instructions are: D2i, D2, d2F
  • S2b does not exist, and the actual case is treated as I2b
     public void downCast3(){
        short s = 10;
        byte b = (byte)s;
    }
    
    0 bipush 10
    2 istore_1
    3 iload_1
    4 i2b
    5 istore_2
    6 return
    Copy the code
  1. Accuracy loss problem

Narrowing type conversions can result in conversion results with different signs and orders of magnitude, and therefore, the conversion process is likely to result in numeric loss of precision. Although data-type narrowing conversions can cause upper bound overflows, lower bound overflows, and loss of precision, narrowing conversions that are explicitly specified in the Java Virtual Machine specification for numeric types can never cause a virtual machine to throw a runtime exception

@Test public void downCast4(){ int i = 128; byte b = (byte)i; Println (b); //byte up to 127; //byte up to 127; } / / - 128.Copy the code
  1. added

3.1 When converting a floating-point value to an integer of type T (limited to either int or long), the following conversion rules are followed:

  • If the floating-point value is NaN, the result of the conversion is 0 of type int or long.
  • If a floating-point value is not infinite, it is rounded to zero using IEEE 754 to obtain the integer value V, which is converted to v if v is within the representation of the target type T (int or long). Otherwise, v will be converted to the largest or smallest positive number that T can represent, depending on the sign of v

3.2 When narrowing a double to float, the following conversion rules are followed: round to the nearest number by rounding a number that can be represented as float. The final result is judged according to the following three rules:

  • If the absolute value of the conversion result is too small to be represented using float, a plus or minus zero of type float is returned.
  • If the absolute value of the conversion result is too large to be represented using float, positive or negative infinity of type float is returned.
  • NaN values of type double are converted to NaN values of type float by convention.
  • Ex. :
    @test public void downCast5(){double d1 = double.nan; // 0.0/0.0 int I = (int)d1; System.out.println(d1); //NaN System.out.println(i); Double d2 = double.POSITIVE_INFINITY; // Infinity long l = (long)d2; int j = (int)d2; System.out.println(l); //9223372036854775807 -> 9223372036854775807 system.out. println(long.max_value); //9223372036854775807 system.out. println(long.max_value); //9223372036854775807 System.out.println(j); Println (integer.max_value); //2147483647 -> 2147483647 system.out.println (integer.max_value); //2147483647 float f = (float)d2; System.out.println(f); //Infinity float f1 = (float)d1; System.out.println(f1); //NaN }Copy the code

4. Object creation and access instructions

Java is an object-oriented programming language, and the virtual machine platform has made deep support for object-oriented from the level of bytecode. There are a series of instructions dedicated to object manipulation, which can be further subdivided into create instructions, field access instructions, array manipulation instructions, and type checking instructions.

Create instruction

Although class instances and arrays are objects, the Java virtual machine creates and manipulates class instances and arrays using different bytecode instructions:

  1. Directives for creating class instances:
  • Directive to create class instances :new
    • It takes an operand, an index to the constant pool, representing the type to be created, and when it’s done, pushes the reference to the object onto the stack.

  1. Instructions for creating arrays:
  • Instructions for creating arrays :newarray, anewarray, multianewarray.
    • Newarray: Creates an array of primitive types
    • Anewarray: Creates an array of reference types
    • Multianewarray: Creates multidimensional arrays
    • The reason strArray is anewArray instead of multianewArray in the following example is because strArray creates a one-dimensional array in the heap

The above creation instructions can be used to create objects or arrays, and because objects and arrays are widely used in Java, they are often used.

Field access instruction

Once an object is created, you can retrieve fields or array elements in an object instance or array instance through object access instructions.

  • Directives to access class fields (static fields, or class variables) : getStatic, putStatic
  • Directives that access class instance fields (non-static fields, or instance variables) : getField, putfield

For example, the getStatic directive contains an operand that is the Fieldref index to the constant pool. Its job is to get the object or value Fieldref specified and push it onto the operand stack.

 public void sayHello(){
    System. out. println ("hello");
 }
Copy the code

Corresponding bytecode instructions:

0 getstatic #8 <java/lang/System.out> 
3 ldc #9 <hello>
5 invokevirtual#10<java/io/PrintStream.println> 
8 return
Copy the code

Here is:



Feel getStatic and putStatic with getField and putField

Array manipulation instruction

  1. Array operation instructions mainly include: Xastore and Xaload instructions. Specific as follows:
  • The instruction to load an array element into the operand stack :baload, caload, Saload, iaload, laload, faload.daload, aaload
  • Instructions to store the values of an operand stack in an array element :bastore, Castore, sastore, iastore, laStore.fastore, dastore, aastore

namely

An array type Loading instructions Storage instructions
Byte (Boolean) baload bastore
char caload castore
short saload sastore
int iaload iastore
long laload lastore
float faload fastore
double daload dastore
reference aaload aastore

  • The instruction to take the length of an array: arrayLength
    • This instruction pops the array element at the top of the stack, gets the length of the array, and pushes the length onto the stack.

  1. instructions
  • The xaload instruction pushes the elements of an array, such as saload and caload, which push the elements of an array into a short array and a char array respectively. When executing the instruction Xaload, it requires that the top element of the stack in the operand be array index I and the second element of the top of the stack be array reference A. The instruction will pop the top two elements and push a[I] back onto the stack.
  • Xastore is specific to array operations. Iastore, for example, is used to assign a value to a given index of an int array. Before iastore executes, three elements need to be prepared at the top of the operand stack: value, index, and array reference. Iastore pops these three values and assigns them to the location of the specified index in the array.

Type checking instruction

Directives that check class instances or array types :instanceof, checkcast.

  • The checkcast directive checks whether a cast can be cast. If it can, the checkcast directive does not change the operand stack, otherwise it throws a ClassCastException.
  • The instanceof directive is used to determine whether a given object is an instanceof a class, pushing the result onto the operand stack.