“This article has participated in the weekend learning plan, click to see details”

preface

Some time ago when I was looking at the algorithm, I felt dizzy. So THESE days I picked up the book In-depth Understanding of Java Virtual Machine, and this time I mainly read the third part of the book. I always believe that knowledge is constantly summed up and constantly learning each other, so that only in this way can I apply what I have learned and have a solid foundation.

concept

The bytecode

Bytecode refers to the fixed format.class files generated by.java files in Java after compilation (javac) for use by the JVM. They are called bytecode files because bytecode files are made up of hexadecimal values, and the JVM reads them in groups of two hexadecimal values, that is, one byte. The JVM is also optimized for different operating systems and platforms, which is the root of Java’s claim to compile once and run anywhere.

This leads to the problem that, thanks to the JVM specification, as long as we can eventually generate bytecode files that conform to the JVM specification, we can run them on the JVM, which leads to other languages that run on the JVM (Scale, Kotlin, Groovy), Features and syntactic sugar that Java does not have can be extended in other languages.

Bytecode enhancement

Bytecode enhancement refers to modifying Java bytecode to enhance its functionality after it has been generated, in the same way as modifying the binaries of an application. The application scenario of bytecode enhancement focuses on reducing redundant code and shielding developers from the underlying implementation details.

Implementation mechanisms

  • Extend the methods of the original class by creating a subclass of the original class, that is, dynamically creating the class to inherit from the original class.
  • Directly modify the original class generatedClassFile, which is used during tracing of many classes (runtime modification, class-load modification of bytecode information).

basis

Bytecode format

A.java file is compiled (javac) to produce a.class file.

As shown below, the original code is on the left and the compiled bytecode is on the right:

Bytecode file parsing:

  • Magic number: EachClassThe first four bytes of the file represent the magic number, which indicates whether the file is acceptable to the virtual machineClassFile. Magic number fixed values are:CAFEBABE

Interestingly, the magic number’s fixed value isJavaThe father ofJames GoslingFormulated forCafeBabe(Coffee baby), whileJavaIcon for a cup of coffee.

  • Version number: the first two bytes represent the minorversion number minorversion, and the last two bytes represent the majorversion number majorversion. Converting a four-byte hexadecimal value to decimal is the corresponding version number.

  • Constant pool: The constant pool size is not fixed, depending on the number of constants in the class. The preferred value is a 2-byte hexadecimal number that defines the constant pool length, calculates the decimal value of the constant pool, and subtracts it by one to get the number of constant pools.

    Constant pool type:

    constant type describe
    CONSTANT_Utf8_info The tag bit is 1 The character string is utF-8 encoded
    CONSTANT_Integer_info The tag bit is 3 Integer literal
    CONSTANT_Float_info The tag bit is 4 Floating point literals
    CONSTANT_Long_info The tag bit is 5 Long integer literals
    CONSTANT_Double_info The tag bit is 6 Double literals
    CONSTANT_Class_info The tag bit is 7 Symbolic reference to a class or interface
    CONSTANT_String_info The tag bit is 8 A literal of type string
    CONSTANT_Fieldref_info The tag bit is 9 Symbolic reference to a field
    CONSTANT_Methodref_info The tag bit is 10 Symbolic references to methods in a class
    CONSTANT_InterfaceMethodref_info The tag bit is 11 Symbolic references to methods in the interface
    CONSTANT_NameAndType_info tag The flag bit is 12 Field and method names and symbolic references to types
    CONSTANT_Method-Handle_info The tag bit is 15 Method handles
    CONSTANT_Method-Type_info The tag bit is 16 Method type
    CONSTANT_Invoke-Dynamic_info The tag bit is 18 Dynamic method call points

    Constant pool distribution:

  • Access flag: Two bytes after the end of the constant pool that describe whether the Class is a Class or an interface, and whether it is modified by Public, Abstract, Final, etc. Types of access flags:

    Sign the name Flag values meaning
    ACC_PUBLIC 0X0001 publictype
    ACC_PRIVATE 0X0002 privatetype
    ACC_FINAL 0X0010 Declared asfinal, only classes can be set
    ACC_SUPER 0X0020 Using the new semantics of the Invokespecial bytecode instruction, the semantics of the Invokespecial instruction have been changed in JDK1.0.2. In order to distinguish the semantics used in this instruction, all classes compiled after JDK1.0.2 are true
    ACC_INTERFACE 0X0200 interface
    ACC_ABSTRACT 0X0400 abstractType, which is true for interfaces or abstract classes and false for other classes
    ACC_SYNTHETIC 0X1000 This class is not generated by user code
    ACC_ANNOTATION 0X2000 annotations
    ACC_ENUM 0X4000 The enumeration
  • Current class index: The two bytes after the access flag that describe the fully qualified name of the current class. These two bytes hold the value of the index in the constant pool, from which the fully qualified name of the class can be found.

  • Superclass index: The two bytes following the name of the current class, describing the fully qualified name of the parent class, as above, and also holding the index value in the constant pool.

  • Interface index: A two-byte interface counter followed by a parent class name that describes the number of interfaces implemented by that class or parent class. The next N bytes are the index values of string constants for all interface names.

  • Field tables: Field tables are used to describe variables declared in classes and interfaces, including class-level variables and instance variables, but not local variables declared inside methods.

  • Method table: after the field table, the method table is also composed of two parts. The first part is two bytes describing the number of methods. The second part provides detailed information for each method.

  • Property sheet: The last part of the bytecode that holds basic information about properties defined by a class or interface in the file.

Definition of format in bytecode files:

  • U2, u4Two bytes and four bytes respectively.
  • ClassThere are only two data types in the pseudo-structure of the class file: unsigned quantity and table.

    Unsigned numbers are basic data types,U2, u4Unsigned numbers representing two and four bytes, respectively. While the rest of theCp_info, field_info, method_info, attribute_infoIs the table.

Bytecode common tools

  • View decompiled bytecode, Idea plugin: jclasslib
  • Idea plugin: ASM ByteCode Outline

Bytecode enhancement

Bytecode enhancement is a kind of technology that can modify existing bytecode or dynamically generate new bytecode files.

JDK dynamic proxy

JDK dynamic proxies use reflection to generate an anonymous inner class that implements the interface and calls InvokeHandler to handle it before invoking the concrete method. Interface class:

public interface Demo {
	public int add(int x, int y);	
}
Copy the code

Interface implementation class:

public class DemoImpl implements Demo {
	@Override
	public int add(int x, int y) {
		returnx + y; }}Copy the code

The proxy implementation handles classes:

public class JdkProxyFactory implements InvocationHandler {  
    private Object target; // Proxy object
    public JdkProxyFactory(Object target) {  
        this.target = target; // The proxied object is passed in when the method object is constructed
    }  
    public Object createProxy(a) {  
        return Proxy.newProxyInstance(target.getClass().getClassLoader(), 
        	target.getClass().getInterfaces(), this); // Three arguments: class loader, implementation interface, invocationHandler
    }
    @Override  
    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {  
        System.out.println("jdk proxy invoke test!");  
        returnmethod.invoke(target, args); }}Copy the code

Disadvantages: To use JDK dynamic proxies, you must require the Target class to implement the interface. If you do not implement the interface, you cannot use JDK dynamic proxies. The same is true of the AOP implementation in Spring, which uses Cglib dynamic proxies if there is no such inheritance interface.

ASM

ASM can modify.class bytecode files directly or dynamically modify class behavior before classes are loaded into the JVM.

The process is that the original bytecode file is read by the ClassReader, the bytecode file is processed by the Visitor, and the new bytecode file is produced by the ClassWriter and replaced by the original bytecode file.

The coreAPI

  • ClassReader: used to read what has already been compiled.classFile.
  • ClassWriter: used to rebuild the compiled class, such as changing the class name, attributes, and methods, and to generate a bytecode file for the new class.
  • Visitor : CoreAPIProcessed from top to bottom according to the bytecode, there are differences for different regions in the bytecode fileVisitor, such as for accessing methodsMethodVisitor, used to access class variablesFieldVisitorFor accessing annotationsAnnotationVisitorAnd so on.

useASMimplementationAOP

Original Basic Categories:

public class A { 
	public void operation(a) { 
		System.out.println("operation A ..."); }}Copy the code

Rewrite Visitor class:

public class MyClassVisitor extends ClassVisitor implements Opcodes {
    public MyClassVisitor(ClassVisitor cv) { super(ASM5, cv); }
    @Override
    public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
        MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions);
        if(! name.equals("<init>") && mv ! =null) {
            mv = new MyMethodVisitor(mv); // There are two methods in class A: no-argument constructor and operation method. Constructors are not enhanced here
        }
        return mv;
    }

    class MyMethodVisitor extends MethodVisitor implements Opcodes {
        public MyMethodVisitor(MethodVisitor mv) { super(Opcodes.ASM5, mv); }
        @Override
        public void visitCode(a) {
            super.visitCode();
            mv.visitFieldInsn(GETSTATIC, "java/lang/System"."out"."Ljava/io/PrintStream;");
            mv.visitLdcInsn("start"); // 方法在,打印"end"
            mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream"."println"."(Ljava/lang/String;) V".false);
        }
        @Override
        public void visitInsn(int opcode) {
            if ((opcode >= Opcodes.IRETURN && opcode <= Opcodes.RETURN) || opcode == Opcodes.ATHROW) {
                mv.visitFieldInsn(GETSTATIC, "java/lang/System"."out"."Ljava/io/PrintStream;");
                mv.visitLdcInsn("end"); // The method prints "end" before returning
                mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream"."println"."(Ljava/lang/String;) V".false); } System.out.println(opcode); mv.visitInsn(opcode); }}}Copy the code

Main categories:

public class Generator {
    public static void main(String[] args) throws Exception {
        / / read
       	ClassReader classReader = new ClassReader("cn.vgbhfive.bytecodedemo.A");
        ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_MAXS);
        / / processing
        ClassVisitor classVisitor = new MyClassVisitor(classWriter);
        classReader.accept(classVisitor, ClassReader.SKIP_DEBUG);
        byte[] data = classWriter.toByteArray();
        / / output
        File f = new File("/bin/cn/vgbhfive/bytecodedemo/A.class");
        FileOutputStream fout = new FileOutputStream(f);
        fout.write(data);
        fout.close();
        System.out.println("generator A success!!!!!"); }}Copy the code

Output:

start
operation A ...
end
Copy the code

Javassist

ASM works with bytecode at the instruction level, and the most intuitive thing to realize after looking at all of this and getting started with it is that frameworks for manipulating bytecode at the instruction level are somewhat arcane to implement. Then there’s another class of frameworks: Javassist, a framework that emphasizes source level manipulation of bytecode.

When you implement bytecode enhancement with Javassist, you don’t have to worry about the rigid structure of bytecode, and the advantage is that it’s easy to program. You can dynamically change the structure of a class or generate a class on the fly using Java encoded form without needing to understand virtual machine instructions.

The coreAPI

  • CtClass ( compile-time class) : compile-time class information, which is aClassAn abstract representation of a file in code can be obtained by the fully qualified name of a classCtClassObject to represent the class file.
  • ClassPool: From a development perspective,ClassPoolIt’s a saveCtClassThe information ofHashMap.keyAs the name of the class,valueIs corresponding to the class nameCtClassObject.
  • CtMethod: a method in a class.
  • CtField: An attribute in a class.

Demo

public class JavassistTest {
    public static void main(String[] args) throws NotFoundException, CannotCompileException, IllegalAccessException, InstantiationException, IOException {
        ClassPool cp = ClassPool.getDefault();
        CtClass aa = cp.get("cn.vgbhfive.bytecodedemo.A");
        CtMethod m = aa.getDeclaredMethod("operation");
        m.insertBefore("{ System.out.println(\"start\"); }");
        m.insertAfter("{ System.out.println(\"end\"); }");
        Class c = aa.toClass();
        aa.writeFile("F://workSpace/projects"); A a = (A) c.newInstance(); a.operation(); }}Copy the code

Output:

start
operation A ...
end
Copy the code

Overloading of runtime classes

Above, we solved how to override methods in a class using bytecode files. This raises the question of how to reload the modified bytecode files in a running JVM.

Instrument

Instrument is a library provided by the JVM that modifies loaded classes and provides support for staking services written specifically in the Java language. It depends on the Attach API mechanism of JVMTI.

To use Instrument’s class modification capabilities, you need to implement its ClassFileTransformer interface and redefine a ClassFileTransformer. The transform() method in the interface is called when the class file is loaded, and this method can be used to rewrite or replace the bytecode passed in using ASM or Javassist as described above, eventually generating a new bytecode array and returning it.

New class file converter:

public class TestTransformer implements ClassFileTransformer {
    @Override
    public byte[] transform(ClassLoader loader, String className, Class<? > classBeingRedefined, ProtectionDomain protectionDomain,byte[] classfileBuffer) {
        try {
            ClassPool cp = ClassPool.getDefault();
	        CtClass aa = cp.get("cn.vgbhfive.bytecodedemo.A");
	        CtMethod m = aa.getDeclaredMethod("operation");
	        m.insertBefore("{ System.out.println(\"start\"); }");
	        m.insertAfter("{ System.out.println(\"end\"); }");
            return aa.toBytecode();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null; }}Copy the code

After the new classfile converter is available, we will also need an Agent to inject Instrument into the JVM with the power of the Agent. When the Agent is attached to a JVM, bytecode-like substitution is performed and the JVM is reloaded.

public class TestAgent {
    public static void agentmain(String args, Instrumentation inst) {
        inst.addTransformer(new TestTransformer(), true); // Specify our own Transformer where we use Javassist for bytecode substitution
        try {
            inst.retransformClasses(Base.class); // Redefine the class and load the new bytecode
            System.out.println("Agent Load Done.");
        } catch (Exception e) {
            System.out.println("agent load failed!"); }}}Copy the code

JVMTI (JVM TOOL INTERFACE) is a set of TOOL interfaces provided by the JVM to operate on the JVM. JVMTI enables multiple operations on the JVM. For example, by registering various event hook subroutines through the interface, predefined hook subroutines are also fired when JVM events are triggered to realize the response to individual JVM events. Events include class file loading, exception generation and catching, thread start and end, critical section entry and exit, member variable modification, GC start and end, method call entry and exit, critical section contention and wait, VM start and exit, and so on.

The Attach API provides the ability for the JVM to communicate between processes.

This is not the point of this chapter, so it is not clear. If you want to read it, you can go there yourselfTake a look at!


Common scenario

  • Hot deployment: Modify online services without deploying services. You can perform operations such as logging and adding logs.
  • Mock: Testing for certain servicesMock
  • Performance diagnostic tools: for examplebTraceIs the useInstrumentFor non-intrusive tracking of a runningJVMTo monitor state information at the class and method levels.

conclusion

For programming needs to always be humble, good at asking questions, but also according to the problem to find the corresponding solution ideas, forming a closed loop. This continuous closed loop will build their own knowledge system. Another point is the need to be good at summarizing the knowledge they have mastered, to achieve output capacity balance.


reference

Bytecode enhancement technology exploration “Deep Understanding Java Virtual Machine” part 3 Java bytecode technology (2) Bytecode enhancement ASM, JavaAssist, Agent, Instrumentation


Personal note

The contents of this blog are all notes made by the author to learn, invade and delete! If used for other purposes, please specify the source!