• Java object behavior

  • java.lang.instrument.Instrumentation

  • Direct manipulation of bytecode

  • BTrace

  • Arthas

  • Our vision-be


In the distant planet of Siesi Java country sevo city, two young programmers are in trouble for a thing, the program has a problem, temporarily can not see where the problem is, so there is the following dialogue:

“Debug it.”

“Online machine, no Debug port open.”

“Look at the log. What is the request value and what is the return value?”

“That code didn’t print a log.”

“Change the code, log it, and release it again.”

“Suspected thread pool problem, restart will destroy the scene.”

After a few dozen seconds of silence: “It is said that the highest level of troubleshooting is to find a problem by reviewing the code.”

Silence dozens of times longer than dozens of seconds: “AFTER polling that code 17 times, I finally came to a conclusion.”

“The conclusion?”

“I haven’t gotten to the point where I can find problems just by reviewing code.”

Java object behavior

The problem at the beginning of this article is essentially the problem of dynamically changing the behavior of existing objects in memory.

So, it’s important to figure out where the JVM has to do with the behavior of objects, and if it’s possible to change it.

Objects use two things to describe things: behavior and properties.

Here’s an example:

publicclass Person{ privateint age; private String name; public void speak(String str) { System.out.println(str); } public Person(int age, String name) { this.age = age; this.name = name; }}Copy the code

In the Person class above, age and name are attributes, and speak is behavior. An object is an instance of a class, and the properties of each object belong to the object itself, but the behavior of each object is common. For example, let’s say we now create two objects, personA and personB, based on the Person class:

Person personA = new Person(43, "lixunhuan"); PersonA. Speak (" I am Li Xunhuan "); Person personB = new Person(23, "afei"); Personb.speak (" I am Alfy ");Copy the code

PersonA and personB have their own names and ages, but have the same behavior: speak. Imagine if we were the designers of the Java language, how would we store the behavior and properties of objects?

“It’s very simple. Attributes follow objects, and each object stores a copy. Behavior is a communal thing, pulled out and put in a separate place.”

“Yi? Stripping out common parts is like code reuse.”

“The road is simple, and many things lead to the same destination.”

In other words, the first step is to find this common place to store the behavior of the object. After some searching, we found this description:

Method area is created on virtual machine startup, shared among all Java virtual machine threads and it is logically part of heap area. It stores per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors.

Java object behavior (methods, functions) is stored in the methods area.

“Where does the data in the method area come from?”

“The data in the method area is extracted from the class file at class loading time.”

“Where does the class file come from?”

“Compiled from Java or other jVM-compliant source code.”

“Where does the source code come from?”

“Nonsense, handwriting, of course!”

“Pushing backwards, writing is fine, compiling is fine, and loading… Is there a way to load an already loaded class? If so, we could modify the area of the bytecode where the target method resides and reload the class so that the behavior of the object in the method area (method) is changed without changing the properties of the object or affecting the state of the existing object, and we’d be done with that.

But doesn’t this violate the JVM’s classloading principles? After all, we don’t want to change ClassLoader.”

“Boy, can go to see the Java. Lang. Instrument. Instrumentation.”

java.lang.instrument.Instrumentation

After reviewing the documentation, we see that there are two interfaces: redefineClasses and retransformClasses. One is to redefine the class, the other is to modify the class. These two are much the same. See redefineClasses:

This method is used to replace the definition of a class without reference to the existing class file bytes, as one might do when recompiling from source for fix-and-continue debugging. Where the existing class file bytes are to be transformed (for example in bytecode instrumentation) retransformClasses should be used.

RedefineClasses replaces an existing class file by providing its own bytecode file, and retransformClasses replaces an existing bytecode file after modifying it.

Of course, it is not safe to replace classes directly at run time. Exceptions can be thrown if a new class file references a class that doesn’t exist, or if a field of a class is removed. So as documented, instrument has many limitations:

The redefinition may change method bodies, the constant pool and attributes. The redefinition must not add, remove or rename fields or methods, change the signatures of methods, or change inheritance. These restrictions maybe be lifted in future versions. The class file bytes are not checked, verified and installed until after the transformations have been applied, if the resultant bytes are in error this method will throw an exception.

What we can do is basically simply modify some behavior in the method, which is enough for the problem we started with, printing a log. Of course, there are many other useful things we can do besides printing logs through reTransform, which we’ll cover below.

So how do we get the class files we need? The easiest way to do this is to recompile the modified Java file to get the class file and call redefineClasses to replace it. But what about files that don’t have (or can’t get, or can’t easily modify) the source code? For the JVM, the source code of any JVM compliant language, whether Java or Scala, can be compiled into a class file.

The JVM operates on class files, not source code. So, in this sense, we can say that “the JVM is language agnostic.” In this case, with or without the source code, all we really need to do is modify the class file.

Direct manipulation of bytecode

Java is a language that software developers can read, class bytecode is a language that the JVM can read, and class bytecode is ultimately interpreted by the JVM as a machine-readable language. Every language is a human creation.

So in theory (and indeed in practice) man can read any of these languages, and if he can read them, he can modify them. We could skip the Java compiler and write to byte files if we wanted to, but that’s just not true of The Times, since high-level languages were designed to serve us humans and are much more efficient than machine languages.

Bytecode files are much less readable to humans than Java code. Nevertheless, some brilliant programmers have created frameworks for directly editing bytecode, providing interfaces that allow us to easily manipulate bytecode files, inject methods to modify classes, dynamically create a new class, and so on. One of the best known frameworks is ASM, which is used to manipulate bytecodes in cglib, Spring and other frameworks.

As we all know, Spring’s AOP is based on dynamic proxy implementation, which dynamically creates proxy classes at runtime that reference proxied classes and perform mysterious operations before and after proxied methods are executed.

So how does Spring create proxy classes at run time? The beauty of dynamic proxies is that instead of manually writing proxy class code for every class that needs to be proxied, Spring dynamically creates a class as needed at run time. Instead of writing a Java file from a string, compiling it into a class file, and then loading it. Spring simply “creates” a class file and loads it. The tool for creating class files is ASM.

So far, we know that we can use the ASM framework to directly manipulate the class file, add a piece of code to the class to print the log, and then retransform.

BTrace

So far, we’ve stayed at the level of theoretical description. So how do you do that? Let’s start with a few questions:

  1. In our project, who is going to do the action of finding the bytecode, modifying the bytecode, and then retransform? We are not prophets, and it is impossible to know whether we will encounter the problem at the beginning of this article in the future. Given the cost performance, it is not possible to develop a piece of code that does these modifications and reloads bytecodes in every project.

  2. What if the JVM is not local, but remote?

  3. What if you can’t even use ASM? Could it be more generic, more “goofy”?

Fortunately, thanks to BTrace, we don’t have to write our own. What is BTrace? BTrace is open source and the project description is extremely brief:

A safe, dynamic tracing tool for the Java platform.

BTrace is a secure Java language-based tool that provides dynamic tracing services. BTrace is developed based on ASM, Java Attach API, Instrument, and provides many annotations for users. With these annotations, we can write BTrace scripts (simple Java code) to do what we want, without getting caught up in ASM’s manipulation of bytecode.

Look at a simple example provided by BTrace: Intercept methods starting with read in all classes in the java.io package and print the class name, method name, and parameter name. When the IO load of the program is high, it is convenient to see which classes are causing the problem in the output information.

package com.sun.btrace.samples; import com.sun.btrace.annotations.*; import com.sun.btrace.AnyType; importstatic com.sun.btrace.BTraceUtils.*; /** * This sample demonstrates regular expression * probe matching and getting input arguments * as an array - so that any overload variant * can be traced in "one place". This example * traces any "readXX" method on any class in * java.io  package. Probed class, method and arg * array is printed in the action. */ @BTracepublicclass ArgArray { @OnMethod( clazz="/java\\.io\\.. */", method="/read.*/" ) public static void anyRead(@ProbeClassName String pcn, @ProbeMethodName String pmn, AnyType[] args) { println(pcn); println(pmn); printArray(args); }}Copy the code

Let’s look at another example: print the number of threads created until now every 2 seconds.

package com.sun.btrace.samples;

import com.sun.btrace.annotations.*;
importstatic com.sun.btrace.BTraceUtils.*;
import com.sun.btrace.annotations.Export;

/**
 * This sample creates a jvmstat counter and
 * increments it everytime Thread.start() is
 * called. This thread count may be accessed
 * from outside the process. The @Export annotated
 * fields are mapped to jvmstat counters. The counter
 * name is "btrace." + <className> + "." + <fieldName>
 */
@BTracepublicclass ThreadCounter {

    // create a jvmstat counter using @Export
    @Exportprivatestaticlong count;

    @OnMethod(
        clazz="java.lang.Thread",
        method="start"
    )
    public static void onnewThread(@Self Thread t) {
        // updating counter is easy. Just assign to
        // the static field!
        count++;
    }

    @OnTimer(2000)
    public static void ontimer() {
        // we can access counter as "count" as well
        // as from jvmstat counter directly.
        println(count);
        // or equivalently ...
        println(Counters.perfLong("btrace.com.sun.btrace.samples.ThreadCounter.count"));
    }
}

Copy the code

Read the above usage is not inspired? A lot of ideas popped up. See when a HashMap triggers a Rehash, how many elements are in the container at that point, and so on.

With BTrace, the problem at the beginning of this article can be solved perfectly. As for the specific functions of BTrace and how to write the script, there are a lot of explanations and examples in the BTrace project on Git, and the online articles introducing the use of BTrace are the number of Ganges river sand, which will not be described here.

We understand the principles, we have the tools to support them, and it’s up to us to be creative and use them in the right context.

Since BTrace solves all of the problems we’ve mentioned above, what is the architecture of BTrace?

BTrace mainly has the following modules:

  1. BTrace scripts: With annotations defined by BTrace, we can easily develop scripts as needed.

  2. Compiler: Compiles BTrace scripts into BTrace class files.

  3. Client: sends a class file to the Agent.

  4. Agent: Java-based Attach API. Agent can dynamically Attach to a running JVM, start a BTrace Server, and receive BTrace scripts from the client. Parse the script and find the class to modify based on the rules in the script; After modifying the bytecode, invoke the ReTransform interface of the Java Instrument to modify the object behavior and make the modification take effect.

The overall BTrace architecture is roughly as follows:

Btrace workflow

BTrace ultimately implements class replacement by Instrument. As mentioned above, for security reasons, there are many restrictions on the use of Instrument, and BTrace is no exception. BTrace is “read-only” to the JVM, so the limitations of BTrace scripts are as follows:

  1. Object creation is not allowed

  2. Creating arrays is not allowed

  3. Exceptions are not allowed

  4. Catch exceptions are not allowed

  5. Not allowed to call other object or class of methods, only allow you to call com. Sun. Btrace. BTraceUtils provided in the static method (some data processing and information output tools)

  6. Class attributes are not allowed to change

  7. Member variables and methods are not allowed, only static public void methods are allowed

  8. Inner classes, nested classes are not allowed

  9. Synchronization methods and blocks are not allowed

  10. Loops are not allowed

  11. No arbitrary inheritance of other classes (except, of course, java.lang.object)

  12. Interface implementation is not allowed

  13. Assert is not allowed

  14. Class objects are not allowed

So many restrictions, actually understandable. What BTrace does is change the bytecode, but it doesn’t affect the entire program, except to output the required information.

Arthas

BTrace scripts have a cost of learning to use, and it would be nice to be able to encapsulate some common functions and provide simple commands directly to the outside world. Alibaba’s engineers have long thought of this, and last year alibaba opened its own Java diagnostics tool, Arthas

Arthas provides simple command line operations and is powerful. The technical principles behind it are roughly the same as those mentioned in this article. Arthas documentation is quite comprehensive, but you can click here for more details.

The purpose of this article is to explain the ins and out of Java dynamic tracking technology, and after mastering the principles behind the technology, you can develop your own “frozen throne” if you wish.

Our vision-be

Now, let’s try to “look down” on these problems from higher ground.

Java Instrument holds promise for runtime dynamic tracing, the Attach API provides an “entry point” for runtime dynamic tracing, and ASM makes it much easier for “humans” to manipulate Java bytecode.

Based on the Instrument and Attach APIS, predecessors have created tools such as JProfiler, Jvisualvm, and BTrace. Based on ASM, cGLIb, dynamic proxy, and Spring AOP are widely used.

Java is a static language and does not allow data structures to change at runtime. However, after Java 5 introduced Instrument and Java 6 introduced the Attach API, things started to change. Although there are many limitations, however, under the efforts of the predecessors, only the use of reserved little space similar to “read only”, still created a variety of brilliant technology, greatly improving the efficiency of software developers to locate problems.

Computer should be one of the greatest inventions in the history of human beings, from electromagnetic induction, magnetic generation of electricity, to high and low voltage simulation of bits of 0 and 1, to binary representation of several basic types, to the basic type representation of infinite objects, and finally infinite object combination interaction simulation of real life and even the entire universe.

Two thousand five hundred years ago, the Tao Te Ching said, “Tao gives birth to one, life to two, two gives birth to three, and three gives birth to everything.”

Two thousand five hundred years from now, computers will probably do the same.