This is the 18th day of my participation in Gwen Challenge

Origin of things

Because there are always some differences between production and development environments, the most common is the amount of data. As a result, some lag problems that cannot be found in the test environment will occur in the production environment, especially in small teams, there are basically no test positions, and most of the lag problems in the production environment will be exposed after the launch.

Here’s the question:

  1. How to locate problem code while the system is running online?
  2. How can I avoid modifying the original code to locate the problem code?

The most basic positioning method is to locate the interface of Carton and calculate the response time of each method by printing time stamps before and after each method request.

What’s wrong with this basic approach?

  1. How to reproduce the production environment if the developer has no access to it.
  2. Locating by response timestamp is inefficient if many methods are called.

Arthas tools

Arthas’s official website

Arthas is an open source Java diagnostic tool from Alibaba. Arthas, in addition to solving the problem of locating interface card failure described above, also solves the following problems [1] :

  1. From which JAR is this class loaded? Why are all kinds of class-related exceptions reported?
  2. Why didn’t the code I changed execute? Did I not commit? Got the branch wrong?
  3. If you encounter a problem, you cannot debug it online. Can you only re-publish it by logging?
  4. There is a problem with a user’s data processing online, but it cannot be debugged online, and it cannot be reproduced offline!
  5. Is there a global view of the health of the system?
  6. Is there any way to monitor the real-time health of the JVM?
  7. How to quickly locate application hot spots, generate flame map?
  8. How do I find an instance of a class directly from the JVM?

A few common commands:

Watch: method performs data observations

Monitor: method performs monitoring

Trace: The internal invocation path of a method and prints the time spent on each node along the method path

Stack: Outputs the call path of the current method

Tt: a space-time tunnel of method execution data, recording the input and return information of each call to a specified method, and observing these different time calls

Arthas can also generate fire diagrams, which we won’t go into detail in this article.

Start the arthas

After downloading arthas, command to open the corresponding folder and start arthas with the following command:

java -jar arthas-boot.jar
Copy the code

Trace command

This section takes a look at how to trace the execution time of each method.

Trace online documents

The most basic use of trace is to listen for the method call path and the elapsed time of each method:

trace class-pattern method-pattern
Copy the code

As shown, we listening class com. XXXX. Productmanage. The index of ProductManageController/interface () method:

trace com.xxxx.productmanage.ProductManageController index
Copy the code

The effect is as follows:

We request the interface corresponding to the index() method: the browser refreshes the specified page.

View the command line output:

We can clearly see the execution time of each method (it seems that the command line language setting caused some symbol exceptions), so we can easily locate the execution time of the method.

in-depth

As you can see from the above command, you can only see the execution time of each method in the current class. What if you want to drill into the corresponding methods of other classes?

For example, we want to further listen for the internal call to the getCompanys method, which takes a long time.

Method 1

 trace - E class1|class2 method1|method2
Copy the code

If the above problems are located, we shall execute:

trace -E com.xxxxx.ProductManageController|com.xxxx.ProductManageService index|getCompanys
Copy the code

The output is:

Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 2) cost in 147 ms, listenerId: 6
Copy the code

Then request the interface again, and the output result is as follows:

[arthas@18804]$trace -E com.highmall.suppliermanage.productmanage.ProductManageController|com.highmall.suppliermanage.productmanage.ProductManag eService index|getCompanysPress Q or Ctrl+C to abort. Affect(class count: 2 , method count: 2) cost in 147 ms, listenerId: 6 `---ts=2021-06-17 18:24:01; thread_name=XNIO-1 task-3; id=55; is_daemon=false; priority=5; TCCL = com. Jfinal. Server. Undertow. Hotswap. HotSwapClassLoader @ 6156496 ` - [73.303623] ms Com. Highmall. Suppliermanage. Productmanage. ProductManageController: the index () + - [0.080517] ms Com. XXXX. Productmanage. ProductManageController: getHeader () # 46 + - [0.223587] ms Com. XXXX. Productmanage. ProductManageController: getParaToInt () # 47 + - [0.034004] ms Com. XXXX. Productmanage. ProductManageController: getParaToInt () # 48 + - [0.029192] ms Com. XXXX. Productmanage. ProductManageController: getPara () # 49 + - [0.030154] ms Com. XXXX. Productmanage. ProductManageController: getParaToInt () # 50 + - [71.959859] ms Com. XXXX. ProductManageService: getCompanys () # 51 | ` - [71.861378] ms Com. XXXX. Productmanage. ProductManageService: getCompanys () | + -- - [38.061978] ms Com. XXXX. Productmanage. ProductManageService: haveManageAllCompanyRight () # 157 | + -- - [0.083083] ms Com. Jfinal. Kit. StrKit: notBlank () # 162 | ` - [32.933605] ms com. Jfinal. Plugin. Activerecord. Db: paginate (# 172) ` - [0.446852] ms com. XXXX. Productmanage. ProductManageController: renderAppJson () # 52Copy the code

Screenshot below:

Method 2

I haven’t recreated this method, whether it’s an operational issue or an arthas version issue, but the original operational documentation has this part and we’ll keep it for now.

We need to open another command line window and execute the command:

telnet localhost 3658
Copy the code

Link to the Arthas we are executing, and then execute the following command to add a listener:

trace com.xxxxxx.productmanage.ProductManageService getCompanys --listenerId 1
Copy the code

Then leave our original command window and re-invoke the interface to see the effect:

The first terminal still outputs only one layer, IT NOT WORK FOR ME!!

[1] What can Arthas do for you