The author | r&d engineers in the great central plains of the banking system, currently in the middleware technology platform chamber group work in the distributed cache, message queue, etc.

Arthas official Community is holding an essay contest to win prizes. Click to submit.

Arthas is an open source diagnostic tool for Java applications. Due to its powerful troubleshooting and diagnosis capabilities, Arthas has been widely followed and used by developers since it was opened to the public, topped GitHub Trending several times, and has been recommended and shared by many technical media in China.

I. Customization function transformation

Arthas provides access to running JVMS in a simple command interaction mode to quickly locate and diagnose online application execution problems. This command dynamically modifies codes in real time and takes effect in real time without restarting services. The specific working principle is as follows:

1. Attach TO JVM: Attach to running JVM by attach PID via attach mechanism; 2. Viewing and Modifying JVM bytecode: Implement enhanced logic by attaching or modifying bytecode to a running JVM using instrument technology.Copy the code

At the end of 2018, Centaline Bank began to put in staff to conduct research on Arthas, to understand the main functions in the open source community, and to understand the overall engineering structure by reading the Arthas project outline. The implementation process is as follows: Arthas base calls rt.jar package ManagementFactory to get the entire JVM internal information, and interacts with the back end through command integration, execution, and return results. The whole project is simple and clear and easy to use.

In early 2019, the Centaline Bank technology team began using Arthas promotion to locate and diagnose problems online.

In order to protect our customers’ sensitive information and ensure the stable operation of our business systems in the production environment, we have customized some of Arthas’s functions to hide some of its commands:

1. Watch: The watch method can see the input parameter and return value of the method without printing logs, which may expose sensitive information of customers; 2. MC, RE-define: MC combination RDEFINE can carry out hot update to the code, which cannot meet the requirements of our bank's production operation management specifications.Copy the code

At the same time, for the purpose of use, custom developed gc and other commands:

1. Gc: real-time dynamic display of garbage percentage of young generation and old generation, recycling times and time consumption, etc.Copy the code

Going forward, we plan to use Arthas for problem identification and location diagnosis in all development test environments and some production environments. The use of Arthas was also promoted internally through technology sharing to the industry’s application development teams.

Two. Focus on the use of functions

Arthas has a number of powerful features that the Centaline Bank technical team loves in addition to the methods used to troubleshoot everyday problems.

1.target-ip

Arthas only listens 127.0.0.1 if you don't specify an IP address, so if you want to connect remotely, you can specify the IP address for listen using the --target-ip parameter
java -jar arthas-boot.jar --target-ip IP
Copy the code

With a remote access IP bound, Arthas can be remotely connected to through Telnet or HTTP for troubleshooting.

Web address: IP address: 8563

/ Telnet access: IP: 3658

When a problem occurs with an online application, the problem machine can be isolated and Arthas can specify target-IP at startup so that multiple technicians can simultaneously troubleshoot the problem through a remote connection.

2.trace

View the internal invocation path of the method and print the time spent on each node in the method path
trace ClassName methodName
Copy the code

Using the trace command allows you to track where the elapsed time is layer by layer, which is useful for performance tuning.

3.ognl

Ognl is an open source expression language applied in Java, the role is to access data, it has type conversion, access object methods, operation set objects and other functions, through OGNL can complete some powerful operations.

  • Executing static methods
Call static methods using ogNLOgnl "@class name @Method name (parameter)"Copy the code
  • Get static properties
Get static attributes using ogNLOgnl "@Class name @Attribute Name"Copy the code
  • Example: Change the log level
Find the classloader hashcode for the current class
sc -dThe name of the class | grep classLoaderHashGet logger with OGNL
ognl -c ***** '@ class name @ logger'
Set logger level for this class separately
ognl -c ***** '@ class name @ logger. SetLevel (@ ch. Qos. Logback. Classic. Level @ the DEBUG)'
Set logger level globally
ognl -c ***** '@org.slf4j.LoggerFactory@getLogger("root").setLevel(@ch.qos.logback.classic.Level@DEBUG)'
Copy the code

4.gc

Gc is a custom developed feature of our bank, derived from the jstat -gcutil pid timeinterval command, where the PID can be obtained from Arthas. Timeinterval (in milliseconds) represents the gc interval, 1s by default.

Timeinterval specifies the interval in milliseconds (default: 1S).
gc -i timeinterval -n 5
Copy the code

Iii. Application and practice cases

Here are some examples of our Arthas application practices (due to confidentiality requirements of in-line code, the examples below are all examples of code written for scenario reproduction)

Case 1: The system CPU usage is high

Fault Description: The service personnel reported that the response time of one page in the background management system is very long. The CPU usage of the server is high, reaching about 80%.

1. Start Arthas and attach it to the appropriate Java process

Note: Arthas starts with the same startup user as the Java process.

# start Arthasjava -jar arthas-boot.jar [INFO] arthas-boot version: 3.2.0 [INFO] Found Existing Java process, please choose one and input the serial number of the process, eg: 1. Then hit ENTER. * [1]: 11360 org.gradle.launcher.daemon.bootstrap.GradleDaemon [2]: 12196 com.durian.ddp.Application# Select the Java process number to attach2...Copy the code

2. Run the thread command to view the threads with high CPU usage

Start Arthas, attach it to the corresponding Java process, and run Thread-n 5 to see the stack for the top five threads with the highest CPU usage.

# look at the top 5 threads with the highest CPU usagethread -n 5 at ***.TreeUtil.findMenuChildren(TreeUtil.java:94) at ***.TreeUtil.findMenuChildren(TreeUtil.java:92) at ***.TreeUtil.findMenuChildren(TreeUtil.java:92) at ***.TreeUtil.findMenuChildren(TreeUtil.java:92) at ***.TreeUtil.recursiveTree(TreeUtil.java:74) at ***.getOwnerDeparmentTree(DepartmentServiceImpl.java:550) ... at ***.TreeUtil.findMenuChildren(TreeUtil.java:94) at ***.TreeUtil.findMenuChildren(TreeUtil.java:92) at ***.TreeUtil.findMenuChildren(TreeUtil.java:92) at ***.TreeUtil.findMenuChildren(TreeUtil.java:92) at ***.TreeUtil.recursiveTree(TreeUtil.java:74) at ***.getOwnerDeparmentTree(DepartmentServiceImpl.java:550) ... .Copy the code

3. Run the monitor command to view the call times and time of the method

The thread command is used to locate the findMenuChildren method whose CPU is mainly consumed by TreeUtil. The monitor command is used to view the number of times and time spent on calling the method.

# 5s is a statistical period, counting the time spent on the findMenuChildren method in TreeUtil
monitor -c 5 ***.TreeUtil findMenuChildren
Copy the code

Through the monitor command, it can be seen that the average invocation time of this method is 17-20ms. However, the invocation times are many, so the overall page response is slow.

4. Decompile TreeUtil using JAD to view the source code

[arthas@12196]$ jad com.durian.ddp.utils.TreeUtil
ClassLoader:
+-sun.misc.Launcher$AppClassLoader@18b4aac2
  +-sun.misc.Launcher$ExtClassLoader@244038d0
Location:
/*
 * Decompiled with CFR.
 *
 * Could not load the following classes:
 *  ***.ResourceTreeVo
 */
...
public class TreeUtil {
    public static ResourceTreeVo findMenuChildren(ResourceTreeVo resourceTreeVo, List<ResourceTreeVo> treeNodes) {
        for (ResourceTreeVo resource : treeNodes) {
            if(! resourceTreeVo.getResourceId().equals(resource.getResourceParentId()))continue;
            if (resourceTreeVo.getChildResourceVo() == null) {
                resourceTreeVo.setChildResourceVo(new ArrayList());
            }
            resourceTreeVo.getChildResourceVo().add(TreeUtil.findMenuChildren(resource, treeNodes));
        }
        return resourceTreeVo;
    }
    public static List<ResourceTreeVo> recursiveTree(List<ResourceTreeVo> list) {
        ArrayList<ResourceTreeVo> trees = new ArrayList<ResourceTreeVo>();
        for (ResourceTreeVo treeNode : list) {
            if(! StringUtils.isEmpty(treeNode.getResourceParentId()))continue;
            trees.add(TreeUtil.findMenuChildren(treeNode, list));
        }
        returntrees; }}Copy the code

Looking at the source code using the JAD command, you can see that the business logic here is roughly building a tree from a list using the resourceParentId field of the ResourceTreeVo object. There are recursive calls in the findMenuChildren method, and each call requires traversing the entire ResourceTreeVo list to find child nodes in O(n) time. This can be time consuming when the ResourceTreeVo list is full of elements.

5. Solve problems

To solve the problem, you can build a map of parentId-> list

based on the list in advance. When each node searches for the list of child nodes, it can be obtained from the map. So the time algorithm for the whole tree construction is O(n).

Case 2: The number of application thread connections is abnormal

Symptom: The number of server handles is exhausted, and an application occupies too many handles.

1. Run the thread command to view information about threads

Start Arthas, attach to the corresponding Java process, and execute Thread to see what the thread is doing.

# Check thread status
thread 
Copy the code

See a large number of MasterListener- myMaster -* threads in the connected state, has not been released.

Select one of the threads to view stack information
thread id  
Copy the code

Found that the threads by redis. Clients. JedisSentinelPool $MasterListener, then came to check the JedisSentinelPool $MasterListener calls.

2. Run the stack command to view the stack information

stack redis.clients.jedis.JedisSentinelPool$MasterListener
Copy the code

Triggers an application request and prints the following stack information:

GetJedis calls JedisSentinelPool$MasterListener on each request. Next we decompress the REedisUtil class.

3. Decompile the jad command to view the code

# decompile the RedisUtil class
jad cn.com.zybank.testredis.starter.RedisUtil
Copy the code

Looking at the getJedis method, you can see that every time getJedis is called, a New JedisSentinelPool is created.

Every time redis is used, the getJedis method is called to create a new JedisSentinelPool, which starts a masterListen-MyMaster -* thread. Since this thread will always be listening, it will not be released automatically. Therefore, as the application requests increase, the number of threads keeps increasing, resulting in the connection count is full.

4. Solve problems

To solve this problem, you simply create a global JedisSentinelPool from which you fetch a Redis connection every time you fetch it; the code is not shown here.

Fourth, summarize the suggestions

Before Arthas was used, online troubleshooting often required checking network, JPS, JSTACK, JMAP, Jhat, jstat, hprof and other operations, which took time and effort. Currently, most common problems can be easily located and quickly resolved using Arthas.

One click to install and start Arthas

  • Method 1: Implement Arthas one-click remote diagnosis using Cloud Toolkit

Cloud Toolkit is a free local IDE plug-in released by AliYun to help developers develop, test, diagnose and deploy applications more efficiently. Plugins enable one-click deployment of native applications to any server, even the cloud (ECS, EDAS, ACK, ACR, applets, etc.); There are also built-in Arthas diagnostics, Dubbo tools, Terminal terminals, file uploads, function calculations, and MySQL executators. Not only the mainstream version of IntelliJ IDEA, but also Eclipse, Pycharm, Maven and others.

It is recommended to download Cloud Toolkit using the IDEA plugin to use Arthas: t.tb.cn/2A5CbHWveOX…

  • Method 2: Download directly

Address: github.com/alibaba/art… .

As Arthas has been used more comprehensively, we have found some features that need to be improved and we hope to further improve them.

  • When tracing, as long as there is asynchrony in the call chain, the stack will break, and the child thread cannot trace, so you have to follow up the trace layer by layer manually, which is inefficient.
  • The tt command is expected to add the asynchronous switch. If the switch is enabled, COST will show how long it takes to get the result asynchronously.

Arthas GitHub Star has reached 24,000 so far. We hope Arthas will gain more attention and love from developers around the world, and we look forward to more domestic quality projects like Arthas being open sourced.

Arthas’s essay campaign is in full swing

Arthas is officially holding an essay call if you have:

  • Problems identified using Arthas
  • Source code interpretation of Arthas
  • Advise Arthas
  • No limit, other Arthas related content

Welcome to participate in the essay activity, there are prizes to win oh ~ click to submit

“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”