Hello, Arthas | August more challenges

This is the 7th day of my participation in the August Text Challenge.More challenges in August

Start with a question

The current service response is slow. What is it doing? Is it the thread that consumes the most CPU?

Machine monitoring – there happens to be no dimension where threads occupy CPU
Top-hp PID, find the id of the thread that occupies the most threads in the operating system

Printf ‘x \ % n’ threadId, will check the decimal number to a hexadecimal jstack pid | grep threadId, finally found which thread CPU high with Arthas later? Tip: The following is the body of this article, the following cases for reference

What is Arthas?

Arthas is an open source Java diagnostic tool for Alibaba. The most worrying thing about R&D is that there is a problem with the production environment, but you don’t know where the problem is. The test environment did not have a similar problem, and the logs did not show the problem. Arthas is designed to troubleshoot problems online without a restart; Dynamically trace Java code; Monitoring the state of the JVM in real time is a gateway for programmers to observe their programs, giving them the ability to troubleshoot problems in real time. Why can it troubleshoot problems in real time without restarting? Java has added the Instrument package since 1.5, which provides two entrances: AgentMain and PreMain. 1.6 Era also increased the capability of ATTACH during operation. Arthas uses this technique to modify the bytecode in the front method to enhance the behavior of the code. What else can instrument solve, given its blessing?

Running dashboard Commands

Using the thread command

-b Displays the number of threads that are locked and blocked. -n displays the number of threads, sorted by CPU usage

Let’s do another problem

I changed some code, why didn’t it execute after deployment? Did I not commit? Got the branch wrong?

Log, re-release Log – Usually code changes, release, wait for deployment, find log problems, fix bugs and re-launch.
Go to the Webapps directory to find the corresponding class or JAR package, download it to the local computer and open it with JAD

Using jad

Decompiling specified loaded class source jad com. Facishare. Paas. Foundation. The boot. Injector

Can you do a little more?

Static variables in the program, the execution of the discovery seems not expected, how to check? getstatic

Can you do a little more? For example, an interface request is very slow. Why is it slow?

The use of the trace

Used to view the time spent on a single call link

Trace – n 10 org.apache.com mons. Lang. StringUtils isBlank

Command | | 10 times the name of the class | method trace * StringUtils isBlank

Command | regular | | class names regular matching method of trace of regular matching technology based on byte code technology, thus inevitably added to link every node time-consuming. Therefore, do not leave trace on all the time during production, and do not directly compare the trace with the time data of the pressure test.

Can you do a little more? The result of processing user data is not as expected, so what are the return values of key function input parameters?

tt

For one of the most basic uses, it is to record the environment scene for each invocation of the current method. When there is a problem, record all the incoming arguments and return values of the method call, and the exceptions thrown, so that troubleshooting can be targeted

However, many frameworks sneak environment variable information into the ThreadLocal of the calling thread, and since the calling thread has changed, the ThreadLocal thread information cannot be stored through Arthas, so this information will be lost.

The tt command saves object references to the current environment, but only one reference. If the input parameter has been changed internally or the object returned has undergone subsequent processing, the tt view will not see the most accurate value at that time. So the main thing we recommend is the watch command

-watch

Allows you to easily observe the invocation of the specified method. The range that can be observed is as follows: return value returnObj, throw exception throwExp, input parameter params, object target, time cost can be used in ogNL expressions

watch -n 1 com.facishare.social.abs.AbstractStandardSocialService batchObject2Feed “{params,returnObj}” ‘#cost>200’ -x 2

Command | | | common parameter class name the method name | | ognl expression conditions, optional | can also continue to add common parameters General command support – n, said several times, avoid matching to a hot function, lead to constantly refresh screen, stop can’t stop All the words of general can output object information support – x, Represents the hierarchy of display objects.

Watch also supports -e, which matches exceptions thrown

Can you do a little more? How do you see where the CPU hot spots are in your application right now?

Use profiler/ flame charts

Profiler – Samples the application using async-profiler to generate a flame map

profiler start -d 300

Can you do a little more? Use ogNL expressions in Arthas

Ognl basis

The official entry: commons.apache.org/proper/comm… Expressions are parsed against the current object and can be used in several ways:

Access attributes, such as name or name.text
Calling methods, such as hashCode()
Array access, such as Listeners [0]

Arthas is so good, let’s use it!

Attention! Arthas class enhancements are based on bytecode technology, which means that JIT compiled caches fail and performance degrades. Enhancements to Java’s own classes, such as toString enhancements that call toString, can cause havoc with the JVM. Trace can easily help you locate and find performance defects due to high response times, but it is best to trace only one method call link at a time, because the more trace at the same time, the greater the performance impact. If the interface is called too frequently, too many data variables will be recorded in memory, such as TT instructions. It is recommended to add -n parameter to limit the output times. If too many commands are output, the screen keeps flushing. The current commands cannot be terminated, which affects the service performance. The * wildcard is used improperly in commands that support regex and the range is too large. When using asynchronous tasks, do not enable too many background asynchronous commands at the same time. Otherwise, the performance of the target JVM may be affected. It is best to use stop to close the Arthas service for listening and restore the enhanced classes.