Author: the reds no. 7 link: https://yq.aliyun.com/articles/69520?utm_content=m_10360Copy the code

This is an article from ali internal technology forum, the original text in Ali internal praise. The author has made this article available to the cloud community for extranet access. Hollis made some cuts to the article, including the introduction of the tools that are only available inside Alibaba and the links that can only be accessed through Alibaba’s internal network.


At ordinary times often encounter a lot of problems in the work process, at the same time of problem solving, there are some tools play a considerable role, write down here, it is as notes, can let oneself forget quickly through follow-up, 2 it is to share, want to see this students can show their daily feel great help tool, everybody progresses together.

Enough gossip, let’s do it.


tail

Tail -f is the most commonly used

Tail-300f shopbase.log # count down 300 lines and enter real-time listening file write modeCopy the code

grep

Grep forest f.t # xt file find grep forest f.t xt CPF. # TXT file to find more grep 'log'/home/admin - r - n # directory to find all the documents as keyword cat f.t xt | grep -i Shopbase grep 'shopbase' /home/admin -r-n --include *.{vm, Java} # grep 'shopbase' /home/admin -r-n --exclude *. # {vm, Java} the matching seq 10 | grep on 3 # 5 - A match seq 10 | grep matching seq # 5-3 B under 10 | grep 3 # 5 - C match, At ordinary times agreed with the cat f.t xt | grep -c 'SHOPBASE'Copy the code

awk

1 Basic Commands

awk '{print $4,$6}' f.txtawk '{print NR,$0}' f.txt cpf.txt    awk '{print FNR,$0}' f.txt cpf.txtawk '{print FNR,FILENAME,$0}' f.txt cpf.txtawk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txtecho 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'Copy the code

2 match

Awk '/ LDB / {print}' f.t # match ldbawk '! LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' fCopy the code

3 Built-in variables

NR:NR indicates the Number of data reads according to the Record separator after the execution starts from AWK. The default Record separator is a newline character, so the default is the Number of data rows read. NR can be understood as the abbreviation of Number of Record.

FNR: When awK processes multiple input files, it does not start at 1 after the first File is processed. Instead, it continues to add up. Therefore, FNR is generated.

NF: indicates the Number of fields to be split in the current record. NF can be understood as the Number of fields.

find

Sudo -u admin find /home/admin/tmp/usr-name \*. Log (multiple directories to find)find. -iname \*. TXT (case matched)find /usr-type l(all symbolic links in the current directory)find /usr-type l-name "z*" -ls(symbolic link details eg:inode, directory)find /home/admin-size Find /home/admin f-perm 777-exec ls -l {} \; Find /home/admin-atime-1 Files accessed within one day find /home/admin-ctime-1 Files whose status has changed within one day find /home/admin-mtime-1 Files modified within 1 day find /home/admin-amin-1 Files accessed within 1 minute find /home/admin-cmin-1 Files whose status has changed within 1 minute Find /home/admin-mmin-1 Files modified within 1 minuteCopy the code

pgm

Batch query vM-ShopBase logs that meet the conditions

pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'Copy the code

tsar

Tsar is our own collection tool. It is very useful to persist historical data on disk, so let’s quickly query historical system data. Of course, the real-time application can also be queried. It’s installed on most machines.

See the latest day's indicatorsCopy the code
Tsar live allows you to view real-time metrics, with a default of five secondsCopy the code
Tsar-d 20161218 ### Specifies data for a single day, which looks like a maximum of four monthsCopy the code
Tsar --memtsar --loadtsar --cpu### this can also be used with the -d parameter to query the status of individual metrics on a given dayCopy the code

top

Top in addition to look at some basic information, the rest is to cooperate with the query VM various problems

ps -ef | grep javatop -H -p pidCopy the code

After the thread is converted from base 10 to base 16, JStack tries to figure out what the thread is doing

other

Netstat NAT | awk '{print $6}' | sort | uniq -c | sort - rn # to check the current connection, note close_wait on the high side, such as the followingCopy the code


btrace

The first thing to say is bTrace. What a production environment & a pre-issued troubleshooter. Forget about the introduction. Go straight to the code

Check who is currently calling the add method of ArrayList, and print only the stack of threads whose current ArrayList size is greater than 500

@OnMethod(clazz = "java.util.ArrayList", method="add", location = @Location(value = Kind.CALL, clazz = "/.*/", method = "/.*/"))public static void m(@ProbeClassName String probeClass, @ProbeMethodName String probeMethod, @TargetInstance Object instance, @TargetMethodOrField String method) {   if(getInt(field("java.util.ArrayList", "size"), instance) > 479){       println("check who ArrayList.add method:" + probeClass + "#" + probeMethod  + ", method:" + method + ", size:" + getInt(field("java.util.ArrayList", "size"), instance));       jstack();       println();       println("===========================");       println();   }}Copy the code

2. Monitor the value returned when the current service method is called and the parameters of the request

@OnMethod(clazz = "com.taobao.sellerhome.transfer.biz.impl.C2CApplyerServiceImpl", method="nav", location = @Location(value = Kind.RETURN))public static void mt(long userId, int current, int relation, String check, String redirectUrl, @Return AnyType result) { println("parameter# userId:" + userId + ", current:" + current + ", relation:" + relation + ", check:" + check + ", redirectUrl:" + redirectUrl + ", result:" + result); }Copy the code

Interested in more content, please click: https://github.com/btraceio/btrace

Note:

  1. After observation, the release output of 1.3.9 is unstable, and the correct result can be seen only after it is triggered several times

  2. The range in which the regular expression matches the trace class must be controlled, otherwise it is highly likely that the application will freeze due to CPU overload

  3. Due to the principle of bytecode injection, you need to restart the application to restore it to normal state.

Greys

Here are a few cool features (some of which overlap with bTrace):

Sc-df XXX: Outputs details of the current class, including source location and classloader structure

Trace Class Method: Really like this feature! JProfiler has seen this feature for a long time. Prints out the elapsed time of the current method call, broken down into each method.

javOSize

Classes, for example, changes the content of classes by modifying the bytecode, effective immediately. So you can do a quick log somewhere to see the output, but the downside is that it’s too intrusive to code. But it’s great if you know what you’re doing.

Other functions Greys and BTrace can easily do, forget it.

JProfiler

JProfiler was used to determine a lot of problems, but now Greys and BTrace can do almost everything. Plus, the problems are mostly in production environments (network isolation), so it’s not used much anymore, but it’s worth noting. Website, please click https://www.ej-technologies.com/products/jprofiler/overview.html


eclipseMAT

It can be opened as an Eclipse plug-in or as a separate program. Please click http://www.eclipse.org/mat/ for details


jps

I use only one command:

sudo -u admin /opt/taobao/java/bin/jps -mlvVCopy the code

jstack

Common usage:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815Copy the code

Native + Java stack:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815Copy the code

jinfo

You can see the system startup parameters as follows

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815Copy the code

jmap

Two purposes

1. Check the heap

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815Copy the code


2.dump

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815Copy the code

or

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815Copy the code

3. Who’s taking up the heap? Combined with Zprofiler and BTrace, troubleshooting problems is like adding a tiger to its wings

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10Copy the code

jstat

There are many jstat parameters, but using just one is sufficient

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000 Copy the code

jdb

JDB is still in regular use today. JDB can be used to pre-send debug, assuming you pre-send javA_home to /opt/ Taobao/Java/and remote debug port 8000. Sudo -u admin /opt/taobao/ Java /bin/ jdb-attach 8000


If the preceding information is displayed, the JDB is successfully started. You can set breakpoints for debugging. The specific parameters visible oracle official http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html

CHLSDB

The CHLSDB feels that in many cases you can see more interesting things, without going into detail. I’ve heard that tools like JStack and JMap are based on it.

sudo -u admin /opt/taobao/java/bin/java -classpath /opt/taobao/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDBCopy the code

More detailed R large this post at http://rednaxelafx.iteye.com/blog/1847971


key promoter

You can’t remember a shortcut key once, but you can remember it several times, right?


maven helper

Analysis maven depends on a good helper.


1. From which file did you load your class?

- XX: + TraceClassLoading results form such as [the Loaded Java. Lang. Invoke. MethodHandleImpl $Lazy from D: \ programme \ JDK \ jdk8U74 \ jre \ lib \ rt jar]Copy the code

2. The dump file is outputted

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/java.hprofCopy the code


Is it too much to write this as a headline? Everyone has dealt with this annoying case at one point or another. What the hell can I do with all these plans?

mvn dependency:tree > ~/dependency.txtCopy the code

Play all dependencies

mvn dependency:tree -Dverbose -Dincludes=groupId:artifactIdCopy the code

Type only the specified groupId and artifactId dependencies

-XX:+TraceClassLoadingCopy the code

Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script

-verboseCopy the code

Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script

greys:scCopy the code

The sc command of Greys can also clearly see where the current class is loaded from

tomcat-classloader-locateCopy the code

Through the following url to know from where the current class loading curl http://localhost:8006/classloader/locate? class=org.apache.xerces.xs.XSObjec


dmesg

If you find that your Java process has quietly disappeared, leaving few clues, then DMESG might have what you’re looking for.

sudo dmesg|grep -i kill|lessCopy the code

Go to the keyword oom_killer. The results found are similar to the following:

[6710782.021013] Java invoked oom - killer: Gfp_mask =0xd0, order=0, oOM_adj =0, oOM_scoe_adj =0[6710782.070639] [< ffffff81118898>]? Oom_kill_process +0x68/0x140 [6710782.257588] Task in /LXC011175068174 KILLED as a result of limit of /LXC011175068174 [6710784.698347] Memory cgroup out of Memory: Kill process 215701 (Java) score 854 or Sacrifice child [6710784.707978] Kill Process 215701, UID 679, (java) total-vm:11017300kB, anon-rss:7152432kB, file-rss:1232kBCopy the code

The Java process was killed by OOM Killer with a score of 854. Explain OOM killer (out-of-memory killer), which monitors the machine’s Memory consumption. When the machine runs out of memory, the mechanism scans all the processes (calculated according to certain rules, memory usage, time, etc.), selects the process with the highest score, and kills it to protect the machine.

Dmesg log time conversion formula: Log Actual time = Greenwich 1970-01-01+(current time seconds – seconds since the system was started + Log time printed by DMESG) Seconds:

Date - d "of the 1970-01-01 UTC ` echo" $(date + % s) - $(cat/proc/uptime | the cut - f - 1 d ') + 12288812.926194 "| BC ` seconds"Copy the code

All that remains is to see why the memory is so large that it triggers OOM-Killer.


RateLimiter

Want fine control of QPS? For example, if you call an interface and they explicitly want you to limit your QPS to 400, how do you control that? That’s where RateLimiter comes in. Details can be found at http://ifeve.com/guava-ratelimite

How do you compare the size of two times in Java?

Learn more about enumerations in Java.

– MORE | – MORE excellent articles

  • Why can’t my girlfriend apply for refund on Double 11

  • This article gives you an in-depth understanding of Zookeeper

  • Concurrency problems caused by a HashSet

  • 5 minutes to understand congestion control

If you enjoyed this article.

Please long press the QR code to follow Hollis

Forwarding moments is the biggest support for me.