The log

This is the most common method for troubleshooting. You need to estimate the number of daily logs and the log duration to be stored. When you apply for disk space, 35% of the disk space is reserved for unexpected traffic.

Generally, log entry and exit of each externally provided method, and call before and after the call of a third party. The print content mainly includes input and output parameters. Github.com/xiexiaojing… I define in the concise logging specification that several commonly used classes inject logs in faceted form.


monitoring

The traditional approach is to turn on the GC log if the JVM has problems such as GC, which can sacrifice some efficiency. However, these data have been widely reported into the system in the industry, such as Prometheus commonly used in Europe and America, and Xiaomi’s Falcon is widely used in China.

For the database, call volume, response time and other monitoring system also have statistics. Meituan-dianping CAT not only integrates these, but also dependency analysis, dependency analysis, heartbeat report, business market, etc. It’s convenient.


Call the police

The alarm can not only find the problem, but also send out the stage that has been executed when the problem occurs as the alarm information, which is convenient for quick location.


Linux command

tcpdump

If you receive unknown data and do not know where the data comes from, the simplest method is to capture packets and analyze them using tcpdump to determine the data source.


netstat

Usually used to monitor the status of the connected port. The common states are:

  • Listen status: Listening for connection requests from remote TCP ports.
  • Syn-sent status: Waiting for a matching connection request after a link request is sent.
  • Syn-received: Waits for a link request to be acknowledged after it is received or sent.
  • Established state: Represents an open link.
  • Fin-wait-1 status: Waiting for a remote TCP connection interruption request or confirmation of a previous connection interruption request.
  • Fin-wait-2 status: Waits for a link break request from the remote TCP.
  • Close-wait state: Waiting for the link interruption request from the local user.
  • Closing state: Waiting for the remote TCP to confirm that the connection has broken.
  • Last-ack status: Waiting for confirmation of the original link break request that found remote TCP.
  • Tie-wait state: Waits enough time to ensure that the remote TCP has received confirmation of the connection interruption request.
  • Closed: No connection is closed.



reproduce

I pretended to pass countless times, just for a chance encounter with you…


When this phrase is used to describe a desire to reproduce an online problem, it often describes a desire to cry.


If the program running online encounters problems and needs troubleshooting, it is recommended to use the most orthodox method first. Otherwise, Linux commands are acceptable. Memory leaks, concurrency problems, and so on are costly to reproduce. And make very little use of quick recovery.


Online debugging

If you can’t determine the cause of the problem after a replay, there’s one last trick you shouldn’t use until you’re done: online debugging.

Tools: Online machine is Linux system, local Intelij


Copy the information below

Add the above information to the online startup parameters and restart.

sudo svc -du /service/XXXX/

See if the listening port is up

lsof -i :8083

After waking up, enable remote debug for the local inteij. Remember there was an online accident when a guy started remote debug on the line and asked to hit the break point and got stuck there.


For more technical information: Gzitcast