An article explains online application monitoring

“Online service is down, need to reboot”? Programmers who have been doing r&d in the workplace for a long time will gradually shift their attention to the running state of online applications. Imagine if you were sleeping in your wechat group at 2 a.m., and suddenly something went off: “Service stopped, restart…” Can you bear the end of a dream when you get up?

Today, there are three main parts: application status monitoring, application log-based monitoring, and sublimation (old driver, take you flying). Let’s talk a little about application monitoring.

Serious statement:

1. Today’s content is quite brain-burning, please drink six walnuts in advance!

2. But I believe that if you keep reading until the end, you will definitely have a worthwhile trip!

Application service status monitoring

The production application service generally requires 7 x 24, with a stable operation rate of 99.99%. In addition to ensuring the robustness of the application itself, it is also necessary to rely on some daemons for monitoring. What else if the service appears to be dead?

The first thing we can think of is a small, compact shell file made up of a few Linux commands, with the occasional crontab scheduled task to complete the daemon of the application service. Without further ado, open the common monitor.sh script (tomcat as an example) to find out.

Although the sparrow has all the five organs, let’s dissect the sparrow.

How to determine if an application is in suspended animation?

The URL of the health check is configured for heartbeat detection. If 200 status codes are returned for each access, the application can still provide services. If the status code is not 200, check whether the application process ID exists. If yes, the application is in suspended animation.

How to implement suspended animation application restart?

Through the ps – ef | grep “tomcat” | grep -w ‘tomcat’ | grep -v “grep” | awk ‘{print $2}’ access to the corresponding process ID, if the process ID, to kill, The start command is then invoked to complete the service restart.

The above approach is implemented in the shell script every 60 seconds to check the application service status. In addition, I often use the Crontab provided by The Linux system to configure the monitoring script to call periodically to complete the application monitoring. I also use the monitor.sh script as an example to comment out the loop statements with minor modifications.

After compiling the script, the next task is scheduling with crontab (if you heard the word crontab for the first time, please consult Google and Baidu to catch up on your knowledge).

*/1 * * * * /app/script/monitor.sh > /dev/null 2>&1

There are two things to consider if you are going to try the above solution:

Note 1: Modify the corresponding directories, including the tomcat directory, script directory, and heartbeat detection URL.

Note 2: Make sure that executable permissions are assigned to shell scripts.

Small script to solve the big problem, so don’t take bean bags improper dry food, there are forty-two dials of potential.

In fact, based on past experience, if you think about it for a moment, this is not the case for other non-Tomcat service monitoring.

This is the most basic, the most simple and the most practical application service status monitoring scheme. Did you get it?

2. Monitoring based on application logs

As anyone who has worked on a financial project knows, the log is the last Aladdin lamp for solving system bugs.

Today, with the development of micro-services in full swing, service granularity is getting finer and finer, and module division of labor is getting clearer and clearer. Consequently, troubleshooting problems according to logs is becoming tedious.

Is it possible to put microservices’ logs together? There are already plenty of ideas in the works. So what about log collection? How are the collected logs stored? How should stored logs be displayed? How do you implement an alarm?

How do I do log collection?

Common log collection schemes in the industry can be divided into two kinds: one is direct collection; The other is the Agent approach.

Direct logging is an application that uploads logs directly to a storage tier or server, such as Log4j’s appender.

In the Agent mode, an Agent service is deployed on an application machine to collect logs and push logs to the storage layer or server. The application itself is only responsible for generating logs.

The direct collection mode applies to the following scenarios: If an agent, such as a load balancing device, can be deployed independently to collect logs without additional resources, the direct collection mode must be considered.

The Agent mode applies to the following scenarios: If an application outputs logs to a disk as a file, the Agent can collect logs and loosely couple the logs to the application. Compared with direct collection, agent collection is more scalable and maintainable.

What are the common log collection tools in the industry?

A lot of wheels.

Logstash, Filebeat, Apache Flume, Syslog/Rsyslog/ syslog-ng for Linux, Scribe for Facebook, and so on.

It is estimated that you will insist on reading to this face meng forced (laugh and cry), but it doesn’t matter, as today to expand the knowledge.

Today I’d like to mention two that I’ve used: Filebeat by Elastic and Flume by Apache.

Filebeat is developed with Go language, is a binary file, deployment without dependence, occupy very few resources, light weight 3M, out of the box, the use of personal test is particularly convenient. It is the product of ELK architecture upgrade. Have you ever heard of ELK? (Laughing and crying)

Flume is developed in Java language, and I used Flume mainly to integrate it into the project framework to provide log collection capability, mainly for Flume to remove some redundancy, expand some functions, and carry out secondary expansion development (when there is time to write a special article about Flume secondary development, please look forward to it).

How are the collected logs stored?

Another pile of wheels appeared.

ElasticSearch, Mongodb, HDFS, and time series databases InfluxDB, Opentsdb, RRD, etc.

Elasticsearch is best for locating queries by keyword because of the work scenario. Because each wheel has a usage scenario for each wheel, I won’t expand it further here.

What visualization tools are available for log analysis?

Yeah, you guessed it. More wheels on the horizon.

Display tool based on Node.js development, provide log display, summary, search, analysis dashboard and other functions of Kibana.

Grafana, based on the GO language, focuses on providing time-series charts based on specific metrics such as CPU and IO utilization.

How do you implement an alarm?

Thousands of miles long, only one step. Log collection completed, that if you want to see if there is a keyword, such as error, exception and so on, the keyword will send an alarm notification, implementation is not so easy.

I have talked so much about log collection. I often use ELK. Later, I will write a detailed article about log collection.

Three. Sublimate once, old driver takes you to install B, takes you to fly

So far, you have seen how to implement monitoring of application service status and the idea of logging based monitoring. Have you ever struggled with how to call a request? How many systems does a request traverse? Which node does a request take?

Let me introduce the concept of “APM application performance monitoring”. If you have time, please focus on the following three APM components.

The first is Zipkin, a distributed tracking system open source by Twitter, which mainly includes data collection, storage, search and presentation.

Second: Pinpoint: by the Korean open source distributed tracking component, is a large-scale distributed system of Java written APM tools.

The third kind: Skywalking: excellent APM component made in China, which is a system for tracking, warning and analyzing the business operation of Java distributed application.

There are thousands of wheels, and there’s always one for you.

Write at the end

Internet winter, the environment is not good, you can only self-improvement! Strong!!!!! Self-improvement!!

Unconsciously code so many words, do not know how much you get. If you feel a little help to you, please do not praise, please recommend more friends around very gratified.

An article explains online application monitoring

Related Posts

DDD Case 1: Introduction to Domain Driven Design DDD Landing

Design rbTree (red black tree) -1

How to deal with online Excel multiplayer collaboration conflicts