1. The background

Logging is a cliche among us programmers, and you probably hear it every day. I just graduated from university when I entered the company, it was time to do the business handover of some departments, that is, the services of other departments were handed over to us for maintenance. I remember that before the handover, the relevant functions of the wechat public account of the business were unavailable. At that time, the students who were in charge of this part of the business had great difficulty in troubleshooting the problems. They did not log the whole link, but just error log at the entrance, connected the line for several times and added several rounds of logs, and then the problem was located. At that time, another example occurred in other departments. Too many logs were typed. Due to the magnitude of service access, a large number of logs were typed, causing disk I/OS to be filled and the whole service to break down.

Time went quickly over several years, but the above problems in different companies, different departments have been playing out, many developers to figure some of his own convenience, don’t attach importance to the use of the log, sometimes if you have been maintenance is also ok, out of the question because it is you, may be you can find at a glance, but if your code transfer out, Let someone else do the maintenance. There are also some public account readers will ask me some questions about the use of log, so I here combined with some of the usual use experience and “Alibaba Java development Manual”, write what I think the best practice of log.

Best practices

2.1 Reasonable classification of levels

There are six levels in the logging system that control the output of our logs:

  • TRACE: Online debugging. This has never been used before.
  • DEBUG: Logs used for debugging. If the information is not important, it is needed only in extreme scenarios.
  • INFO: The INFO information is usually used for some log output needs to be normalized, need to use it often, we troubleshoot some business problems often need this information, then you can use INFO.
  • WARNING: Indicates a WARNING message, which is usually used for some known service errors. These errors can be handled.
  • ERROR: ERROR information, usually used for exceptions or errors that we cannot handle, should be used for this part.
  • FATAL: This program needs to be terminated immediately. It’s also rarely used, and we don’t use it in business use

Although there are six log levels, there are generally only three business levels we need to pay attention to in real business development. TRACE and FATAL are basically not used. DEBUG is often used in the development of some basic tools, because these basic tools print logs implicitly for the business side. Therefore, use debug if the information is not too important.

In fact, many students will mix the two kinds of logs: ERROR and WARN. Usually, ERROR logs will send various reminders, such as SMS. Some students always mark all errors as “ERROR”, such as user has no permission, user balance is insufficient, etc., so it is inevitable that SMS bombing. In fact, this part of the error is part of our business process, so we should have used WARN to print the log. A good way to handle this situation is to make all business exceptions inherit an exception, in our case BizException, by catching this part of the exception and logging WARN.

        try{
            // do something
        }catch (BizException e){
            LOG.warn("biz exception", e);
        }catch (Exception e){
            LOG.error("exception", e);
        }
Copy the code

Of course, this part of logic can also be processed through the section, the exception information as much as possible into the exception, unified processing.

Dynamic log level adjustment

We have mentioned that there are six log levels, but there is an overall choice of log levels for a logging system. Usually in business, we choose INFO, that is, we output logs at levels greater than or equal to INFO to the file. Consider the following two situations:

  • A bug is triggered in a service. A large number of error logs are generated, which not only affects the performance of the machine, but also frequently sends error SMS messages.
  • If the log level is DEBUG, you need to go online again and change the log level to DEBUG.

Dynamic log level adjustment can be used to solve both scenarios. When a large number of error logs are generated, you can disable the logs immediately to prevent further problems caused by printing logs. If you need to troubleshoot problems at the debug level, you can directly change the output level to the debug level.

The methods for dynamic log level adjustment are as follows:

  • In spring-Boot 1.5 and later versions, spring-boot-starter-actuator is imported and the log level is changed through the HTTP interface.
  • Using arthas, modified via OGNL, as shown in the following code:
ognl -c 1be6f5c3 '@org.slf4j.LoggerFactory@getLogger("root").setLevel(@ch.qos.logback.classic.Level@DEBUG)'

Copy the code
  • Developed a set of support HTTP protocol level tools, Meituan before there is an article (tech.meituan.com/2017/02/17/…). You can refer to

2.2 Use placeholders wisely

There is a classic interview question, as shown in the code below. Can you find the problem with this code?

        LOG.info("a:" + a + "b" + b );
Copy the code

If our log output level is WARN, our INFO logs will not actually print, but the addition of strings does occur, so many people would add a level judgment to this problem:

        if (LOG.isInfoEnabled()){
            LOG.info("a:" + a + "b" + b );
        }
Copy the code

But writing it this way leads to a bit of procrastination, which is where placeholders come in:

        LOG.info("a:{}, b:{}", a, b);

Copy the code

This is done in a single line of code. It is important to note that the logging service does nothing special. Placeholder replacement is still done through the MessageFormat class, that is, iterated one by one, so vipSHOP Java manual recommends that if the log is ERROR, The “A” + a + “b:” + b pattern is recommended, because ERROR is usually printed every time, so you don’t need to worry about doing String concatenation. I personally don’t think it’s necessary to worry too much about the performance of placeholders. Your INFO uses placeholders. Generally speaking, info logs account for 99% of the system’s logs and only 1% of the error logs.

Exceptions do not use placeholders

     try{
           // do something
       }catch (Exception e){
           LOG.error("exception :{}", e);
       }
Copy the code

A lot of people at the time of print exception will be like the one above to write, a placeholder filling time is actually take the toString method, so if a direct call toString will cause an exception stack information loss, so as to increase the difficulty that we troubleshoot problems, it should be stressed, so if print anomalies can not directly use placeholder, Instead, it is written directly at the end of the log, which automatically prints out the stack information of a field.

2.3 Select a proper log output mode

There are two log output modes: synchronous mode and asynchronous mode. We generally choose asynchronous mode in business. In log4j2, there are two asynchronous modes:

  • AsyncAppender: Use ArrayBlockingQueue to save asynchronous logs and use an asynchronous thread to output them.
  • AsyncLogger: Saves logs using the Disruptor framework and outputs them using an asynchronous thread.

Disruptor is a high-performance queue that would increase log output throughput if we used Dirsruptor in log4j2. Disruptor is a log mode that we use less often and has been banned in some places due to CPU overflow caused by the Disruptor log. In general AsyncAppender will suffice for our purposes.

2.4 Properly Save logs

When we think clearly after the log output way, we can consider how to save the log, because our disk space is not infinite, we need to consider the log expired or deleted in the alibaba Java development manual clear mandate we log shall be retained for at least 15 days, for the user’s sensitive and important logging operation, takes six months to save time, You can also delete logs in log4j2 using the following configuration:

<DefaultRolloverStrategy max="30">
				<Delete basePath="${LOG_HOME}/" maxDepth="2">
					<IfFileName glob="*.log" />
					<IfLastModified age="30d" />
				</Delete>
			</DefaultRolloverStrategy>
Copy the code

In this case, the logs can be saved for a maximum of 30 days. Of course, you can also configure a scheduled task on the machine to delete the logs.

Sometimes our machine more when we are trying to identify a problem, the most primitive way we will go one by one machine, we began to use polysh, behind only needs to execute commands on a machine, other machines can automatically perform, are still a little inconvenient, but it is out of the back so the ELK, We collected all the logs using Logstash and stored them in Elasticsearch. Finally, we used Kibana for visual interface analysis. So using Elasticsearch is a good way to store logs.

2.5 Reasonable Log Output

In our system, if the output of a large number of invalid logs, will affect the performance of our system, so we need to think about the printing of logs, which is helpful to us, rather than all at once.

When we troubleshoot problems through logs, we usually cross services. Sometimes the information in the logs is not well matched, so we need something to associate them at this time, in this case traceId, we can get the log information in the whole link through a traceId. Especially convenient for our log search.

2.6 Don’t have sensitive information

In 2018, Facebook suffered a data breach. At that time, the whole Internet suddenly began to pay attention to the leakage of sensitive information, which was also easy to leak in the log system, such as the user’s name, mobile phone number and so on. If hit in the log, it is easy to be stolen information by criminals, so we should pay special attention to sensitive information when printing the log, specific log desensitization I also wrote an article before interested can take a look. How to design log desensitization plug-in

2.7 Proper Log Division

There are a lot of classmates all logs are playing in the same file, for we are trying to log information is particularly convenient, we can put the log into multiple files, according to the different middleware, for example, HTTP, RPC, mq, and so on can be separate into separate log files, this is trying to identify a problem when it is easier to find.

2.8 Third-party Tools

Although talk about so many log some regulations, but not perfect every time can hit the results we want, such as some methods may not add log, but also to check some problems. At this time, we can use our third-party tools, such as Arthas. We can use many arthAS commands, such as Watch and trace, to complete the function of our log. However, the third-party tools are not omnipotent, they can only help us check some upcoming data, and we still need to rely on our log system to ensure the historical data.

conclusion

Of course, the practice of log optimization is not only on the above points, there are more scenarios need to be combined with the actual business to optimize. Here I hope you can use a good log, so that there is no difficult to troubleshoot problems!

If you find this article helpful to you, your attention and forwarding will be my biggest support.