preface

“Logging” is both familiar and unfamiliar to the client developer, and like code comments and programming styles, it doesn’t add anything to functionality by itself, and it’s usually not tied to your KPIs. But when a problem occurs on the wire and you don’t know what to do about it, you have to say, “I wish I had printed a log here.”

However, due to the lack of a reasonable log specification for a long time, the phenomenon of abuse and misuse of log printing is endless. To address this pain point, today we’ll start with “Why print logs” and “How should YOU print logs?” And “when to log” three aspects as the starting point, to draw up a suitable for client development in the use of the “log printing specification”.

WHY: WHY are logs printed?

In terms of client development, the common reasons for printing logs are as follows:

1. Verify that the logic execution is correct

Compared with the inefficient breakpoint debugging method, the effective log output at the key nodes in the logic execution process can quickly verify whether the relevant parameters, process, results and so on meet the expectations, and make corresponding code adjustments in time, so as to improve the debugging efficiency in the development/test phase.

2. Monitor the running status of components

When a product’s functions depend on one or several components or services that run for a long time, the application monitoring system based on the log system can monitor the running status of the components in real time. In this way, a fault can be warned in time and developers can be notified to handle the fault to prevent further expansion of the fault.

3. Restore the fault site

Due to the fragmentation of end-devices and the unpredictability of user behavior, the pre-launch testing phase often fails to cover all contingencies.

When online problems occur, whether the log can provide enough information to restore the user’s scene and behavior at that time, in order to quickly locate the cause of the problem, reduce unnecessary disputes and blame, it is particularly important.

4. Record the track of user operations

Using log data as a brush and with the help of data analysis tools, users’ habits and preferences can be analyzed, and users’ portraits can be drawn, so as to provide users with personalized customized services and improve product competitiveness.

HOW: HOW should the log be printed?

Before we go into this, let’s clarify what the log levels are:

The level of logging

In order of importance, they are:

· DEBUG: Debugging information that is useful only during development.

Logs of this level are mainly used in the development/test phase. The content and format of logs can be adjusted according to the actual debugging needs of developers. Generally, logs of this level contain parameter information, process information, and return value information.

It is important to note that this level of logging cannot be brought into production, and it is recommended to include a debug mode determination in the encapsulated logging tool class API.

· INFO: Expected log information for general usage.

This level of logs is used to record specific service behaviors in the scenarios mentioned in cause 2, 3, and 4. Use this level of logs selectively and output only the content that is meaningful to the result to avoid excessive log output and insufficient storage space on the device.

· WARN: Information about potential problems that have not yet caused a serious error.

This level of logs mainly applies to the scenarios caused by cause 2 and 3. Logs of this level usually involve problems that can be predicted in advance and whose impact scope is controllable. Generally, logs of this level do not affect the normal execution of service processes, including but not limited to parameters missing, parameter errors, and task timeout. For this level of logging, the context information at the time the problem occurred is required to be recorded in as much detail as possible for later log analysis.

· ERROR: Problem information that has caused a serious ERROR.

This level of logs mainly applies to the scenarios mentioned in cause 2 and 3. Exceptions or errors that are unpredictable and affect a wide range of applications may crash or severely block the normal execution of service processes, requiring manual intervention in time. For this level of logging, in addition to the context information when the problem occurred, you also need to include the full exception stack information so that you can quickly locate the problem and fix it in time.

The following is an example of the use of different log levels using a common login module process:

  1. Firstly, we assume that our login module contains three login methods, which are verification code login, password login and third-party platform login. We can define different tags for different login methods:
Public static final String TAG_LOGIN = "login"; Public static final String TAG_LOGIN_IDENTIFYING_CODE = "login_identifying_code"; Public static final String TAG_LOGIN_PASSWORD = "login_password"; Public static final String TAG_LOGIN_THIRD_PARTY = "login_third_party";Copy the code
  1. Assume that the user selects the verification code to log in, enters the mobile phone number, and clicks the “Obtain the Verification Code” button. In this case, the user needs to request the interface for obtaining the verification code, set the button to unavailable, and start the timer. After more than one minute, the button will be available and the verification code can be obtained again. Do not allow the user to repeatedly click the button before starting the timer to avoid repeated requests.

  2. To verify that the logic for preventing duplicate capture of the verification code and for the timing button to recover usable code is in effect, we can use the TAG of the verification code login to print the following DEBUG level logs, respectively, to verify that the process is performing as expected:

Logutil.d (TAG_LOGIN_IDENTIFYING_CODE, "in the time interval of repeated click judgment, return no processing ")... Logutil. d(TAG_LOGIN_IDENTIFYING_CODE, "start timing, button is not available "); . Logutil. d(TAG_LOGIN_IDENTIFYING_CODE, "current remaining seconds:" + second); . Logutil. d(TAG_LOGIN_IDENTIFYING_CODE, "button is available when time is over ");Copy the code

Through the above logs, we can also cover other test scenarios, such as whether the countdown can be executed normally after the application is back in the background, and whether the countdown ends normally after the interruption of verification code login to other login methods, and other problems that are difficult to be verified by the naked eye.

  1. Assuming that the user receives the verification code text message normally and successfully logs in after entering the verification code, we need to print out the specific business behaviors in the process with INFO level, so that the user’s behavior track and the on-site information when the fault occurs can be restored during the log analysis in the later period:
Logutil. I (TAG_LOGIN_IDENTIFYING_CODE, "Request to get authentication code interface, phone:" + phone); . Logutil. I (TAG_LOGIN_IDENTIFYING_CODE, "Request for verification code interface succeeded, response:" + Response.toString ()); . Logutil. I (TAG_LOGIN_IDENTIFYING_CODE, "Request authentication code login interface, phone:" + phone + ", identifyingCode:" + identifyingCode); . Logutil. I (TAG_LOGIN_IDENTIFYING_CODE, "Request verification code login interface successful, response:" + Response.tostring ()); . Logutil. I (TAG_LOGIN, "Start synchronizing user configuration..." ); . Logutil. I (TAG_LOGIN, "Start sync message notification..." ); .Copy the code
  1. When a captcha timeout occurs, the user can retrieve it by clicking the “Get Captcha” button again, but the situation should be recorded in a WARN level log and the context information should be provided so that the frequency of the situation can be counted later to find space for optimization.
String MSG = new StringBuilder(" countryCode timeout: ").appEnd ("countryCode").append("countryCode").append("countryCode").append("countryCode").append("countryCode").append("countryCode").append("countryCode").appEnd ("countryCode"). ").append(phone).append("time: ").append(dateutil.format (system.currentTimemillis ())).append(" dateutil.format (system.currentTimemillis ())).append("networkAvailable: "), append (NetworkUtil isNetworkAvailable (getContext ())), append (" networkType: ").append(NetworkUtil.getNetworkType(getContext())); LogUtil.w(TAG_LOGIN_IDENTIFYING_CODE, msg);Copy the code
  1. When the “get verification code” interface is not available and the user login process is blocked, we need to print the ERROR code and ERROR information returned by the interface or the exception stack thrown by the interface at the ERROR level in the interface request failure callback method, so as to help quickly locate the cause of the problem and fix it in time:
@Override public void onFailure(Response response, IOException e) { if(response ! = null) {logutil. e(TAG_LOGIN_IDENTIFYING_CODE, "interface request to obtain verification code failed, code:" + response.getcode () + ", MSG: " + response.getMsg()); } else {logutil. e(TAG_LOGIN_IDENTIFYING_CODE, "interface request to obtain verification code failed", e); }}Copy the code

The other two cases are similar, so I won’t repeat them here, but use the correct TAG. Next, let’s enumerate some specific logging specifications:

The log specification

1. Control the log level to prevent log information from being inconsistent with the log level

Different log levels indicate different log importance levels. If a log level is misused, major logs will be disturbed.

Logutil. e(TAG, "Send a Msg")Copy the code

2. Use tags properly to quickly filter out specified logs

The suggested naming method here is to divide the TAG according to different module granularity, arrange the TAG according to the dependency relationship from large to small, and separate the module names with a hyphen. As a result, you can more carefully check whether the module functions in different granularity are executed properly.

Note that older Versions of Android support a maximum of 23 characters for logcat TAG length. It is recommended to use reasonable and understandable word abbreviations, and try not to exceed three module levels.

For example, the following two tags represent:

  • Msgserv_ws_keepalive — Message access service /WebSocket module/heartbeat keepalive function
  • Msgserv_ws_msgqueue — Message access service /WebSocket module/message queue function

If I just want to focus on heartbeat alive, I can filter the full MSgserv_WS_KeepAlive. If I want to keep an eye on whether the entire WebSocket module is functioning properly, I can filter msgserv_WS.

Public static final String TAG = "msgserv_ws_keepalive"; . Logutil. I (" Received a pong frame, do nothing") logutil. I (" Received a pong frame, do nothing") Omit TAG logutil. I ("onPause()") for convenienceCopy the code

3. Ensure that important information is complete to avoid invalid logs

A large number of invalid logs not only occupy the storage space of the device, but also increase the interference of obtaining valid logs, which is not good for locating and solving problems quickly. To this end, before printing the log, think: What is the purpose of printing the log? Does the log really help solve the problem?

Logutil. e(TAG, "Websocket connection was closed", e) Logutil.w (TAG, "Request failed with code: "+ code); Logutil.d (TAG, "Current download progress: ") logutil.d (TAG, "Current download progress: Logutil.w ("Download failed") // example: invalid log -- missing return code and description logutil.w ("Download failed") Logutil.d (TAG, "1") logutil.d (TAG, "2") logutil.d (TAG, "3")Copy the code

4. Keep the log content concise and clear without affecting readability

String MSG = new StringBuilder().append("Request method: "), append (request method ()), append (" \ n "), append (" request url: "), append (request url ()), append (" \ n "), append (" the request headers. "), append (request headers ()), append (" \ n "), append (" the request body: "), append (request) body ()), append (" \ n "). The toString (); LogUtil.d(TAG, msg); Logutil. I (TAG, request.toString())Copy the code

5. Use StringBuilder instead of string concatenation to handle many arguments

When developed in the Java language, using String concatenation produces a large number of strings. If there are many parameters, you are advised to use StringBuilder instead of string concatenation.

Logutil. d(TAG, "Request method: "+ request.method() + "\n" + "Request URL: "+ request.url() + "\n" + "Request headers:" + request.headers() + "\n" + "Request body: "+ request.body());Copy the code

6. The logs containing sensitive information need to be desensitized, encrypted, or not output

The log information printed in normal times should be avoided from leaking sensitive information. If the operation is persisted to the local, the log content should be encrypted.

7. To print an entity class defined by the Java language, you must override the toString() method

An entity class defined in the Java language outputs by default only the hashCode value of this object, without any reference.

@override public String toString() {return jsonutil.tojson (this); }Copy the code

8. Avoid instability and additional performance loss due to the introduction of the log system

As I said earlier, the log itself is not a feature gain, but the log printing is part of the coding, and coding has hidden stability risks and performance costs that developers need to pay special attention to. It is better to support online degradation. When the adverse effects caused by logs occur, you can stop printing the logs of a certain level or stop printing the logs directly.

Logutil.d (TAG, "Insert a new message:" + message.getid ())Copy the code

9. Set a proper cache duration for log files and periodically clear expired logs

It is recommended to remove expired log files according to the FIFO clearance policy in order of date. The maximum cache time of log files can be determined according to the product’s service features (such as whether there is a regular weekly activity/monthly activity). You can choose to check whether the log files are expired at each read/write operation. You can also create a background task to periodically check expired log files and delete them.

10. Prohibit direct use of third-party logging framework apis to avoid solution fragmentation

After the specification is established, specific Log related processing can be implemented by a third-party Log framework. However, in order to ensure the unity and substitution of the scheme, Log printing needs to be encapsulated into the Log tool class as the appearance role based on the appearance mode, and the Log tool class is used to print logs in the project.

WHEN: WHEN should I log?

This article does not cover all of the logging scenarios, but only lists some common scenarios, which can be expanded according to the actual needs of the project.

1. Execution process of the core business of the product

Needless to say, the normal execution of the core business of the product determines the final quality of the product, affects the reputation of the company in the industry, and is linked to the actual earnings. On the one hand, a comprehensive log system is needed to help identify hidden technical vulnerabilities. On the other hand, the page needs to be able to locate and respond quickly to user feedback with the log.

2. Cross-terminal/cross-application/cross-module communication process

The common ones include the external interface request and response process, which mainly includes the request methods, request headers, request parameters, response codes, and response contents that affect the success rate of interface requests and data display. Similarly, there are data sharing processes between different applications and the route hopping processes between different service modules of the same application.

3. Perform initial startup configuration for important components

The initial startup configuration of an important component has a direct impact on the overall performance of an application. You can print the initial startup configuration parameters of a component to verify whether there are exceptions caused by parameter configuration errors.

4. Behavior/state switching of long-running components

Components that run for a long time are greatly affected by device memory, power, network, and user operations. You need to monitor the running status of components in real time through logs to ensure their normal running.

5. Judgment of multiple branch logic

Typically, there are multiple conditional branches in the code, or multiple policy classes to choose from, and you need to determine which branch or policy you entered to verify that the process is executing as expected.

6. User interactions that affect the result

More representative is the search module, the user through the input box search/historical search terms/hot search terms/hot list/associative words and other modules can trigger the search behavior, in order to locate which module is triggered by the search, it is necessary to record the specific user interaction behavior.

7. Calling a method that has a high probability of failure

When the implementation of the function depends on the validity of external conditions, the implementation method of calling the function is likely to fail. For example, the persistent data needs enough storage space, and the external interface needs the current network to be available. When we call such methods, we often need to verify that the external conditions are valid and provide the appropriate treatment if the conditions are not. Proper log printing can help us verify that the process is reasonable and feasible.

8. The calling process of the third-party SDK

Since the technology provided by the third-party SDK cannot be controlled, we cannot guarantee whether the introduction of the third-party SDK will affect the stability of the application.

In order to avoid this situation, we need to print out the information provided by the third-party SDK during the API call process of the third-party SDK, so that we can timely and effectively communicate with the third-party SDK provider in case of problems.

conclusion

Drawing up the specification is only the first step, how to adhere to the long-term implementation is the key. However, we must also understand that a good log is not written in one go, but in the actual use of the problem is found to be constantly adjusted. It is suggested that in the future project code review process, attention should be paid to log output, and team members should discuss better output content and way together, and correct bad printing habits in time.