Original: Taste of Little Sister (wechat official ID: XjjDog), welcome to share, please reserve the source.

In daily development, we often use Logback to print logs and include sensitive content. Such as mobile phone numbers, card numbers, email addresses, etc., which are risky for data security.

However, if the business to deal with these problems, it needs to print logs in each place, repeated desensitization operation, not only tedious affect code style, but also there will be omissions.

At this point, we need to consider a relatively unified solution by enhancing logback to uniformly detect and desensitize log messages before they fall to disk.

First, sources of demand

We usually log processing, facing the general appeal:

1) Extremely long log message interception: The log message printed by the program may be very large, such as more than 1M, which greatly affects the system performance and usually has low data value. Such messages should be intercepted or discarded.

2) Unified log format: Under normal circumstances, business logs in the production environment will be collected, analyzed and stored on demand, so unified log format is very necessary for downstream data processing.

To avoid misconfiguring the log format, we should standardize the log format, integrate it by default and limit changes.

The log format usually contains system information used for data sorting (for example, project name, deployment cluster name, IP address, cloud platform, rack, etc.) as well as dynamic MDC parameter values at runtime. The final format requirements are consistent.

3) Desensitization: desensitization is required for strings with specific rules in logs, such as mobile phone numbers.

Second, the core idea of design

Instead of using the default pattern parameter to specify the format, we will use fixed field format + custom field to concatenate the format specification.

Locally controllable fields can be system variables or MDC field lists. The fixed format section, usually the message header, contains the time, IP, project name, and so on.

Based on the MessageConverter feature provided by logback, it allows conversion of a “parameter formattedMessage” before a message is printed. The actual content that the Logger eventually prints is the result of the shaping returned by the Converter.

Based on this feature, we can perform the two main operations of “super-long message interception” and “content desensitization” in the convert method.

Iii. Design and coding

Design concept

CommonPatternLayoutEncoder: the parent class for PatternLayoutEncoder, used to define the log format, including fixed field sections, custom fields, the system, the MDC attribute, etc., for Mosaic.

At the same time, based on the option feature of Logback, dynamic parameters are passed to MessageConverter and finally spliced into a string as pattern attribute. In addition, configuration parameters required by Converter, such as maximum message length, regular expressions, and replacement policies, need to be declared through Encoder.

ComplexMessageConverter:

Message conversion, which only operates on the message part passed by Logger. info(String message,Throwable ex). The throwable stack information is not manipulated (and indeed cannot be modified).

Converter can get a list of option arguments passed by Encoder and initialize the associated processing classes. The internal implementation matches sensitive information based on regular expressions.

DataSetPatternLayoutEncoder (optional) :

It is mainly used to limit the log format of the data set class. It cannot filter sensitive information by itself. The data format is mainly for data analysis.

The main code

Below is CommonPatternLayoutEncoder. Main Java code, see comments in detail.

package ch.qos.logback.classic.encoder;  
  
import ch.qos.logback.classic.PolicyEnum;  
import ch.qos.logback.classic.Utils;  
  
import java.text.MessageFormat;  
  
import static ch.qos.logback.classic.Utils.DOMAIN_DELIMITER;  
import static ch.qos.logback.classic.Utils.FIELD_DELIMITER;  
  
/** * Applies to file-based Appender * 

* limits our log specification to add filters for sensitive information. * You can use regex to specify expressions that need to be matched and filtered. Strings that match the expression are processed using policy. *1) replace: replace the string with a facade, such as 18611001100 > 186****1100 * 2) drop: discard the entire log * 3) erase: 18611001100 > *********** *

* depth: indicates the regular matching depth. The default value is 12. That is, the matching is terminated after the number of successful matches reaches this value. If a log is too long, we should not replace it all, otherwise performance problems may be introduced. * maxLength: The maximum length of a single message (excluding the throwable). *

* In consideration of extensibility, users can still directly configure pattern. In this case, regex, policy, depth and other options do not take effect. But maxLength takes effect consistently. * Example format: * %d{yyyy-MM-dd/HH:mm:ss.SSS}|IP_OR_HOSTNAME|REQUEST_ID|REQUEST_SEQ|^_^| * SYS_K1:%property{SYS_K1}|SYS_K2:%property{SYS_K2}|MDC_K1:%X{MDC_K1:--}|MDC_K2:%X{MDC_K2:--}|^_^| * [%t] %-5level %logger{50} %line - %m{o1, O2, O3,o4}%n * Domain1 is mandatory and cannot be extended. Easy to parse; It can be empty. * Domain3 is the normal message part, where %m carries options, which Converter can then retrieve. * * /

public class CommonPatternLayoutEncoder extends PatternLayoutEncoder { protected static final String PATTERN_D1 = "%d'{'yyyy-MM-dd/HH:mm:ss.SSS'}'|{0}|%X'{'requestId:--'}'|%X'{'requestSeq:--'}'"; protected static final String PATTERN_D2_S1 = "{0}:%property'{'{1}'}'"; protected static final String PATTERN_D2_S2 = "{0}:%X'{'{1}:--'}'"; protected static final String PATTERN_D3_S1 = "[%t] %-5level %logger{50} %line - "; //0: maximum length of message (truncated if exceeding), 1: regular expression, 2:policy, 3: search depth (stop regex matching if exceeding depth) protected static final String PATTERN_D3_S2 = "%m'{'{0},{1},{2},{3}'}'%n"; protected String mdcKeys;// Keys from the MDC are separated by commas. protected String regex = "-";If the value is null or "-", the policy and deep arguments are invalid protected int maxLength = 2048;// Maximum length of a single message, mainly message protected String policy = "replace";// If the match is successful, the string policy. protected int depth = 128; protected boolean useDefaultRegex = true; protected static final String DEFAULT_REGEX = "' ((? <\\d)1[3-9]\\d{9}(? ! \\d))'";// Mobile phone number, 11 digits, and the front and back digits are no longer digits. // System parameter, if not specified, default is used; protected String systemProperties; protected static final String DEFAULT_SYSTEM_PROPERTIES = "project,profiles,cloudPlatform,clusterName"; @Override public void start(a) { if (getPattern() == null) { StringBuilder sb = new StringBuilder(); String d1 = MessageFormat.format(PATTERN_D1, Utils.getHostName()); sb.append(d1); sb.append(FIELD_DELIMITER) .append(DOMAIN_DELIMITER) .append(FIELD_DELIMITER); // Set system parameters. If the current data view does not exist, set a default value first if (systemProperties == null || systemProperties.isEmpty()) { systemProperties = DEFAULT_SYSTEM_PROPERTIES; } // System parameters String[] properties = systemProperties.split(","); for (String property : properties) { String value = Utils.getSystemProperty(property); if (value == null) { System.setProperty(property, "-");/ / initialization } sb.append(MessageFormat.format(PATTERN_D2_S1, property, property)) .append(FIELD_DELIMITER); } // Splice MDC parameters if(mdcKeys ! =null) { String[] keys = mdcKeys.split(","); for (String key : keys) { sb.append(MessageFormat.format(PATTERN_D2_S2, key, key)); sb.append(FIELD_DELIMITER); } sb.append(DOMAIN_DELIMITER) .append(FIELD_DELIMITER); } sb.append(PATTERN_D3_S1); if (PolicyEnum.codeOf(policy) == null) { policy = "-"; } if (maxLength < 0 || maxLength > 10240) { maxLength = 2048; } // If a custom regex is set, it takes effect first. Otherwise use the default if(! regex.equalsIgnoreCase("-")) { useDefaultRegex = false; } if (useDefaultRegex) { regex = DEFAULT_REGEX; } sb.append(MessageFormat.format(PATTERN_D3_S2, String.valueOf(maxLength), regex, policy, String.valueOf(depth))); setPattern(sb.toString()); } super.start(); } public String getMdcKeys(a) { return mdcKeys; } public void setMdcKeys(String mdcKeys) { this.mdcKeys = mdcKeys; } public String getRegex(a) { return regex; } public void setRegex(String regex) { this.regex = regex; } public int getMaxLength(a) { return maxLength; } public void setMaxLength(int maxLength) { this.maxLength = maxLength; } public String getPolicy(a) { return policy; } public void setPolicy(String policy) { this.policy = policy; } public int getDepth(a) { return depth; } public void setDepth(int depth) { this.depth = depth; } public Boolean getUseDefaultRegex(a) { return useDefaultRegex; } public boolean isUseDefaultRegex(a) { return useDefaultRegex; } public void setUseDefaultRegex(boolean useDefaultRegex) { this.useDefaultRegex = useDefaultRegex; } @Override public String getPattern(a) { return super.getPattern(); } @Override public void setPattern(String pattern) { super.setPattern(pattern); } public String getSystemProperties(a) { return systemProperties; } public void setSystemProperties(String systemProperties) { this.systemProperties = systemProperties; }}Copy the code

The code is introduced

Here’s a quick look at the above code.

The MDC parameter declaration format is %X{key}. If the key does not exist, “” is displayed. We declare the default value by using :-. For example, %X{key:–} indicates that a “-” will be printed if the key does not exist.

According to logback, the option parameter list must be declared in a field and combined with

to take effect. In this example, message is mainly reshaped. So the option argument is declared on %m in the format: %m{o1, O2… }, multiple options are separated by,. The literals of o1 and O2 can be obtained in Converter. Simply put, when you need to pass parameters to Converter, those parameters must be declared in a field in option mode, otherwise you can’t do it.

In particular, if the option argument contains {,}, the option argument must be included with ”. For example, %m{2048,’\\d{11}’,’replace’,’128′}. For ease of understanding, it is recommended to include all option parameters one by one with ”.

In addition, if you need to use System Property in the log format, you can declare it with % Property {key}. For instance,

MessageFormat.format("Show the formatting of '{'{0}'}'."."hello") 
Copy the code

Output > >

Show the {hello} formatting effect.Copy the code

There are some more important parameters.

useDefaultRegex

Whether to use the default expression, that is, the number of the mobile phone number (consecutive 11 digits, with no subsequent digits).

regex

We also allow users to customize expressions. In this case, useDefaultRegex must be set to false to take effect.

maxLength

The default value is 2048, meaning that the maximum length of message exceeds this value will be intercepted and can be configured.

policy

How to handle the string that regex matches successfully. (Processing rules, see ComplexMessageConverter below)

A) drop; reset message to A terminating symbol. Such as:

My mobile phone number is18611001100
Copy the code

Will be shaped to:

><
Copy the code

B) replace: The default policy is to replace sensitive information with * except for the first three and last four characters. Such as:

My mobile number is 18611001100Copy the code

Will be transformed into

My mobile phone number is186* * * *1100
Copy the code

C) erase: argument, replace all matched strings with * of equal length, e.g.

My mobile phone number is18611001100
Copy the code

Will be shaped to:

My mobile phone number is ***********Copy the code

depth

The match depth is the maximum number of successful matches in message, after which the match will be terminated. It is mainly concerned with performance. The default value is 128. If there are 200 phone numbers in message, then matching and replacing to 128 will terminate the operation and the remaining phone numbers will not be replaced.

mdcKeys

A list of MDC parameters to be inserted when specifying pattern concatenation, such as mdcKeys=”name,address”, will be included in pattern:

name:%X{name:--}|address:%X{address:--}
Copy the code

The main purpose of Encoder is to join a pattern together.

%d{yyyy-MM-dd/HH:mm:ss.SSS}|IP_OR_HOSTNAME|REQUEST_ID|REQUEST_SEQ|^_^|  
    SYS_K1:%property{SYS_K1}|SYS_K2:%property{SYS_K2}|MDC_K1:%X{MDC_K1:--}|MDC_K
Copy the code
%X{MDC_K2:--}|^_^|  
    [%t] %-5level %logger{50} %line - %m{2048.'(\\d{11})'.'replace'.128}  
Copy the code

In the format, domain1 is mandatory and cannot be extended.

Domain2 according to the configuration file specified system properties and mdcKeys dynamic Mosaic, K-V structure, easy to parse; It can be empty.

Domain3 is the regular message part, where %m carries options, which Converter can then retrieve.

Log format converter

package ch.qos.logback.classic.pattern;  
  
import ch.qos.logback.classic.PolicyEnum;  
import ch.qos.logback.classic.spi.ILoggingEvent;  
  
import java.util.List;  
import java.util.regex.Matcher;  
import java.util.regex.Pattern;  
  
/** * 

* The log format converter creates one instance for each appender, so compatibility needs to be considered at the configuration level. * The main purpose is to match message against the configured regex, replace the string that matched successfully, and return the corrected message. * * /

public class ComplexMessageConverter extends MessageConverter { protected String regex = "-"; protected int depth = 0; protected String policy = "-"; protected int maxLength = 2048; private ReplaceMatcher replaceMatcher = null; @Override public void start(a) { List<String> options = getOptionList(); // If option exists, extract if(options ! =null && options.size() == 4) { maxLength = Integer.valueOf(options.get(0)); regex = options.get(1); policy = options.get(2); depth = Integer.valueOf(options.get(3)); if((regex ! =null && !regex.equals("-")) && (PolicyEnum.codeOf(policy) ! =null) && depth > 0) { replaceMatcher = newReplaceMatcher(); }}super.start(); } @Override public String convert(ILoggingEvent event) { String source = event.getFormattedMessage(); if (source == null || source.isEmpty()) { return source; } // Reasons for complex processing: as little as possible string conversion, spatial reconstruction, character movement. Sharing a Builder if(source.length() > maxLength || replaceMatcher ! =null) { StringBuilder sb = null; // If it is too long if (source.length() > maxLength) { sb = new StringBuilder(maxLength + 6); sb.append(source.substring(0, maxLength)) .append("❮ ❮ ❮");// Add three terminators } // If matcher is started if(replaceMatcher ! =null) { // If maxLength is not exceeded if (sb == null) { sb = new StringBuilder(source); } return replaceMatcher.execute(sb, policy); } return sb.toString(); } return source; } class ReplaceMatcher { Pattern pattern; ReplaceMatcher() { pattern = Pattern.compile(regex); } String execute(StringBuilder source, String policy) { Matcher matcher = pattern.matcher(source); int i = 0; while (matcher.find() && (i < depth)) { i++; int start = matcher.start(); int end = matcher.end(); if (start < 0 || end < 0) { break; } String group = matcher.group(); switch (policy) { case "drop": return "❯ ❮";// As long as the match, return immediately case "replace": source.replace(start, end, facade(group, true)); break; case "erase": default: source.replace(start, end, facade(group, false)); break; }}returnsource.toString(); }}/** * obfuscated, but cannot change the length of the string **@param source * @param included * @return* / public static String facade(String source, boolean included) { int length = source.length(); StringBuilder sb = new StringBuilder(); // If the length is longer than 11, keep the first three, the last four, and replace all the middle * // Lower than 11 bits or included=false if (length >= 11) { if (included) { sb.append(source.substring(0.3)); } else { sb.append("* * *"); } sb.append(repeat(The '*', length - 7)); if (included) { sb.append(source.substring(length - 4)); } else { sb.append(repeat(The '*'.4)); }}else { sb.append(repeat(The '*', length)); } return sb.toString(); } private static String repeat(char t, int times) { char[] r = new char[times]; for (int i = 0; i < times; i++) { r[i] = t; } return newString(r); }}Copy the code

This class, mainly from the options of CommonPatternLayoutEncoder statement (i.e. regix, maxLength, the policy, the depth) initialize a Matcher, aiming at to match and replace the message. Regular comparisons consume CPU. We also want to avoid creating too many new strings during message processing, which can consume a lot of memory. When processing, try to ensure that there is only one main message, and do not change the length of the message when replacing, to avoid wasting some space by rebuilding the String.

Converter works without < VersionRule >, as shown in the configuration sample below. Note, however, that each Appender creates an instance of Converter based on

, so Converter design with code compatibility in mind.

<?xml version="1.0" encoding="UTF-8"? >  
<configuration>.<conversionRule conversionWord="m" converterClass="ch.qos.logback.classic.pattern.ComplexMessageConverter"/>  
  
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">  
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">  
            <level>INFO</level>  
        </filter>  
        <file>Your log file name</file>  
        <Append>true</Append>  
        <prudent>false</prudent>  
        <encoder class="ch.qos.logback.classic.encoder.CommonPatternLayoutEncoder">  
            <useDefaultRegex>true</useDefaultRegex>  
            <policy>replace</policy>  
            <maxLength>2048</maxLength>  
        </encoder>  
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">  
            <FileNamePattern>Your log name.%d{YYYY-MM-DD}.% I</FileNamePattern>  
            <maxFileSize>64MB</maxFileSize>  
            <maxHistory>7</maxHistory>  
            <totalSizeCap>6GB</totalSizeCap>  
        </rollingPolicy>  
    </appender>  
  
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">  
        <encoder class="ch.qos.logback.classic.encoder.ConsolePatternLayoutEncoder"/>  
    </appender>.</configuration>  
Copy the code

Note that conversionWord=’m’ in the

node, where M is %m in the corresponding pattern. The options list can be obtained from %m.

Because CommonPatternLayoutEncoder has limited the pattern format, so we in the logback. Also no longer need to display in the XML declaration pattern parameters. Based on this, you can restrict the format of business logs to remain uniform. Of course, if there are special cases that require customization, you can still use the declaration to override the default format.

Xjjdog is a public account that doesn’t allow programmers to get sidetracked. Focus on infrastructure and Linux. Ten years architecture, ten billion daily flow, and you discuss the world of high concurrency, give you a different taste. My personal wechat xjjdog0, welcome to add friends, further communication.