1. The background

The company I work for recently required desensitization of sensitive data everywhere, presumably because of the faceBook data breach.

Speaking of desensitization in general in the data output place need desensitization and our data landing output place is generally there are three places:

  • The interface returned value desensitization
  • Log desensitization
  • Database desensitization

Here is how to do log desensitization. There are two kinds of log sensitive data for code:

  1. Sensitive data is in the method parameters
LOGGER.info("person mobile:{}", mobile);Copy the code

Desensitization for this suggestion write a Util directly, because mobile the parameter name is not available in the code, had thought to pass parameters using regular match, so efficiency is too low, can let each logging methods of regular matching, efficiency is extremely low, and if they just have a phone number string but not sensitive information, This is also desensitized.

 LOGGER.info("person mobile:{}", DesensitizationUtil.mobileDesensitiza(mobile));Copy the code

2. Sensitive data is in the parameter object

Person person = new Person(); 
person.setMobile(mobile); 
LOGGER.info("person :{}", person);Copy the code

For our business, the most is actually the above log, in order to play the whole parameter, the first method needs to take out the parameter, the second only need to pass a parameter, and then print out the log through toString, there are two schemes for this desensitization

  • To modify toString, there are three more ways to modify toString:
  1. ToString () is a cumbersome and inefficient way to modify the code directly in toString. You need to modify every class to be desensitized, or you need to write an IDEA plugin to automatically modify toString().
  2. Modify the abstract syntax tree at compile time and modify the toString() method. Like Lombok, this has been investigated before and is difficult to develop and may be updated later.
  3. In the load time through the implementation of Instrumentation interface + ASM library, modify the class file bytecode, but there is a more troublesome place is the need to add startup parameters to the JVM – JavaAgent: Agent
    jar

    Path, this has been implemented, but after implementation, it is really not universal enough.

  • As you can see, all three of the above toString() methods are cumbersome. We can think of a different way to generate logs instead of using toString(). The following sections explain how to do this.

2. Plan

The first thing we need to know is what happens when we use logger. info? As shown in the figure below, I have listed the asynchronous situation here (we all use asynchrony in the project, synchronization efficiency is too low).



Log4j provides a lot of extensions that you can customize if you need them. For example, meituan.com’s XMDT unified log and offline alarm log are its own appenders. Unified log also encapsulation of LogEvent.

We can also use the extensibility provided by Log4j2 to customize our own requirements.

2.1 Customize the Convert for PatterLayout

That is, modify step 8 in the figure above. By overwriting Convert and adding filtering logic.

Advantages:

This approach is ideal and will have little impact on the performance of our logs because the filtering logic is all inside PatterLayout.

Disadvantages:

But I’m embarrassed here because I can only get strings that have already been generated, so I have to do this stupid thing of matching word by word, and then desensitizing the data that comes after that word, which is too complicated. I have thought about using any algorithm to optimize (for example, the comment system is to filter tens of thousands of words of sensitive words), but the cost is too high, so I gave up.

2.2 Customizing the Global Filter

When thinking of the first method, this time actually encountered a bottleneck, at that time did not fully analyze the Log4j2 link, later I think maybe from the Log4j2 panoramic link, can find more ideas, so there is the above figure.

Why is scheme 2.1 not feasible? The main thing is I can only get strings that have already been generated. At this point, I thought, I wish I could change the String generation method. The log is just a String. It doesn’t matter how the String comes from.

That’s when I came up with JSON, which is also a string, a format for our data exchange. Using Json generation, filter and desensitize the values we need to convert to achieve our goal.

Of course, converting Json and toString() methods can have significant efficiency differences, and in this case, FastJSON is used. Fastjson uses ASM bytecode technology to get rid of the reduced efficiency of reflection, as shown in the performance benchmark below, and the efficiency impact is negligible.

So in fact, we need two kinds of filter: one is log4j2 for desensitization log filter, and one is FastJSON filter for converting Json to do some field processing.

Advantages:

Add the filter to the log4j. XML configuration file to take effect globally.

Disadvantages:

1. Since it is global, every log must be converted from toString to JSON, which may not be applicable to some services that seek extreme performance (for example, even 1ms more is not acceptable).

2. It can be seen that our log is in the first step, and the first step is followed by its own level filter. Because we sometimes dynamically adjust the log level, our log will be converted even if it is not the current output level, which is not worth the loss.

The second point has been optimized and I have also done the work of the level filter in advance. If the level is not enough, I will directly reject it.

Example code is as follows:

@Plugin(name = "CrmSensitiveFilter", category = Node.CATEGORY, elementType = Filter.ELEMENT_TYPE, printObject = true) public class CrmSensitiveFilter extends AbstractFilter { private static final long serialVersionUID = 1L; private final boolean enabled; private CrmSensitiveFilter(final boolean enabled, final Result onMatch, final Result onMismatch) { super(onMatch, onMismatch); // This. Enabled = enabled; } @Override public Result filter(final Logger logger, final Level level, final Marker marker, final Object msg, final Throwable t) {return filter(logger, level, marker, null, msg); 
    } 
 
    @Override 
    public Result filter(Logger logger, Level level, Marker marker, String msg, Object... params) { 
        if (this.enabled == false) { 
            return onMatch; 
        } 
        if (level == null || logger.getLevel().intLevel() < level.intLevel()) { 
            return onMismatch; 
        } 
        if (params == null || params.length <= 0) { 
            return super.filter(logger, level, marker, msg, params); 
        } 
        for (int i = 0; i < params.length; i++) { 
            params[i] = deepToString(params[i]); 
        } 
        return onMatch; 
    } 
 
 
 
    @PluginFactory 
    public static CrmSensitiveFilter createFilter(@PluginAttribute("enabled") final Boolean enabled, 
                                                  @PluginAttribute("onMatch") final Result match, 
                                                  @PluginAttribute("onMismatch") final Result mismatch) throws IllegalArgumentException, 
                                                                                                     IllegalAccessException { 
        returnnew CrmSensitiveFilter(enabled, match, mismatch); }}Copy the code



2.3 rewrite MessageFactory

The disadvantage of the global filter above is that it cannot be customized, so I will focus on the third step, which is to generate log content and output Message.

By overriding the MessageFactory we can generate our own Message, and we can specify at the code level whether our LoggerMannger will use our own MesssageFactory or use the default, which we can control.

Of course, the basic idea of Message generation is still the Fastjson value filter.

Advantages:

Can customize LOGGER, not global.

Disadvantages:

Log4j2 is not applicable to other logging frameworks such as LogBack

Here is part of the code:

public class DesensitizedMessageFactory extends AbstractMessageFactory { 
    private static final long                      serialVersionUID = 1L; 
 
    /** 
     * Instance of DesensitizedMessageFactory. 
     */ 
    public static final DesensitizedMessageFactory INSTANCE         = new DesensitizedMessageFactory(); 
 
    /** 
     * @param message The message pattern. 
     * @param params The message parameters. 
     * @return The Message. 
     * 
     * @see MessageFactory#newMessage(String, Object...) 
     */ 
    @Override 
    public Message newMessage(String message, Object... params) { 
        return new DesensitizedMessage(message, params); 
    } 
 
    /** 
     * 
     * @param message 
     * @return 
     */ 
    @Override 
    public Message newMessage(Object message) { 
        returnnew ObjectMessage(DesensitizedMessage.deepToString(message)); }}Copy the code

Use 3.

Before the business project of our team, log4j was using version 2.6, and it was always using filter, but suddenly it was upgraded to 2.7, and suddenly desensitization did not work. After studying the source code at that time, it was found that some changes had taken place in filter, and there was a problem when the log parameter was less than or equal to 2.

Select the one most suitable for your business scenario:

Log4j versions smaller than 2.6 use filter and larger than 2.6 use MessageFactory

3.1 Filter Configuration (Optional)

Find log4j.xml (each environment has its own version)

Write the following configuration below the outermost node, that is, inside the outermost node: Enabled For the online/offline switchover, true for the switchover to take effect, false for the switchover to take effect.

3.2 MessageFactory Configuration (Optional)

Create a file: log4j2.com ponent. Properties

Input: log4j2 messageFactory = log. Message. DesensitizedMessageFactory

4. Performance benchmarks:

Benchmarks focus on how efficient printing logs are.

Hardware:

4 core, 8 gCopy the code

Operating system:

linuxCopy the code

JRE:

V1.8.0_101, initial heap size 4GCopy the code

Preheating strategy:

Before the test starts, global warm-up, execute all tests several times, judge that the running time is stable and stop, ensure that all required classes are loaded. Before each test starts, independent warm-up, repeat the test 64 times to ensure that the JIT compiler has fully optimized the code.Copy the code

Execution strategy:

The loop is executed with an initial number of 200, increasing by 200 steps until it reaches 1000. Do this 10 times, remove the highest one, remove the lowest one, and take the average.Copy the code

Test results:

It can be seen from the above results that the growth rate is basically stable

The duration of desensitization is about 1.5 times that of non-desensitization.

On average, the undesensitized one log was generated at 0.1255ms, while the desensitized one log was generated at 0.18825ms, the difference between the two was about 0.06ms.

It is estimated that at most 10-20 logs will be printed for the whole request, and the average time of the whole request will be about 0.6ms-1.2ms. I think this time can be ignored in the whole request.

Therefore, the performance of this mode is relatively good and can be applied to the production environment.



For more communication, please scan my technical official account

In order to facilitate everyone’s learning and communication, WE have set up a QQ Java back-end communication group :837321192, which has my collection of 100 GB learning videos (covering interview, structure, etc.), as well as a lot of interview materials, so you can join in and communicate with us.