Data security is a commonplace issue. In many cases, when developers are developing interfaces, they leak out some sensitive information, such as user accounts, mobile phone numbers, ID cards, etc., so that others can call the interface to capture our user data, resulting in privacy information leakage. If we can do a desensitization between database and application system, the data from the application flow to filter out the sensitivity information, before this application access to data is the data after desensitization, it released any interface which is after desensitization of data, so as to protect our sensitive data is not leaked. Before chatting with friends about the realization of data desensitization software, so the use of Java to achieve a simple version of desensitization procedures, here to record.

The target

The realization of desensitization to the running application system, does not affect the normal operation of the system, does not modify the data in the database of the application system

Scenarios and schemes

At present, most of the relational database is CS mode, the database manufacturer provides driver package for the application and the database interaction, between the application and the database through the driver package using TCP + SSL protocol to communicate. In view of this, I have proposed two solutions to realize data desensitization. Please provide more ideas. I am very grateful.

The first is in the server (where the database is located server) do database plug-in to complete desensitization;

The second is to desensitize the client (application) to the data.

Personally familiar with the Java language, the practice of the use of the second scheme. The next step is to desensitize Java applications that use relational databases, but of course the idea can be applied to applications that use other types of databases. The detailed scheme is as follows:

Problems encountered

In the process of practice in the program, the following problems will be encountered. If there are some big guys who have solved the following problems, the next content can not be looked at, ha ha.

  • How to add additional logic code to a client’s application system without source code
  • Where is desensitization logic added
  • The application is running. How do I get my code to run

The technical implementation

First of all, let’s solve the problems mentioned above and clear the technical obstacles.

For the first problem with no source code, although there is no source code but there is a bytecode file, we can find the bytecode of the corresponding class, parse out the method of the class and add logic to the method, which requires the use of bytecode enhancement techniques. Bytecode enhancements include ASM and Javassist, which is a source-level API that is highly readable; ASM is based on bytecode. Each has its pros and cons, and ASM is used here.

As for the second question, we know that Java provides a uniform specification for database interaction, which is implemented by driver packages from various database vendors. The underlying database interaction of the application is connected to the database through JDBC, so our desensitization logic can control the data as long as it is added in the driver package. Since we can all enhance bytecode, why not enhance sockets? If you want to enhance the socket, it is necessary to realize the communication protocol of the data, but also need to account. Implementing the various vendor protocols is a huge job, and if we were to build on JDBC enhancements we would have saved the work, which is actually lazy, lol. ResultSet is the interface for JDBC query data to return the ResultSet. The data returned by this interface is the parsed plaintext data. As long as the interface is well managed, the outflow data can be controlled. This practice is to enhance the implementation class of the ResultSet interface, and our desensitization logic can be added to the implementation class of the ResultSet. In addition, when data flows out after desensitization, it may be saved back, so it is necessary to manage the inflow of data before desensitization. This requires the implementation of the Statement, PrepareStatement and other interfaces to be enhanced.

For the third problem, we need a Java Agent, which can help us load our enhanced classes after the application starts, allowing the new desensitization logic to take effect. There are two ways to implement Java Agent, one is to increase the start parameter – JavaAgent;

-javaagent:${agent.jar.path}/agent.jar

The other is to use the Attach API, which loads the proxy classes at run time. Notice that the Attach API is in the Tools package and needs to be added. The command format is as follows:

${java_home}/bin/java -cp "${java_home}\lib\tools.jar; ${attach.api.jar}" ${package.class.name} ${pid} ${agent.jar}

Related technical inventory:

- asm
- java agent
- java attach api

This way, our desensitizer will work in a Java application as long as it uses a database driver package that implements the JDBC specification. Above is the implementation of the whole idea, the implementation of the source code is relatively simple, no longer one by one to repeat the code, then we directly see the example effect.

Software example

We use Attach as an example to show how to add desensitization function to the running application system. The other one is similar. Example link points here for the main steps:

  1. Start a third party application
  2. Prepare the desensitization program JAR package
  3. Get the PID of the application system and attach the desensitization program ATTACH to the application system
  4. Desensitization test, pay attention not to affect the use of the application system, desensitization opened after the application system normal use, off the system can also be normal use

conclusion

The example shows the realization of the data desensitization procedures, only need to know the process of the application PID, you can help desensitization system. The current desensitization rules can be supported according to keywords, regular expressions, field names desensitization, desensitization enabled after the application can be normal data processing, does not affect the use of the system, the scope of application: the implementation of the JDBC specification of the Java program, self-testing MySQL and Oracle can be used. Of course, we can also expand new functions on this basis, such as adding NLP to identify people’s names, mobile phone number, ID number for desensitization, so that desensitization is more automatic and intelligent; Support for more sensitive word sources, including databases, redis, etc. This article is limited by personal thoughts, mistakes and omissions, welcome everyone to correct.

The resources

  • Bytecode enhancement
  • Java bytecode instructions complete
  • Java methods perform process parsing