Database security has always been one of the focuses of the users, especially in some of the core systems and special field, when mastered the sensitive data of a certain size, you need time to avoid leakage and loss of data, so for database management and use have miscellaneous cumbersome process and complex permissions, which is an important security technology – data desensitization.

Data desensitization

Data desensitization refers to the regular transformation of some sensitive information stored in the database to protect sensitive data from being read and leaked at will. Sensitive information, such as ID card, mobile phone number, bank card number, will be desensitized to data, so we are more common is the data mask, with * character number to replace the middle data.

For example, the phone number 123456789 has been desensitized to 123******6789.

Nowadays, we can often see all kinds of data leaks, user data is sold and other related news, data security and network security is also a side we pay more and more attention to. Now the country is also constantly introduced a variety of policies, requiring relevant enterprises to do a good job in data protection and desensitization, especially some overseas enterprises, for data management will be more strict.

As a database engineer, people close to the database need to understand and pay attention to data desensitization technology.

Desensitization way

There are probably two kinds of data desensitization, static desensitization and dynamic desensitization.

Static desensitization

For static desensitization is actually very good to understand, is in the known data structure and the need for desensitization of the content, through some desensitization algorithm for certain processing of these data, converted into other data forms.

Static desensitization often requires that the data be extracted from the database first, and then put into the tool that has implemented the desensitization algorithm for batch conversion, and finally put the desensitization data into other places where it is needed. So this approach does not need to establish direct contact with the application or database, pure data processing.

Dynamic desensitization

Dynamic desensitization is real-time analysis of the query data whether there is desensitization content, if there is, the data through different desensitization algorithm desensitization back to the application end.

So now, the realization of dynamic desensitization is often create a broker layer, placed between application and database, and the interception of the query or data returned by the SQL result set, search, and judgment, whether there is sensitive information, the type of sensitive information, if present, use the corresponding elimination algorithm for modification of SQL, Or you can simply transform the result set and return it. Both approaches exist.

Data desensitization platform

Now there are a lot of data desensitization platforms in the world. In fact, we also have a data dynamic desensitization platform product – TDMP. Recently, we also conducted functional tests with TiDB cluster, and the functions supported by the platform can be used normally on TiDB.

Of course, this is also expected, I have to say that because TiDB is fully compatible with MySQL, many MySQL tools can be directly used in TiDB.

The technical implementation

TDMP desensitization of the overall idea of a brief introduction:

  1. Tdmp-dm is deployed between o&M tools, applications, and databases as a reverse proxy technology.
  2. When a user executes a query statement, the system checks whether sensitive information exists and whether the user has the right to view sensitive information based on the user permission and execution statement.
  3. If the desensitization condition is triggered, it will choose one of two ways to desensitize, one is to directly rewrite the SQL statement, the other is to modify the result set of the query. Specific selection of that way and that desensitization algorithm, it is to be determined by user configuration, the size of query results, performance evaluation and so on.
  4. Finally, desensitization data will be returned to the user or the application end.

At first glance, the idea is quite simple, but the actual inspection, to implement SQL desensitization algorithm and selecting optimization, the implementation difficulty is relatively high, especially in the SQL detection side, more dry operations are to understand, the boundless universe, what kind of SQL, want to write a generic detection algorithm, more troublesome.

The implementation of the desensitization core is actually a bit similar to the processing of the DATABASE SQL parsing layer, that is, the mode of TiDB Server:

  1. Parse the SQL for user queries into a custom tree structure, very similar to the abstract syntax tree.
  2. Based on the rules of the audit, determine whether a query for desensitization, or to require the desensitization operation, we will establish a list of rules (including user privileges, user configuration rules and bring a series of tests), traverse the list of rules, determine whether each need in the query to use, if can be used to record.
  3. Based on the optimization of physical, actually do a database middleware to desensitization, data to the database query performance will have a big impact, because this kind of operation is to be specific to the data to modify, so in the real way to carry out a certain desensitization, apart from the user’s configuration, we will be a certain optimization, Optimize by judging the size of the query data and the complexity of the query.

Is the whole process very similar to TiDB Server for SQL processing:

SQL – > Parser -> RBO -> CBO -> Executor

More information about TDMP’s platform can be found below

Official website introduction 2www.dc-tdmp.com/

I won’t go into too much detail here

Some thoughts on TiDB

Desensitization function

Desensitization has been introduced above the simple idea of implementation, in fact, and TiDB Server SQL execution process is very similar. So now there is an idea, is to directly add a simple desensitization function in TiDB Server, can be a desensitization function.

Select id data_mask(name, '*') data_mask(phone,'·') from student;Copy the code

The returned result: | | id name | phone |

|—-|——|———-|

| 1 | J * * k | | 123… 999

C * * l | | 2 | | 156… 888

select id data_change(name) data_change(phone) from student;


Copy the code

Result returned:

| id | name | phone |

|—-|——|———-|

| 1 | sdan |5486446468|

| 2 | wnru |6848789786|

Sensitive field declaration

In addition to adding this similar data desensitization function, you can also add permission-specific desensitization policies. For example, you can add the sensitive keyword when creating a data field to declare it sensitive. When common database users view the field, the field is automatically desensitized.

create table student(

id int,

name varchar(255) sensitive,

phone int sensitive,

primary key(id)

)


Copy the code

TiDB desensitization extension tool

Although there are now a variety of desensitization platform use, but they are too heavy, will be very troublesome to use, need to pay for deployment and so on a set, this is generally used for large enterprises, rich companies. But for some small enterprises, perhaps only need a simple desensitization function is enough, this time he needs only a desensitization tool to help him.

Like Extension in PostgreSQL, you can use simple desensitization in TiDB by executing a single command and downloading an Extension tool. Therefore, we are also considering to separate and simplify the desensitization function of TDMP platform and make it into a very simple and easy to use gadget.

As for the Extension function, TiDB, as an open source database, actually needs to provide some fixed interfaces so that people in the community can develop some small functional components as extensions, which can not only help the ecological development of TiDB itself, but also allow more developers to join in. The community will be richer and more active.