Recently in a few nights at home always received some strange calls, “brother, you are XXX, we are XXX high-end men’s private club…” , hold grass, I first one leng, and then ruthlessly scold back. One face proud jiao of turn head, the face take a smile slightly show flatter: wife you listen to me to say, I really what also did not stem, you want to believe me!

Pa ~

After kneading face fine think, certainly is which immoral website, and sold my personal information, now the Internet is in a state of streaking, personal information no longer belongs to individuals, nowadays this kind of thing seems to be not strange, however, there is mostly this kind of thing inside ghost.

And as a developer, we can do is to try to avoid the leakage of user data by our hands, that today is to talk about the Internet internal means to prevent privacy data leakage – data desensitization.

What is data desensitization

First, what is data desensitization? Data desensitization is also called data privacy desensitization. In the case of given desensitization rules and strategies, sensitive data such as mobile phone number, bank card number and other information can be converted or modified as a technical means to prevent sensitive data from being directly used in an unreliable environment.

The government, the medical industry, financial institutions and mobile operators started to apply data desensitization relatively early, because what they have is the most core private data of users, and if leaked, the consequences will be immeasurable.

Data desensitization application is more common in life, for example, we buy things on Taobao in the details of the order, the merchant account information will be covered with *, to protect the merchant privacy is not disclosed, this is a data desensitization way.

Data desensitization is divided into static data desensitization (SDM) and dynamic data desensitization (DDM) :

Static data desensitization

Static data desensitization (SDM) : Applies to scenarios where data is extracted from the production environment and desensitized and distributed to testing, development, training, and data analysis.

Sometimes we may need to copy the data in the production environment to the test and development library for troubleshooting or data analysis, but sensitive data cannot be stored in the non-production environment for security reasons. In this case, sensitive data should be desensitized from the production environment before being used in the non-production environment.

In this way, desensitized data is isolated from the production environment to meet business needs and ensure the security of production data.

As shown in the figure above, the user’s real name, mobile phone number, ID card and bank card number are desensitized through replacement, invalidation, disorder, symmetric encryption and other schemes.

Dynamic data desensitization

Dynamic data desensitization (DDM) : it is generally used in the production environment to perform real-time desensitization when accessing sensitive data. Sometimes, different levels of desensitization are required to read the same sensitive data in different situations. For example, different roles and permissions perform different desensitization schemes.

Note: While deleting the sensitive content in the data, it is necessary to maintain the original data characteristics, business rules and data relevance to ensure that our development, testing and data analysis businesses will not be affected by desensitization, so as to ensure the consistency and effectiveness of data before and after desensitization. In a word: you love how to take off how to take off, do not affect my use on the line.

Data desensitization scheme

The data desensitization system can define and write desensitization rules according to different business scenarios. It can perform data desensitization in accordance with a sensitive field in the library table.

There are many ways to desensitize data. The following figure shows each scheme one by one.

1. Invalidation

The invalidation scheme desensitizes sensitive data by truncating, encrypting, and hiding the field data value when dealing with the data to be desensitized, making it no longer useful. Generally, special characters (*, etc.) are used to replace the truth value. This method of hiding sensitive data is simple, but the disadvantage is that users cannot know the format of the original data. If they want to obtain complete information, they need to ask users to authorize queries.

For example, if we replace the real number with *, it becomes “220724 ****** 3523”. It’s very simple.

2. Random value

Random value replacement, letters into random letters, numbers into random numbers, text randomly replace text to change sensitive data, the advantage of this scheme is that it can retain the original data format to a certain extent, often this method is not easily noticed by users.

We see that the name and IDnumber fields are randomized desensitized, while the first name and surname randomization are slightly special and need to be supported by the corresponding surname dictionary data.

3. Data replacement

Data substitution is similar to the invalidation method above, except that the true value is replaced by a set dummy value instead of a special character. For example, we set the mobile phone number as “13651300000”.

4, symmetric encryption

Symmetric encryption is a special reversible desensitization method. Sensitive data is encrypted through encryption keys and algorithms. The ciphertext format is consistent with the original data in terms of logical rules.

5. Average

The mean value scheme is often used in statistical scenarios. For numerical data, we first calculate their mean value, and then make the desensitized values randomly distributed around the mean value, so as to keep the sum of the data unchanged.

After the average processing of price field price, the total amount of the field remains unchanged, but the field values after desensitization are all around the mean value of 60.

6. Offset and rounding

In this way, the digital data is changed by random shift, and the offset rounding ensures the general authenticity of the range while maintaining the security of the data. Compared with the previous schemes, it is closer to the real data and has great significance in the big data analysis scenario.

For example, in create_time, 2020-12-08 15:12:25 is changed to 2018-01-02 15:00:00.

In practice, data desensitization rules are often used with multiple schemes to achieve a higher level of security.

conclusion

Both static desensitization and dynamic desensitization are ultimately aimed at preventing the abuse of private data within the organization and preventing private data from flowing out of the organization without desensitization. So as a programmer, it’s the least you can do.

Sorted out hundreds of various kinds of technical e-books, students in need can pay attention to the same name public number “programmer internal point matter” reply “666” for yourself. There are students who want to add technology group can add my friends, and big guy kan technology, indefinite period push, programmer’s internal point of matter this has.