Why we should develop our own DLP:

With the rapid development of information technology, the Internet has become a necessary tool and way for daily office, communication and collaboration. At the same time, with the large-scale growth of data economy, serious data security incidents such as data leakage and data abuse frequently occur.

For Bytedance, which has massive data and information, data storage security and transmission security have higher requirements. In addition, due to the large number of byte business models and complex desensitization requirements, current DLP products cannot meet the requirements in sensitive information recognition rules and efficient desensitization processing.

In order to better guarantee the data security and privacy of Bytedance, the security team of Bytedance independently developed a set of sensitive data protection solution — GoDLP. As bytedance’s successful practice in data desensitization protection, the ByteDance security team hopes to share our research results with you in an open source way and grow with the industry.

What is DLP for bytes?

The introduction of GoDLP

GoDLP provides a series of sensitive data recognition and disposal schemes, including sensitive data recognition algorithms, data desensitization processing methods, business-defined configuration options and mass data processing capabilities. GoDLP can apply a variety of privacy compliance standards to grade and mark raw data, judge the sensitivity level and implement corresponding desensitization treatment.

What is GoDLP capable of?

GoDLP supports a wide range of structured (JSON data, KV data, Golang Map) and unstructured data (multi-language strings).

  • Sensitive data can be automatically discovered
    • GoDLP has a variety of built-in sensitive data recognition rules, which can cover more than 80% of the recognition rules in the market and identify sensitive data to the maximum extent

  • Flexible desensitization of sensitive data
    • GoDLP supports a variety of desensitization algorithms, and users can desensitize themselves according to their needs

  • Service customization configuration options
    • In addition to the default sensitive information identification and processing rules, businesses can configure self-defined YAML rules according to the actual situation, and GoDLP can complete the corresponding data processing tasks according to the incoming configuration options.

GoDLP performance

The following data is the result when all sensitive data types are turned on:

  • For 1K files, it takes 4.1 ms and occupies 128.6K of memory
  • The 1M file took 4.1s and occupied 118.6MB of memory
The serial number The file size The execution time Memory footprint
1 1K 4.1 ms 128.6 KB
2 10K 41.4 ms 1257.4 KB
3 100K 406.9 ms 12.3 MB
4 1M 4.1 s 118.6 MB

Test environment: 8-core VM, Linux, AMD64, Intel(R)Xeon(R)[email protected]; Code version: V2.0.0

GoDLP practices in bytes

At present, GoDLP has carried out rich practice in Bytedance. The connected business lines include Douyin, e-commerce, Feishu, Huoshan Engine, cloud security, finance, education, games and many other business lines.

How do I use GoDLP

Git Open Source project address: github.com/bytedance/g…

The first step is to install the GODLP module using the Go get command.

go get github.com/bytedance/godlp@latest
Copy the code

Second, refer to the API documentation in the README for sensitive data manipulation.

The following code desensitizes strings. For more API examples, see the mainrun/mainrun.go file in the source code.

package main import ( "fmt" dlp "github.com/bytedance/godlp" ) func dlpDemo() { caller := "replace.your.caller" if eng, err := dlp.NewEngine(caller); Err == nil {eng.applyConfigDefault () fmt.printf ("DLP %s Demo:\n\n", eng.getVersion ()) inStr := 'My email is [email protected], I live at 43 Beisanhuan West Road, Haidian District, Beijing. MAC address: 06-06-06-AA-BB-CC 13900000000` if outStr, _, err := eng.Deidentify(inStr); err == nil { fmt.Printf("\t1. Deidentify( inStr: %s )\n", inStr) fmt.Printf("\toutStr: %s\n", outStr) //eng.ShowResults(results) fmt.Println() } eng.Close() } else { fmt.Println("[dlp] NewEngine error: ", err.Error()) } } func main() { dlpDemo() }Copy the code

The third step, compile and run, will get the following effect.

% go run main.go DLP v1.2.15 Demo: 1. Deidentify(inStr: My email is [email protected], 18612341234 is my phone number. Where do you live? I live at 43 West Beisanhuan Road, Haidian District, Beijing, MAC address: 06-06-06-AA-BB-CC 13900000000 ) outStr: My email is a * * * @ * * * * * * * *, * * * * * * 186 is my phone Where do you live? I live in haidian district of Beijing north third ring road DianQu north third ring road no. * * * *, the MAC address of the 06-06-06 - * * - * * - * * recipient: a * * * * * * phone number: 00 139 * * * * * *Copy the code

The follow-up plan

GoDLP will be maintained and updated for a long time, and you are welcome to communicate with us.

Thank you and exchange

Since the inception of GoDLP project, we cannot leave the hard work of the development students, here I would like to extend our sincere thanks to the bytedance students who wrote the code for GoDLP.

Welcome to join the official Byte GoDLP communication group via GitHub for follow-up communication and feedback.