An opportunity

Recently, a student wrote a text copy oil monkey script baidu Wenku failed again. I took this opportunity to re-analyze baidu Wenku’s anti-text copy strategy to see how Baidu can achieve this requirement at a low cost.

Take a look at baidu Wenku’s anti-text copying strategy

1. Look at the text rendering

Open Baidu Library and randomly select a document. Through reviewing elements, you can see that each document is rendered to canvas. Since Canvas opens the interface fillText1 to draw text directly, it is very easy to draw text information directly on canvas. However, the text drawn in canvas is an image that cannot be selected, so the problem of text copying is fundamentally solved.

Digression: The previous version of Baidu Library used a double canvas, through the original information back to the canvas rendered text information, in the upper canvas to monitor the mouse events, to simulate the effect of text selection and copy, but this version to prevent the oil monkey script cheating and cancelled (speculate).

2. Look at the data source

Then, where does the data source used to draw canvas come from? It is always impossible for the server side to direct it. If the server side does, it is most likely to be a picture or SVG, which increases the server cost and reduces the user experience (clarity and volume). Furthermore, we can locate some suspicious packets through the console network panel.

These lengthy Json documents are the drawing information for the Canvas, and we will only analyze the key fields. Among them, field C is the unicode encoded text, field P is the width, height and position information of the text, and the remaining fields may be related to the text style and document format. Therefore, complete canvas drawing information can be obtained by parsing the Json data obtained by the server request.

3. Get the overall process roughly

Based on the analysis of the first two steps, we can roughly guess the front-end overall process of baidu Library to realize anti-text copying. These processes can be found in the front-end JS file, and the specific implementation details are not discussed here.

4. Existing problems

Because Json data is completely clear text, malicious scripts or plug-ins can parse the Json data themselves to obtain the complete document data at a relatively low cost. On the other hand, if you want to encrypt Json data and then transfer it, if the decryption function is written in JS, even if the JS script is obfuscated, cracking its decryption function is a very low-cost work.

Implement a Baidu Library “Plus version”

Based on the above analysis, we can summarize our requirements as follows: we want to encrypt the original data while preventing text copy, and increase the cracking cost of key decryption functions.

1. Technical selection

Use js written script file, no matter how confused by encryption final would be a made the fault debugging script file, so want to locate to the key function is relatively easy one thing, and through dynamic debugging can also quickly with the decryption process, so we can’t put the decryption function and decrypted plaintext data in js file and stack.

For this reason, we chose WebAssembly technology 2 (WASM) as the main body of the solution. Because WASM is a binary format, it is not readable, and cannot be debugged in a cut-down way, it is expensive to restore key functions through static analysis. There are many ways to write WASM. Inspired by the article RUST is the Future of JavaScript Infrastructure, I chose to use RUST as the wASM development language and RUST already has a fairly complete toolchain for WASM.

2. Overall process

Below, we present the overall process based on WASM transformation, and put the key steps into the WASM layer, including dom related operations. In particular, we are considering the division of user rights, VIP users can enjoy the function of directly copying text.

3. Analysis of key technical points

First, we define the original information by referring to Baidu Library, in which cipher refers to ciphertext, position refers to positioning information, and font_style refers to text style.

const info = [
  {
    cipher: "\u{1d}\u{14}\u{14}".position: { x: 30.y: 30 },
    font_style: { size: 16}, {},cipher: "\u{19}\u{1a}\t".position: { x: 50.y: 50 },
    font_style: { size: 30}},];Copy the code

Defines the INTERFACE between the JS layer and the WASM layer as encrypt_canvas, passing in the required encryption information and user identity token for rendering.

encrypt_canvas({ render_info: info, user_token: "1234567" });
Copy the code

In the WASM layer, we call decryption functions to get the plaintext information and manipulate the DOM via web-sys. We ended up implementing three types of views based on the tags Canvas, Image, and div.

Among them, Canvas calls the drawing instruction, image calls canavas.to_data_URL () method 5, and div uses absolute positioning to complete the layout. The source code of the specific implementation is in the appendix, and the implementation details are not discussed here.

4. Effect display

Talking about my understanding of front-end safety

It’s impossible to achieve absolute security on the front end, because all the data and information needed to run it has already been downloaded to the front end, and it’s only a matter of time before you reverse it. However, we can tip the balance by dramatically increasing the cost of the reverse by adding a small counter-reverse cost.

It reminds me that when you reverse an Android application, the bytecode generated by the Java virtual machine can be easily reversed into a more readable language like Smali using tools, and the code can be dynamically debugged through things like peeing. In order to improve the security of the application, the developer encapsulates the key functions in the So library. At this time, hacker can dynamically debug the assembly code composed of ARM instruction set by attaching and other forms. At this time, the readability of the code is very poor, and the reverse cost increases greatly. If the So library is encrypted and protected by some forms of obfuscation shell, the reverse threshold will be further raised.

In my experience, I think Java and SO libraries are like JS and WASM in terms of security alone (although WASM probably wasn’t created to prepare for front-end security), and I haven’t found a way to debug WASM dynamically yet, If the wASM format can also be obfuscated shell operations, then its security will be further guaranteed.

The appendix

  1. Specific implementation source github.com/ascodelife/…

Refer to the article

  1. Canvas fillText interface – developer.mozilla.org/zh-CN/docs/…
  2. WebAssembly Introduction – developer.mozilla.org/zh-CN/docs/…
  3. RUST is the future of JavaScript Infrastructure – juejin.cn/post/703099…
  4. Rustwasm.github. IO/wASm-bindge…
  5. Canvas toDataURL function – developer.mozilla.org/zh-CN/docs/…