Fan Yang, a programmer who wants to write a staff with code.

background

I am currently working on insurance-related projects. Due to the regulatory requirements of the Health Insurance Administration, users must be able to purchase insurance retrospectively. In this way, when disputes arise between users and insurance companies, there can be traces to follow. For example, the user said that he and his wife were insured at that time, but there was only one order in the background of the insurance company. At this time, if only the background data were shown to the user, the user would not be convinced.

The best means records the specific operation process that the user casts insurance namely to make video, when producing dispute, it is proof with video directly, such ability is more persuasive.

DOM snapshot

When we want to see the state of the user’s page at a certain point in the insurance process, we simply record the DOM structure of the page at that point, as well as the CSS styles in the page, and then re-render it in the browser to achieve the backtracking effect.

const cloneDoc = document.documentElement.cloneNode(true); // Record document.replacechild (cloneDoc, document.documentElement); / / playbackCopy the code

This enables a point-in-time DOM snapshot. But the recorded cloneDoc is still just an object in memory and does not implement remote recording.

serialization

To achieve remote recording, we need to serialize the cloneDoc object into a string, save it to the server, then pull it from the server during playback, and give it to the browser to re-render.

const serializer = new XMLSerializer(); / / XMLSerializer is the API browser, can be a dom object serialization into a string const STR = serializer. SerializeToString (cloneDoc); document.documentElement.innerHTML = str;Copy the code

At this point, we have completed the remote recording of a point in time in the user interface.

Regular snapshot

But our goal is to record video, and a DOM snapshot is obviously not enough. Those of you who know about animation should know that animation is produced by playing at least 24 frames per second in sequence. Here by the way, when our eyes observe an object, the picture will stay in our retina for about 16.7ms time, the professional term is called visual stay, so specific to our feeling is that the picture is “gradually” disappear.

So when we play the animation, when the first frame has just disappeared in our retina, put the second frame out, then give people the impression that the picture is continuous, is moving. But the action of the characters in the animation still feels a little slow, a little unnatural, why? Let’s do the math: 1 second /24 frames = 41.7 milliseconds, far below the 16ms interval that the human eye can distinguish, so we’ll feel a bit kacky.

In order to achieve smoother graphics, many games and movies use the speed of 60 frames per second to project graphics. Because 1 second /60 frames = 16.7ms, which is about the same time as human vision, the graphics will feel smooth. If you look at your computer screen, the average refresh rate is also 60 frames.

Anyway, let’s get back to business. From the above knowledge, we know that if we want to record video, we need at least 24 frames of data per second, that is, 1000ms/24 frames = 41.7 milliseconds to clone the content of the web page.

setInterval(() => { const cloneDoc = document.documentElement.cloneNode(true) const str = serializer.serializeToString(cloneDoc); axios.post(address,str); // Save to server}, 41.7)Copy the code

Now we can get the picture moving, but on closer inspection it doesn’t work, for several reasons:

  • Clone the entire page content 24 times per second, which has a huge performance loss and seriously affects the user experience
  • With 24 frames per second of page content being uploaded to the server, the network overhead is also huge
  • On playback, 24 complete HTML pieces are rendered per second, which is far too fast for a browser to do
  • Also, if the page does not change, then 24 frames of data may be exactly the same, there is no need to clone this multiple times.

Increment snapshot

Due to the disadvantages of the above periodic snapshot, we can only clone the entire page content once after the page initialization is completed. When the page changes, only the changed part is recorded. In this way, the benefits are obvious:

  • Only the changes are recorded, which is much smaller than the entire page. So the performance of the web page, the overhead of the network will be much smaller.
  • We only record pages when there are changes, which solves the problem of a lot of duplicate data.
  • When playing back, we only need to render the first frame (the full page content) first, and then render the changes to the page in the order of recorded time. This allows you to track back the user’s actions like a video.

For example, as shown in the figure above, there are four divs on the page. The page changed twice, the first time dom2 changed to red and the second time dom4 changed to green. So our data looks something like this:

Var events = [{full HTML content}, {id: 'dom2', type: '# FFF -> red'}, {id: 'dom4', type: '# FFF -> green'}]Copy the code

The recorded data is an array of three raw elements. The first element is the full HTML content. The second element describes dom2 turning red and the third element describes DOM4 turning green. Based on the data from the appeal record, we can first render events[0], then execute Events [1] to turn Dom2 red and Dom4 green. In this way, we theoretically complete a complete functional loop from the recording of the page, to the remote server, and finally to playback.

MutationObserver

In the previous step, we theoretically implemented the ability to record and play back. But what about the implementation? How do we know when the page changes? What has changed? In fact, browsers already provide us with a very powerful API called MutationObserver. It returns updates to the DOM in a batch. Using the same example above, change the background color of Dom2 and Dom4

setTimeout(() => { let dom2 = document.getElementById("dom2"); dom2.style.background = "red"; let dom4 = document.getElementById("dom4"); dom4.style.background = "green"; }, 5000); const callback = function (mutationsList, Observer) {for (const mutation of mutationsList) {if (mutation.type === "childList") {console.log(" add or delete child elements."); } else if (mutation.type === "attributes") {console.log(" attributes changed "); }}}; document.addEventListener("DOMContentLoaded", function () { const observer = new MutationObserver(callback); observer.observe(document.body, { attributes: true, childList: true, subtree: true, }); });Copy the code

The resulting callback data looks like this

As you can see, the MutationObserver only records the dom element that changed (target), and the type that changed (Type). In this way, we can use MutationObserver to implement the idea of incremental snapshots.

Interactive element

With MutationObserver we can record additions, deletions, and attribute changes to elements, but it does not track input to interactive elements such as INPUT, Textarea, or SELECT. For such interactive elements, we need to record the input process by listening for input and change, thus solving the scenario where the user enters manually. However, some elements are set directly by the program, so the input and change events are not triggered. In this case, we can just hijack the setter of the corresponding property.

const input = document.getElementById("input"); Object.defineproperty (input, "value", {get: function () {console.log(" get input value"); }, set: function (val) {console.log("input value updated "); }}); input.value = 123;Copy the code

Above is the browser recording and playback of the general idea, but also open source tool RRWeb (Record Replay Web) the core idea. Of course, RRWeb also records the mouse’s moving track, the size of the browser window, the sandbox environment, time calibration and so on, which will not be described here. Interested students can consult the introduction of RRWeb official website by themselves.

rrweb

The above section mainly introduces rrWeb recording and playback core ideas, here is a brief introduction to its use. See rrWeb User guide for more gestures. Introduced through NPM

npm install --save rrweb
Copy the code

The recording

Const events = [] let stopFn = rrweb.record({emit(event) {if (events.length > 100) {// Stop recording stopFn() when events.length > 100; // Serialize events to a string and keep it to the server}},});Copy the code

The playback

const events = []; Const replayer = new rrWeb.replayer (events); replayer.play();Copy the code

Static resource aging problem

Here’s a recording I tookIt can be seen that there is an outer linked picture in the recorded data, which means that we need to rely on this picture when we use the recorded data for playback. But as the project iterates, the image is likely to be gone, and when we play it back, the image on the page will not load. It’s not just images, it’s linked CSS, font files, etc. Going back to the insurance scenario mentioned at the beginning of this article, the insurance information is on a poster on the website. The customer might say, “I saw the insurance for 1.5 million, but now it’s 1 million.” “How do you prove that the poster said a million?

JSON turn video

So the safest solution is to convert rrWeb’s raw data into video, so that no matter how much the site changes, how many iterations, the video will not be affected. What I did was to run a headless browser on the server using Puppeteer, play back the recorded data in the headless browser, then capture a certain number of images per second, and finally synthesize the video using FFMPEG. Here is a general flow chartFrame rateI’m looking at 50 frames per second, which means you have to take a picture every 20ms.Capture the timingThere is a pit here. It takes about 300ms for puppeteer to take a screenshot. Assuming that the page is playing back and we use setInterval to take a screenshot every 20ms, there is a difference of about 300ms between the two screenshot actions. In the second frame, we want to capture the data at 20ms of the video, but the playback page has already played at 320ms.On TVIn order to solve the impact caused by time-consuming screenshots, BEFORE each screenshot, I will play back the video and pause to the corresponding time point, so that we can capture the image we want.

updateCanvas () { if (this.imgIndex * 20 >= this.timeLength) { this.stopCut(); // Calculate how many frames need to be cut in the whole video in advance, and end when they are full return; Screenshot ({type: 'PNG ', encoding: 'binary', }).then(buffer => {this.readable.push (buffer) // Save the screenshot data to the readAble stream this.page. Evaluate ((data) => { window.chromePlayer.pause(data * 20); }, this.imgIndex) this.updatecanvas (this.imgIndex++)})}Copy the code

The output video

StopCut () {this.readable.push (null) // After the screenshot is finished, you need to give the readAble stream a NULL, indicating that there is no data left. This.ffmpeg.videocodec ('mpeg4') // Video format, Here I output mp4.Videobitrate (' 1000K ') // Video usage per second, which is a key indicator of video sharpness. InputFPS (50) // Frame rate, which is a key indicator of video fluency, Needs and screenshots is consistent with the number of per second. On (' end '() = > {the console. The log (' \ n video conversion success')}), on (" error ", (e) = > {the console. The log (' error happend: '+ e)}), save ('. / res. Mp4') / / output video}Copy the code

conclusion

Due to the performance problem of puppeteer screenshots, it takes 15 seconds to transfer rrWeb video in 1 second, which is far from enough performance. If you have any good ideas, you are welcome to join this project to achieve more stable, efficient and powerful RRWeb video transfer tool. Here is the source address