Architecture design of a new version of Youdao Cloud Note Editor (Part 1)

In the development of youdao cloud notes in the process of the new editor, we encountered a lot of practical problems, more and more feel that this is a very deep front end technology, technology selection, so we will the new editor architecture and share part of the implementation details to you, hope to develop the rich text editor, do complex systems architecture design has a certain reference significance.

By Jin Xin

Edit/Ryan

Source/Youdao Technology Team

1. Rich text editor background

1.1 What is an editor

In the field of front-end development, editor refers to functional modules that can provide users with the ability to edit plain text, rich text, code, multimedia content, etc. For example, take cloud notes as an example. Editor refers to the green area in the following figure.Editor generally consists of editing area, cursor, toolbar, right-click menu and other functional modules, which generally include editing text, setting text style, setting paragraph style, inserting multimedia content, undoing redo, copying, cutting and pasting and other functions.

1.2 Brief history of editor development

Editors can be traced back to typewriters. Here is a common typewriter.

We can compare the construction of the typewriter to the construction of the editor. The paper of the typewriter corresponds to the editing of the editor, the cursor of the typewriter corresponds to the cursor of the editor, and even the performance of typing on the keyboard. The editor is in the same line as the typewriter:

When typing a letter, enter the character after the cursor.
When the space bar is struck, a space is inserted after the cursor;
When the backspace key is struck, the character before the cursor is deleted;
When the newline key is struck, the cursor moves to the start of the next line;

In computers, the earliest text editors appeared, such as VI, Vim and Emacs, which are commonly used in Linux systems. They can edit plain text data and introduce core functions of editors such as undo redo, copy, cut and paste, find and replace, etc.

With the rise of user graphical interfaces, text editing is not only satisfied with plain text, but also needs to add various formatting and typesetting information to paragraphs of text.

At the same time, there is a growing demand for richer formats, such as images, graphics, and tables, to be inserted into documents. To meet these needs, rich text editors have emerged, the epitomized of which are Microsoft Word and Kingsoft’s WPS.

Word and WPS were the ultimate rich text editors for desktop clients, and are still powerful rich text editors.

But they were designed to be stand-alone word processors, so they ran into problems that didn’t support Internet audio and video formats, relied on local computer file systems for storage and backup, and relied on multiple people collaborating to copy files.

In today’s world, people do not need the complex interactive and powerful functions provided by Word. Instead, they need lightweight rich text editors that support more Internet data formats, facilitate storage and backup, and provide multi-person collaborative editing functions.

The browser-based rich text editor is produced under such design ideas, among which the representative products are Google Docs, Youdao Cloud Note, Evernote, graphite document, etc.

These browser-based rich text editors have the following features:

Using Web technology development, need to be used in the browser environment;
The function is more simple than Word, only the most commonly used rich text editing function is retained;
Support pictures, attachments, videos, audio, maps and other Internet resources;
The documents can be backed up in the web disk to achieve multi-terminal synchronization;
The document can be shared and viewed, and can be edited by multiple people in real time.

Browser-based rich text editors, of course, have also gone through several rounds of technical iteration and innovation to reach today’s situation.

1.3 Four elements of a browser-based rich text editor

In the modern browser framework, using Web technology to develop a rich text editor, generally using the classic MVC model, according to the data model rendering view, view operations through the controller to modify the data model. Specifically, the following four problems should be solved:

Model:

The model consists of an in-memory model and a storage model. A storage model is a model for data storage, synchronization, and backup. Factors such as bandwidth, storage volume, model serialization efficiency, and model correctness verification efficiency should be considered. The in-memory model is the model used for data rendering. The structure of the in-memory model is generally more complex than that of the storage model. Other attributes needed for rendering are added to the storage model.

Render:

Rendering refers to how an in-memory model is rendered into a Web page. All browser-based rich text editors render the in-memory model as an HTML page. But they have slightly different typography strategies, with most editors using HTML and CSS-BASED typography, and a few implementing their own typography engines, such as Google Docs.

Editor:

Edit refers to how to provide an edit area for the user to edit the document in the edit area, and how to perceive the edit action in the user edit area to notify the controller to modify the data model. The browser provides properties for contentEditable to turn elements into editable, and most editors edit in this way, and they can intercept events of contentEditable elements and notify the controller of them. A few editors implement their own edit area and event system, such as Google Docs.

Instructions:

It refers to that the controller generates corresponding instructions to modify the memory model according to the editing actions of the editing area received, so that the memory model can be updated to complete the cycle. This part is related to the data model. If the data model is HTML, the editor can use execCommand to modify the HTML data directly. If the data model is custom, the editor can listen or intercept events in the edit area, and infer the intent to generate instructions to modify the data model. Through the instruction to modify data, can more conveniently realize the undo redo, historical version recovery, collaborative editing and other functions.

1.4 Evolution of browser-based rich text editor technology

Based on the above four questions, browser-based rich text editors can be divided into four generations:

The first generation:

The design is completely based on the browser API. The data model adopts HTML data directly, rendering uses native HTML, editing areas are generated by contentEditable, and the instructions of modifying HTML data in the browser are executed by execCommand.

This type of editor usually appears in various “XX lines of Code to teach you how to implement a rich text editor” blogs. There are few mature open source editors or commercial editors that adopt this design approach. Their main problem lies in the execCommand interface:

Only a limited number of commands are provided, such as execCommand, and there is no way to support inserting to-do lists. Some of the provided commands have limited functionality. For example, the ‘fontSize’ command supports only 1 to 7, which makes it impossible to customize the fontSize. The result is browser-dependent, for example, the ‘bold’ command adds a B tag to the selected text on some browsers, and a strong tag on others.

The second generation:

Due to the limitation of execCommand function, second-generation editors generally abandon the method of modifying HTML documents directly with execCommand interface of browser, but modify HTML documents with execCommand and instructions implemented by themselves. This allows for more flexibility and versatility.

The main problem with this type of editor is that different HTML structures can mean the same thing. For example, the following two lines of HTML represent both bold and italic text, but their HTML structures are different, making it difficult to compare the same data.

The third generation:

In response to the inconsistencies in THE meaning of HTML, the third generation of editors abandoned the strategy of using BOTH HTML for document model and HTML for rendering, and instead adopted a custom data model, such as XML data model or JSON data model. In the same way that the data model renders the generated HTML, custom actions can ensure that the same actions modify the document model as well.

At present, common editor products such as Youdao Cloud note, graphite document, etc., and open source editor libraries such as Slate, Draft, Quill, etc., are all third-generation editors, which can meet most application scenarios. However, since the editable area in the rendered page is still based on contentEditable, user behavior needs to be judged according to the intercepted events and corresponding instructions need to be generated to modify the data model. Any time user data is not intercepted, or the processing is wrong, the user’s behavior may directly modify elements of contentEditable, causing inconsistency between the data and the view. Therefore, it is difficult to locate and fix the bugs, which often appear in the mobile adaptation of the editor.

The fourth generation:

In order to solve the uncontrollable events caused by contentEditable, the fourth generation of editors represented by Google Docs completely abandoned contentEditable and implemented their own typesetting engine. The typography engine controls the page and layout of the document, rendering the data into HTML on the page. At the same time, because of the abandonment of contentEditable, technical problems such as cursor and selection drawing and monitoring of text input events need to be solved, so as to achieve a browser-like editing experience.

Compared with the third generation, the fourth generation editor has the advantage of completely solving the bugs caused by contentEditable and has better scalability. The trade-off is more difficult to develop, a less-than-native experience, and possible performance problems. Currently, only Google Docs and others have adopted this architecture to develop editors.

2. Technical selection of the new edition editor of Cloud Notes

Based on the four elements of implementing a rich text editor described in the previous section, we summarize the technology selection of the four generations of editors, as shown in the following table:

The new version of Youdao Cloud Note editor integrates the scalability and implementation difficulty of the project, and makes the following technical selection:

Model: The custom JSON data format serves as the memory model, and its compressed version serves as the storage model;

Render: Render the view using the React frame with browser typography;

Edit: does not rely on contenteditable, intercepts browser events to judge user interactions, implements the cursor and selection itself;

Instructions: rich custom rich text editing instructions are implemented, and execCommand execution instructions are re-implemented.

2.1 model

The new version of youdao Cloud Note editor uses a custom JSON data format as the memory model. The storage model corresponds to the memory model and is a compressed version of the memory model, which can reduce the errors in the process of data serialization and deserialization.

Document model:

The in-memory model of the new version of the editor is a document, paragraph and text model. The top object is a document (yellow). A document contains multiple paragraphs (blue), and each paragraph has at least one text (red).

For a three-tier document model, we can naturally think of using a tree structure to represent it, as shown in the figure below:

Since the JSON format naturally represents nested tree structures, our three-tier document model can be represented as the following JSON structure:

Rich Text says:

For a rich text editor’s data model, you need to consider the inline and paragraph styles of the text:

Inline styles are styles applied to text. Each text may have a different inline style, such as bold, italic, text color, background color, font, size, and so on

Paragraph styles are used with paragraphs. There is only one paragraph style for the entire text, such as alignment, line height, paragraph indentation, and so on.

Since the paragraph in our three-layer document model is a separate layer with corresponding paragraph nodes, we only need to add the field representing the paragraph style on the paragraph node. We added the data field in Paragraph and added the style attribute to represent the paragraph style of the paragraph, as shown in the figure below:

If we want to represent inline styles, the current text node with only one content field is not sufficient. We need to split it into one character and add a field representing inline styles to each character. For example, we can use a chars array where each element represents one character. The text field represents the content of the character, the marks array represents the inline style on the character, and the figure below represents a rich text.

Leaf node:

In the above expression method of rich text inline style, we can see that rich style bold, italic, red background color is saved on the four characters R, I, C, H, there is redundant data. We defined the merge rules by referring to the HTML rendering results and the implementation of some open source editors.

If successive character nodes have exactly the same inline style, they can be merged into a single leaf node.

For example, in the example above, r, I, c, h are four consecutive characters with exactly the same inline style, and they can be merged into a single leaf node. Similarly, “a” and “text” can be merged into leaf nodes respectively, so we can simplify text nodes as:

To sum up, text nodes that support inline style after simplification are also a tree structure, which contains one or more leaf nodes. Each leaf node contains the text content and the common inline style of these contents, and the inline style of adjacent leaf nodes must be not completely consistent, as shown in the figure below:

2.2 apply colours to a drawing

Cloud notes the new editor still using the browser to typesetting, not since the layout engine, like Google Docs because we think the browser layout engine has strong enough, basic meet the demand of the daily text editing, only such as image text surrounded, paging, more advanced functions such as columns cannot be achieved, The self-developed typesetting engine requires a lot of development and testing effort, and may cause performance problems, so for the time being, we will use the browser for typesetting and rendering.

We used the React framework to render the data model in a componentized way. For the three-layer data model of document-paragraph-text and the leaf model of text node, we designed the React component for nested rendering, as shown in the figure below:

The Document component renders Document data.
The Paragraph component renders Paragraph data, which is a child of the Document component.
The Text component renders Text data and is a child of the Paragraph component.
The Leaf component renders the Leaf node, which is a child of the Text component.

For the text fragment and inline style contained in the leaf node, it only needs to be rendered as a label with style attribute, which also confirms the rich text model we designed, simplifying the rendering logic of rich text and making the rich text rendering code very lightweight.

And that’s all for this issue. Next Wednesday, we will push “Editor Architecture Design of New Version of Youdao Cloud Note (Part ii)”, continue to share the content about editing and instructions, and further explain the layered architecture of the new editor.

Please pay attention to Youdao Technical team.

– END –