Three a: online rich text editor architecture design and practice

On December 5th, InfoQ held GMTC conference in Shenzhen, ant Group language editor technology student 3 a was invited to attend the conference and share “Architecture Design and Practice of Online Rich Text Editor”, the following content is collected and compiled according to the on-site speech.

Good afternoon, everyone. My name is Han Cong. Now I am in charge of the research and development of the document editor of The Finch in the Finch team of The Ant Group.

Today, what I want to share with you is the architecture design and practice of the rich text editor.

The finch editor family

First, let’s get to know the whisperfinch’s family of editors. So far we have produced seven different types of editors. The first is the document editor, which is built on traditional DOM technology, and the second is the directory editor, which is also built on DOM technology. The third is the worksheet, which is built on Canvas. The fourth and fifth are graph-type editors built on SVG technology. The sixth is powerpoint, also SVG technology.

Get to know the Wordsmith document editor

Today, WE share with you our oldest – the document editor. Take a brief look at the interface of this document editor. It is a very classic layout. At the top is our toolbar area, where we list some common high-frequency functions and place them for everyone to use. On the right side is our function extension panel area, which is the permanent outline of our document. According to different user operations, there will also be other function panels, such as picture setting panel and attachment download control panel. In the middle is the editing area of the document, the core area of our editor’s work, where most of the code in the editor deals with user-generated interactions.

This is an effect after the text editor has been filled in.

How a rich text editor works

What’s behind a rich text editor like this? In fact, from my point of view, I think it is good to be clear about these two problems. The first problem is how to render rich text in the browser. The second question is: how do we edit rich text in the browser

How do I render rich text on a browser?

First of all, we need to make clear what is rich text. Traditionally, rich text is proposed relative to the concept of plain text. Simply put, text with rich formatting. Back to the question, how do we render this content in the browser? That’s what the browser’s content rendering technology is all about. Browsers provide us with three content rendering technologies: SVG, Canvas, and HTML + CSS.

Which of these three techniques should we choose to render our rich text? My answer is HTML + CSS, why? It’s simple enough, and it’s easy to extend. In general, HTML + CSS is the simplest of the three techniques and requires the least amount of code to achieve the same UI effect.

How to edit rich text on the browser?

Then, let’s look at the second question, how to edit rich text? Figure this out, and basically the magic veil of the editor is lifted. For most of today’s editors, the answer is contenteditable.

Contenteditable is an HTML property that makes a DOM element editable. This capability is ideal for building our rich text editor. All we need to do is go to our editor, put the root node of our editor, hang this property, and turn on the edit state. The browser also takes care of basic functions such as selection and cursor movement when an element becomes editable.

Here, we have answered the two questions clearly, in fact, the whole editor, for our front-end students, there is no too big technical barriers. All that remains is to implement the editor’s functionality step by step. This section is a brief explanation of how the rich text editor works.

Sparrow document editor

The next link we will start to enter the language sparrow document editor, to understand the language sparrow document editor behind the architecture is how to design, how to achieve?

The evolution of the Text editor

First let’s take a look at the development of the Sparrow editor. It has been around six years since its birth, with four generations of editor upgrades.

The first generation editor in 2016, which is a Markdown editor, is not yet a rich book editor. We did secondary development based on CodeMirror. At this time, we mainly serve our internal engineer classmates.

In 2017, we entered the era of rich text editors. The second generation editor is based on slate.js for secondary development.

In 2018, our third generation editor came online. This generation of editors is our own and works on the basis of the Contenteditable I mentioned earlier. The third-generation editor was our longest-running online editor so far, having been online for nearly three years before being replaced by our fourth-generation editor in April. The fourth-generation editor bottom layer technology is also Contenteditable, but it has been redesigned based on microkernel ideas.

Today we are going to focus on the fourth generation of editors, and then we will mention the third generation of editors in passing. We’re not going to talk about the first and second generations, because it’s so long ago.

Third generation document editor

Third generation document editor architecture

Let me show you the third generation editor. This is an architecture of the third generation editor, which consists of two main parts. The first part is responsible for creating and managing the UI. Some of these are typically things like our sidebar toolbar. Then the second part is an editing Engine called Engine. This is where all the rich text editing is done, and it consists of a kernel called Core and a series of plug-ins. By working with our kernel in this way, we have achieved the rich text editing functionality that is at the heart of the entire editor. This is a framework for third-generation editors.

Document initialization process

This is followed by a document initialization process for the third-generation editor. The whole process is very simple. When our editor receives the initialization request, it parses the content, converts it into our DOM tree, and then transforms the DOM tree. The purpose of transformation is to normalize some of the same semantic or same tags into the same tag, which is used to simplify our subsequent algorithm implementation, so that they can focus on as few nodes as possible.

And then after we plan it, we’re going to hand it over to our Schema for filtering. A Schema does two things: weed out illegal nodes and attributes. After Schema filtering, we get a more pure DOM tree. Every node and attribute in the DOM tree can be understood and recognized by our editor. After serializing such a pattern, we generate HTML and submit it to the editor for rendering at once, which can complete the initialization process in the whole document.

Third generation document editor features

One of the big features of our third-generation editor is that it’s DOM-centric. All functionality was developed with the sole purpose of rendering this effect on DOM nodes, very simple and straightforward. But it can be difficult to maintain.

A new generation of document editors

So we started the development of the fourth generation editor. We did some small internal discussions and settled on a design goal. This design goal was based on some of the lessons learned in the third generation editor. First of all, our first goal is to keep data and view separate, and our second goal is to keep our data structures strictly controlled.

Next, let’s look at the architecture of this fourth generation editor. Today’s editor is a typical three-tier architecture, with each layer having its own very specific responsibilities. At the bottom is our kernel layer, which is responsible for creating an abstract document data structure for the entire editor and controlling the reading and writing of the document structure. The second layer is the Engine layer, whose core purpose is to present documents to the user. The third layer is our Editor layer, whose goal is to provide the user interface.

Editor architecture

Start with the kernel layer, which contains two main modules: the IO module and the Model module. IO to control the data interaction and flow between the editor and the outside world. The Model module is responsible for creating the document model that defines a standard document change process. The implementation of this layer not only runs on the browser, but also runs on the server side of the speaker to manipulate data.

The second layer is the Engine layer, which contains two modules: the first module is the View module, which calculates a node tree suitable for rendering in the browser based on the data maintained in the kernel; The node tree is then rendered to the browser by the second module, renderer.

The third layer is the Editor layer, which has only one module and does a very lightweight job of creating the editor’s main DOM nodes, which are then provided to plug-ins with UI requirements. The toolbar, for example, mounts the toolbar UI components to the nodes created by the Editor and presents them to the user.

Data change process

In the new generation of rich text editors, we have strict control over the flow of data changes. Whenever a change is made, regardless of the cause, such as initialization or user interaction, the change must first be committed to the kernel. The view module of the rendering layer will not be pushed until the kernel confirms it. After the calculation, the renderer module is pushed to do the actual rendering. This data change process is one that all plug-ins must follow.

In this generation, each plug-in is divided into three parts, and the editor plug-in decides which layer to include based on its actual functional needs. Up to now, the number of plug-ins in our self-developed editor project has reached 103.

Let’s take a look at the data types supported by the fourth-generation editor. The first two are standard data formats, plain text and HTML respectively. These two data formats are supported by all rich text editors (not just finches, but even some code editors, etc.) because they are the two data types we are most relevant to the clipboard.

The third data format is the internal data format of our new editor, called inode. The fourth is the Lake data format, which is the internal data format of the third generation editor.

IO subsystem

Next, look at the IO subsystem. Let’s use an HTML read/write example to give you an idea of our IO subsystem. In the editor there is a plugin called HTMLDataSource that registers data types with the kernel to tell our IO module that there is a data format called HTML.

The other two plug-ins are HTMLReader and HTMLWriter. Through the registration of these three plug-ins, we have completed the entire editing, can complete the HTML format data read and write. But this is not enough. HTMLReader and HTMLWriter are also framework plug-ins. They can only recognize the syntax of HTML, but do not understand the semantics of HTML content. In order for htmlreaders and htmlwriters to correctly recognize the content in HTML data, it also needs the support of some functional plug-ins.

For example, if I need to read or write an HTML containing an H1 tag, I need the Heading plug-in to provide the h1 tag conversion. If I needed to write a Bold property, I would need a plug-in like Bold to convert the Bold property. We worked through this layer upon layer of plug-ins to build a very flexible IO subsystem for our new editor. This subsystem can fully meet our current requirements for all formats of data read and write management.

Schema, the guardian of document structure

Let’s look at the Schema subcomponent, which is small in itself, but is responsible for protecting the document’s data structure.

The Command interface

Then there is the Command in the editor, which is implemented in Command mode in many editors. Commend is an implementation vector for specific editor features. All effects, including user input, cursor control, and font size changes, are done in Command.

For a fourth-generation editor, all Command modification data goes to the kernel, through the Editing component.

The following is the definition of the Command interface. A Command defines three constants that represent the state of the Command. ● The first state, which indicates that Command is unavailable at the current location; ● The second state indicates that Command has already been executed at the current position; ● The third state, which indicates that Command has not been executed at the current position;

Finally, to wrap up our architecture analysis, let’s look at the document initialization process.

The initialization request is sent to the kernel first. When the kernel receives the initialization request, it relies on the IO subsystem to parse the data once. The output from the IO subsystem is a tree of nodes in inode format. The tree of nodes is ultimately handed over to the Editing component in the Model module. Editing defines the entire document data Editing process, which creates a Job that then adds each node in the tree to the document tree in our kernel.

It generates a corresponding operation for each node that is suspended. This process also does the Schema validation we just mentioned. When all nodes have been mounted, the entire operation is committed and a ContentChange event is fired. This event will carry all the operation list in our change and submit it to the upper Engine layer. The View module in Engine layer will listen to this event, get the corresponding operation list after the event occurs, calculate the operation list once, and convert it into the node change. The node changes are then pushed to the Render module. The Render module manipulates the actual DOM nodes based on node changes, reflecting the changes to the browser.

This completes our entire document initialization process. Under this architecture, the rendering flow caused by user action and initialization is roughly the same, with the only difference being the trigger point. The initialization trigger is handled by the IO subsystem, changes caused by user operations are triggered by Command, and other than that the subsequent flow is exactly the same.

The future goals

The last part is the future goal of the document editor. First, we will deal with editing performance issues, such as typing lag, large document processing and so on. Second, we will make rich text editing ability into primitive ability, output to other editors.

That’s the end of my sharing. Thank you.

Ali people are here to deposit knowledge