The introduction

About six years ago, the author entered the front end for the first time and became a front end intern in his senior year. The front-end frameworks like React and Vue are taking over jQuery’s territory. With the front-end engineering, Gulp/Grunt has yet to gain a foothold and is facing the flood of Webpack. The rise of the concept of the big front end never shy of expressing its ambition to unify all front-end devices. New technologies come and go, bewildering and deeply unsettling. Every new person, both hope to ride the wind and waves in the occasion of big change, indomineering, but also afraid of being involved in turbulence, stranded beach.

At that time, the mainstream view of front-end planning: “skilled entry in 3 years, and then choose the direction of segmentation, in-depth consolidation, to form their own moat”. What were the subdivisions? After several times of searching, under the guidance of zhihu leaders, the summary is: kernel, rich text, visualization, engineering (of course, now the subdivision field is even a hundred flowers bloom). Maybe the big guys can’t think of that, maybe a few words at random, then turned into the author of six years of time. I worked as an intern for one year and two years after graduation, focusing on graphics. Then I came to Jinshan to work on the development of new Web rich text/typesetting software by chance. After that, I started to work on graphic creation in Toutiao, and I have been working in the field for more than three years. As a matter of fact, three years of time, measured by ten years of subdivided professional field, seems to be insignificant, so the author was in a state of trepidation when writing, for fear of making mistakes in expression and causing embarrassment. But I still want to retain a trace of the world’s great noble spirit, under the guidance of WPS, back to the headlines, two or three mouthfuls of Jasmine tea, master Kong emboldened, heart, I brazenly write this “rich text” introduction article, to provide a bit of experience or ideas for colleagues.

Please correct any errors.

What’s in the editor

From an abstract point of view, the editor can be divided into four layers, namely “data”, “layout”, “render”, and “interaction”.

data

Data, roughly divided into two aspects, one, refers to the source file and memory data structure of mutual transformation, the main steps are: reading, parsing, structuring, and the corresponding inverse operation; Second, it refers to the process of changing the memory data structure when editing behavior occurs, which is the common increase, delete, change and search.

Read the parsing

Although the source file types are different, but all changes are inseparable from its ancestor, read and parse, basically covering the main steps of the compilation principle. Of course, common editors, as an application layer, generally do not involve low-level hardware optimization, and often get the syntax tree, it can represent the end of the parsing.

For example, the HTML parsing steps can be basically the same as the above steps, this link has more specific implementation details:

Memory data structures and their variations

In touch projects, data is often manipulated by class indexes. To the outside eye, rich text looks like a one-dimensional array, with each item representing a formatted node. The advantage of this approach: It’s very intuitive.

// Operation object: bytedance
insertText(0.'Today's headlines,'); // Today's headlines, Bytedance
setBold(1); // Bytedance
setColor(2.'blue'); // Bytedance
Copy the code

But the implementation is not.

For performance reasons, the concept of “range” needs to be introduced to make the data more compact.

[{
    start: 0.end: 100.content: 'bytes... Beating '.style: { / * * * / },
    start: 101.end: 200.content: 'today... The headline '.style: { / * * * /}}]Copy the code

On the other hand, the concept of offsets is introduced because updating one interval does not result in an avalanche of changes to all the intervals when modifying the data.

// For example, start for each interval may refer to the offset from the previous interval

[{
    startOffset: 0.endOffset: 100.content: 'bytes... Beating '.style: { / * * * / },
    startOffset: 0.endOffset: 100.content: 'today... The headline '.style: { / * * * /}}]Copy the code

A common approach is to determine a certain size relationship to build a red-black tree (ideally, a balanced binary tree with O(log(n)) for both additions, deletions, changes and searches) and then optimize for high frequency operations or bottlenecks. This is a technique I learned in the previous section, and we have a similar concept in VS Code’s Text Buffer Reimplementation optimization.

typography

Typesetting, approximately equivalent to the arrangement of data in space.

There are two factors that affect typography: the attributes of the element itself and the typography rules applied. In contrast to the computer-based data layer, typography is more about “business” (i.e., different typography standards).

In the rich text editor, “rich” in the narrow sense refers to multi-format graphics and texts, and in the broad sense refers to all the elements that can be displayed in the front-end device, such as pictures or texts, hyperlinks, videos, files, and even various inline applications.

The element

The front end is most often dealing with images and text. Note, however, that because browsers are cross-platform, the Css styles that text or images can apply are often already abstract or emasculated properties. Fonts, for example, can be described further without Css: FreeType; For pictures, different pictures also have differences (PNG, JPEG). In a few cases, images behave inconsistently on the Web and other ends, and some neutered attribute may be at work.

Typesetting rules

The typography rules are extremely complex. The familiar box model and flexible layout on the front end are different typography rules. Obviously, the same elements behave differently under different typographical rules. Interestingly, inside fonts, there are all sorts of tedious rules. For example, VA takes up different space than V and A because of Kerning pairs. Typesetting rules within the text are often language dependent. Such as involved in Arabic, fonts can occur “deformation” : ب س ب ب and ب س ب ب is the same; When LTR (left to right) and RTL (right to left) text are combined, Bidi bidirectional algorithms are involved.

Unfortunately, I have not been able to implement any of these typography rules in depth, so here are just a few.

Apply colours to a drawing

After “data” is “typesetting”, the process of submitting the location information of specific elements to the front-end device is called “rendering”.

In my work, “rendering” in rich text is much easier than “rendering” in graphics. In rich text rendering, schedulers and layers are emphasized, as well as viewports. This is basically related to some human-computer interaction experience. In the case of performance constraints, the rendering will give priority to the content of important levels within the visual area, and delay the content outside the visual area or not important, that is, the rendering is segmented and staged.

When optimizing a render module, some of the techniques feel familiar. For example, the separation of calculation (non-layout part) and commit (analogous to render and commit in React) helps to preserve the possibility of “space for time” and process time-consuming operations as early as possible, thus implementing strategies such as offline rendering and double buffering to avoid long white screens.

And, of course, some novel concepts like “Font Fallbacks.” Due to copyright and other restrictions, in cases involving certain commercial fonts, the editor will have a fallback strategy to return similar fonts from the list, similar to the font family: sans-serif, ‘Microsoft Yahei’ in Css. Emoji, as a special font, also needs a corresponding rollback due to platform differences.

interaction

The interaction layer, on the other hand, is something most front-end engineers are familiar with. Interaction can be divided into two aspects, one refers to the interaction with the editor, and the other refers to the common sense of front-end components (toolbar, sidebar, etc., less interaction with the editor).

A complete editing process, usually triggered by interactions (such as input, drag and drop, etc.) that modify data, trigger typography, and finally inform the renderer of a render refresh.

For the Web front end, there are familiar steps: JavaScript (calling Api to change data) -> trigger Layout (Style, Layout) -> notification render (Paint, Composite). Because the typography engine is an important part of the browser, there is certainly no shortage of these abstractions.

Here’s another interesting concept: hittests. You can simply think of it as a pure function that takes in screen coordinates (x, y) and returns the data context (such as index, node, format, and so on) that results in the current coordinates. Click through the test, can effectively establish interaction and data, typesetting channels, so as to achieve a variety of custom interaction. (for example: simulate Hover effect: by monitoring mouse movement, retrieve the current mouse coordinates and a text element collision, if collision, that is, consider Hover behavior). This approach breaks through common front-end limitations, such as allowing no hierarchy, and can simulate familiar interactions (such as custom cursors and selections) on other vectors (such as the Canvas on the Web side). A simple point-and-click test front-end implementation: binary lookup of ordered typeset data and sequential traversal of unordered elements (such as floating elements).

When editors meet the Web

“Data” + “layout” + “render” + “interaction” constitute a platform-independent rich text editor implementation.

But the complexity is often daunting. Fortunately, the development of the Web has made it easier to implement rich text editors. Browsers that handle HTML + CSS + JS have largely shielded data, layout, and rendering from implementation difficulties, allowing developers to focus more on “interactive” details.

In the world of Rich text on the Web, there is a relatively general level:

  • Lv0, :TextArea + document.execCommand
  • Lv1:contenteditable + observerable + parser
  • Lv2: Self-arranging (refers to typesetting) self-drawing (refers to rendering)

Lv0-level editors have very limited presentation and compatibility and are used more as form components.

Lv1 level editor (most of the open source rich text is this implementation), listening to the DOM changes, through constraint syntax model, can realize the editing behavior is relatively larger degree unification, but with some non-standard behavior (such as the cursor, district), there are many subtle differences on different browsers or Bug.

Lv0, Lv1 pain points, in the Lv2 level editor can be effectively resolved. However, the Lv2 level editor also has some obvious defects. First, any small function may need to consider the implementation of data, typography, rendering, interaction levels, resulting in high research and development costs; For example, it is difficult to implement a Notion – like product on a typesetting software based on OOXML (DOCX source format).

The level above only represents the difficulty of implementation (or the level of control over details), not the quality of the product, since most products are demand-driven, not technology-driven. The author had the honor to participate in the development of the latter two editors, so I have an immature point of view: “Lv1 > Lv2, a Web rich text editor for the purpose of user experience and development schedule in non-professional business situations.”

The reason is simple: the presentation of the Web side is flexible enough: the layout can cover most commonly used scenarios; And good enough: Modern browsers have been horribly performance squeezed (take a peek at V8’s blog or two), and Web editors that are not single-threaded Js, or slimmed down multithreading (WebWorker), are more or less at a performance bottleneck. Less streamlined than contentedItable based browsers. Most importantly, an editor built on TOP of Html is relatively easy to learn and less expensive to develop and maintain than a self-organizing editor. An Lv1 level Web rich text editor is a good choice if the team doesn’t have the deep development capabilities associated with the editor.

Prosemirror

In this amway Prosemirror framework, author Marijn Haverbeke, also the author of CodeMirror, Acorn, and Lezer, has been immersed in Web editors for more than a decade and is currently active in the open source community as an independent software developer.

Prosemirror is an excellent framework that implements the basic features of the editor, clear syntax parsing support, a complete transaction system, and even support for multi-party collaboration.

But unlike other rich text components, Prosemirror positioning is more like the skeleton of an editor than a full-fledged editor. This means that developers have the flexibility to implement various forms of editors based on this skeleton. It can be a rich text workbench like UEditor, a Notion workbench like Xiume, or a graphical typesetting tool like Xiume. If the developer has a strong desire to customize the editor, Prosemirror will do no harm.

However, Prosemirror also has obvious disadvantages in the process of use. On the one hand, the framework was written by a foreign master, and only contains a few English examples and relatively obscure apis. At the same time, the editor is a relatively niche area, resulting in a lack of Chinese documentation. Prosemirror, on the other hand, is programmed to be more of an extensibility skeleton and not plug-and-play. Prioritizing modularity and customizability over simplicity is not an easy drop-in component for The core library. As a result, the cost of learning and using Prosemirror is steeper than other frameworks, discouraging some developers.

AD time

So is there a framework that maximizes the extensibility of Prosemirror while providing plugins that can be plugged in and played? Syllepsis takes a look. Manual dog head.

Syllepsis was born in Toutiao as a rich document editor created by professional authors. Later, it was extended to other departments. Its internal version has been used by more than 30+ departments such as Toutiao, Xiagua Video, Xinlinli and Tongche Di. It’s easy to think of Syllepsis as a React component built on top of Prosemirror (there was once a Vue version, but for upper-level reasons, only extensible interfaces remain) with two purposes:

  1. Provide common editing plug-ins, more concise interface, ensure available out of the box, simple configuration.
  2. Keep Prosemirror’s extensibility features and retain the ability to customize when existing plug-ins fail to meet requirements.

Of course, the project is still in its infancy on the open source side, so you can read more about Syllepsis’s capabilities through the documentation. The author believes that building an active ecosystem is the core element for the long-term development of open source projects. All students are welcome to express their opinions on the Issue, feed back bugs, and provide opinions or requirements on rich text editors.

conclusion

The editor is actually a very complex module, and it is difficult for a single person to implement all the content. The author is only on the basis of the accumulation of predecessors, combined with their own work content, to tell their own views. Limited to space and ability, many places are just a brush brush, welcome peer survey and supplement. Finally, thanks to Liejin from WPS and Haibao from Ali for providing writing inspiration and revision suggestions.

reference

  1. How browsers work