If you feel like a feature is too complex to simply explain yourself, you’re probably going in the wrong direction!


The L1 capability is the ability to implement rich text style operations yourself, instead of using the browser’s built-in execCommand. I have been doing research in my spare time since 4.3, and now it has been 2-3 weeks. I want to share some research records.

  • So I started writing the demo, and I wrote part of it
  • Investigate classic open source works: Quill Slate ProseMirror

PS: At present, it is just an intermediate stage. I will continue to do research and write demos. Therefore, this sharing will be more scattered, and more content.

Write a demo

[Important reminder] As an understanding of this part, some of the design of demo is unreasonable, I will correct it later, do not be distorted!!

Based on two premises

  • Split the Model and View and render the page using vDOM
  • Abstract Selection and range

Abstract model

【 Note 】 Model refers to data model, sometimes also called state Content DOC, etc., we collectively refer to model here.

I’m going to define IBlockNode IIlineNode, the top layer has to be IBlockNode, the bottom layer has to be IInlineNode model which is the array of blockNode. (See SLATE)

An abstraction based entirely on the DOM structure, easy to understand. The text node must be wrapped in a SPAN, otherwise the model cannot be modified.

        window.content1 = [
                selector: 'p'.children: [{selector: 'span'.children: ['Welcome'] {},selector: 'a'.props: {
                            href: 'https://www.wangeditor.com/'.target: '_blank'
                        children: [{selector: 'span'.children: ['wangEditor']}]}, {selector: 'span'.children: ['Rich text Editor']}]}, {selector: 'p'.children: [{selector: 'span'.children: ['Welcome'] {},selector: 'b'.children: [{selector: 'span'.children: ['wangEditor']}]}, {selector: 'span'.children: ['Rich text Editor']}]}, {selector: 'p'.children: [{selector: 'img'.props: {
                            src: 'http://www.wangeditor.com/imgs/logo.jpeg'.alt: 'logo'}}]}]Copy the code

Vdom and Diff Patch use github.com/snabbdom/sn…

Abstract selection range

Reference Quill’s design

interface IVRange {
    offset: number
    length: number
Copy the code

Calculation rules:

  • Start to 0
  • To text +textContent.length
  • +1 if non-text such as images is encountered
  • When encountering block nodes (p li td, etc.), +1 is also important, otherwise it is impossible to distinguish between the end of N lines and the beginning of N+1 lines

There are two important functions (both involving depth traversal of the DOM tree)

  • When the selection changes, you should be able to calculate the vRange from the native range
  • Be able to set the editor’s actual selection based on vRangeselection.updateSelection({ offset: 5, length: 0 })


The basic idea

  • Call command to modify the model
  • The model to modify the view
  • Reset the selection range

Abstract base command, extensible custom-command

  • NsertInlineNodes in bold or italic
  • ReplaceText enters and deletes text
  • ReplaceBlockNode such as title, list
  • removeNode
  • deleteText
  • insertBlockNodes
  • ReplaceInlineNodes For example, cancel link
  • deleteBlockNodes
  • … (The basic command should be enumerable, otherwise it would be confusing.)

Gets the selected node

For example, to set p to h1, or to bold a piece of text, you need to know which nodes are selected (the nodes in the Model). So, based on the vRange and model, you need to find the selected nodes and their parent nodes. Which the content of the code. SelectedNodeAndParentsArr, this is a 2 d array

private selectedNodeAndParentsArr: Array<Array<IBlockNode | IInlineNode>> = []
Copy the code

For example, if you click and select a P, it is [[p, span]] (the text must be wrapped by span). For example, if you drag and select a text within P, bold text, text, It is [[p, span], [p, b, span], [p, span]] knowing the selected nodes, you can modify these nodes, that is, modify the Model.

Modify the model and update the view

  • Execute relevant commands, such ascommand.do('bold', true) command.do('color', 'red')
  • The code of each command is converted to node, such as bold when the<span>x</span>convert<b><span>x</span></b>
  • Perform basic commands, modify the model, update the view, update selection and range — note that text formatting operations like bold can be very complicated due to selection issues!!

About contenteditable

Try to draw the cursor yourself, then give up

At first you want to draw the cursor yourself, without contenteditable

  • Range.getclientrects () gets the location of the selection
  • Draw an Absolute div at that location, append a flashing div and input internally
  • Listen for input keyDown, then perform text entry, delete, enter, and cursor up and down

I abandoned it because I could not focus input at the same time while dragging the selection, so I could not listen on the keyDown. Later, after investigation, all the editors that draw their own cursors need to use iframe (as a third-party lib, I don’t want to use iframe), and some even need to draw their own selection (Youdao Cloud Notes??). Watching a video of a presentation by proseMirror’s author, he also said that this approach is going to cause a lot of bugs.

Decided to use Contenteditable

After research, classic editors such as Quill SLATE proseMirror tinyMCE use Contenteditable. So it should be the right direction.

But there are some dissenting voices:

  • The most classic is why-contenteditable is-terrible
  • Another post from Youdao Cloud Notes says that Contenteditable needs to hijack keyDown to modify the model, which could result in views being secretly modified — views should be generated through the Model and cannot be modified themselves.

This is why I didn’t want to use Contenteditable at first. But there seems to be only one trade-off. So it was decided to use Contenteditable

About the mutation observer

I had to deal with the text myself, as I wanted to get rid of contenteditable. The way I think about it is:

  • Listen for input keydown (anti-shake will be considered)
  • Modify model, update view, and update Selection

There are two types of content changes

With Contenteditable, however, editing areas are open to change, and it’s easy to miss things when you want to hijack keydown. So, I see this picture right here. The figure shows that the mutation Observer is used to listen for changes to the editor.

So I divided the editing operations of the editor into two categories:

  • External call apis, such as JS bold, header command
  • Edit changes inside the area, which are open for typing, deleting, line wrapping, and so on, because it is contenteditable

For the second type, I can use mutation Observer to listen in without having to hijack various keydowns

The Mutation Observer was unable to listen for all changes

But soon I realized that listening for mutations turned out to be anti-human when I edited a text carriage return in a Contenteditable area!!

I spent two days of my spare time trying to decipher the mutation Observer’s handling of carriage return newlines, but failed. I gave up after two days, not because OF the difficulty, but because of the situation, even if it was solved at all, it was a very complicated design. Good design should be simple, reasonable, easy to read, and easy to understand.

You can try it yourself.

The mutation Observer can only listen for text changes

Quill uses only the Mutation Observer to listen for text changes. Other enter delete, etc., are still hijacking keyDown, modify model.

mutations.length === 1 && mutations[0].type === 'characterData'
Copy the code

Demo stops at the carriage return line feed problem

I thought long and hard about the carriage return newline problem, but there was no solution. So I’m going to leave you there.

  <a href="xxx">
Copy the code

For example, in the HTML structure above, what would model do if it wanted to wrap lines in random places? What if I drag and drop a segment and press Enter to wrap it? None of this is easily explained by the current design.

So, I decided, it must be my reverse design problem.

Research classic open source works

Research other products, should look at several at the same time, can complement each other. Because you can’t get 100% of it from just one. I spent a few days looking at the Quill SLATE proseMirror in my spare time and got a general idea of the main process, but needed to explore the details.

All three editors share some common design features, which also made me realize some mistakes in my previous design. These are all very important!!

  • Instead of modifying the model, regenerate (immutable data, as with immer). The side effects can get out of hand, bigger and messier.
  • Instead of modifying model directly, command -> Operation -> Model -> vDOM & patchView
  • Model is not the appearance of DOM structure, but flat or even linear, so as to better carry out range operation

What is the key to researching other works?

Energy is limited, please be sure to grasp the core, grasp the main contradiction.

  • It has important concepts and data structures such as Quill’s Delta
  • It’s the complete process of rendering from command to the final view, as well as the various intermediate stages

Immutable data

Easy to split modules, reduce complexity, write pure functions, no side effects, easy to test – important but performance considerations, so use the appropriate tool immer (immutable. Js is not recommended, API learning costs are high)

The value of the Operation

Atomic operations such as juejin.cn/post/691712… Operation should be low-level, enumerable, and not extensible.

A command may contain multiple operations. For example, drag blue to select a text and hit Enter. This command contains multiple operations: remove and then split a node.

Co-editing relies on atomic operations, passing operation to peers, and then merging. It’s like a common OT algorithm, but I haven’t started investigating this one yet.

To undo the operation, you need to reverse operation and then re-apply, as in inverse here. [Note] If collaborative editing is considered, the undo operation simply overwrites the contents of the editor, but “undo only your own, not others”. So, you need to find your own operation and reverse it.

Flattening of model

Above is the documentation for proseMirror, as does SLATE. That is, all text, be it B, I, U, link, color, bgColor, fontSize, etc. is tiled. That’s what I didn’t expect when I started doing demos.

Quill went further and made a linear model directly, representing the tree structure with linear structure. More on that below.

[{type: 'p'.attrs: {},
    children: [{type: 'text'.text: 'aaa'
        type: 'text'.text: 'bbb'.marks: { bold: true}}, {type: 'text'.text: 'ccc'.marks: { bold: true.em: true },
        style: { color: 'red'}}, {type: 'text'.text: 'ddd'.marks: { bold: true.em: true.link: 'xxxx'}}]}, {// ...}]Copy the code

Once the model is flattened (i.e., without the < I > nodes and nested hierarchies), a qualitative change occurs: the model is the node tree, and the tree is as deep as 3 (table tr td), which is fast to traverse

  • Easy to calculate range{ offset, length }(With hierarchies, calculations are cumbersome and error-prone)
  • Easy to modify the text style, such as randomly selected text, bold, set color, text selection is very arbitrary
  • Easy to split nodes, such as the carriage return newline (highlighted above) are splitNodes, very simple
  • Easy to clean and merge (this step is very complex if there are hierarchies)
  • The text node is cleared if the content is empty
  • Merge two adjacent text nodes with the same attributes

model -> view

Since the Model does not correspond exactly to the DOM structure, it cannot be rendered directly as vDOM. For example, does {bold: true} render as or ? For example, when bold em exists, do you render as or ? Who wraps whom?

So, the Model is rendered in the middle of the VDOM with a parser. Quill, formats. SLATE wrote the React component to implement it. ProseMirror I haven’t figured out yet

Also, does the model have to be rendered as HTML? Can I render it markdown?


Content data structure

OT based model {retain, insert, delete} (can have attributes)

The abstraction of Selection

Quill’s content is text-based, and embed can also be seen as a special text for images, videos, etc.

  • A text occupies a unit of length
  • An Embed occupies a unit of length
  • A block takes up a unit of length — otherwise it’s hard to tell the end of an N line from an N+1 line

So, Quill abstracts selection into a simple {index, length}

Linear structure of content [important]

The content here is the model mentioned above. The above model is a Node tree, and the text content is flattened. Quill’s content is a linear structure (that is, an array), and it can render the DOM with a linear structure — great design! Also, it is a natural OT model, which naturally supports collaborative editing.

Linear structure, easier to operate based on range. For example, modify text attributes, insert and delete text, enter newline, etc. Compare this to the node structure above.

How does Content ultimately represent the DOM structure? How does it represent

  • , etc.? Refer to the demo
// Add the following code to the demo to view content at any time
document.body.addEventListener('click'.() = > {
  console.log('contents', editor.getContents())
Copy the code

As you can see, \n plays an important role. Quill uses \n to express the end of a block. It also means table. It’s a little more complicated, but it’s the same thing.

The delta is the operation

The demo presentation codepen. IO/quill pen/d… The content changes to generate a delta, which then generates a new content

Delta is also based on the OT model {retain, insert, delete} (you can have attributes), just like Content. But don’t mix the two.

  • Content represents the current content of the editor
  • Delta represents a change in content, which is operation. There may be multiple Deltas in one change.
  • 【 Note 】 Content is not directly concat delta, which is computed by transformation. For example, delta may have delete and content only insert

That is, Quill supports a collaborative editor, the OT data model is the foundation, and delta is the concrete implementor.

Parchment namely vdom

Content is OT model and cannot be directly rendered to DOM, so two more steps are required

  • Formats, such as Bold Link Image, how to render
  • Parchment Blots (vDOM and VNode)

Quill main process is roughly: command/format/textChange – > generate delta [] – > recount the content – > according to foremats to regenerate the parchment – > render the DOM.


Data model

Quill is a text-based linear structure and Slate is a Node based tree structure. However, text nodes are also flattened.

Sample look juejin. Cn/post / 691712…

Selection and the Range

Quill is a text-based linear structure and Slate is a Node based tree structure. Quill uses {index, length} to abstract range, while Slate doesn’t fit in this way.

  • Path finds the specific node
  • Point Determines the specific position, including Path and offset
  • Range is represented by two anchor focus points

Sample look juejin. Cn/post / 691712…

In fact, Slate can be calculated if you have to use {index, length}. Pach works better with the Node tree structure, however, as Path computations are faster.

Nine opreation

The delta of Quill is an OT data model with only three {remain, insert, delte} SLATE and nine operations. It is designed to fit into the node tree structure.

  • Six nodes
    • insert_node
    • merge_node
    • move_node
    • remove_node
    • set_node
    • split_node
  • Text related to two
    • insert_text
    • remove_text
  • Selection of related 1
    • set_selection

Source code reference github.com/ianstormtay…

Furthermore, each operation can find its counteroperation, which is easy to undo (as described above).

RenderElement renderLeaf is equivalent to Quill’s formats

SLATE is just an editor controller, view doesn’t care. Developers write it themselves.

For example, see juejin.cn/post/691712…

Main process

CustomCommand -> transform. XXX (Editor,…) -> editor.apply(operation) -> rebuild model -> React render

  • Transform is equivalent to some base commands, on which to extend their own Custom Commands
  • Within each Transform function, it is possible:
    • Perform other transforms
    • Generate at least one Operation, then editor.apply(Operation)
  • Inside editor.apply, immer is used to generate immutable data


ProseMirror feels very abstract and difficult to understand. But it has a certain social status, there must be a lot to learn. But I haven’t read enough so far, just one day, so I can’t write much.

However, its data structure, content modification process, and the above main ideas are corresponding.

Model is a Node tree structure, similar to SLATE

Not just graphic

Although graphic editing is fundamental, it must not be considered only graphic. The design should be comprehensive and closed loop.

Consider what?

  • Data format and structure in the Model
  • How do I render to VDOM and DOM
  • How to represent range
  • Check whether the operation type can be met
  • Whether collaborative editing is supported


My previous misunderstanding of Embed

I wondered earlier that all complex things can be used as embed, and this was reflected in the previous blog. As I slowly realized, I was on the wrong track.

Embed has nothing to do with complexity, it has nothing to do with text. Complex stuff can be done separately, but that’s another thing, not embed.

Embed must be non-text

The image is the most typical embed, it has some raw data {href, Alt,… }, rendering to a non-literal “block” that is non-editable, non-separable, and non-detachable. For example, video, audio, formulas, etc.

Therefore, table codeBlock is not embed.

Complex textual organization

  • Table (e.g. merge cells)
  • codeBlock

These two are the most important. The rest are not.

At present, I have a little research results, which are limited to Quill. Other editors need to be investigated and compared.

Future plans

The L1 editor kernel is very complex and needs further research. Next I will:

  • Dive into Quill SLATE and proseMirror
  • Extensive knowledge of other editors such as tinyMCE CKEditor editor.js
  • Learn about co-editing in detail, otherwise operation cannot be designed in detail