This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

The challenges of real-time collaborative editing

When it comes to the difficulty of real-time collaborative editing, people’s first reaction is basically collaborative conflict processing.

To deal with conflict

In fact, the solutions for conflict management are relatively mature, including:

  1. Edit lock: When someone is editing a document, the system locks the document to prevent others from editing at the same time.
  2. Diff-patch: Based on the similar idea of Git and other version management, it compares and merges contents, including GNU Diff-patch, Myer’s diff-Patch and other schemes.
  3. Final conformance implementation: Operational Transformation (OT), conflict-free Replicated Data Type (CRDT).

The implementation of edit locks is simple and crude, but directly affects the user experience. Diff-patch can merge conflicts by itself or hand over to users when conflicts occur. The OT algorithm is the solution used in Google Docs, and the Atom editor uses CRDT.

OT and CRDT

The OT and CRDT approaches are similar in that they provide ultimate consistency. The difference is how they operate:

  • OT does this by changing operations
    • OT will split and convert editing operations to achieve the effect of conflict handling
    • OT does not include specific implementation, so it needs to be implemented by the project itself, but it can carry out high-precision conflict handling according to the needs of the project
  • CRDT does this by changing state
    • Basically, CRDT is a data structure that, when updated with the same set of operations, will always converge to the same representation, even if the operations are applied in a different order
    • CRDT comes in two ways: operation-based and state-based

OT is primarily used for text, which is often complex and unextensible. The CRDT implementation is simple, but Google, Microsoft, CKSource, and many others rely on OT for a reason. The current state of CRDT research supports collaboration on two main types of data: plain text and arbitrary JSON structures.

For more advanced structures, such as rich text editing, OT trades complexity for user expectations, while CRDT focuses more on data structures. As data structure complexity increases, the time and space complexity of the algorithm increases exponentially, resulting in performance challenges. Therefore, most real-time collaborative editing is now implemented based on OT algorithm.

Version management

In the scenario of multi-person collaboration, in order to ensure user experience, the DIFF-Patch /OT algorithm is generally used to handle conflicts. To ensure that each user action is updated in the correct sequence, an incremented version number is maintained and updated every time a new change is made.

Data version update

For the data version to update as expected, several conditions are required:

  • The collaborative data version is updated properly
  • The lost data version was successfully pulled. Procedure
  • The submitted data version increases in order

How do you understand these premises? Let’s take an example.

Ming opens a document that pulls the data version 100 from the server. At this time, the server sent a message, saying that someone had updated the version to 101, so Xiaoming needed to update the data of the 101 version to the interface, which was a normal update of collaborative data version.

Xiaoming edited it based on the latest 101 version and produced a new operation data. When Ming submitted the data to the server, the server saw that Ming’s data was based on version 101, and told Ming that the latest version was now 110. Xiao Ming can only go to the server to patch back the 102-110 version, which is the successful patch of the lost data version.

After the data version of 102-110 was pulled back, Xiao Ming’s previous operation data needed to be conflicted with these data versions respectively, and an operation data based on the 110 version was finally obtained. At this time, Ming submits the data to the server again, and the server accepts it and assigns the 111 version to Ming. Therefore, Ming upgrades his local data version to 111, which is an orderly increment of the submitted data version.

Maintain data task queues

To manage these versions, we need to maintain a data queue of user actions for orderly submission of data. The responsibilities of this queue include:

  • The user operation data is queued properly
  • Queue tasks are normally submitted to the access layer
  • The queue task is submitted abnormally and retry
  • The queue task is removed after being successfully submitted

Such a queue may also face the possibility that the user suddenly closes the page, and we need to maintain a cache of data that the user edits but does not commit to the server when the user opens the page again. In addition to the browser shutdown, there is also the network interruption caused by the change of network conditions during the editing process. In this case, we also need to take the user’s operation offline to the local, and continue uploading when the network is restored.

Room management

Due to the need of multi-person collaboration, compared with the ordinary Web page, but also more room and user management. Users in the same document can be considered in the same room. In addition to being able to see who is in the same room, we can receive messages from each other. In the document scenario, each action of the user can be regarded as a message.

But documents are different from regular room chats in that the user’s actions cannot be lost, and strict version order is required. The user’s operation content may be large, for example, the user copied and pasted a 10W or 20W table content, such messages obviously cannot be transmitted all at once. In this case, in addition to considering websockets for their own data compression (HTTP itself supports compression), we also need to implement our own sharding logic. When it comes to data sharding, it is followed by how to deal with sharding and the loss of sharding data.

Multiple modes of communication

There are many ways for the front and back end to communicate, including HTTP short polling (Polling), Websocket, HTTP long polling (long-polling), and SSE (Server-sent Events).

We can also see that different online document teams use different communication methods. For example, Google Docs uses Ajax for upstream data and HTTP for downstream data. Ajax is used for upstream data of graphite documents, SSE is used for downstream data push; Jinshan documents, feishu documents, Tencent documents are using Websocket transmission.

Each communication mode has its own advantages and disadvantages, including compatibility, resource consumption, real-time performance, etc., which may also be related to the background architecture of the business team. Therefore, when designing the connection layer, we should consider the interface extensibility and reserve support for various modes.

Each grid is a rich text editor

In addition to real-time collaborative editing, the Excel project faces many other challenges. Everyone knows rich text editors suck, but in Excel, every grid is a rich text editor.

The rich text

There are several processing methods for editing rich text:

  • A simple div incrementcontenteditableProperty, native to the browserexecCommandperform
  • Div + event listener to maintain a set of editor states (including cursor states)
  • Textarea + event listeners maintain a set of editor states

For the contenteditable property, manipulating the selected text (italics, color) requires determining the cursor position, using Range to determine where the selected text is, and then determining whether the text has been processed and needs to be overwritten, removed, or retained. Compatibility issues often arise. In general, complex editors such as Atom and VSCode implement contenteditable functionality on their own, using div+ event listening. Ace Editor, Kingsoft documents, etc., use hidden Textarea to receive input and render it into div to achieve editing effect.

Copy and paste

In general, when a single cell or multiple cells are copied, we get the raw data of the cell, so we need to do two steps: convert the data to rich text (splicing table/tr/ TD elements, etc.) and write to the clipboard.

The paste process also involves taking content from the clipboard, converting that content into cell data, and submitting action data. There is also the possibility of uploading images, parsing rich text of various kinds, and each cell can be made exponentially more complex by setting some properties, including merged cells, row height and column width, filters, functions, and so on.

Copy and paste Function modules Copy and paste can be divided into two types based on application scenarios:

  1. Internal copy and paste.
  2. External copy and paste.

Internal copy and paste refers to the copy and paste in its own product. Because a copy and paste process involves a lot of calculation and parsing, internal copy and paste can consider whether to directly write cell data into the clipboard, so that the data can be directly obtained during the pasting. It eliminates time-consuming and resource-consuming steps such as converting data to rich text and parsing rich text to cell data.

The external copy and paste is more related to the compatibility of various similar Excel editing products and the compatibility of system clipboard content format, and the code implementation is particularly complicated.

How complex table rendering is

Generally speaking, there are two ways to draw a table:

  1. The DOM.
  2. Drawing canvas.

The well-known handsonTable open source library in the industry is based on DOM to achieve rendering, but it is obvious that DOM rendering of 100,000 or millions of cells will cause great performance problems. Therefore, many Web spreadsheets are realized based on Canvas + superimposed DOM. Using Canvas to realize spreadsheets also requires consideration of visual area, scrolling operation and canvas hierarchy, as well as some performance problems faced by Canvas itself. Including how canvas straight out.

Table rendering involves merging cells, selecting, zooming, freezing, rich text, and wrapping. Let’s see how complex it is.

Word wrap

In general, a cell wrap is represented on the data store and consists of only: cell contents + line feed properties. But when such a data needs to be rendered, it faces some calculations for wrapping:

We need to find the column width of the column and then branch the render layer based on the contents of the cell. As shown, a string of text is divided into three lines according to the calculation of the branch logic. After line wrapping, it may also involve the adjustment caused by the row height of the cell being propped up. The adjustment of row height may also affect the rendering results of some centered attributes of other cells in the line, which needs to be recalculated.

Therefore, when we wrap a column of cells, we can cause massive recalculation and rendering, which also involves a significant performance cost.

Frozen area

The freeze feature divides our table into four zones, left, right and up and down frozen and unfrozen. The complexity of frozen region mainly lies in the processing of some special cases of boundary, including region selection, image cutting and so on. Let’s look at a picture:

Figure, for an image, although it is placed directly on the whole table, but when it falls into the data layer, it actually belongs to a certain grid. On the frozen region edit, we need to shard it, but we still need to show the original image regardless of which region is selected:

This means that in canvas, when we get the position of mouse click, we also need to calculate whether the corresponding clicked grid belongs to the coverage scope of the picture.

Alignment with cell overflow

There are three types of horizontal alignment for a cell: left, center, and right. Overwriting occurs when a cell is not wrapped and its contents exceed the width of the cell:

In other words, when we draw a certain cell, we also need to calculate whether the nearby cell overflows into the current cell. If there is an overflow, we need to draw in this cell. In addition, the overflow logic may need to be adjusted and updated when a column is hidden.

All the points listed above are just a few of the more detailed points, but table rendering also involves the logic of hiding cells and rows, dragging and dropping, zooming and selection, as well as the complex calculation of cell borders. In addition, as canvas rendering is the content of a screen, scrolling of the page and updating of collaborative data may also lead to frequent updating and drawing of the canvas.

Data management challenges

When each grid supports rich text content, in the scenario of hundreds of thousands or millions of cells, the storage of data on the disk and the data change of user operation also pose a great challenge.

Atomic operation

Similar to database transactions, for spreadsheets we can break down user actions into indivisible atomic actions. Why do you do that? In fact, it is mainly convenient for conflict processing of OT algorithm. It can perform conflict calculation and conversion of specific logic for each non-separable atomic operation, and finally fall into the storage.

For example, if we insert a child table, in addition to inserting itself, we may need to move other child tables. So, for a child table, our operations might include:

  • insert
  • rename
  • mobile
  • delete
  • Update the content
  • .

All user actions on child tables can be combined into the final effect, provided they are broken down carefully enough, and the actions that are no longer separable are the final atomic actions. For example, a copy and paste child table can be split into insert-renaming – update contents; Clipping a child table can be split into insert – update – delete – move other child tables. By analyzing user behavior, we can extract these basic actions and take a concrete example:

As shown in the figure, for the server, the final result is to add two child tables, one is John’s “worksheet 2”, the other is John’s “worksheet 2 (automatic renaming)”.

In implementation, concurrent operations are handled using a Tranform function that takes two operations that have been applied to the same document state (but on different clients) and calculates the expected changes to the new operation that can be applied after the second operation and retain the first.

The names of the OT functions used on different OT systems may vary, but they can be divided into two categories:

  • Inclusion transformation/ Forward transformation: indicates asIT(opA, opB).opATo contain in an efficient wayopBTo convert an operation into another operationopB'.
  • Exclusion transformation/ Backward transformation: expressed asET(opA, opB).opAIn an effective wayopBTo convert an operation to another operationopB''.

Some OT systems use both IT and ET functions, while some use only IT functions. OT of the complexity of the functional design depends on many factors: whether the OT system does support consistency maintenance and support Undo/Redo, which convert properties to meet, whether or not to use ET, OT operation model is general, each action is according to the character of data in (a single object) and according to the string object (sequence), layered and other structures, etc.

In addition to local conflict processing after the client receives the collaborative message from the server, the server may also perform conflict processing after receiving two messages based on the same version successively. In order to ensure the final consistency of the algorithm, both the local and the server have a consistent set of conflict handling logic.

Version rollback/redo

Undo/Redo is a basic capability for most editors, and document editing is no exception. Earlier we mentioned the concept of real-time collaborative versioning, where each user action may be split into multiple atomic actions.

In this scenario, Undo/Redo involves both the recovery of falling disk data and the handling of conflicts during the restoration of user operations. In a collaborative scenario, if some operation data of others is received during the editing process, will the Undo Undo operation of others be reversed?

In fact, the idea of Undo based on OT algorithm is relatively simple. Usually, invert() is implemented for each atomic operation, and a new atomic operation is generated and applied.

The transform function can be divided into IT and ET, and Undo can be implemented in two ways:

  • Inv & IT: invert + inclusion transformation
  • ET & IT: exclusion transformation + inclusion transformation

Regardless of the algorithm, the basic idea of OT for Undo is to convert the inverse of an operation (the operation to be undone) into a new form based on the effect of those operations performed after the operation, so that the transformed inverse can achieve the correct Undo effect. However, if the user receives a new collaborative operation during editing, when the user performs Undo, the atomic operation generated through reverse operation also needs to be conflicted with these new collaborative messages to ensure the final consistency.

data

For cells that support rich text, in addition to setting some properties of each cell, including data format verification, function calculation, width and height, border, fill color, etc., it is also necessary to maintain some data of rich text format and associated pictures in the cell. These data in the face of hundreds of thousands or even millions of cells, data transmission and storage also bring a lot of challenges.

The revision and restoration of revision records, how to optimize memory, how to optimize the size of data, how to efficiently use data, how to reduce the time and space complexity of computation, etc., have become some difficult problems faced by the data layer.

END

The above list only takes up a small part of the entire Excel project, in addition to workers, menu bars, various features like data formats, functions, images, charts, filtering, sorting, smart drag, import and export, region permissions, search and replace, Each feature faces a variety of challenges due to the complexity of the project.

In addition, functional decoupling between modules, how to organize and design the architecture of 100W+ code, how to optimize the code loading process, problems caused by multi-party collaboration, maintenance/readability of the project, performance optimization and so on are all problems that we often need to think about.

conclusion

The biggest feeling of working on a project like this is that you don’t have to scratch your head to think about what else a project can do, because there are so many things that can be done. Code quality, maintainability, and readability are also often underrated for many businesses. Because of the limitations of the project itself (relatively simple), we often cannot find the points that we can dig deeply. Therefore, we have to improve efficiency as much as possible through automation and configuration, but what we can do is also limited, and our growth is also limited.

People often joke about the low ceiling at the front and say they are facing obsolescence at 35. Regardless of personal interests, enthusiasm and their own bottlenecks, most of the time, conditions are not allowed, the business scenario is relatively simple, so there is no scene to play their ability. I used to think it was ok to study after work, but wouldn’t you kill two birds with one stone if you just went to work and did what you loved?