This article is written by Shang Wenmo with revisions and changes.

1. Write it first

I finishing a large number of IM technical articles (see “resources” section at the end of this article), about message reliability and consistency issues accounted for a large proportion, the reason is that IM such systems regardless of product function and technical characteristics of all kinds of dazzling, almost guarantee the reliability of the news and consistency is essential to the quality of the IM products.

Imagine if a IM don’t even know each other a message can be received, chat content from the other side to see whether “nonsense” out-of-order serious (), the APP users won’t let him on the phone for the night (certainly the first time unload), because the most basic chat logic is not possible, It has lost the meaning of IM software itself.

Another ways, however, IM system is not standard (although XMPP once this agreement is trying to solve this problem, but it turns out that isn’t real), that are almost all their own proprietary protocol, the implementation of different logic, which determines if the same technical problems, it is difficult to have a fixed for IM implementation routines and standard solutions.

So, for this article, the author offers a solution to the “reliability” and “consistency” problems of IM messaging, but it’s a matter of opinion whether the solution is right for you or not. Words of choose and employ persons is: the content of this article is for reference only, specific solutions please be combined with their own system architecture and implementation, read several articles about this technical topic, take its essence, find suitable for their own technical solutions and ideas is the most sensible.

2. Introduction

Instant messaging (IM) systems must address the problem of message reliability and message consistency (**PS: ** If you are not sure what IM systems are, read this article: Introduction to Zero-Base IM Development (1) : What is IM systems? ).

These two questions are colloquially known as:

  • 1) Message reliability: simply speaking, no message is lost. One party sends a message and the message reaches the other party successfully and is displayed correctly.
  • 2) Message consistency: including the message consistency of the sender and the two sides of the session, requiring that the message is not repeated or out of order.

This article will start from the typical IM message sending logic, simple to understand the principle of message reliability, consistency problems and can refer to the technical solution, maybe the technical solution is not perfect, but I hope to bring you inspiration to solve the IM technical problem.

3. Typical IM message sending process

The general implementation process of IM message sending can be divided into two stages:

  • 1) The sender sends a message, the server receives and returns an ACK message to the sender;
  • 2) The server pushes the message to the receiver.

The success of sending a message is determined primarily by the first phase — whether the message is received by the server.

For message senders, message states can be divided into three categories:

  • 1) Sending;
  • 2) The message is sent successfully.
  • 3) Sending fails.

Specifically, the meanings of these three states are as follows:

  • 1) Sending: the sender triggers the sending event until it receives the ACK corresponding to the return message from the server;
  • 2) Successful sending: the sender replies with ACK when receiving the message;
  • 3) Sending failure: If no message is received after a certain number of resending times is exceeded, the ACK reply is received.

The corresponding message sending process is shown as follows:

4. IM message reliability

Limited by space, for the basic concepts and detailed principles of IM message reliability, you are advised to read The Introduction to Zero-Base IM Development (3) : What is IM System Reliability? , this paper focuses on the solution.

4.1 Retransmission mechanism

In the first stage of message sending (see section “3, Typical IM Message sending Process” in this article), the message can be sent successfully by setting up a retransmission mechanism:

  • 1) Determine whether the message should be resend according to whether the corresponding ACK is received within a certain period of time;
  • 2) Resend if the preset duration is exceeded;
  • 3) When the number of retransmissions exceeds the preset number, the message will not be retransmitted, the message is judged to have failed to be sent, and the message sending status is modified.

PS: For the complete scheme-level code implementation, please refer to the code implementation of QoS mechanism in MobileIMSDK.

4.2 Checking Session Records

Message sending in the second phase (see section “3. Typical IM Message Sending Process” in this article), the server pushes the message to the receiver. If the connection is disconnected, the message is lost.

Therefore, to ensure the integrity of the message, you need to establish a connection, according to the last message (ACK) timestamp, get session records, a time to return all messages in a period of time (PS: medium and large applications, message pulling is not a simple thing, details can read IM development dry products share: How to gracefully deliver a large number of offline messages reliably.

Another way to ensure message integrity is to add scheduled polling, as shown in the following figure.

Establish connection flow chart:

4.3 Two issues to consider

Message resending and session record checking need to consider two issues:

  • 1) Whether the message will be sent repeatedly;
  • 2) Whether the message order will be scrambled.

Here are two examples.

About message retransmission:

  • 1) If the server does not receive the lost message before the message reaches the server, the sender resends the lost message, and the server receives the lost message successfully without generating two identical messages;
  • 2) If the server receives a message and returns an ACK that is lost, then sending the same message again may cause the message to repeat.

On message order:

  • 1) If the sender sends three consecutive messages, the first and third messages are successfully received by the server, and the second message is lost, will the third message be recorded?
  • 2) If the second message reaches the server at this time, is it before or after the third time (the server usually stamps the record)?

5. IM message consistency

As in the previous section, for the basic concepts and detailed principles of IM message consistency, you are advised to read The Introduction to Zero-Base IM Development (4) : What is Message Timing Consistency for IM Systems? .

5.1 Using UUID Messages for Deduplication

For message retransmission, you can add the attribute UUID to each message as the unique identifier of the message. The UUID of the retransmission message remains unchanged, and the front-end system deduplicates the message based on the UUID. That’s the general idea.

PS: Message ID is also a big technical topic for IM. If you are interested, you can read the following series:

IM Message ID Technical Topic (I) : Wechat’s Massive IM Chat Message Serial Number Generation Practice (Algorithm Principle)

IM Message ID Technical Topic (2) : Wechat’s Massive IM Chat Message Sequence Number Generation Practice (Disaster Recovery Solution)

IM Message ID Technical Topic (3) : Decoding the Chat Message ID Generation Strategy of Rongyun IM Products

“IM Message ID Technical Topic (4) : Deep Decryption of Meituan distributed ID Generation Algorithm”

IM Message ID Technical Topic (5) : Technical Implementation of Open Source Distributed ID Generator UidGenerator

IM Message ID Technical Topic (6) : Deep Decryption of Didi’s High-performance ID Generator (Tinyid)

5.2 Message ordering using vector clocks

For message ordering: because in chat, the order of messages has an important influence on the expression of the sender, incomplete messages or reversed messages may cause semantic incoherence or even distortion. Therefore, the order of sending messages from the sender must be guaranteed, and the order of sending messages from both sides of the session must be based on the actual situation.

In general cognition, the status is the message being sent and should not be seen by the other party. Only the message that is successfully sent can be seen by the other party. However, in the implementation, the success of sending a message is judged by the server receiving the message and returning an ACK, rather than being received by the other party.

The question then arises: if a message is in the sending state and a message is received at this time, is the message received before or after the message being sent?

This is a context, and the key question is which message the sender sent the message based on.

Here is an idea: refer to vector clock algorithm in distributed system (see “Vector clock Algorithm in Distributed System”).

Firstly, the vector clock algorithm is briefly described:

Vector clock algorithms are used to generate partial order relationships and correct causality in distributed systems. A system contains N nodes, and the message body generated by each node contains the logical clock of the node. The vector clock of the whole system is composed of n-dimension logical clock, and is transmitted in the message body generated by each node.

In brief, the realization principle of vector clock algorithm is as follows:

  • 1) In the initial state, vector-value is 0;
  • 2) Each time the node processes the node event, the node clock +1;
  • 3) Each time the node sends a message, it sends the system vector clock containing its own clock together;
  • 4) Each time the node receives a message, the vector clock is updated, and the clock of the node is +1. Other nodes compare the locally retained vector clock value of each node with the vector clock value in the message body, and take the maximum value.
  • 5) The node receives multiple messages at the same time, and determines whether there is a partial order relationship between the vector clocks receiving messages.

For point 5) above:

  • 1) If partial order relationship exists, the vector clocks are merged and the vector clocks with larger partial order are taken;
  • 2) If there is no partial order relation, the merge cannot be performed.

Partial ordering: If each dimension of vector A is greater than or equal to vector B, there is A partial ordering relationship between A and B, otherwise there is no partial ordering relationship.

For IM to sort chat messages, it is actually to deal with the context of chat messages and determine the causal relationship between messages.

Reference vector clock algorithm: Suppose there are N message session parties, the vector clock of the system is composed of n-dimensional clocks, the vector clock is transmitted in the message body sent by each party, and sorted according to the vector clock.

Specific implementation ideas are as follows:

  • 1) The system vector clock is set to (0, 0… , N);
  • 2) The node sends a message to update the system vector clock. The clock of this node increases by one, while other nodes remain unchanged;
  • 3) The node receives the message and updates the system vector clock, which increases by one; Other nodes compare the locally retained vector clock value of each node with the vector clock value in the message, and take the maximum value.
  • 4) The message order is determined by the partial order relation of the system vector clock in the message body.

For point 4 above:

  • 1) If the partial order relationship can be determined, it will be displayed from small to large according to the partial order relationship;
  • 2) If the partial order relationship of multiple messages cannot be determined, they will be displayed in the natural order (the order received).

Vector clock can solve most of the problem of message consistency in theory, but it needs to consider the practical experience in implementation.

The most important issue is whether to force ordering, or whether to move the order between messages if the actual display order and the vector clock have an inconsistent partial ordering relationship.

Here’s an example: in a multi-party conversation, if one party is too slow to receive or send a message. After the last message he saw, the others had started a new topic, and his message about the previous topic was finally sent and received by the others.

The question then arises: does the message about the previous topic appear last, or is it moved to an earlier time?

  • 1) If the message is displayed at the end, but the content is not relevant to the current topic, others may be confused;
  • 2) If the message is moved to an earlier time, it may not be seen by others, or it may feel awkward to see an extra message.

IM scenarios are many and complex, and more often than not you need to think from a product perspective.

To solve the problem of whether messages need to be sorted, only a relatively general solution is proposed: it is recommended that the sorting is not mandatory in the session, and the sorting is carried out in the session history according to the partial ordering relationship of the vector clock.

6. Summary of this paper

For the reliability and consistency of IM system messages, the message retransmission mechanism ensures that the message is successfully received by the server, and the session record check ensures that the received message is complete, thus ensuring the reliability of the entire message sending process. Uuid message deduplicating and reference vector clock algorithm are used to sort messages, which provides a solution to ensure message consistency.

In short, this kind of IM system seems simple, in fact, the water is like the sea, if you are new to IM development, you can start from the “beginner’s entry is enough: from the zero development of mobile TERMINAL IM” this article system learning. If you consider yourself an IM veteran, here are some articles on large architecture design in IM.

7. Reference materials

[1] Basics of IM Development (iii) : What is IM System reliability?

[2] Introduction to Zero-base IM Development (iv) : What is message timing consistency in IM systems?

[3] Implementation of IM message delivery guarantee mechanism (I) : To ensure reliable delivery of online real-time messages

[4] Implementation of IM message delivery guarantee mechanism (II) : To ensure the reliable delivery of offline messages

[5] How to ensure the “timing” and “consistency” of IM real-time messages?

[6] A low-cost method for ensuring IM message timing

[7] IM group chat messages are so complex, how to ensure that not lost and not heavy?

[8] How to design a “retry failure” mechanism for a completely self-developed IM?

[9] IM Development Dry Goods Sharing: How to gracefully achieve a large number of offline messages reliable delivery

[10] Talk about message reliability and delivery mechanism of mobile IM from the perspective of client

[11] A set of 100 million user IM architecture technology, Part 2: Reliability, order, weak network optimization

[12] From Novice to Expert: How to design a distributed IM system with 100 million messages

This article has been simultaneously published on ** : ** www.52im.net/thread-3574…