The author of this article “Shang Wenmo”, there are revisions and changes.

1, in front

Real-time network of a large number of IM technical articles (see “resources” section at the end of this article), about message reliability and consistency issues accounted for a large proportion, the reason is that IM such systems regardless of product function and technical characteristics of all kinds of dazzling, almost guarantee the reliability of the news and consistency is essential to the quality of the IM products.

Imagine if a IM don’t even know each other a message can be received, chat content from the other side to see whether “nonsense” out-of-order serious (), the APP users won’t let him on the phone for the night (certainly the first time unload), because the most basic chat logic is not possible, It has lost the meaning of IM software itself.

Another ways, however, IM system is not standard (although XMPP once this agreement is trying to solve this problem, but it turns out that isn’t real), that are almost all their own proprietary protocol, the implementation of different logic, which determines if the same technical problems, it is difficult to have a fixed for IM implementation routines and standard solutions.

So, for the purposes of this article, the authors have provided a solution to the problem of IM messaging “reliability” and “consistency,” but it’s a matter of opinion whether that solution is right for you or not. The words of choose and employ persons say: this article content is for reference only, the concrete solution please do combine own system architecture and realization situation, read a few instant messaging net about this technical topic article, take its essence, find suitable for own technical solution and thought is the most sensible.

Learning and communication:

  • Im/Push Technology Development Exchange 5 groups: 215477170 [Recommended]
  • Introduction to Mobile IM Development: One Entry is Enough: Developing Mobile IM from Zero
  • Open source IM framework source: https://github.com/JackJiang2…

(synchronous published in this article: http://www.52im.net/thread-35…

2. Introduction to this paper

As we all know, instant messaging (IM) systems have to solve the problem of message reliability and message consistency. (PS: If you don’t know exactly what AN IM system is, read this article, Introduction to Zero-based IM Development I: What is AN IM System? ).

These two questions, in layman’s terms, are:

  • 1) Message reliability: simply put, no message is lost. A session party sends a message, and the message reaches the other party successfully and is displayed correctly.
  • 2) Message consistency: including the message consistency of the sender and the message consistency of both sides of the session. The message should not be repeated or out of order. This article will start with a typical IM message sending logic, and explain the principles of message reliability and consistency issues in a simple and easy to understand manner. The technical solutions may not be perfect, but hopefully they will inspire you to solve IM technical problems.

3. Typical IM message sending process

The general implementation of IM messaging can be divided into two stages:

  • 1) The sender sends the message, the server receives and returns the message ACK to the sender;
  • 2) The server pushes the message to the receiver.

The success of sending a message depends primarily on the first stage — whether the server received the message.

For message senders, message states can be divided into three categories:

  • 1) Sending;
  • 2) Sending is successful;
  • 3) Sending fails.

Specifically, the meanings of these three states are:

  • 1) Sending: the sender triggers the sending event to start before receiving the ACK corresponding to the message returned by the server;
  • 2) Sending success: The sender receives an ACK reply corresponding to the message;
  • 3) Sending failure: If the number of resending times exceeds a certain amount, an ACK reply will be given if the message is not received.

The corresponding message sending process is shown in the figure below:

4. IM message reliability

Limited by space, for basic concepts and detailed principles of IM message reliability, it is recommended to read The Introduction to Zero-based IM Development (3) : What is IM System Reliability? This article focuses on the solution.

4.1 Resending Mechanism To ensure the successful sending of messages in the first stage (see section 3, Typical IM Message Sending Process), the method is to set up a resending mechanism:

  • 1) Determine whether to resend the message according to whether the message is received within a certain period of time;
  • 2) If it exceeds the preset duration, it will be sent again;
  • 3) When the number of resend times exceeds the preset number, the message will not be resend, the message will be judged to fail to be sent, and the message sending status will be modified.

PS: For the complete solution level code implementation, you can refer to the code implementation of QoS mechanism in MobileIMSDK.

4.2 Checking Session Records The second stage of Sending messages (see section 3 typical IM Message Sending Process) The server pushes messages to the receiver. If the connection is disconnected, the messages will be lost.

Therefore, to ensure the integrity of the message, you need to establish a connection, according to the last message (ACK) timestamp, retrieve the session record, a period of time to return all the messages. How to elegantly deliver a large number of offline messages reliably.

Another method is to add scheduled polling to check message integrity. The detailed idea is shown in the following figure.

Establish connection flow chart:

4.3 Two Issues You need to Consider When resending messages and checking session records:

  • 1) Whether the message will be sent repeatedly;
  • 2) Whether the order of messages will be disrupted.

Here are two examples.

About message resend:

  • 1) If the server does not receive the message before the message reaches the server at the point where the message is lost, the sender resends the lost message. The server succeeds in receiving the message and will not generate two identical messages;
  • 2) If the server receives the message and returns ACK loss, then it sends the same message again, which may cause message duplication.

On message order issues:

  • 1) If the sender sends three messages in succession, the first and third messages are successfully received by the server, and the second one is lost, will the third message be recorded?
  • 2) If the second message reaches the server, is it before or after the third time (the server will always timestamp the record)?

5. IM message consistency

As in the previous section, for basic concepts and detailed principles of IM message consistency, it is recommended to read Introduction to Zero-based IM Development iv: What is Message Timing Consistency for IM Systems? .

5.1 Deduplicating a MESSAGE Using a UUID To resend a message, you can add a UUID to each message as the unique identifier of the message. The UUID of a resending message remains unchanged. That’s the general idea.

PS: Message IDS are also a big technical topic for IM, so you can read the following series:

  • IM Message ID Technology Topic (1) : Wechat Mass IM Chat Message serial number generation Practice (Algorithm principle)
  • IM Message ID Technology Topic (2) : Wechat Mass IM Chat Message Serial number Generation Practice (Disaster Solution)
  • IM Message ID Technology Topic (3) : Decrypting The Chat Message ID Generation Strategy of Rongyun IM Product
  • IM Message ID Technology Topic (4) : Deep Decryption of Distributed ID generation algorithm of Meituan
  • IM Message ID Technology (5) : Technical Implementation of Open Source Distributed ID Generator UidGenerator
  • IM Message ID Technology (6) : High-performance ID Generator (Tinyid) for Deep Decrypting Didi

5.2 Using vector clock to sort messages For the problem of message sorting: because in chat, the order of messages has an important impact on the expression of the sender, incomplete messages or reversed order may cause semantic incoherence, or even misinterpretation. Therefore, the order in which the sender sends messages needs to be ensured, and the order of messages between the two parties in the session needs to consider the actual situation.

In general cognition: the state is the message that is being sent, should not be seen by the other side, only the message that is sent successfully, can be seen by the other side. However, in the implementation, the success of sending a message is judged by the server receiving the message and returning an ACK, rather than being received by the other party.

Then the question arises: If a message is in the sending state and a message is received at this time, is the message received before or after the message being sent?

This is a context relationship, and the key question is: the sender is sending the message based on which message he saw.

Here is a way of thinking: draw lessons from the vector clock algorithm in distributed systems (see “Vector clock Algorithm in distributed Systems”).

Firstly, the vector clock algorithm is briefly described:

The vector clock algorithm is used to generate the partial order of events and correct the causality in distributed systems. A system contains N nodes, and the message body generated by each node contains the logical clock of the node. The vector clock of the whole system is composed of the n-dimensional logical clock, which is passed in the message body generated by each node.

In short, the realization principle of vector clock algorithm is as follows:

  • 1) In the initial state, the vector value is 0;
  • 2) Each time a node finishes processing node events, the clock of the node is +1;
  • 3) Each time a node sends a message, the system vector clock containing its own clock is sent together;
  • 4) Each time a node receives a message, the vector clock is updated. The clock of this node is +1, and other nodes compare the locally reserved vector clock value of each node with the vector clock value in the message body, and take the maximum value;
  • 5) When the node receives multiple messages at the same time, judge whether there is a partial order relationship between the vector clocks receiving the messages.

To point 5) above:

  • 1) If there is a bias order relationship, the vector clock is merged and the vector clock with a larger bias order is taken;
  • 2) If there is no partial order relationship, it cannot be merged.

Partial order relation: If each dimension of vector A is greater than or equal to vector B, there is A partial order relation between A and B, otherwise there is no partial order relation.

For IM to sort chat messages, it is to deal with the context of the chat messages and determine the causal relationship between the messages.

Reference vector clock algorithm: Assuming that there are N message session parties, the vector clock of the system is composed of n-dimensional clock. The vector clock is passed in the message body sent by all parties and sorted according to the vector clock.

The specific implementation ideas are as follows:

  • 1) The system vector clock is set to (0, 0… , N);
  • 2) The node sends a message to update the system vector clock. The clock of this node is increased by one, and other nodes remain unchanged;
  • 3) The node receives the message, updates the system vector clock, and adds one to the node clock; Other nodes compare the value of the locally reserved vector clock of each node with the value of the vector clock in the message, and take the maximum value.
  • 4) The message order is determined according to the partial order relationship of the system vector clock in the message body.

To point 4) above:

  • 1) If the partial order relation can be determined, it will be displayed from small to large according to the partial order relation;
  • 2) If the partial order relationship of multiple messages cannot be determined, the messages are displayed in the natural order (the order received).

The vector clock solves most of the message consistency problems in theory, but the implementation needs to consider the actual experience when using it.

The most important concern here is whether to force sorting, or whether to move the order between messages if the actual display order and the vector clock are not in the same order.

For example, in a multi-party conversation, if one party’s network speed is very slow, no messages can be received or sent. After he saw the last message, others had started a new topic, and his message about the previous topic was finally sent and received by others.

Here’s the question: Is this message about the previous topic displayed at the end, or is it moved to an earlier time?

  • 1) If the message is displayed at the end, but the content is not relevant to the current topic, others may be confused;
  • 2) If the message is moved to an earlier time, the message may not be seen by others, or there will be a sudden feeling when there is an extra message.

IM scenarios are many and complex, and more often than not, issues need to be considered from a product perspective.

To solve the problem of whether the message needs to be sorted or not, only a general scheme is proposed here: it is suggested that the sorting is not mandatory in the session, and the session history is sorted according to the partial ordering relationship of the vector clock.

6. Summary of this paper

For the reliability and consistency of IM system messages, the message resending mechanism is used to ensure that the messages are successfully received by the server, and the session record check is used to ensure the integrity of received messages, so as to ensure the reliability of the whole message sending process. The use of UUID message deduplication, reference vector clock algorithm for message sorting, to ensure the consistency of messages to provide a solution.

In short, systems like IM look simple, but in fact they are as deep as water. If you are new to IM development, you can start with a systematic study in Beginner’s Entry: Developing Mobile IM from Zero. If you consider yourself an IM veteran, this collection of articles on large-scale architecture design in IM may be a good reference.

7. References

[1] Introduction to Zero-based IM Development (PART 3) : What is THE reliability of IM systems? [2] Introduction to Zero-based IM Development (PART 4) : What is message timing consistency for IM systems? [3] IM message delivery assurance mechanism implementation (1) : Ensure the reliable delivery of online real-time messages [4] IM message delivery assurance mechanism Implementation (2) : Ensure the reliable delivery of offline messages [5] How to ensure the “timing” and “consistency” of IM real-time messages? [6] A Low-cost method to ensure IM message timing Discussion [7] IM group chat messages are so complex, how to ensure not to lose weight? [8] How to design a “failure retry” mechanism for a fully self-developed IM? [9] IM development dry goods sharing: How to achieve elegant reliable delivery of a large number of offline messages [10] From the perspective of the client to talk about the mobile IM message reliability and delivery mechanism [11] A set of HUNDREDS of millions of users OF IM architecture technology dry goods (Part II) : reliability, order, weak network optimization, etc. [12] From novice to expert: How to design a distributed IM system with 100 million messages

This post has been posted on the “Im Technosphere” official account.



▲ The link of this article on the official account is: click here to enter. The synchronous publish link is:http://www.52im.net/thread-35…