This article is originally shared by rongyun technical team. The original title is “COMPREHENSIVE analysis of IM message synchronization mechanism”. In order to make the article better understand, the content has been summarized and revised in detail.

1. Content Overview

Instant messaging (IM) system is the most basic, the most important is the timeliness and accuracy of the message, timely reflected in delay, accurate is not lost, not heavy, not out of order.

Considering business scenarios, system complexity, network traffic, terminal energy consumption, etc., our hundred-million-level distributed IM message system has carefully designed the message sending and receiving mechanism, and has been refined and optimized to form the current reliable message delivery mechanism.

The whole idea is:

  • 1) Clients and servers cooperate and complement each other;
  • 2) Adopt multiple mechanisms to guarantee from different levels;
  • 3) Split the upstream and downstream and deal with them separately.

This paper summarizes the reliable delivery mechanism of distributed IM message according to the technical practice of Rongyun yiyi level IM message system, hoping to play a role in your IM development and knowledge learning.

2. The overall principle of message interaction between client and server

2.1 an overview of the

A complete IM message interaction logic usually consists of two paragraphs:

  • 1) Message upstream segment: that is, the message sender sends the message to the server through IM real-time channel;
  • 2) Message downlink: the message is delivered to the final message receiver by the server according to certain policies.

2.2 Message upstream segment

The upstream segment of the message relies on the REAL-TIME CHANNEL of IM to deliver the message to the server.

The reliable delivery of messages in this stage needs to be guaranteed from the Protocol layer, which needs to provide reliable and orderly bidirectional byte stream transmission, which is achieved by our self-developed communication Protocol RMTP (RongCloud Message Transfer Protocol).

The client and server use a long connection to transfer data based on RMTP.

RMTP interaction diagram:

As shown in the preceding figure, the protocol layer uses QoS and ACK mechanisms to ensure the reliability of data transmission in the upstream segment of IM messages.

2.3 Message downlink

After summarizing, there are three main behaviors in the message downlink segment.

1) The client pulls the message actively, and there are two trigger modes for the active pull:

  • (1) Pull offline message: A new connection is established with the IM service, which is used to obtain messages that are not received during the offline period.
  • ② Timed pull message: Start the timer after the client finally receives the message, for example, every 3-5 minutes. It is mainly used for two purposes. One is to prevent notification delivery failure caused by uncertain factors such as network and intermediate devices, and the state of the server and client is inconsistent. The other is to implement state machine protection for the service layer through this request.

2) The server takes the initiative to send messages (direct messages) :

This is one of the online message sending mechanisms. The server sends the message content directly to the client. It is suitable for low message frequency and continuous interaction, such as normal communication between two people or groups.

3) Server initiative – send notification (notification pull) :

This is one of the online message sending mechanisms. It is simply understood that the server sends a notification to the client, which contains the timestamp and other contents that can be used as sorting index. After receiving the notification, the client compares the timestamp in the notification according to its own data and initiates the process of pulling the message.

This scenario applies to more messaging: for example, someone has many large groups, with many members in a heated discussion within each group. The notification pull mechanism can effectively reduce the number of network interactions between clients and servers and package multiple messages to increase the data payload. It can guarantee both time-effectiveness and performance.

Schematic diagram of client server downstream message interaction:

3. Concrete realization of message interaction between client and server

As mentioned in the previous section, we split the flow of message interaction: upstream and downstream.

3.1 the upside

Ensure the order of sending messages in the uplink process. To ensure the order of messages, the best way is to separate them by userId and then use timestamp sort. In distributed deployment, users are assigned to a fixed service server (PS: different ends of the same account are connected to the same service server), making uplink sorting easier. Owning to the same server at the same time makes it easier to maintain multiple servers.

Client connection process:

  • 1) The client obtains the token for connection through the APP Server;
  • 2) The client uses the token to obtain the specific ACCESS server (CMP) to be connected through the navigation service. The navigation service calculates the access server through the userId, and then delivers it so that a client can connect to the same access server (CMP).

The schematic diagram is as follows:

To summarize: after sending a message, the client sends it to the specified message server according to the userId through access service, generates the message Id, and confirms to update the timestamp of the current message according to the time of the last message (if the same timestamp exists, it will be postponed). The timestamp, along with the message Id, is then returned to the client via Ack; Then, the upstream messages are cached and stored persistently using userId + timestamp. Subsequent service operations use this timestamp.

The above business process is called the upstream process, and the messages stored in the upstream process are outbox messages.

PS: A note about message ids:

We use a globally unique message ID generation strategy. Ensure that messages can be identified by IDS and are reweighted. The structure of the message ID is shown below.

For details about how to generate unique IDS in distributed scenarios, see IM Message ID Technical Topics (3) : Chat MESSAGE ID Generation Strategy for Decrypting Rongyun IM Products.

3.2 the downside

After the message node processes the upstream flow, the message is delivered to the message node according to the target user to enter the downstream flow.

Downstream, calculates whether the timestamp needs to be updated (forward) based on the target userId and the timestamp generated during the upstream process of this message.

If updates are required, the timestamp is added until the current user timestamp is not repeated.

After this processing, the storage of the target user and the rearrangement of the message received by the client can be consistent, and the timestamp within the same session can be ordered. In this way, messages from the same receiving user will not be out of order.

So far: we have introduced the downstream interaction process of the message. The specific implementation of the downstream process of the message is not simple, and the details will be expanded in the following.

1) Direct message:

A message that the server actively sends (to the target client) :

  • 1) The client SDK determines the latest message timestamp stored locally and uses it for sorting and other logic;
  • 2) Send one message directly to the same user, and send other notifications. During notification pull, the client selects the latest local message timestamp as the start pull time.
  • 3) In the message sending process, if the last message sending process is not finished, the next message will not be sent directly (s_msg), but with notification (s_NTF).

Straight hair logic diagram:

2) Inform Rato:

That is, the server actively sends a notification (to the target client) :

  • 1) The server carries the timestamp of the current message in the notification body. Delivery to the client;
  • 2) After receiving the notification, the client compares the local message timestamp and selects whether to send and withdraw message signaling.
  • 3) After receiving the pull message signaling, the server queries the message list (200 or 5M) starting from the timestamp carried by the signaling and responds to the client.
  • 4) After receiving the packet, the client sends ack to the server, and the server maintains the status;
  • 5) The timestamp used by the client to pull the message is the timestamp of the latest local message on the client.

Schematic diagram:

In the figure above, steps 3-7 May need to be repeated several times, with the following considerations:

  • A. The client receives too many messages at one time and the response volume is too large. The transmission process has higher requirements on network quality, so it is carried out in batches according to the quantity and message volume;
  • B. If too many messages are pulled at one time, the client processing will occupy a lot of resources and may be stuck, resulting in poor experience.

3) Switching logic between direct message and notification pull on the server:

It mainly involves updating the state machine.

The following diagram integrates the direct message and notification pull process for updates to the state machine:

At this point, the whole core process of message sending and receiving is introduced, and the remaining content will introduce multi-terminal online message synchronization processing.

4, multi-terminal online message synchronization

Multi-endpoint is also divided into sender multi-endpoint synchronization and receiver multi-endpoint synchronization according to the upstream and downstream phases of the message.

4.1 Sender Multi-end Synchronization

Earlier in the process of clients connecting to the IM service (see section 4.1 of this article), we had multiple clients of the same user converged on the same service, and it was easy to maintain multiple sides of a userId.

The specific logic is:

  • 1) After the user has successfully connected multiple terminals, it sends a message. After the message reaches CMP(IM access service), CMP does basic check and then obtains the connection of other terminals of the user;
  • 2) The service encapsulates the upstream messages of the client as downstream messages of the server and delivers them directly to other clients of the user. This completes the sender’s multi-end cc, which then delivers the message to the IM service. The normal sending and delivery process is displayed.

For point 2 above, the sender’s multi-endpoint synchronization does not go through the IM Server, which has the following advantages:

  • 1) Relatively fast;
  • 2) The fewer service nodes it passes through, the smaller the probability of problems.

4.2 Receiver Multi-terminal synchronization

The specific logic is:

  • 1) After receiving the message, THE IM service first determines the delivery range of the receiver, which range refers to which terminals of the receiving user want to receive the message;
  • 2) THE IM service sends the range and the current message to THE CMP, which matches the receiver’s terminal according to the range and then delivers the message.

The application scenario of multi-terminal message synchronization on the receiver generally applies to all terminals.

However, there are some special services: for example, WHEN I control the state of another client on CLIENT A, I may need some command messages. In this case, I need this scope and specific delivery messages.

At this point, we have shared information about the core IM message processing process, which provides a reliable message delivery mechanism through layers of logic. (This article has been simultaneously published at: www.52im.net/thread-3638…)

5. Reference materials

[1] “Introduction to Zero-Base IM Development (ii) : What is Real-time IM System?” [2] Introduction to Zero-Base IM Development (III) : What is IM System Reliability? [3] Introduction to Zero-Base IM Development (4) : What is Message Timing Consistency in IM Systems? [4] Realization of IM Message Delivery Guarantee Mechanism (I) : Guarantee reliable Delivery of Online Real-time Messages [5] Realization of IM Message Delivery Guarantee Mechanism (II) : Guarantee Reliable Delivery of Offline Messages [6] IM Development Dry Goods Sharing: How to Gracefully Realize the Reliable delivery of a large number of offline messages [7] understanding the “Reliability” and “Consistency” Problems of IM Messages and Discussion on Solutions [8] How to Ensure the “timing” and “Consistency” of IM Real-time Messages? [9] IM Group Chat Messages are so complex, how to Ensure that not lost and Not heavy? [10] “Message Reliability and Delivery Mechanism of Mobile IM from the Perspective of Client” [11] “A Set of IM Architecture Technology for 100 million Users (Part 2) : Reliability, Order, Weak Network Optimization, etc.”

The following are other articles shared by rongyun technical team:

[1] IM Message ID Technology Special Topic (III) : Decoding chat Message ID Generation Strategy of Rongyun IM Products [2] Sharing of Rongyun Technology: Optimization practice of real-time Audio and Video First Frame Display Time based on WebRTC [3] Sharing of Rongyun Technology: Practice of Network Link Preservation Technology of Rongyun Android IM Products