A background

Multi-terminal scenario: The mobile phone, PC, and PAD synchronize messages

Message storage model

Generally, the core architecture of IM system is divided into three parts: message management module, message synchronization module and notification module. The functions of these three modules are as follows:

  • The message management module is mainly responsible for receiving and storing messages.
  • The message synchronization module is mainly responsible for storing and pushing downlink message data and its status.
  • The notification module is mainly responsible for maintaining third-party channels and notification functions.

The core of message management module is message storage model. The selection of storage model directly affects the realization of message synchronization module. Message, session, session and message organization are implemented in different ways in the mainstream IM systems, but there are no more than two forms: write-spread read aggregation and read-spread write aggregation. Read/write diffusion is the form in which messages are stored in a group session, as described below.

  • In a read-spread scenario, messages belong to the session, which is equivalent to storing a conversation_MESSAGE table in the database that contains all messages generated by the session. The advantage of this storage mode is that the message storage efficiency is high, only the binding relationship between the session and the message can be saved.
  • In the write-spread scenario, the messages generated by the session are delivered to the Message_Inbox table, which is similar to the inbox for personal mail and holds all the sessions of the individual, with messages in the session arranged in the chronological order in which they were generated. The benefit of this form of storage is flexible message state management, where each message in the session can present a different state to a different recipient.

If read diffusion is adopted, the efficiency of data consistency processing and data change becomes the bottleneck of system performance in the scenario of concurrent data modification. Therefore, the case described below takes a write-spread approach to implementing the message storage model to support higher update performance at a higher storage cost.

Three message synchronization modules

The core problem of multi-terminal synchronization lies in the consistency of multi-terminal data. The IM system needs to record the order of messages and the synchronization point of each terminal to achieve the final consistency of messages.

Since we use write diffusion to record messages, the system needs to:

  • Create a Message_Inbox for each user to store messages for that user.
  • Create an incremented SYNc_ID for each message to record the order of the messages.
  • Records the synchronization ID of the user on each client.

By comparing and synchronizing user data on each client, multi-terminal data synchronization can be achieved. The detailed implementation method is as follows.

1. Refine the data structure. A unified message data structure is extracted from various events in the IM system, such as new messages, read messages, adding and deleting session information, etc. An example of a message data structure is as follows:

struct message {
  int type; // Business type
  string data; // Business data
}
Copy the code

2. Select a storage product. The selection basis mainly includes the following two points:

  • The system needs to assign an incremented SYNc_ID to each message in the Message_Inbox, so products used to store message data need to implement atomic-incremented queues.
  • Complete messages need to be saved to persistent storage (such as PolarDB) in the message management module of IM system, while message_Inbox data need not be stored for persistent storage and only need to be stored for a period of time (such as one week), so the storage capacity is not high.
  • The Redis version of the cloud database with counters and Sorted Sets meets these requirements.

3. Use the Hash structure of the cloud database Redis to store the synchronization ID of each user on the client.

Four scenarios

The following figure shows a detailed implementation of multi-endpoint synchronization based on a case study.

After the new message is stored in the database, the message push logic is triggered. The system obtains the current point of all client devices based on the user name, and then obtains all messages from the message queue to the latest point, and then pushes them to the client device. After push, update the current point information of the device. Sample code for the key steps is shown below.

4.1 Importing New Messages into the Database

sync_id = INCR bob
ZADD bob $sync_id message:{type:new_message, data:"{msgid:991,cid:123,text:"hello"}"}
Copy the code

4.2 Obtaining message Range:

ZRangeByScore bob 100103 100310
Copy the code

4.3 Obtaining point Locations of Client Devices

4.4 Adding or Updating client Device Information:

HSET bob dev_1001 100103
HSET bob dev_1002 100202
Copy the code

conclusion

IM communication has become one of the most common communication methods in the Internet environment. With the rich data structure of Redis version of cloud database, you can build a highly available IM system. Not only the message synchronization module mentioned in this article, but also the message storage module of IM system can be accelerated using Redis, and finally build a reliable IM system that supports large-scale access.