This article will take you to build a lightweight IM server from scratch. The overall design ideas and architecture of IM have been described in my last blog. If you haven’t seen it, please click on the development of IM (instant messaging) server from scratch.

This article will give you more details about the implementation. I will explain how to build a complete and reliable IM system from three aspects.

  1. reliability
  2. security
  3. Store design

reliability

What is reliability? For an IM system, the definition of reliability is at least not to lose messages, messages do not repeat, not out of order, to meet the three points, said to have a good chat experience.

Don’t throw the message

Let’s start by never losing news.

First, review the design of the last articleServer architecture:

Let’s start with a simple example: When Alice sends a message to Bob, it might go through a link like this:

  1. client–>connecter
  2. connector–>transfer
  3. transfer–>connector
  4. connector–>client

In this whole link, every link may fail. Although TCP is reliable, it only ensures the reliability of the link layer, not the application layer.

For example, in the first step, the Connector receives a message from the client, but fails to forward it to the transfer. Bob will not receive the message, and Alice will not be aware of the message failure.

If Bob is offline, the message link is:

  1. client–>connector
  2. connector–>transfer
  3. transfer–>mq

If, in step 3, the Transfer receives a message from the Connector but offline message entry fails, the message has also failed to be delivered. To ensure the reliability of the application layer, we must have an ACK mechanism that allows the sender to confirm that the message has been received.

Specific implementation, we imitate TCP protocol to do an application layer ACK mechanism.

TCP packets are sent in the following formatByteIt’s in units, and we’re inmessageThe unit.Each time a sender sends a message, it must wait for an ACK response from the other party. The ACK acknowledgement message should have the ID received by the sender for identification.

Second, the sender needs to maintain a queue waiting for ack. Each time a message is sent, it is enqueued with a timer.

In addition, a thread has been polling the queue, if there is a timeout did not receive ack, it will take out the message and resend.

An ACK message that has not been received due to timeout can be handled in two ways:

  1. Like TCP, it keeps sending until it receives an ACK.
  2. Set a maximum number of retries. If no ACK is received after this number, useFailure mechanismProcessing, saving resources. For example, if yesconnectorNot received for a long timeclientThen you can actively disconnect the connection with the client, and the remaining unsent messages are stored as offline messages. After the client is disconnected, you can try to reconnect the server.

No repetition, no disorder

Sometimes, the ACK may be received slowly due to network reasons, and the sender will send the ACK repeatedly. In this case, the receiver must have a deduplication mechanism. This is done by adding a unique ID to each message. This unique ID does not have to be global; it only needs to be unique within a session. For example, a conversation between two people, or a group. If the network is disconnected and reconnected, the session starts from 0 again.

The recipient needs to maintain the ID of the last message received in the current session, called lastId. Each time a new message is received, the ID is compared to lastId to see if it is consecutive, and if it is not, it is placed in a temporary queue for later processing.

Such as:

  • The lastId of the current session is 1, and then the server receives the message MSG (id=2). If the message is continuous, the server processes the message and changes the lastId to 2.

  • However, if the server receives the message MSG (ID =3), it indicates that the message has arrived out of order. Then, the message will be queued and processed after lastId turns to 2 (that is, the server receives the message MSG (ID =2) and finishes processing).

Therefore, to determine whether the message is repeated, msgId>lastId &&! The queue. The contains (msgId). If a duplicate message is received, you can determine that an ACK has not been delivered and send another ACK.

After receiving the message, the complete processing process is as follows:

The pseudocode is as follows:

Class ProcessMsgNode{/** * private Message Message; Private Consumer<Message> Consumer; } public CompletableFuture<Void> offer(Long ID,Message Message,Consumer<Message> Consumer) {if (isRepeat(id)) {// Message repeat sendAck(id); return null; } if (! IsConsist (id)) {// Message discontinuous notconsistmsmap. put(id, new ProcessMsgNode(message, consumer)); return null; } return process(id, message, consumer); } private CompletableFuture<Void> process(Long id, Message message, Consumer<Message> consumer) { return CompletableFuture .runAsync(() -> consumer.accept(message)) .thenAccept(v -> sendAck(id)) .thenAccept(v -> lastId.set(id)) .thenComposeAsync(v -> { Long nextId = nextId(id); If (notConsistMsgMap. Either containsKey (nextId)) {/ / are in the queue next message ProcessMsgNode node = notConsistMsgMap. Get (nextId); return process(nextId, node.getMessage(), consumer); } else {// There is no next message in the queue CompletableFuture<Void> Future = new CompletableFuture<>(); future.complete(null); return future; } }) .exceptionally(e -> { logger.error("[process received msg] has error", e); return null; }); }Copy the code

security

Both chat logs and offline messages will be backed up on the server, so the security of messages and the protection of customers’ privacy are also critical. Therefore, all messages must be encrypted. In the storage module, there are two basic tables to maintain user information and relation chain, namely im_user user table and IM_relation relation linked list.

  • im_userA table is used to store common user information, such as user names and passwords.
  • im_relationThe following table is used to record friends:
CREATE TABLE 'im_relation' (' id' bigint(20) COMMENT 'id',' user_id1 'varchar(100) COMMENT' id', 'user_id2' varchar(100) COMMENT 'encrypt_key ',' ENCRYPt_key 'char(33) COMMENT' AES key ', `gmt_create` timestamp DEFAULT CURRENT_TIMESTAMP, `gmt_update` timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `USERID1_USERID2` (`user_id1`,`user_id2`) );Copy the code
  • user_id1anduser_id2Is the user ID of each other’s friends. In order to avoid duplication, the user id is stored according touser_id1<user_id2Is stored in the order of, and combined with the index.
  • encrypt_keyIt’s a randomly generated key. When the client logs in, it retrieves all of the user’s data from the databaserelation, stored in memory for subsequent encryption and decryption.
  • When a client sends a message to a friend, it extracts the key of the relationship from the memory, encrypts the key, and sends the message. Similarly, when a message is received, the corresponding key is extracted and decrypted.

The complete client login process is as follows:

  1. The client invokes the REST interface for login.
  2. The client invokes the REST interface to obtain the ownership of the userrelation.
  3. The client sends a greet message to the Connector.
  4. Connector pulls offline messages and pushes them to clients.
  5. Connector Updates user sessions.

Why would connector push an offline message before updating the session? Let’s think about what would happen if the order were reversed:

  1. The userAliceLogging In to the Server
  2. connectorUpdate the session
  3. Push offline Message
  4. At this point Bob sends a message to Alice

If the offline message is still being pushed and Bob sends a new message to Alice, the server will push the message immediately after obtaining Alice’s session. In this case, the new message might be pushed along among the offline messages, and Alice’s messages would be out of order.

We must ensure that offline messages precede new messages.

So if you push the offline message first, then update the session later. In the offline message push process, Alice’s status is “offline”. At this time, the new message sent by Bob will only be put into im_OFFLINE, and the new message will be “online” after the data in im_OFFLINE table is read. This also avoids the disorder.

Store design

Storing offline messages

When the user is offline, the offline message must be stored on the server and pushed after the user goes online. After understanding the previous section, storing offline messages is easy. Add an offline message table im_offline. The table structure is as follows:

CREATE TABLE 'im_offline' (' id' int(11) COMMENT 'id',' msg_id 'bigint(20) COMMENT' id', 'msg_type' int(2) COMMENT 'type ',' content 'varbinary(5000) COMMENT' type ', 'to_user_id' varchar(100) COMMENT 'recipient ID ', 'has_read' tinyint(1) COMMENT 'unread ',' gmt_create 'timestamp COMMENT' create time ', PRIMARY KEY (' id '));Copy the code

Msg_type is used to distinguish message types (chat, ACK). The encrypted message content is stored as a byte array. When a user goes online, pull records based on the condition to_user_id= user ID.

Prevent repeated push of offline messages

Let’s consider the case of multiple logins, where Alice has two devices logged in at the same time. In this case, we need some mechanism to ensure that offline messages are read only once.

CAS mechanism is used to achieve:

  1. First of all, take out allhas_read=falseIn the field.
  2. Check each messagehas_readWhether the value is false or, if so, true. This is an atomic operation.
update im_offline set has_read = true where id = ${msg_id} and has_read = false
Copy the code
  1. If the modification succeeds, it will be pushed. If the modification fails, it will not be pushed.

I believe that by now, students can build a complete and usable IM server by themselves. Please leave more questions in the comments section ~~

Github link: github.com/yuanrw/IM feel helpful to you please click a star bar ~!