This article is originally shared by Rongyun technical team. The original title “Message discarding strategy for mass message distribution in chat rooms” has been revised.

1, the introduction

With the popularity of livestreaming applications, especially the popularity of the concept of livestreaming with goods, the scene of livestreaming rooms with large users has become normal.

The real-time interaction in the large number of users live broadcast room is very frequent, which is embodied technically in real-time messages such as all kinds of user chat, bullet screen, gifts, likes, gags and system notices.

How to deal with such a large amount of real-time messages during distribution so as not to bring down the server, and so as not to cause the APP to flood and lag when it reaches the client (so as not to affect the user experience), obviously requires special technical means and implementation strategy to deal with.

In fact, the real-time message distribution in the live broadcast room is the same as the traditional online chat room in this concept, but in the traditional Internet era, the number of users online at the same time will not be so large, although the magnitude is different, but the technical model is completely applicable.

Based on the background of live broadcast technology practice, this article will share the technical experience summary of real-time message distribution of millions of online users in a single live broadcast room, hoping to give you inspiration.

2. Series of articles

This is the sixth in a series of articles:

  • Chat Technology of Live Broadcast System (I) : Practical Road of Real-time Push Technology of Millions of Online Mepai Live Live Barrage System
  • Chat Technology of Live Broadcast System (II) : Technical Practice of Alibaba E-commerce IM Messaging Platform in group Chat and Live Broadcast
  • Live Chat Technology (III) : Evolution of Message Architecture for single room of 15 million Online Live chat rooms in wechat
  • Live Broadcast System Chat Technology (IV) : Evolution practice of Massive User Real-time Message System Architecture of Baidu Live
  • Live Streaming System Chat Technology (5) : Cross-process Rendering and Stream Pushing Practice of wechat Mini Game Live streaming on Android Terminal
  • “Live Broadcast System Chat Technology (VI) : Practice of Real-time Chat Message Distribution technology in Live Broadcast Room with One Million People online” (* Article)

3. Technical challenges

Let’s take the example of a live broadcast room watched by a million people to see what the technical challenges are.

1) There will be wave after wave of message peaks during live broadcast, for example, “screen rush” messages in live broadcast, namely massive real-time messages sent by a large number of users at the same time. Generally, the message contents of such “screen rush” messages are basically the same. If all messages are displayed on the client, problems such as delay and message delay may occur on the client, which seriously affects user experience.

2) In the case of massive messages, if the server stores each message for a long time, the service cache usage will surge, making memory and storage become performance bottlenecks.

3) In other scenarios, such as the notification message or system notification after the operation of the room administrator in the direct broadcast room, such message is generally more important. How to guarantee its arrival rate in priority?

Based on these challenges, our services need to be optimized based on business scenarios to meet them.

4. Architectural model

Our architecture model diagram is as follows:

As shown in the figure above, a brief description of the main services is provided below.

1) Direct Broadcast Service:

The main function is to cache the basic information of the broadcast room. Includes user lists, gag/block relationships, and whitelisted users.

2) Message service:

The node caches user relationship information and message queue information to be processed.

Specifically, there are two main things.

User relationship synchronization between direct broadcast:

  • A) When members join and quit on their own initiative: The direct broadcast service is synchronized to the ==> message service;

  • B) When the user is found offline during message distribution, the message service is synchronized to the livestream service.

Send a message:

  • A) Broadcast the message to the message service after the direct-broadcast service passes the necessary verification;

  • B) Live broadcast service does not cache message content.

3) ZkZookeeper) :

The main function is to register each service instance to Zk, and the data is used to calculate the drop point of the flow between services.

Specifically:

  • A) Direct broadcast service: the location shall be based on the ID of the direct broadcast room;

  • B) Message service: drop point according to user ID.

4) Redis:

Primarily used as a level 2 cache and as a backup of in-memory data for service updates (restarts).

5. Overall plan of message distribution

The complete logic of message distribution of direct broadcast service mainly includes message distribution flow and message pull flow.

5.1 Message Distribution Process

As shown in the figure above, our message distribution process consists of the following steps:

  • 1) When user A sends A message in the broadcast room, it will be processed by the broadcast room service first;

  • 2) The direct broadcast service synchronizes the message to each message service node;

  • 3) Message service delivers notification pull to all members cached by this node;

  • 4) As shown in Message service-1, a notification is sent to user B.

In addition, due to the large amount of messages, we have a notification merging mechanism in the process of distribution, which is mainly mentioned in step 3 above.

The notification merging mechanism of Step 3 is as follows:

  • A) Add all members to the queue to be notified (update the notification message time if it already exists);

  • B) Thread delivery, rotation training to obtain the queue to be notified;

  • C) Send notification pull to users in queues.

Through the notification merge mechanism, we can ensure that the sending thread will only send one notification pull to the same user in one round, that is, multiple messages will be merged into one notification pull, which effectively improves the performance of the server and reduces the network consumption between the client and the server.

PS: The above notification merging mechanism, in the case of large amount of messages, is very suitable for the use of Actor distributed algorithm to achieve, interested students can further study “Distributed high Concurrency under the Actor Model is so good”, “Distributed Computing technology of Actor Computing mode”.

5.2 Message pulling process

As shown in the figure above, our message pull process mainly consists of the following steps:

  • 1) After receiving the notification, user B will send a pull message request to the server;

  • 2) The request will be processed by the message Services-1 node;

  • 3) “Message service-1” will return a list of messages from the message queue according to the timestamp of the last message passed by the client (see the principle below: ▼);

  • 4) User B receives a new message.

The above steps3The specific logic of the pull message is shown in the figure below:

6. Discard policy for message distribution

For users in the studio, a lot of messages actually did not have much practical significance, such as a large number of repeat refresh messages and dynamic notification and so on, in order to improve the user experience, this type of message is can be discarded strategically (this is the biggest difference between the real time chat messages with IM, IM) are not allowed to throw the news in.

PS: The discarding strategy of message distribution in the live broadcast, together with the notification merge mechanism in the section, makes it possible to distribute a stable and smooth mass of messages directly.

Our discard strategy mainly consists of the following three parts:

  • 1) Uplink speed limit control (discard) policy;

  • 2) Downlink speed limit control (discard) strategy;

  • 3) Anti-discard policy for important messages.

As shown below:

Let’s explain them one by one.

1) Uplink rate limiting (discarding) policy:

For uplink rate limiting, the default value is 200 packets per second, which can be adjusted based on service requirements. Messages sent after the rate limit is reached are discarded by the broadcast service and are not synchronized to each message service node.

2) Downlink speed limit control (discard) strategy:

For the downlink rate limiting control, that is, the length of the message ring queue (see the detailed logical diagram of the message pull in “5.2 Message Pull Process”) is controlled. After reaching the maximum value, the oldest message will be discarded.

After each notification is pulled, the server marks the user as pulling. After the user actually pulls the message, the user removes the mark.

Pull tag role: for example, to generate new messages when the user has a pull in the tag, if from the tag set time in 2 seconds, you won’t notice, reduce the pressure of the client, discarding notice not discard messages), more than 2 seconds, then continue to notice (notice many times continuously did not pull the trigger user play strategy, not the redundancy).

Therefore, whether the message is discarded depends on the client pull speed (affected by client performance and network). If the client pulls the message in time, no message is discarded.

3) Anti-discard strategy of important messages:

As mentioned earlier, certain messages should have higher priority in the live broadcast scenario and should not be discarded.

For example, a notification message or system notification is sent after an operation is performed by the room administrator in the live broadcast room.

For this scenario, we set the concepts of message whitelist and message priority to ensure that messages are not discarded. As shown in the figure at the beginning of this section, the message ring queue can be multiple, and being separated from ordinary live broadcast messages ensures that important messages are not discarded.

1) Upstream Rate Limit Control (Discarding) policy and Downstream Rate Limit Control (Discarding) Policy ensure that:

  • 1) The client will not have problems such as lag and delay due to massive messages;

  • 2) Avoid the situation that messages are flooded and cannot be viewed by naked eyes;

  • 3) At the same time, the storage pressure of the server is reduced, and the service will not be affected by the memory bottleneck of massive messages.

7. Write at the end

With the development of the mobile Internet, the real-time message business model and pressure of the live broadcast room are constantly expanding and changing, and there may be more challenges in the future. Our service will keep pace with The Times and follow up with better solutions and strategies to cope with them.

8: Reference materials

[1] “Should I Use” Push “or” pull “to synchronize online status in IM Chat and Group Chat?”

[2] IM Group Chat Messages are so complex, how to Ensure that not lost and Not heavy?

[3] “How to Ensure The Efficiency and Real-time Performance of Mass Group Message Push in Mobile TERMINAL IM”.

[4] Discussion on Chat Message Synchronization and Storage Scheme in Modern IM System

[5] “Discussion on group Messaging Disorder in IM Instant Messaging”

[6] how to Implement the Read Receipt function of IM Group Chat Messages?

[7] “Should I save one (read) or multiple (write) IM group chat Messages?

[8] “A Set of Highly available, Scalable and Concurrent IM Group chat and Single Chat Architecture Design Practice”

[9] IM group chat mechanism, other than circular messaging, what other ways? How to optimize?

[10] NetEase Cloud Communication Technology Sharing: Practice Summary of 10,000 People Chat Technology Solution in IM

[11] Ali Dingding Technology Sharing: Enterprise-level IM King — DingDing’s Excellence in Back-end Architecture

[12] “Research on storage Space implementation of read and Unread IM Group Chat Messages”

[13] “Enterprise wechat IM Architecture Design Revealed: Message Model, ten thousand people, Read receipt, message Withdrawal, etc.”

[14] “Sharing of Rongyun IM Technology: Thinking and Practice on Message Delivery Scheme of Wancrowd Chat”

(This article has been simultaneously published at: www.52im.net/thread-3799…)