The original title of this article, “Practice of Baidu Live Message Service Architecture”, was originally shared by the Baidu App Message Center team on the “Baidu Geek Talk” public account. In order to make the content of the article more easy to understand, the layout has been optimized and the content has been redivided, and the original link is at the end of the article.

1, the introduction

A complete live broadcasting system has two core functions:

1) Push and pull streams of real-time audio and video;


2) Send and receive the message flow in the broadcast room (including chat messages, barrage, instructions, etc.).

This paper mainly shares the architectural design practice and evolution process of Baidu Live Message System.

In fact: the chat interaction of users in the broadcast room, although the form is a common IM chat message stream, but the live feed message stream is more than just user chat.

In addition to user chat: common real-time reminders of users’ interactive behaviors such as gift sending, entrance, thumb up, shopping, product recommendation by anchors and application for Mai in the broadcast room are also sent through messages.

In addition, some special scenarios, such as the closing of broadcast room and the switching of live stream, also rely on the real-time sending of message flow.

Therefore, the message flow in the live broadcast system can be considered as the basic ability of real-time interaction between anchors and users in the live broadcast room and real-time control in the live broadcast room, as well as the support of the system. If real-time audio and video push and pull stream is the soul of live broadcast system, then message stream can be said to be the skeleton of live broadcast system, its importance is self-evident.

Then, how to build a live broadcast message system, and what challenges need to be solved, through this article we will comb together.

Learning Exchange:

  • Instant Messaging/Push Technology Development Exchange 5 Group: 215477170
  • An Introduction to Mobile IM: One Beginner Is Enough: Developing Mobile IM from scratch
  • Open source IM framework source: https://github.com/JackJiang2…

(synchronous published in this article: http://www.52im.net/thread-35…

2. A series of articles

This is the fourth article in a series:

Chat Technology of Live Broadcast System (I) : Practice Road of Real-time Pushing Technology of Million Online Meipai Live Broadcast Barrage System


“Live streaming system chat technology (2) : technical practice of Ali e-commerce IM messaging platform in the scene of group chat and live broadcast”


“Live streaming system chat technology (3) : the evolution of WeChat live chat room single room 15 million online message architecture”


“Live streaming system chat technology (4) : practice of real-time messaging system architecture evolution for massive users of Baidu Live” (* article)

3. Differences from ordinary IM group chat

In-studio chat messages are often compared to normal IM group chat functions.

Group chat is the familiar instant messaging (IM) scene. There are similarities between studio chat and group chat, but there are also essential differences.

Comparing the characteristics of the two, there are mainly the following differences between live message and IM group chat:

1) Different number of participants: the number of participants in an IM group chat is thousands. However, for large-scale live broadcast scenes with high popularity, such as National Day, military parade, Spring Festival Gala, etc., the cumulative users of single broadcast rooms are millions or even tens of millions, and the number of online users can reach millions at the same time.

2) Different organizational relationships: IM users enter group and exit group, which is a relatively low frequency operation. The user set is relatively fixed, and the change frequency of user entry and exit is not particularly high. Users come in and out of the broadcast room very frequently. A single broadcast room with high heat is faced with tens of thousands of users coming in and out of the room every second.

3) Duration is different: After IM group chat is established, it may last for a long time, ranging from a few days to a few months. Most studios last no more than a few hours.

4. Core technical challenges

Based on the analog analysis of live messaging and IM chat in the previous section, we can extract two core technical challenges for the messaging system in live streaming.

Challenge one: the maintenance of the users in the broadcast room

1) Change in and out of a single broadcast room of tens of thousands of users per second (the actual peak value of entering the broadcast room is not more than 20,000 Qps, and the exit is not more than 20,000 Qps); 2) Millions of users are online at the same time in a single broadcast room; 3) The total number of users in a single broadcast room is tens of millions. Support online millions, cumulative tens of millions of two sets, 40 thousand QPS updates per second, there is some pressure, but there is support for high read and write performance storage should be able to solve, such as Redis.

Challenge two: sending messages from millions of online users

In the face of millions of online users, there are a large number of messages on both sides. From the perspective of live broadcast users:

1) Real time of message: if the message server performs simple peak elimination processing, the accumulation of peak message will increase the overall message delay, and the delay may have a large cumulative effect. There will be a big deviation between the message and the live video stream on the time line, which will affect the real-time interaction of users when they watch the live broadcast. 2) Terminal experience and performance: Terminal displays all kinds of user chats and system messages, generally no more than 10-20 messages per screen. If more than 20 messages are sent per second, the messages displayed on the end will almost always refresh. Taking into account the special effects of gift messages, a large number of messages, end-to-end processing and presentation, bring a continuous high load. Therefore, for a client that watches live broadcast for a long time, if there is a large number of continuous messages, the message consumption of the end will have significant performance pressure, and too many messages will have a cumulative effect. Since technical challenge one is not difficult to solve, the following discussion focuses on technical challenge two.

5. Technical design objectives

Taking into account the technical challenges and live broadcast business scenarios in the previous section, we have a relatively obvious index definition for the demand target of the message system.

The technical design objectives are roughly defined as follows:

1) Real time: end and end messages should reach second level; 2) Performance: the message service can support more than one million users in the same broadcast room to send online at the same time; 3) Peak processing: for excessive messages at peak, discarding is a reasonable and appropriate way to deal with them; 4) Based on reasonable end-user experience, the number of messages per second in a single broadcast room is assumed to be no more than N. Now: The core of the problem is how to deliver no more than N messages to millions of users in the studio in S seconds (assuming N<=20 and S<=2).

6. Find inspiration from the technical implementation of ordinary IM group chat

6.1 Analysis of general IM group chat messages receiving and receiving

IM group chat data flow and pressure points:

As shown in the figure above, firstly, the message receiving and receiving process of ordinary group chat is analyzed in detail:

1) For group group-1, assign a group public message mailbox group-mbox-1; 2) User -1 in Group-1 sends message Msg-1 on the mobile APP-1; 3) When the server receives the message msg-1, it checks whether User-1 has permission. If so, it will store msg-1 in group-mbox-1 to generate the corresponding msgid-1; 4) Search GroupUserList-1 on the server side; 5) based on split out all independent group of users: groupUserList – 1, the user – the user – 1 2… User – n; 6) For each user User-i, it is necessary to query the device where user User-i is device-i-1, device-i-2 and device-i-m (because one account may log in to more than one device); 7) For each device device-i-j, the long connection channel will establish an independent long connection connect-j to serve the device; However, since Connect-j is connected to the long connection service by app-1 on the terminal, which is dynamic, it needs to rely on a routing service Route to complete the query when querying the corresponding relationship between device-i-j and Connect-j. 8) After finding connect-j, you can send a notification groupMsg-1 through connect-j; 9) If user-i is using device-i-j’s mobile APP 1, user-i can immediately receive a notification groupMsg-1 from the long connection connect-j; if user-i is using device-i-j’s mobile APP 1, user-i can immediately receive a notification from Msg-1 on the long connection connect-j. 10) After receiving GroupMsg-notify-1, the message SDK in the mobile APP 1 will issue a message pull request fetchMsg to the server according to the message ID corresponding to the last message latestMsg in the local history message record of the server, namely latestMsgid. Pull all messages from group-1 from latestMsgID+1 to the latest; 11) After receiving fetchMsg, the server fetches all the latest messages from Group-MBox-1 and returns them to the server; If there are too many messages, side paging may be required; 12) The APP on the end pulls all the messages from LatestMsgID +1 to the latest in Group-1, which can be displayed; After the user reads in the session, the read state of all new messages or the session read state needs to be set. 6.2 Main pressure of ordinary IM group chat If the whole process from sending notification to pulling end of ordinary group chat message is fully reused, for a message sent by User-1, MSG-1, if it needs to support a real-time group message of millions of magnitude, there are approximately several challenges of millions of magnitude per second.

GroupUserList-1, which takes millions of seconds to read out of a user list, is the first million-level challenge for storage and services.

Second: to split out all independent users User-i in the group, it needs to query millions of device-i-j at a second level, which is the second million-level challenge for storage and service.

Third: For all device-i-j, through the dynamic routing service route, it takes seconds to query millions of connect-j, which is the third million-level challenge for storage and services.

Fourth, when sending through a long connection connect-j, it is necessary to support GroupMsg-notify-1 to send a million-level group message to the corresponding connect-j in a second level, which is a million-level challenge for the long connection service.

Fifth: For all the terminals APP-1 receiving the message notification, it needs to support one million QPS terminals to pull the message from the server to request FETCHMSG. For the message mailbox service, this is also a challenge of one million orders. Considering that the actual latestMsgID of each end may be different, the possible optimization method will be more complicated and bring greater performance impact.

Sixth: if the vast majority of users are online chat scenarios, setting the read state will also have millions of QPS pressure on the server.

Clearly, the pressure on messaging services and long-connection services to fully reuse group chat message flows is enormous.

6.3 General IM group chat optimization scheme

Pressure points after optimized IM group chat data flow:

As shown in the figure above, let’s now analyze whether there is room for optimization for each of these million-magnitude challenges:

1) For ① split the user list and ② query the corresponding device of the user, if the storage is combined and centralized, that is, the storage of the user list in the broadcast room is optimized, and the device information is expanded, the million QPS query of a User-> device can be reduced once, which can be optimized; 2) For the reliable fetch mode of ④ downlink notification and ⑤ pull fetchMsg, considering that live messages are allowed to be partially broken and discarded, one-way message sending can be done without fetching, which is also acceptable for most users whose connections remain online. So you can optimize to keep only the downlink advice (containing the body of the message) and discard the end pull; 3) for ⑥ message Settings have been read, live scenario can be considered to simplify the abandonment. The above optimization reduces pressure requests by ②⑤⑥ and three million orders of magnitude, but there are still three million orders of magnitude steps that need to be handled: ① split user lists ③ dynamic routing queries ④ long connections.

For the split user list: support million level of user list query, the more conventional idea is to support groupid-based batch query, such as a 100 users can be found, 10 thousand QPS query can support to a million; Based on GroupID, the storage of user data is distributed to multiple master-slave instances and sharding. It can be basically achieved by controlling the granularity of shattering to avoid hot spots, but the storage resources may consume a lot.

For ③ dynamic routing queries: On the surface, the problems are similar to ①, but somewhat different. Because the user list of the group is based on the group groupID as the key, to establish a table or several shattered tables; The query of device-i-j is completely decentralized, which also requires the capacity of batch query. However, the completely decentralized device information query cannot be optimized only for specific keys, and the dynamic routing service is required to support the query performance of millions of QPS on the whole.

For the long connection service sending: since the long connection service does not depend on the external storage service, if the whole to support the million-level sending capacity, if the long connection single instance can support the 10,000-level sending capacity, the whole 100 instances can support the million-level sending capacity.

Based on the above analysis: support for millions of orders of magnitude of the news, the first light. It seems that all you need to do is optimize user lists, dynamically routed storage/queries, and capacity scaling for long connections, but all you need to do is consume a lot of storage and machine resources.

Considering the actual situation of the live broadcast business, the reality is not optimistic:

1) On the one hand, when there is no hot live broadcast at ordinary times, the peak number of concurrent online users in a single live broadcast may not exceed 10,000 or even less than 1,000; In the initial stage of the business, the peak value of the overall live streaming online users may not exceed 100,000. This means that the resource as a whole is tens of times redundant in order to support a million-magnitude peak; 2) On the other hand, if a live broadcast with high popularity suddenly comes, it may need support not only of 1 million orders of magnitude, but of more than 5 million orders of magnitude (such as National Day military parade, Spring Festival Gala, etc.). In this case, each large live broadcast has to estimate the possible online user peak in advance, if it exceeds the current design capacity, need to ① user list ③ dynamic routing query ④ long connection service, respectively capacity expansion and pressure test; Or, where acceptable, do a service downgrade or denial of service. In fact, it is difficult to estimate the peak magnitude of online users accurately, which will result in low actual resource utilization rate, frequent operation of expansion capacity and high operation and maintenance cost. Whether to choose this plan is also very confusing.

6.4 Multi-group scheme of general group chat A scheme of splitting multiple groups has also been proposed.

For example, if a group supports a maximum of 10,000 users, opening 100 groups can support a million users; Creating another virtual group to connect these 100 groups seems feasible.

However, if we analyze carefully, we will find several problems mentioned above: ① split user list, ③ dynamic routing query, ④ long connection sending. High pressure still exists, or is inevitable.

In addition, multiple groups introduce other problems:

1) Problem 1: Multiple group messages are not synchronized. If two users watch live streams together and belong to different groups, they will see completely different messages. 2) Problem 2: In the live broadcast scene, users enter and exit dynamically, that is to say, group members are very unstable, and the peak value of online users fluctuates greatly. If a new group is opened dynamically according to the growth of online population, the first group may have a lot of users, while the second group has fewer users at the beginning. Or, if more groups are opened during the peak period, and users leave as the heat decreases, users become scattered. Some groups may have fewer users and less chat interaction. In this case, it is necessary to shrink and merge the groups. How to balance multiple groups of users to achieve good business results is also more difficult to do. Based on the above analysis, we did not choose the multi-group scheme.

Message architecture practice based on multicast mCAST scheme

After the architectural design of the analog IM group chat message in the previous section, this section will introduce the proposal and evolution of multicast mCAST scheme, which is our live broadcast message architecture supporting real-time high-concurrent millions of simultaneous online users.

7.1 Think out of the original framework. Should we adopt the optimization scheme based on IM group chat in the above section or find another way?

Forget the group sending and receiving message flow for a moment: if there is one step that is necessary for sending a message, it is the long link sending step. Messages never reach the user without being sent over a long connection.

Of course, some people say that polling fetching can also replace sending with a long connection to get messages, but obviously the performance pressure and real-time performance of polling fetching is much worse than sending with a long connection, so it is not discussed.

If you can simply specify a similar groupID when sending a message to a long connection service, the long connection service can be directly split to all group users related to the long connection connect-j, you can omit the user list split and dynamic routing queries of the million scale queries.

In this case, the stress of sending messages will be borne mainly by the long connection service, and the server will not need to expand the capacity of multiple systems, and the optimization of live messages may be greatly simplified.

According to this idea: equivalent to the long connection service, the concept of connecting connect also establishes a group. Based on the idea of connection group, we design a long connection multicast mCAST mechanism.

The basic concepts of long connection multicast are summarized as follows:

1) Each long connection multicast mCAST has a globally unique identifier mCASTID; 2) Long connection multicast mCAST supports creation, deletion, modification, query and other management operations; 3) Long connection multicast mCAST is a set of several long connection connect online users; 4) A user user-i on device device-i-j creates a unique long connection connect-j-k for a specific application app-k (for now, there is no distinction between logged in user and non-logged user); 5) The maintenance of the relationship between the long connection multicast mCAST and the long connection connect-j-k within the group does not require additional independent storage, and it is maintained on each instance of the long connection service. The route-m of multicast mCAST is a collection of long-connection service instances LCSList, which records the long-connection service instance LCS-J where all long-connection connect-I joined mCAST -M.

Logical process of joining multicast mCAST:

1) The client calls the message SDK to join the mCAST – M; 2) The message SDK issues the uplink request MCASTJOIN (MCAST – M) through the long connection; 3) The business layer receives the MCASTJOIN request of Connect-i from the long connection instance LCS-I, and verifies the validity of mCAST -m; 4) The request routing layer of the business layer establishes the multicast routing McStraoute-M based on multicast mCAST -M, and adds the long connection instance LCS-I into the multicast routing McStraoute-M; 5) The business layer requests the long connection service layer, requests the long connection instance LCS-i where MCASTJoin is located, and adds the connection connect-i where the request is located to MCASTConnectList-m. Leaving a multicast mCAST is basically similar to joining a multicast mCAST. The client calls the message SDK to leave the mCAST -M and issues an uplink request for McAstLeave (McAst-M). The long connection server updates the routing and McastConnectList-M information.

7.5 Multicast mCAST message push

Multicast mCAST data streams and pressure points:

The long connection message push process based on multicast mCAST is a 1:M * 1:N diffusion and amplification process.

The specific process is described as follows:

1) A message is pushed by MSG-1 and the destination is the multicast with ID of mCAST – M; 2) According to the destination mCAST-M, the back-end business module selects the mCAST-routing distribution module instance mCAST-I by making a consistent hash, and sends mSG-1 to mCAST-M; 3) MCAST distribution routing module instance Mcastroute-i, according to the multicast routing Mcastroute-m of MCAST -m, find the corresponding routing record list of access instance MCASTLCSlist-m, and split out all long connection access instance LCS-1 of MCAST -m.. LCS-M, send Msg-1 concurrently to the long connection instance respectively; 4) A long connection service instance, LCS-J, receives the message MSG-1 push, according to the multicast mCAST-M connect list mCAST-ConnectList-M, find all the connections in mCAST-M connect-m-1.. Connect -M-N, and send MSG-1 to SDK-M-1… The SDK – m – N; 5) After receiving MSG-1, the message client SDK-M-O will submit it to the upper business (such as the live broadcast SDK). Now look at the performance pressure of the above multicast mCAST mechanism:

1) Routing maintenance of multicast mCAST, the main pressure lies in MCASTJOIN and MCASTLEAVE, and the magnitude peak request of Join is difficult to exceed 20,000 QPS; Access pressure is two orders of magnitude lower than a million; 2) The message push process of multicast mCAST is generally in the order of tens to hundreds of orders when the first-level routing mCAstroute splits to long connection instances, and the cost is very low; 3) The message push of multicast mCAST in a long-connected single instance is sent concurrently by multiple connections in a single process. After the online measurement after optimization, when the single instance maintains a 25W long connection, the single instance’s mCAST that can reach 8WQps is sent stably and evaluated conservatively according to the capacity of 5WQps; Multiple long connection instances are completely concurrent and can be easily expanded horizontally. To sum up, for 100WQPS issue, 20 long connection instances can be fully loaded (20*5W=100W), and there is a certain margin. If 500WQPS is issued, there will be no more than 100 instances. If a 1000W issue is carried with a larger load of 8W single instance, 125 instances can be supported.

It seems that, based on the above multicast mCAST mechanism, we have established a set of efficient long connection issuing mechanism to support millions of QPS, the current long connection service capacity can be supported, basically without capacity expansion. However, whether it can fully meet the requirements of live broadcast business scenes still needs further discussion.

7.7 Message Peak Problems The above multicast mCAST mechanism seems to be able to cope with 1 message per second spreading to 100W or even 500W users.

But the actual situation of the news in the broadcast room is that there will be a lot of upline chat messages from the users of the popular live broadcast every second. In addition to the chat messages, there are many kinds of system messages sent regularly and irregularly in the broadcast room, such as the number of people, entrance, thumb up, and sharing.

If it is assumed that there are 100 messages of all kinds at the peak per second: 100W*100= 100 million, simply calculate it as a single instance of 5WQPS, it needs 2000 instances to support. Although it is much better than the old group chat system, the system still encounters a lot of resource redundancy or needs a lot of capacity expansion to cope with the old problem of peak. Is there a better solution?

A common optimization idea we consider here is to improve system performance through batch aggregation.

If the 100 messages are aggregated and packaged once per second to send uniformly, the QPS is still 100W, and the QPS of the long-connection system remains the same, but the magnitude of messages sent per second can reach 100 million. This aggregation scheme is actually feasible.

In the aggregation mode, the cost we pay is the increase of message delay. The average delay of 1 second of aggregation increases by 500ms, so the loss of user experience is not big, but the magnitude of messages sent by the system can be increased by 100 times, which is reasonable according to the comprehensive evaluation of cost and benefit. Second-level aggregations and delays are acceptable for most scenarios, given the reality of live streaming.

7.8 Message Bandwidth Issues As analyzed in the previous section, the aggregation delay issue and the long-connection singleton QPS issue are resolved, and the consequent bandwidth pressure issue of long-connection singleton is solved.

For example, when a long connection single instance needs to send a 10000 long connection, 100 messages per second, the average message is 2K bytes, the actual bandwidth is 2K10010000*8=15625Mbps, which has exceeded the bandwidth capacity of a single physical machine’s 10,000MB network card.

On the other hand: from the perspective of global bandwidth, it is also as high as 1.5TBPS, and the bandwidth resource will also bring pressure to the outlet of the machine room. Such bandwidth cost is too high, so it is necessary to cut down the use of bandwidth or have a better alternative scheme.

Faced with the problem of excessive bandwidth consumption of outgoing data, we adopted the solution of data compression without changing the business data.

However, compression is a CPU-intensive operation. Due to the real-time nature of the live broadcast service, the compression ratio cannot be simply considered. After balancing the compression ratio, compression delay and compressed CPU consumption, the measured average compression ratio reaches 6.7 after tuning the compression library: 1. Compress the data volume to about 15% of the original, so that 15625Mbps*15%=2344Mbps=2.29Gbps. The bandwidth capacity of a stand-alone 10 megabit network card can carry up to 427 thousand long connections, although it does not reach 50 thousand, it is basically acceptable.

In terms of global bandwidth, the peak is also cut to no more than 230Gbps, and the benefits are significant.

7.9 The client performance problem is further considered. In the live broadcast scenario, there is not only a high peak message magnitude, but also a continuous high message magnitude pressure during the live broadcast. This is not only a stress for the server, but also a challenge for the client.

Consistently high message magnitude:

1) On the one hand, there is obvious pressure on the client in terms of reception and presentation; 2) On the other hand, too much and too fast news refresh on the live broadcast interface is also harmful to user experience. So: in the overall balance user experience and client performance, on the basis of the message server adds the combination of message priority classification speed limit frequency control mechanism, single-user client does not need to bear the pressure of 100 per second, cutting issued news every second, and 5-80000 connections per second single instance allots long connection, CPU and bandwidth can be stable support.

We provide a real-time sending mechanism based on message priority:

1) For high quality messages, aggregation can be triggered immediately without increasing aggregation delay; 2) For ordinary, medium and low excellent messages, delay aggregation is still done. 7.11 User Online Problems The starting point of multicast mCAST mechanism is to guarantee the message arrival of online users, allow non-online users to receive part of the message loss, pay reasonable technical complexity and cost, and achieve the balance of service quality and performance under the scenario of millions of orders of magnitude of high concurrent online.

And for the message arrival of online users, there is a key problem is how to ensure the long connection of users online.

In order to improve the access stability and accessibility of long connection services, we have optimized the following aspects.

1) Access point:

Long Connection Service has deployed access points in North China, East China and South China regions of the three major domestic operators. In view of the live broadcast scenes with some foreign users, an independent access point entrance of the computer room in Hong Kong is also added.

2) HTTPDNS:

DNS hijacking issue for some users and parse error problem, message SDK HTTPDNS access service and optimize the local cache, form a multi-level DNS security system, improve the reliability of the domain name resolution, reduce DNS hijacking and error rates (see “baidu APP mobile terminal network sharing of best practice in depth (a) : DNS optimization”).

3) Optimization of heartbeat:

The heartbeat of long connection is an important means to keep alive and explore alive. In view of the high real-time characteristics of live broadcast scenes, in order to find the broken chain of long connection as soon as possible, after multicast MCASTJOIN, the heartbeat of long connection is also adjusted to the intelligent heartbeat with shorter interval and dynamic control of the server.

In this way, the messaging SDK can quickly and proactively reconnect if a connection exception is detected in time.

4) Chain break recovery:

In the case that the broadcast room user has joined the multicast mCAST, if the long connection is broken, the long connection server will actively or passively trigger to clear the multicast mCAST member.

When the long connection is rebuilt and restored, the broadcast service layer also needs to monitor the connection recovery signal and rejoin the multicast mCAST to restore the message path of the multicast mCAST.

To sum up, the multicast mCAST mechanism is:

1) Effectively solve the problem of real-time message sending among millions of simultaneous online users; 2) For short-term chain breaks and too many messages, partial messages are allowed to be discarded; 3) It meets the design goal of live broadcast scene message. The characteristics of multicast mCAST mechanism are:

1) The pressure of message service and routing layer is relatively light, the overall pressure is only borne by the long connection layer, which is easy to expand horizontally; 2) The issue based on delay aggregation and the compression speed limit can well solve the performance problems of downlink QPS and bandwidth; 3) The overall downlink QPS and bandwidth of the system are completely controllable. The maximum downlink QPS for a 100W online user is 100W, and the maximum downlink QPS for a 500W online user is 500W. Single instance issue capacity of 50,000-80000 QPS is stable. Therefore, it is easy to judge the overall system capacity, whether special scenes need expansion; 4) Although the MCAST mechanism is proposed for live broadcast scenarios, its design has universality, and it can be applied to other message push scenarios requiring real-time group of a large number of users online.

8. Further expansion of message architecture based on multicast mCAST scheme

Since the multicast mCAST mechanism has solved the problem of millions of online users sending real-time messages, the scenarios of live broadcast messages are constantly expanding, and new demands of live broadcast innovation businesses are constantly put forward.

Correspondingly, the service mechanism of multicast mCAST also needs to keep pace with The Times and continuously expand and optimize in depth and breadth. The following focuses on the history and gift messages.

8.1 Support for the historical news of the broadcast room For users who have just entered the broadcast room, they need to see some recent chat records to enhance the interactive atmosphere of the chat and help them understand the progress of the live broadcast; Users who are interested in the history of the chat can also trace more message history. This creates the need for chat history.

To support this type of historical messaging requirement, the extended solution is to open a multicast public messaging mailbox mCAST -mbox service for each multicast mCAST request.

The logic goes like this:

1) For user messages and other messages that need to be persisted, all write into this message mailbox; 2) The user can specify the multicast McAstID, according to the time interval and the number of message to pull, to get the multicast McAst history message. The following supplement explains the concept and application of message information.

What is the concept of message mail service?

1) a message in the message mailbox MSG, has a unique message identifier msgID; 2) A message MSG also includes fields such as sender information, receiver information, message type, message content, etc., which can be temporarily ignored here; 3) Each message can be set expiration time, the message can not be accessed after expiration; 4) Each message can be set to read state; 5) a message mailbox mbox, with a unique mailbox identifier mboxID; 6) a message mailbox Mbox is a container, store an ordered list of messages msgList; Message list sorted by msgList; 7) Message mailbox service, which supports the writing of single message or batch message to the specified mailbox MBox; 8) Message mailbox service, support the search of single message or batch message based on MSGID to the specified mailbox Mbox; 9) Mailbox service, support for the specified information mbox from msgID-begin to msgID-end range of search. In fact, the most commonly used is a message pull based on the MSGID range. The message mailbox service here is the Timeline Timeline model. Those who are interested can refer to the relevant information of Timeline model further (see “4. Timeline model” in “Discussion on Synchronization and Storage Scheme of Chat Messages in Modern IM System”).

8.2 Support of broadcast gift message

Gift message:

Gift message scenario analysis:

1) When users give gifts to anchors, anchors need to receive gift information as soon as possible and reliably so as to give feedback to users in time; 2) For users who send gifts, the local effect can be displayed in a timely manner, and there is no strong demand for message notification; 3) Other users in the broadcast room need to receive gift messages to show the effect of gifts, improve the interactive atmosphere in the broadcast room and inspire other users to give gifts; 4) Gift messages involve user orders and purchase behaviors, which need to be confirmed and sent by the server; 5) The priority of gift message is obviously higher than that of other chat messages and system messages. Based on the above analysis, the following technical expansion scheme is proposed for live broadcast news:

1) Add an independent reliable message multicast mCAST channel (multicast mCAST -2 in Figure 4) for sending and receiving high quality reliable messages; It is isolated from other common messages and system messages at the data flow level to reduce mutual interference. 2) For the common user side message SDK, although the gift message multicast mCAST channel is a new independent channel, the message sending and receiving logic is consistent with the common message multicast mCAST channel; 3) For the anchor side, the end message SDK needs to support the push-pull combination mode for the MCAST channel of gift message multicast to ensure the arrival of all gift messages; Even if there is a short drop, also need to get all the gift messages; 4) For the anchor side, in extreme cases, if the long connection is abnormal, the message SDK can poll through the short connection interface to pull the gift multicast mCAST mailbox message to find the bottom. Based on the above independent and reliable message multicast mCAST channel scheme, the touch rate of gift message has reached more than 99.9% without eliminating some abnormal scenarios, such as the anchor logging off and not closing the broadcast, and the accidental data hits lost.

8.3 Development of Other Aspect of Live Messages In the course of development of Baidu’s live broadcast, the live broadcast message service is also faced with many other fundamental problems and other challenges brought by innovative business.

Now these problems have a better solution, the following list of some, for everyone to learn reference:

1) How to support a variety of client scenarios, Android, iOS, H5, small programs, PC; 2) How to support the access of the same live broadcast message on Baidu APP and matrix APP such as good-looking, Quanren and Tieba; 3) How to support non-logged users: IM generally supports logged users, while live broadcasting scenes also need to support non-logged users; 4) If there is a serious problem with the long connection service, whether there is a downgrade channel for the end to get the message; 5) How to do the audit by the computer reviewers of the live broadcast news audit, how to support the first review after the review and the first review after the release; 6) How to support messages across multiple broadcast rooms; 7) How does the live broadcast message service support innovative businesses, such as question answering live broadcast, live broadcast with goods, live broadcast with Mai, etc. Due to space limitation, the above issues will not be discussed in detail here. Interested students are welcome to discuss.

9. Review and look ahead

In the past few years since the launch of Baidu Live, the live broadcast message service has risen to the challenge and overcome all the difficulties to escort Baidu Live and provide a solid technical support and guarantee for Baidu Live.

In the future, in terms of supporting the innovative business of live broadcasting, fine-grained message grading service, stability and performance of the basic service of live broadcasting, live broadcasting service will continue to make efforts to lay a solid foundation and keep innovating, so as to support the better and faster development of live broadcasting business.

Appendix: More related articles

[1] The article about IM group chat:

Fast Fission: Witness the Evolution of WeChat’s Powerful Backend Architecture from 0 to 1 (Part 1)


How to ensure the “timing” and “consistency” of IM real-time messages?


“Should Online Status Synchronization Use” Push “or” Pull “in Single IM Chat and Group Chat?”


IM group messaging is so complicated, how do you make sure it’s not lost or lost?


WeChat Background Team: Practice Sharing of Optimizing and Upgrading WeChat Background Asynchronous Message Queuing


“How to ensure the efficiency and real-time performance of the push of large-scale group messages in mobile terminal IM?”


Discussion on Synchronization and Storage Scheme of Chat Messages in Modern IM System


Discussion on the Disorder of IM Group Chat Messages


How to implement the Readed-receipt function of IM group chat messages?


Should I save one copy of IM group chat messages (i.e., read them widely) or multiple copies (i.e., write them widely)?


Design Practice of a High Availability, Easy Scalable and High Concurrency IM Group Chat and Single Chat Architecture


“[Technology Imagination] Could it be technically possible to pull 1.4 billion Chinese people into a WeChat group?”


“IM Group Chatting: What Are Other Ways to Send Messages in Loops? How to Optimize?”


“Netease Cloud Message Technology Sharing: A Practical Summary of the Technical Solution of Ten Thousand People Chat in IM”


“Ali Dingding Technology Sharing: The King of Enterprise IM — The Excellence of Dingding on the Backend Architecture”


Discussion on the Realization of Readable and Unread Function of IM Group Chat Messages in Storage Space

[2] More articles on live broadcasting technology:

Real-time Audio and Video Broadcast Technology on Mobile Terminal (Part 1)


Real-time Audio and Video Broadcast Technology on Mobile Terminal (II) : Collection


Real-time Audio and Video Broadcast Technology on Mobile Terminal (III) : Processing


Detailed explanation of real-time audio and video broadcast technology on mobile terminals (4) : coding and packaging


Detailed Analysis of Real-time Mobile Audio and Video Broadcast Technology (V) : Streaming and Transmission


Real-time Audio and Video Broadcast Technology on Mobile Terminal (VI) : Delay Optimization


Theory with Practice: Implementing a Simple Live Video Live Based on HTML]5


“Real-time Video Broadcast Client Technology Review: Native, HTML]5, WebRTC, WeChat applet”


Android Live Introduction Practice: Build a Simple Live System


“Taobao Live Broadcast Technology Dry Goods: Decryption of High-definition and Low-Delay Real-time Video Live Broadcast Technology”


Technical Dry Goods: Optimization Practice within 400ms of the First Screen Time of Live Video Broadcast


“Sina weibo Technology Sharing: Practice of a Million High Concurrency Architecture for Real-time Broadcast Answers on Weibo”


Summary of the Technical Principles and Practices of Real-time Audio Mixing in Live Video Broadcasting


“Qiniu Cloud Technology Sharing: Using QUIC Protocol to Realize Real-time Video Broadcast 0 Slow!”


Sharing of the Realization Ideas and Technical Difficulties of the Recently Hot Real-time Broadcast Question Answering System


How can P2P technology reduce live video bandwidth by 75%?


Some Optimization Ideas of netease Real-time Video Broadcast in TCP Data Transmission Layer


First disclosure: How does Kuaishou manage to keep millions of people watching live without losing time?


A Brief Discussion on Several Key Technical Indicators Directly Affecting User Experience in Real-time Audio and Video Broadcast


Tech Revealed: Facebook Live Video Live that Enables Millions of Fan Interactions


Real-time Video Broadcast Technology Practice on Mobile Terminal: How to Achieve Real-time Opening, Smooth and Unstuck


“Practice Sharing of Realizing 1080P Real-time Audio and Video Broadcast with Delay Less than 500 milliseconds”


Discussion on the Technical Key Points of Developing Real-time Video Broadcast Platform


The Evolution Road of Video Broadcast System Architecture for Massive Real-time Messages (Video +PPT)[Download the Attachment]


“Deep Optimization Practice Sharing of YY Live in the Mobile Weak Network Environment (Video +PPT)[Download Attachment]”


“From 0 to 1:10,000 people online real-time audio and video broadcast technology practice sharing (video +PPT) [attachment download]”


Best Practices on Server Architecture of Online Audio and Video Broadcast Room (Video +PPT) [Download Attachment]

This article has been published simultaneously on the public account of “instant messaging technology circle”.



▲ The link to this article on the public number is: click here to enter. The synchronous publishing link is:http://www.52im.net/thread-35…