Talk about the architecture of high concurrency and long connection: How to realize the millions of online Meipai live barrage system

Live streaming is one of the core functions of live streaming system. How to quickly make a missile barrage system with good scalability? How to deal with rapid business development? I’m sure many engineers/architects have their own ideas. The author of this article is an architect of Meipai, who has experienced the process of live streaming bullets from scratch to big. This paper is a summary of the author’s experience in building barrage system.

Wang Jingbo, graduated from Xi ‘an Jiaotong University, has worked for netease and Sina Weibo, where he was responsible for the construction of open platform business and technical system. He joined Meitu in September 2015 and worked in the Architecture Platform Department. Currently, he is responsible for the research and development of some core businesses and infrastructure, including bullet screen service, Feed service, task scheduling and quality control system. More than ten years of back-end research and development experience, has rich back-end research and development experience, for the construction of high availability, high concurrency system has more practical experience. Talk to him at [email protected].

Live bullet screen refers to the users, gifts, comments, likes and other messages in the live broadcast room, which is an important means of interaction in the live broadcast room. From November 2015 to now, the System has undergone three stages of evolution and can support millions of users online at the same time. It better interprets the process of balanced evolution according to the development stage of the project. The three stages are fast online, high availability guarantee system construction and long connection evolution.

One, fast online

A message model

At the initial stage of design, the core requirement of meipai live live barrage system is to go online quickly and support millions of users online at the same time. Based on these two points, our strategy is the mid-to-early HTTP polling scheme, and the mid-to-late HTTP polling scheme is replaced by the long-connection scheme. So while the business team is developing the HTTP solution, the basic r&d team is also working hard to develop the long-link system.

Live broadcast messaging, as opposed to IM scenarios, has several characteristics

Messages need to be timely, and outdated messages are not important to users.
Loose group chat, users at any time into the group, at any time out of the group;
After the user enters the group, the message during the offline period (answering the phone) does not need to be re-sent.

For users, there are three typical operations in the live broadcast room:

Enter the live broadcast room and pull the list of users who are watching the live broadcast;
Receiving direct broadcast room continuously receives barrage messages;
Send their own messages;

We treat gifts, comments, user data as messages. After consideration, SortedSet of Redis is selected to store messages, and the message model is as follows:

The user sends a message through Zadd, where score is the relative time of the message;
Receive the news from the broadcast room through ZrangeByScore operation, polling every two seconds;
Enter the live broadcast room, obtain the user’s list, through the Zrange operation to complete;

So the overall process is

The process of writing a message is: front-end machine -> Kafka -> processor -> Redis
The process for reading messages is: front-end -> Redis

There is, however, a hidden concurrency problem: users can lose messages.

As shown in the figure above, a user starts pulling from comment number 6, and two users are Posting comments at the same time, comment number 10 and 11. If comment number 11 is written first, the user happens to pull numbers 6,7,8,9,11. The next time the user pulls the message, it starts pulling from number 12. The result: the user doesn’t see message number 10.

To solve this problem, we added two mechanisms:

On the front end, the same message type in the same broadcast room is written to the same partition in Kafka
On a processor, the same type of message in the same broadcast room, guaranteed serial writing to Redis by synchronized.

Once the message model and concurrency issues are resolved, development is relatively smooth and the system is quickly brought online, achieving its pre-determined goals.

After on-line exposure problem is solved

After the launch, with the gradual increase of the quantity, three serious problems were exposed in the system, which we solved one by one

Problem 1: Messages are written in Redis. If there is a large amount of messages in a broadcast room, the messages will accumulate in Kafka, resulting in a large message delay.

Solutions:

Message writing process: front-end machine -> Kafka -> Processor -> Redis
Front-end: if the delay is small, only one Kafka partion is written; If the latency is large, the live message type is written to multiple partions in Kafka.
Processor: if the delay is small, lock serial write Redis; If the delay is large, the lock is cancelled. So there are four combinations, four stalls, one of them

A partion, lock serial write Redis, maximum concurrency :1
Multiple partitions, lock serial write Redis, maximum concurrency: number of Kafka partion
A partion that writes to Redis in parallel without locking. Maximum concurrency: the number of thread pools on the processor
Multiple partions write to Redis in parallel without locking. Maximum concurrency: number of Kafka partitions Number of processor thread pools

Judgment of delay degree: when the front-end machine writes the message, it prints the unified time stamp of the message. After the processor gets the message, the delay time = current time-timestamp;
Slot selection: automatic slot selection, granularity: a certain message type in a broadcast room

Problem 2: When users poll for the latest news, Redis ZrangByScore is required. Redis Slave has a high performance bottleneck

Solutions:

Local cache, the front end machine every 1 seconds or so to take a live broadcast message, the user to the front end machine polling data, from the local cache to read data;
The number of messages returned is automatically adjusted according to the size of the live broadcast room. Small live broadcast rooms return messages with a larger time span, while large live broadcast rooms have stricter restrictions on the time span and number of messages.

Explanation: There is one big difference between local caching and the usual local caching problem: cost.

If all the broadcast messages are cached, assuming that there are 1000 broadcast rooms at the same time, each broadcast room has 5 message types, local cache pulls data every 1 second, and 40 front-end machines, then the access QPS to Redis is 1000 * 5 * 40 = 200,000. The cost is too high, so we only automatically enable local cache in the large studio, not in the small studio.

Question 3: Bullet screen data also supports playback. After the live broadcast, the data is stored in Redis. During playback, the data will compete with the live data for Redis CPU resources.

Solutions:

After the live broadcast, the data is backed up to mysql.
[Fixed] Added a set of replay Redis
Add a local cache for playback on the front-end machine.

When replaying, read data in the following order: local cache -> Redis -> mysql Both LocalCache and playback Redis can store only part of the data of a certain type of broadcast message, effectively controlling the capacity; Redis uses the SortedSet data structure so that the data structure is consistent throughout the system.

High availability guarantee

Deploy two equipment rooms in the same city

It is divided into master machine room and slave machine room. The writing is in the master machine room, and the reading is shared by the two machine rooms. So as to effectively ensure that single room failure, can be quickly recovered.

Rich means of downgrading

Service monitoring for all links

After the completion of the construction of high availability guarantee, there were four live broadcasts of TFBOYS in the United States. The peak online audience of these four live broadcasts reached nearly one million, with 28.6 million views, 29.8 million comments and 2.623 billion likes. During the live broadcasts, the system operated steadily and successfully withstood the pressure.

Replace short connection polling scheme with long connection

The overall architecture of the long connection is shown as follows

Details:

Before using the long connection, the client invokes the routing service to obtain the IP address of the connection layer. Features of the routing layer: a. Can be gray scale in percentage; B. You can set the uid, deviceId, and whitelist. Blacklist: Long connections are not allowed. Whitelist: Long connections are allowed even if they are closed or not in the grayscale range. These two characteristics ensure the smooth switching of long and short connections.
The client features are as follows: a. Supports both long and short connections based on the routing service configuration. B. Automatic degradation. If the long connection fails for three consecutive times, the long connection is automatically degraded to the short connection. C. Automatically reports long-link performance data.
The connection layer is only responsible for maintaining long connections with clients, without any push business logic. In this way, the number of reboots is greatly reduced and the user connection is stable.
The push layer stores the subscription relationship between users and live broadcast and is responsible for specific push. The whole connection layer and push layer have nothing to do with the direct broadcast business and do not need to perceive the change of business.
The long connection service module is used to verify the user entering the live broadcast room.
The communication between the server uses the TARDIS framework developed by the basic RESEARCH and development team to call the service. This framework is based on gRPC and uses ETCD to do service discovery.

Long connection message model

We adopted the subscription push model, and the following figure is the basic introduction

For example, user 1 subscribes to A live broadcast, and A live broadcast receives new news

After the push layer queries the subscription relationship, it knows that user 1 has subscribed to A live broadcast and that user 1 is on the node of connection layer 1, so it will inform the connection layer of new messages
After receiving the notification message, connection layer 1 waits for a short period of time (millisecond level), pulls the message from user 1 again, and pushes the message to user 1.

If it is a large live broadcast room (with many subscribers), the notification/pull model of the push layer and the connection layer will be automatically downgraded to the broadcast model. As shown in the figure below

We went through three iterations of client version and realized the replacement of long connection to short connection at both ends (Android and iOS). Because of the support of gray scale and black and white list, the replacement is very smooth and the user is not aware of it.

Summary and Prospect

This paper reviews the development process of the system, achieves the predetermined goal of using polling in the early and middle stages, using long connection in the late stages, and practices the principle of balanced evolution. In terms of development, there are things planned for the future

If the equipment room is in Beijing, the connection time may be long in some areas in the south. How do we bring long connections closer to the user.
Further evolution of the message model.

Special edition:

Meitu architecture, focusing on virtualization platform construction, streaming media, cloud storage, ten million simultaneous online communication services, audio and video codec and other infrastructure construction, is in urgent need of related field enthusiasts to join, the workplace can freely choose Beijing, Xiamen, Shenzhen, excellent treatment, many beautiful women. The positions in short supply are as follows:

Go/C Developer: Over 50% of our code uses Go
Java Development Engineer
Audio and video codec researchers
Docker virtualization underlying r&d engineer

If interested, please contact: [email protected] or [email protected] or [email protected]

Recommended reading

Reveal a Facebook live video watched by millions
Meitu’s practice in large container platform log (I) Selection thinking
Is WebP ready for mainstream use? Meitu image selection evaluation and optimization process

54 architectural cases were polished by 49 authors for 2 years

High Availability Architecture Volume 1 is available in October

Click to read the original pre-order

Highly available architecture

Change the way the Internet is built

Talk about the architecture of high concurrency and long connection: How to realize the millions of online Meipai live barrage system

Related Posts

Python magic methods & singleton patterns

Solution of the TLS version mismatch between the Netweaver server and client

Guava EventBus source code parsing