This article was originally published by “Kellyliang”, engineer of wechat development team, on the public account of “wechat Background Team”. It has been revised and changed from time to time.

1, the introduction

With the growth of livestreaming and livestreaming scenes in wechat, the demand of these businesses for temporary message channels (real-time messages during online status) is increasing, and the livestreaming chat room component comes into being. The live chat room component is a temporary message channel based on the room, which mainly provides the functions of message sending and receiving, online status statistics and so on.

This paper will review the technical design and architecture evolution of the message component of wechat live chat room with a large number of simultaneous online users in a single room, hoping to bring inspiration to the design of real-time chat message architecture in your live chat interaction.

This article has been simultaneously published in the “instant messaging technology circle” public account, welcome to follow. The link on the official account is: Click here to enter.

2. Related articles

  • Technical Challenges and Architecture Evolution PPT of Tencent QQ140 million Online Users
  • How to Ensure the Efficiency and real-time performance of Mass Group Message Push in MOBILE TERMINAL IM?
  • Discussion on Synchronization and Storage of Chat Messages in Modern IM System
  • Summary of architectural Design Steps of Massive Social System based on Microblog Application Scenarios
  • Design practice of a set of highly available, Scalable and Concurrent IM Group chat and Single Chat Architecture
  • Ali Technology Sharing: E-commerce IM Messaging Platform, technology practice in group chat and Live broadcast
  • A WebSocket chat room Demo: Based on Node.js +socket.io

3. 15 million online challenges

After the live broadcasting of video number was launched, it was proposed in the product that the backstage of live broadcasting should have the technical ability to support 1500W online in a single room. When receiving this project, it naturally reminds people of a very interesting proposition: can we pull 1.3 billion people into a group?



This article will introduce the chat room component in the evolution process of thinking, to do further exploration of this proposition, try to put forward a more close to the proposition of the answer.

4. Live chat room 1.0 architecture

As you can see in the figure above, the Live chat 1.0 architecture is relatively primitive and straightforward, without many complex technical applications.

This architecture was born in 2017. It mainly serves wechat e-sports direct broadcast room, and the core is to realize high-performance, high-real-time and high-scalable messaging architecture.

5. Message diffusion scheme selection: Read diffusion

The standard group messages in wechat use the write diffusion mechanism, while live chat rooms are very different from the standard group chats in wechat.

Moreover, for the same person, only one chat room can be followed at a time, which determines that the message diffusion scheme in the live chat room should use the mechanism of read diffusion.

6. Longpolling mechanism

To allow users to synchronize to new messages in real time, we used the LongPolling pattern.

Many people wonder why they don’t use websockets for three reasons:

  • 1) Websocket mainly considers the push mode, while the push mode may be lost, so it still needs to pull mode to give the bottom;
  • 2) In push mode, it is very difficult to accurately maintain the online list at every moment;
  • 3) LongPolling is essentially a short connection, which is easier to implement on the client side.

7. Stateless cache design

Obviously, simple read diffusion, will cause huge read disk pressure. By international convention, there is a logical addition of a cache, recVSVR, as shown in the architecture diagram above.

Normal caches are stateful and penetrable, and are not particularly friendly to chat rooms that are prone to sudden traffic. With asynchronous threading tasks, you can solve both of these points.

**① Real-time notification: ** Sends notifications to the RECVSVR cluster after messages are written to the list.

**② Asynchronous pull: ** When the RECVSVR machine receives the notification, it triggers the asynchronous thread pull.

** When the recVSVR machine receives a request from a chat room, the chat room polling is triggered to ensure that the message list is accessed at least once within 1s, to avoid notification failure resulting in failure to cache, and at the same time to achieve automatic data recovery when the machine is started:

** Read the message without lock through read/write table separation and atomic switching:

**⑤ Sect deployment: When the number of sects increases, you can allocate the groups to a new sect.

The design of stateless message cache not only greatly improves the system performance, but also helps the chat room to establish a highly extensible message receiving and receiving architecture.

8. Technical pain points

Despite the high performance of messaging, version 1.0 did not achieve the goal of a single room with 1500W online simultaneously.

Further analysis of the overall architecture and logic revealed four pain points that were holding us back:

  • 1) In the big direct broadcast room, the message channel does not guarantee that all messages will be delivered, and the function of even the mic will be unavailable if the successful signaling of even the mic is lost, and the loss of the big gift appreciation animation signaling will bring customer complaints;
  • 2) The online list of a room is aggregated by Recvsvr into the same statsvr. There is a single point of bottleneck. Failure of a single machine will lead to the jump of the online number of some rooms and the unavailability of the online list and rewards list, etc.
  • 3) The function of historical online statistics is not provided;
  • 4) The naked longpolling mechanism cannot control the volume of requests when messages are constantly updated.

Live chat room 2.0 architecture

From the pain points analyzed above, we can conclude the problems that need to be solved in Chat 2.0:

  • 1) Solve the problem of losing important signaling and ensure the reliability of functions under hotspot access;
  • 2) Solve the single point bottleneck of online statistics and ensure the scalability of online statistics module under hot access;
  • 3) To achieve an efficient and accurate historical online statistics, to ensure the high performance and accuracy of statistics under a large amount of data;
  • 4) Flexibly control traffic to further improve isolation and disaster recovery capabilities and ensure system availability under hotspot access.

10. Priority message list

** RecvSVR only keeps the last 2000 messages. Some messages are discarded from the cache before they are received by the client.

In Chat 1.0, we proved that write diffusion is not feasible, so it is not possible to solve this problem by write diffusion.

** Another straightforward solution: ** is to write important system signaling to another list, recVSVR reads both message tables. The cost is that recVSVR nearly doubles the traffic to kv. So we wondered if there was a better solution.

** Back to a schema detail from version 1.0: ** We can see that in most cases recVSVR is aware of new messages when they arrive, so recVSVR does not pull many messages at a time, so messages are not lost at this step.

So we can put the message table operation into recvSVR:

** prioritizes: ** still writes a message table and prioritizes important signaling. (** purpose: ** saves RPC consumption)

** Recvsvr separates the list of common messages from the list of important messages. (** purpose: ** minimize changes)

**③ Priority collection: normal seq and important seq are collected at collection time. Important message table is collected first, and then common message table is collected. (** Purpose: ** is preferred)

Through a simple optimization, we provided a reliable critical message channel with minimal modification cost, and achieved zero loss of linmac and large gift animations.

11. Distributed online statistics

11.1 Write To the Shared Memory in Active/Standby Mode

Referring to the online module of wechat devices, we can have this plan:

  • (1) Select a sect for each webcast.
  • ② Press RoomID to select a machine as master, read and write the shared memory of the machine;
  • ③ The master synchronizes the RoomID data to other machines within the sect. If the Master is suspended, it can read and write data from other machines.

Advantages and disadvantages of the above schemes:

  • **1) Advantages: ** solves the switching problem.
  • **2) Disadvantages: ** The active/standby synchronization scheme is complex; Read and write master, there are still stand-alone hot issues in the big broadcast room.

** Conclusion: ** uses distributed storage as the central node of data.

11.2 write tablekv

As shown above:

  • ① use a table of TableKV to exist a line list, each line record user ID and active time;
  • ② Update the user’s heartbeat time periodically and maintain it online.

Advantages and disadvantages of the above schemes:

  • ** Advantages: ** solves the switching problem, and the data is distributed;
  • ** Disadvantages: ** 1500W online heartbeat for 10s => 9000W /min, concurrency and performance problems occur when write through a single table; Data cannot be deleted from disks in real time when offline, and the number of historical active users is much larger than that of the current online users, resulting in data redundancy.

Click by click to break, single key can be solved by removing key, data redundancy can be solved by key-val storage to do full replacement, and penetration problem can actually refer to the implementation of RECVSVR.

** Therefore, we get a better solution: ** split key + read/write separation + asynchronous aggregation drop disk.

① Distribution statistics:

  • (1) Each machine is responsible for part of online statistics;
  • (2) In each machine, according to the UIN hashing of multiple shard data;
  • (3) Each SHard corresponds to one key of KV;

**② Combine data: ** Let each machine pull all key data, combine a complete online list:

**③ Asynchronous aggregate updates: ** The heartbeat only updates memory, and the asynchronous task cleans up the offline user and serializes the list to a key val.

**④ Asynchronous pull: ** Pull and combine data from ② by an asynchronous task.

**⑤ atomic switch: ** complete online list do double pointer, using atomic operation lock-free switch, to achieve lock-free query.

Therefore, we improved the performance of heartbeat update and online query, and realized the distributed deployment and parallel expansion of online statistics module.

History online statistics based on Hyperloglog

12.1 requirements

History online statistics, is to have seen the number of users of the live UV, in the product experience is the video number live “XXX people have seen”.

In the distributed online statistics section, we discussed that it is not feasible to use TableKV to record member lists.

** Another idea: ** Is to use BloomFilter for data compression and de-statistics, and maintain an additional count for accumulation.

So here are two things:

  • First, consistency between bloomFilter and count should be considered.
  • Second, bloomfilter accuracy is related to compression rate, better accuracy still requires a relatively large amount of data.

So we investigated some of the industry’s UV statistical solutions and finally found Redis Hyperloglog, which can do 64-bit integer cardinality estimation with minimal spatial complexity.

12.2 What is Hyperloglog?

HyperLogLog is a probabilistic data structure that uses a probabilistic algorithm to calculate the approximate cardinality of a set. The origin of the algorithm is Bernoulli process.

** Bernoulli process: ** Take a coin with tails 0 and heads 1 and flip a coin until the result is 1.

If n Bernoulli experiments are performed and the number of coin flips required for each Bernoulli process is recorded as Ki, an estimate can be made: n=2^Kmax.

Hyperloglog performs bucket splitting and harmonic average optimization for Kmax calculation, making it more accurate than bare Bernoulli estimation.

The main content of optimization:

  • (1) The data to be counted is hashed into a 64-bit integer.
  • ② Use the lower 14 bits to find the position of the bucket;
  • ③ In the remaining high order, look for the position where the first 1 appears as the Ki of Bernoulli process;
  • ④ Update the value of the bucket Rj = Max (Rj, Ki);
  • ⑤ In the estimation, the harmonic average value DV of each bucket was calculated to replace the above Kmax.

** Hyperloglog is only a storage of m buckets (m=10000+), the original space complexity is not high. With some bit compression, Hyperloglog optimizes the entire data structure to a maximum spatial complexity of 12K.

12.3 TableKV + Hyperloglog

Since hyperloglog generates approximations after all, the error will be more obvious when the cardinality is small, so we can use TableKV to complete the experience with a small number of online histories.

  • ① When the number of online history is small, double write tablekV + hyperloglog, use tablekV selectCount as the standard;
  • (2) If the number of historical online connections is large, only hyperloglog is written, and the estimated value of Hyperloglog shall prevail;
  • ③ The online statistics module periodically merges online lists into Hyperloglog to avoid data loss.

** The final result we achieved is: ** the history online accuracy is not more than 1w, the accuracy is greater than 95%.

13. Traffic isolation vipsect

** As we all know: ** large broadcast will bring explosive amount of requests, we can not let the failure of large broadcast caused by the majority of small broadcast.

** In addition: ** large broadcast room influence, but also to ensure its good experience, it needs to use more machines than small broadcast room to support.

** And: ** chat room to KV layer request number, proportional to the number of machines, small direct broadcast room in multiple machines will cause a lot of unnecessary consumption.

** For this situation: ** We refer to wechat Pay’s methods for dealing with large and small merchants, traffic isolation, and setting up VIP sect in chat rooms.

As shown above:

  • (1) Select the VIP Sect in advance for predictable live events.
  • ② Other live straight Go Ordinary Sect;
  • ③ The size of the live broadcast strategy classification, large live online list only remove the key.

Although it still depends on the operation, we cut off most of the large live stream flow in this way, and also reduce the pressure of the whole system on KV layer.

** Why not automatically cut the VIP sect?

This is a future work. At present, we have some preliminary plans, but we still need to verify the impact of the switching process and further refine the strategy. We also welcome your valuable suggestions.

14. Flow control under automatic flexibility

With the mechanism of Longpolling (see section 6 of this article), a 100W online polling station would generate at least 6kW /min of requests per minute, and a 1500W online polling station would generate 900m /min. Logicsvr is a cpu-intensive service, requiring at least 3000 units at 30w/min performance.

So there has to be some flexibility to manage the volume of requests and find a balance between experience and cost.

This cannot be done by rejecting the request from LogicSvR, because the longPolling mechanism causes the client to issue a new request immediately after receiving the return package. The faster logicSvR rejects, the larger the number of requests, and the more likely it is to snowball.

When recvSVR has no new message, it can hold the request in the proxy layer and wait for connection timeout or Longpolling notify.

Therefore, we can take advantage of this feature to allow the request or packet to be held in the proxy for a period of time to reduce the request frequency.

As shown above:

  • ① Set the collection interval according to different online numbers;
  • (2) Add a field in the client context to record the last successful collection time;
  • (3) Within an interval after the successful collection, the hold request is sent to the proxy layer.
  • ④ Discard longPolling notify according to different online numbers.

** According to 400W online pressure test effect: ** 8 ~ 10s tap position is triggered when the adaptive big move is started. The request volume is 58% lower than the expected value without big move, which effectively controls the pressure of big broadcast room on LogicSVR.

15. Achievement Display

1) Support the stable operation of multiple businesses:

2) Pressure measurement of 1500W online at the same time:

16. References

[1] zhuanlan.zhihu.com/p/77289303 [2] www.jianshu.com/p/4748af30d…

Summary and outlook

We have completed the iteration of liveroom2.0 through abstract problems, precise analysis and reasonable design, and reached the standard of supporting a single room with 1500w online at the same time or higher from the aspects of performance, reliability, scalability and disaster recovery.

Further optimizations will be made in the future, such as the automatic switch of large rooms from normal sect to VIP Sect, such as the important message channel for individuals in the room, to make the chat room more powerful and more powerful. (This article is simultaneously published at: www.52im.net/thread-3376…)