Company introduction

In 2018, the brand of LeEco Cloud Computing Co., Ltd. was upgraded to New LeEco Cloud Computing co., LTD. New LeEco Cloud Computing Co., Ltd. is one of the core business segments of New LeEco’s listing system and is responsible for all infrastructure services and cloud computing services of new LeEco system. New LeEco Cloud focuses on video cloud and Iot cloud to carry out business, is committed to becoming a leading provider of intelligent entertainment cloud technology for home interconnection, with iot cloud as the core to create more intelligent home community solutions.

New LetV Cloud Link has a strong technical reserve in the video industry, in the field of video on demand, live, distribution, media technology, video content understanding and other aspects of the industry leader; Iot cloud will focus on home security, intelligent connectivity, environmental health and other aspects to provide all solutions.

## Project background

When watching live video, it is inevitable to miss some wonderful moments due to various interruptions. What kind of experience would it be if we could go back in time quickly? Leeco Cloud “Moonlight box” can perfect make up for regret, let wonderful no longer missed.

Project challenges

Live “moonlight treasure box” is Letv cloud PaaS platform is an important service, can be solved perfectly live at any time in the process of moving back to look, also can at the end of the broadcast, provide instantaneous seconds back function, fast will broadcast signals into on demand for distribution, significantly improved the live viewing experience, also gives live operation more likely. After three iterations of production and research, Moonlight Box witnessed the rapid growth of live streaming from ten thousand to one million. What challenges did we meet along the way? How have the allocation strategies for live streams evolved? What upgrades need to be made to source site slicing and index storage? And how to ensure smooth upgrades during continuous iterations, which we’ll address in the next three major iteration of Pandora’s Box.

Pandora’s Box V1.0

The live broadcast PaaS platform is transformed from the background technology department of live broadcast supporting the business of LetV Group. It has been continuously serving LetV, LetV TV, SET-top box, LetV Sports and LetV Music for more than 5 years, and the early live broadcast flow is at ten thousand level (Note: The number of live stream ids can be understood as one live stream is one signal). The live broadcast signals are usually dominated by 7*24 hours long live broadcasts, supplemented by short live broadcasts such as press conferences and concerts (note: When there is no live broadcast content in such short live broadcast, a designated standby piece is usually configured to continuously replace the live broadcast signal source, so as to improve the user’s playing experience when the stream is cut off.) Therefore, in V1.0 architecture, the live broadcast production scheduling and allocation algorithm at this stage adopts a simple configuration strategy to associate and bind the live broadcast stream with the device group. Slices and indexes corresponding to live streams use simple local storage. Live broadcasting, time – back viewing and dot recording are provided in parallel in this group of devices.

Note: Green indicates that the live stream is in use for a long time; Purple indicates that the live broadcast signal is temporarily interrupted, but the source station is configured with the function of playing the standby film. The standby film signal is played to improve the experience of live streaming interruption.

With the opening up of the live broadcast of the PaaS platform, large amounts of live streaming access, and for the main business live shows, conferences and other short intervals of live is given priority to, if you still live in accordance with the original equilibrium flow strategy, distribution of each broadcast server alone, will cause the server number multiplied, spurt resource cost, in order to solve this problem, The Moonlight Box architecture has also been upgraded to V1.1.

Moonlight Box V1.1

In V1.1, live streams are produced on demand. To ensure the security of traffic accessed by customers, the dispatcher assigns both active and standby devices to produce the stream. When the active node fails, the active/standby switchover is automatically performed to ensure that users are not aware of the broadcast.

With the rapid growth of the business and the rapid rise of daily live streaming, the platform has expanded the cluster of live streaming source stations. However, due to the allocation strategy of live streaming, it gives priority to time-shifting data binding (Note: This policy is used to ensure that the data is continuously viewed on the same device during the whole process. Therefore, serious bias may occur during the actual operation, resulting in obvious hot spot problems. You need to report the flow monitoring status of the cluster to determine whether the standby flow needs to be migrated to achieve cluster rebalancing.

Note: Dotted arrow indicates that part of the live stream migrates when bias occurs. Green indicates the live stream being played. Red indicates that the live stream is about to be migrated. Yellow indicates that the live stream is migrated.

We have alleviated hot issues through stream migration, but this approach has a certain lag, so we need a new architecture to solve this problem. Before introducing the new architecture scheme, we will first quickly introduce some protocols and files used in the live streaming business. Http Live Streaming (HLS) is a real-time Streaming protocol defined by Apple. HLS is implemented based on Http. The transmission content consists of M3U8 description files and TS media files. An M3U8 file is a text description of a media file, consisting of a series of labels.

As services continue to grow, the storage pressure of the entire live cluster becomes severe. Therefore, I/O bottlenecks need to be removed as soon as possible. In this context, we first migrated TS slices to LeS3 (LetV Cloud Object Storage System), but the storage of video index is still managed in the mode of master/slave, so the next focus becomes to find the index cluster storage solution for M3U8. Due to the different live streaming on the chip set sizes (usually set in 2 ~ 10 s), such as Beijing write about one of the largest peak cluster around 3 w, business belongs to write more read less, for traditional master-slave RDS cannot afford single depots table needs to be done, and depots table has many disadvantages, such as too much, the application of invasion to the business is not friendly, Generally, Proxy solutions not only have technical requirements, but also have many limitations. Live video broadcasting requires flexible scalability, and the cost of re-expansion is very high, which will lay hidden dangers for the business. During this period, we came into contact with TiDB, which supports multi-live, no single point, supports horizontal expansion and is compatible with MySQL, and other features are very consistent with our business needs. Besides, TiDB installation, deployment, monitoring and other details are very well done, so we decided to test and see the effect.

Moonlight Box V1.2

After a week or so of testing and pressure testing of common scenarios of TiDB, the test results are relatively in line with expectations. From the perspective of storage capacity, QPS and response time, all of them can support our requirement of “moving back to look through execution quickly”. During the test, I also had technical exchanges with the official students, and determined the following production environment, such as deployment architecture, equipment selection, table structure and index optimization. After the TiDB production cluster in the production environment went online, we synchronized the HLS lookback index of the original live stream to TiDB in addition to local storage in the original V1.1 architecture, so as to verify the stability of TiDB in the real production environment. During this period, we did some fault drills, such as restarting one of PD, TiKV and TiDB. No service is unavailable or data is lost! Then, the pilot transformation of moonlight Box service of a live streaming cluster in Beijing was carried out, and the grayscale cut-stream mode was adopted to gradually cut the time-shift, review-back and second-back requests of live streaming to TiDB, which ran stably. At present, the moonlight Box service of live broadcast cluster all over the country runs above TiDB service.

Summary review

The “Moonlight Box” has gone through three stages in detail. Finally, let’s use a table to make a brief review, as follows:

Description of online effect data

By storing M3U8 data in a unified TiDB cluster, the structure of live broadcast source station is greatly simplified, and the problems of load bias and expansion are solved from the source. Meanwhile, TiDB effectively solves such business scenarios with more write and less read, as shown below:

  • Single HLS equipment production performance increased by 200%;

  • Simplify the allocation and scheduling strategy of live stream and eliminate the bias problem within the cluster;

  • Simplify the source station structure and eliminate the coupling of upstream and downstream related systems;

  • TiDB’s natural high availability improves system availability;

  • Load balancing based on TiDB gracefully solves the problem of elastic expansion of live streaming traffic.

Status and Plan

Currently, Moonlight Box V1.2 has been continuously and stably serving the three business lines of standard live broadcasting, mobile live broadcasting and live broadcasting CDN. Among them, the peak QPS of TiDB written in a core live broadcasting cluster in Beijing reached about 2.5W. After the double cache of CDN and HLS_Consumer, the peak value of read request is about 5K. Next, we will also migrate a set of data analysis system inside live broadcast to TiDB.

The general configuration of a TiDB cluster corresponding to a single live broadcast cluster is as follows:

Official support: Finally, thanks again to the PingCAP students for their TiDB, awesome! Look forward to continued cooperation in the future!

About the author: Bin Liu, development engineer of New LeEco Cloud Link, mainly involved in continuous iteration of LEEco direct wheel broadcast and commercial live BROADCAST PaaS architecture.