Abstract:

This summer, the World Cup was held in Russia. In the special session of Chongqing Cloud Computing Flying Technology Gathering just ended, Qiu Liangke, an expert on Cloud technology of Ali Video, discussed the challenges and relevant practices of high stability of 10 million level live broadcasting with the participants on the topic of World Cup. This is the full text of the speech.

The 2018 Russia World Cup lasted from June 14 to July 15, covering a whole month with 64 matches. Youku became the official new media partner of the World Cup designated by CCTV. As youku’s brother, Aliyun also provided technical support for the live broadcast of the World Cup and participated in the World Cup together. During the whole World Cup, Aliyun’s services are uninterrupted and should always be stable and smooth. In addition to supporting Youku, Ali Cloud also supports CCTV5, CNTV and Migu’s Live broadcast of the World Cup, supporting 70% of the World Cup traffic of the whole network. Youku’s single battle between France and Afghanistan has reached 2000 million concurrent users, plus several other customers, fully tens of millions of concurrent users. Such a large-scale and lasting event is also a great challenge for live broadcasting platforms.

So what are the technical challenges of live coverage of the World Cup?

  • The first challenge is content concentration. The core content of the World Cup is CCTV5, plus the commentary channel, multi-angle channel on more than 10 channels of live broadcasting, assuming that the source station has problems, then all the live broadcasting will have problems.
  • The second challenge is high concurrency. The World Cup is up to tens of millions of concurrent online watching users, and the number of users is very obvious. Assuming a temporary fault occurs on the live broadcast platform, all users may re-request or request other systems, which will put great pressure on the system.
  • The third challenge is security. The World Cup is a world-class event, if there is a security accident in the middle, the impact will be very large, the platform will also have unshirkable responsibility.

Under such challenges, Ali Cloud should always maintain the stability, smoothness and security of World Cup live broadcast. Behind it, there is a set of complicated schemes and logic. Let’s start from three parts: stability, security and monitoring.

The stability of

The following figure shows the architecture of World Cup live broadcast stability, which is divided into four parts: source production link, video cloud center, CDN and client.

The first part is the source production link, because the most original signals may not be used by the cloud platform, such as very high bit rate signals, not suitable for transmission, so generally there will be a source production link for encoding, to provide a source station service. The second part is the video cloud center. The whole live broadcast platform architecture is the central architecture. All functions required for live broadcast, such as transcoding, screenshot, recording, watermarking, etc., are completed in the center. The third part is CDN. CDN is a link that really bears the user pressure of the whole World Cup live broadcast. Tens of millions of users are running on CDN. The fourth part is the client. Next, we will introduce how ali Cloud ensures service stability.

I. Source production link

  • Multiple signal source input: Users are advised to input multiple signal sources for production links. If a single signal source is input, services will be interrupted. If only one signal source is confirmed, you are advised to input multiple links from the user’s signal source to avoid the signal source failure caused by a single link failure.
  • Primary and secondary offline transcoder: as mentioned above, the original signal source cannot be used directly, so an encoder will be used. We generally require that the primary and secondary encoder can obtain multiple signal sources in real time and support real-time switching. There are two modes of switching. The first mode is direct master/slave mode, which outputs the source at the same time with only one encoder. In the second dual-backup mode, the output sources of the two encoders provide data sources to the cloud platform at the same time. In this mode, we will eventually synchronize the data of the transcoder to the video cloud center by pulling stream or pushing stream. If the streaming mode is pulled, the video cloud center will pull streams from multiple source stations at the same time to ensure that any source station/encoder problems, the video cloud center can get data smoothly. If the stream is pushed, the initiative is in the hands of the user, and the user had better push the stream to the video cloud center by multiple source stations/encoders at the same time.
  • Multi-egress push/pull flow: The source site must have multiple egress links, because a single egress link may cause network risks.

Second, video cloud center

  1. Combined with multi-network input, the video cloud center will also adopt multi-network access.
  2. Stream merge, where multiple streams come to the center of the video cloud and turn multiple streams into one stream through a unique merge component.
  3. Components are deployed in a distributed manner. For major events, we will use an independent resource dedicated machine room, which does not affect other businesses.
  4. The status of each component will be automatically detected. If there is a problem, the switchover will be completed within 10 seconds to ensure the continuity of the live stream.
  5. For the live broadcast of events with relatively few interactions, H264 will be used for live broadcast. The optimization is that the length and I frames of H264 slices output by all slices are aligned. The advantage is that when the downstream wants to switch between different bit rates, the picture is continuous and there is no sense of jumping.
  6. Slice double write. When slicing, each stream is sliced into two OSS processes at the same time, so as to ensure that the source of the downstream CDN is double.
  7. In view of the importance of live broadcast, the whole structure is based on the original, and at the same time, the central remote backup is made.

CDN and client

  1. With OSS dual write, CDN also supports two OSS at the same time, and the good slices are pieced together according to real-time detection. Any ABNORMAL or slow OSS write will not affect the content transmission.
  2. As the architecture of CDN itself is distributed, considering the scale of World Cup is very large, central primary and secondary disaster recovery is adopted. If there is a problem in the CDN center, it will immediately switch to another one. In addition, each region will put multiple L2 nodes. If L2 has problems at a certain moment, other L2 nearby will be connected immediately, without affecting the service quality.
  3. Using load balancing and scheduling optimization scheme of CDN can ensure smooth service.
  4. Finally, a suggestion is given to the client. Because there are many clients, some clients may be stuck and unable to play when the above switch occurs, so the client needs to retry the CDN request.

security

In the escort work of World Cup live broadcast, the security and stability of the content are equally important. Users can ensure content security through several schemes such as content audit of Aliyunyun broadcast guide station, whitelist setting of source IP push stream and pull stream, push stream authentication, HTTPS check and anti-hijacking of pull stream, etc. For copyrighted content similar to The World Cup, Ali Cloud also provides broadcast authentication & secondary authentication, DRM authentication of the video itself, sub-domain blocking, regional restrictions and other schemes to prevent theft of the broadcast and chain.

monitoring

After the architecture solves the stability problem and makes a lot of preparations in terms of security, livestream also needs monitoring. Aliyun’s monitoring of World Cup livestream is divided into the following three parts.

The first is the monitoring of the technical environment

Including the LIVE broadcast center, CDN each device CPU usage, memory, network, disk, etc., in case of failure, with the previous scheme to automatically switch; If abnormal or sub-healthy conditions occur, o&M personnel can quickly locate and deal with the problem through alarm.

The second is application monitoring

It includes the process, port, QPS pressure and live delay of each program. It is the same as the previous environmental monitoring. If there is a failure, it will be linked with the switching mechanism, and if there is an exception, it will be handled manually by alarm.

The third is business monitoring

The following figure shows a complex monitoring picture of live broadcast service. Green indicates normal, while yellow may cause abnormal lines with frame loss and packet loss. Click on each line and you can see the status of the flow at the current time, such as whether the timestamp is continuous, increasing and frequency hopping within one hour, to avoid bad user experience caused by client compatibility problems.

For frame rate monitoring, during stream merging in the video cloud center, we combine different streams into one in real time to achieve anti-jitter effect. The following four pictures are the monitoring of the same flow at the same time. The above three flows are merged into the following flow, with less burrs and more stable results.

In addition, the system will also collect and monitor real-time data such as the slow ratio of the server and the lag rate of the client, and realize quality tuning through the combination with the client.

In addition to the above service architecture, multi-dimensional security strategy of high stability, and the whole link monitoring, when ali also provide users with video cloud move live disaster, smart clips, long distance, intelligent scheduling, seconds, and bit rate control, very clear, high-speed channel, 50 hotlinking prevention, a director of cloud platform, advertising identification, ET subtitles ability, etc.

It is believed that after escorting the live broadcast of the World Cup, Ali Video cloud will accumulate more technologies under the scene of live broadcast of events and events, so as to create more value for customers in the future and bring users more extreme viewing experience.

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission