Share by: Que Han, senior technical expert of Aliyun CDN live broadcasting
Content to share: Singles’ 11 live broadcast is a typical scene among many live broadcast activities, inseparable from the topic of live broadcast itself. Therefore, today’s sharing will be carried out from the perspectives of live broadcast overview, live broadcast architecture, business functions, live broadcast monitoring and Double 11.
In the annual Double 11 shopping festival, there are two biggest pressures. The first one is Alipay transaction and the second one is CDN. So what exactly is CDN technology?
To take a simple example, no matter you buy something on taobao or mobile APP, when you enter the product details page, you will see a lot of pictures. Some businesses upload videos to make you look more vivid in order to better promote their products. When merchants upload pictures and videos to servers, typically alibaba’s main computer room, stored in Hangzhou. However, those who see the details of this product may come from distant northeast China or Hainan. If they visit the machine room in Hangzhou directly, the link will be very long, and there will be cross-operator problems. The experience will be poor, and the pictures will not be displayed. Through CDN technology, can put these in hangzhou room pictures, video, and other static files distributed to all over the country hundreds of rooms, in this way, the Beijing users looking at the product details, direct access to Beijing’s room, and it is the network coverage (telecom user access to telecommunications room), so it can nearby, fast access to data. In addition, for the composition of a product detail page, the largest amount of data is pictures and videos, and other texts can be almost ignored. The rapid distribution of pictures and videos can ensure a smoother user experience.
Overview of Live Video
Livestreaming started around 2014, represented by inke, Douyu, Panda TV and Zhanqi TV. In fact, there are two main scenarios, one is mobile live broadcasting, the other is game live broadcasting. Mobile live streaming and playing are mobile apps developed by each platform, while game live streaming uses open source OBS to push streams to complete uploading, and most of the playback is completed through web players.
Similarities and differences between live video and vod
In fact, for the player, whether it is live video or on demand, the behavior is the same: after establishing a connection with the server, keep reading audio and video files, and then finish rendering. From this point of view, the difference between voD and live streaming is not much. The main difference between them is that vod can be fast-forward, backward or played at any time, while live streaming can’t be fast-forward and can only be played at a fixed point in time.
Ali Cloud live panorama solution
After acceleration of edge nodes, the push stream pushes the live video stream to the live broadcast center, where a series of transcoding, screenshot, recording, watermarking and other processing is completed. Then the video stream is distributed through CDN and distributed to different players, which have different SDKS for second on, weak network and other optimization actions.
Characteristics of live broadcast service — Push stream and play in live broadcast system
For live video, the most important two links, one is to push the stream, one is to play. Push-stream generally adopts RTMP protocol, and the commonly used push-stream ends include OBS, mobile APP, FFmpeg, etc. In addition to the RTMP protocol, HTTP FLV and HLS can also be used for playback. RTMP and HTTP FLV are streaming transmission, and HLS is accelerated file transmission. Common players include Flash/VLC/HTML5 / mobile App, etc. For Aliyun live broadcast system, most live broadcast distribution is completed through streaming transmission, and only a small part is completed through accelerated distribution of files.
Features of live streaming service — detailed streaming distribution and CDN live streaming system
Let’s look at streaming distribution in more detail. In fact, streaming distribution is long connected in live broadcast scenarios, whether it is streaming or playing. A live broadcast may last for 2 hours, and the streaming will not be interrupted within these 2 hours. For the player, the server side gets the audio and video data frame by frame, no matter what the transmission protocol is, it is encapsulated by FLV tag. Every frame of audio or video has a timestamp property.
The following figure lists the relationship between the push stream and the player and server. We can define the push stream and play relationships as publish and subscribe.
For server A on the left, there are two streams. In the first stream, the host directly pushes the stream to A server (publishing), and then this stream has two players, namely two subscribers. The second stream has three subscribers. The publishing end is not the main stream, but pulls through from server B to complete the publication. The back source link is a subscriber to server B, and the publisher of server B is the anchor stream. This cascade relationship between servers constitutes the CDN live broadcast network.
Features of live streaming service – audio and video frame and delay generation
The live video stream will transmit audio frames and video frames. For audio frames, each frame can be independently decoded. After the player obtains any audio frame from the server, it can independently render and hear the sound.
Video frames are divided into video key frames and video non-key frames. Video key frames can decode and render independently and see the picture directly, while other non-key frames cannot, and their decoding depends on the previous video frames. The advantage of video keyframes is independent decoding, but the disadvantage is that they carry a lot of information. On the contrary, the video non-key frame is very small, a few K or 1K can be solved. For any playback of live video, the video key frame should be sent from the beginning, otherwise the screen will appear first, and the experience will be poor.
The important concept is that the interval between two video keyframes is a GOP. The following image shows the audio and video sequence of the stream. We can see that there is about 10 seconds between the two keys (video key frames). The arrow on the bottom side indicates the current position of the stream, and the dotted line on the following is the stream to be pushed up. At this point, the previous keyframe appeared about 3 seconds ago, and if a player arrives at this point, we can’t start from the arrow and send it to the player, because it will consume the screen. Therefore, we choose to send from the previous keyframe, that is, the player will start playing from 3 seconds ago, which causes a delay. The larger the GOP, the greater the average delay. The smaller the GOP, the smaller the average delay. In general, the delay of mobile live broadcast is 2-4 seconds, and the delay of game live broadcast is 8-10 seconds.
Features of live broadcast service – server cache data
CDN image and video acceleration mentioned before, they are stored in the form of a file in each edge server, intermediate server and source server, and the data accessing the file today and tomorrow is unchanged. For live video, the stored data is changing in real time. The server stores data from the most recent keyframe, and each time a new keyframe appears, the previous cached data is erased. In this way, the player at any point in time can watch the live broadcast from the latest progress.
CDN live broadcast architecture and service functions
CDN Live broadcast network architecture diagram, which is based on the relationship between publishing and subscription. From the anchor push flow on the left, the anchor push flow to L1 edge node, and then through the upward acceleration of L2 node, finally to the central machine room. For the play on the right side of the end, they played from the nearest L1 node is complete, are generally edge of local cover L1 node, if room with video streams, this road is returned directly, if the room no video streams, this road is from L2 node flow, if you don’t hit L2, will eventually go to room looking for the way the flow. Among them, any link will have jitter condition, CDN will automatically switch scheduling to ensure stability.
Live center,
The live broadcast center is very important in the whole CDN live broadcast system, among which there are some important components. Including streaming media server, for users to store thousands of live streams; The video processing component is used to complete transcoding, screenshot, recording and slicing services; The pull flow component is used to solve the needs that are not actively pushed to the live center; The scheduling component is used to complete the problem that each stream looks up on the server; The live API is typically used by customers to access data from online streams and historical streams; Monitoring components are used by customers to monitor the stability of each stream, server, and component.
Transcoding business
There is a live broadcast is closely related to the transcoding business. The picture on the left is the original picture before all the flow codes. The current bit rate is about 3000K+, and the video transcoding on the right is 300K+. Generally, transcoding is applied to help live broadcast platforms save bandwidth and deal with some scenes where the player is not very smooth.
Capture business
Periodically capture key frames of video stream and save them into a picture. Generally used in live broadcast platform for each studio to do thumbnail to use. Users see the thumbnail of the live broadcast room on the live broadcast platform, and it will change after refreshing later, which is accomplished through screenshots.
In addition, the whole live broadcast system also includes other services, such as recording, callback, authentication, blacklist and whitelist, retweeting other manufacturers, playing pure video and pure audio, etc.
Live monitoring related
There are many attributes of live video, so it is necessary to monitor it. We generally monitor the following indicators:
- Live video bit rate: Generally speaking, mobile phone live bit rate is around 500Kbps ~ 1Mbps, and game live bit rate is around 1.5Mbps ~ 4Mbps.
- Live video playing: Monitors the current number of live online users and the total downstream traffic.
- Video frame rate: Video/audio has a fixed frame rate of 30 to 60 frames per second for audio and 15 to 40 frames per second for video. If the flow is normal and the frame rate is a smooth straight line, if there is a spike, it indicates network jitter. With the full link second level frame rate monitoring chart, troubleshooting problems become very convenient, for all the flow and the user’s various problems can be quickly identified.
- Full-link monitoring: For the entire distributed link monitoring, the system uses the full-link monitoring to locate the problem and remove the faulty node or switch the link based on the customer’s feedback.
Plans related to live broadcast of the Double 11 Gala
Finally, let’s take a look at what we have done during the live broadcast of the Double 11 gala. In fact, since 2015, there has been live broadcast of the gala on Double 11. This evening party is for double 11 to promote and guide, stability is very important, the guarantee level will be very high. The picture below is the overall topology diagram. Basically, all the plans related to the Double 11 live gala are here.
On the far left is the live broadcast car, whose signal will push the video stream to the live center. Regular anchors will be pushed to the live broadcast center through the public network, and there is a great possibility of jitter. The Double 11 gala will choose private lines to push the stream to the live broadcast center. Moreover, the plan does not select only one private line, because one private line may become a single point, which has certain risks. Therefore, we generally choose two private lines from different operators to ensure link transmission.
From the scene collected, we can see that there will be four channels. The narrow channel flow is actually a screen with narrow band HD technology. Sh01 and SH02 are mutually supported by different private lines, and sh01_narrow and SH02_narrow are also mutually supported by different private lines. In addition, sh01 uses the main push link and the cold standby link to prevent problems of onsite push devices. If there is a problem with a stream, the ur will be replaced by the playback end.
Double 11 live party transcoding – narrow band HD
Double 11 live party used the narrow band of high-definition, narrow-band hd video is ali cloud a set of unique to the human eye subjective video decoding algorithm of optimal basis, from the point of this parameter, narrow-band Gao Qinghe of narrow-band hd using peak bit rate and average bit rate value is basic same, but narrowband high-definition image quality effect will be better. It has the following characteristics:
- Can evolve vertically with advances in coding standards, and the effect can be superimposed
- It can save about 20% of the bandwidth generally wasted in the industry
- Optimizes complex scenarios that are generally poorly handled in the industry
- Ordinary viewers can see the difference in quality at a glance
Program screen staring plan
For Double eleven, we will also have a screen-staring strategy, where all streams are played programmatically and output a frame rate. Transcoding stream is about 25 frames per second, and the ones highlighted in the middle indicate network or transcoding problems. Once problems are identified, take action accordingly.
Human flesh view video preplans and more
In addition to the exclusion of network problems, there may also be audio and painting synchronization, scene capture screen, etc., so for different types of players, there will be someone to check the actual effect of each stream. In addition, the operation students will also detect the overall traffic and the water level of CDN nodes. If the traffic bandwidth is very high, we will also reduce the bit rate and other measures to save bandwidth.