“Live broadcasting with goods” may be one of the most representative words in 2020. Then how should traditional e-commerce integrate live broadcasting system and ensure the best viewing experience of users during live broadcasting? Based on the content shared online by He Shuzhao, senior architect of Tencent Cloud, this paper explains in detail the architecture design of large-scale and low-delay e-commerce live broadcasting system as well as the difficulties, technical challenges and breakthroughs of e-commerce live broadcasting.

Article/He Shuzhao

Organizing/LiveVideoStack

Live playback:

http://scrmtech.gensee.com/webcast/site/vod/play-6ced83f94af24094b6d8329948addb09

This time, we will share the recent design and weak network optimization practice of Tencent Cloud in the low-delay e-commerce live broadcast system architecture.

  • Difficulties, challenges and technological breakthroughs of e-commerce live broadcasting
  • Architecture design of large-scale and low-delay live broadcasting system for e-commerce
  • Weak network Optimization and interactive link in low delay live Broadcast System

Difficulties, challenges and technological breakthroughs of e-commerce live broadcasting

E-commerce live broadcasting can be divided into two types: first, currently live broadcasting or short video companies are embracing e-commerce, and the challenge they are facing is not livestreaming related technologies, but the design architecture of e-commerce system. Second, offline e-commerce customers are embracing live streaming and embracing the new situation during the epidemic. The challenge they face is how to introduce live streaming into the e-commerce system.

E-commerce live broadcasting is actually a process of “e-commerce + live broadcasting”. The live broadcasting process is real-time streaming media, which strongly relies on the whole link from the anchor to the audience. If any link in the whole link goes wrong, users may not be able to buy goods and the conversion rate may decrease.

Architecture design of large-scale and low-delay live broadcasting system for e-commerce

There are 7 steps in the design process of the standard e-commerce system: browse the product → place the order → pay the goods → view the order → view the logistics → confirm the receipt of the goods → return the goods.

There are three modules as shown above: convenient entry and channel; Quick, interactive, expert shopping guide, shopping experience; Better service support. I think live broadcasting belongs to the live shopping guide of the second module.

E-commerce live streaming has become a hot topic recently, on the one hand, because of the epidemic. On the other hand, in the previous page-type or shelf-type e-commerce, customers find different products by finding their needs and then decide whether to buy them. With the gradual introduction of new technologies, this process needs to be more suitable for users’ needs, and e-commerce live streaming is in line with this trend. The trend is to have a professional shopper who can match your pain points to replace the common pain points of shopping comparison, which is to move the offline experience of the mall to the online experience.

From my observation, the field of e-commerce live broadcasting is just beginning, because the mode of e-commerce live broadcasting is just emerging, and people are beginning to discuss some of the experience, interaction and even try on, etc. There is still a great space for development in the future, and e-commerce live broadcasting will be a hot trend.

If we want to do a good job in e-commerce live broadcasting based on the existing live broadcasting technology, we first need to know what the live broadcasting architecture of the industry is. As shown in the figure above, the industry’s end-to-end live broadcasting architecture is mainly divided into four parts, and the overall live broadcasting process is as follows: The host end and the source station push the stream to the streaming media processing center (a central node or central machine room) through the streaming SDK or open source tools through the RTMP protocol, which will conduct a lot of processing, and then distribute it through THE CDN. Finally, the audience end watches the live broadcast through the SDK or H5 on the Web page.

The four access processes in the figure are used to integrate live broadcasting into e-commerce, the most important of which are the host end and the audience end. The host side needs to carry out macro customized development through APP, integrate the previous e-commerce system into the live broadcast system for interface interaction, and integrate some SDKS and other push stream support. The most important thing is to have a good audience experience, so that the audience can arouse purchase desire until placing an order through live broadcast. Therefore, the audience side needs to combine the current e-commerce capabilities with technical capabilities in the audio and video field.

In addition, the background of live broadcasting needs to communicate with the background of e-commerce, so that the management of people, goods and anchors can be well matched. When there is a large-scale second kill, data can be updated in time.

The figure shows the process and workload sorted according to recent customer needs

  • The workload of the host side is mainly in the product and UI side, so it is necessary to design good products and produce good user experience. The remaining technical work can be iterated based on the original system and easily accessed according to the live streaming SDK interface on the cloud.
  • The challenge of the server side is the research and development capability, which is the rapid iteration of live CDN capability based on the e-commerce capability. Since the capabilities are all in the cloud, the integration is simple, followed by room and user management. Tencent cloud has a lot of DSMO can be directly used, after the integration work is completed, and then combined with the e-commerce system.
  • The client side mainly reuses commodity related and UI capabilities.

As for product evaluation, I think there are two directions: one is product and UI, and experience is designed according to ability. The second is technical research, that is, the ability of the host side, the server side and the client side to develop and utilize the cloud in parallel.

Weak network Optimization and interactive link in low delay live Broadcast System

The architecture in the figure omits some points that need to be paid attention to and dealt with during the live broadcast

As shown in the picture above, the host pushes the stream to the data center of the uplink access point through SDK, transcodes the stream after relevant processing in the data center, and then uses the three-level CDN back source architecture to pull the stream passively triggered by users. Tencent cloud design purpose is not to do unnecessary waste, only when the audience needs a certain data stream, initiated to pull the flow code, and then transcoding.

Split around the whole Tencent cloud live architecture, the uplink is divided into three ways:

  1. The most common way to push to the cloud is through RTMP push stream.
  2. It has its own upstream source station in the form of RTMP pull flow.
  3. Through HLS pull flow mode, it is pushed up to Tencent Cloud, and Tencent cloud processing is accelerated and distributed through protocol.

The most common distribution is through HTTP-FLV protocol. Distribution through HLS protocol is mostly used for Web or long video processing, and RTMP protocol is rarely used for processing.

If the delay is concerned, the RTMP upstream push flow will be selected under normal circumstances. The end-to-end delay can be controlled within 2-5 seconds. The HTTP-FLV protocol is generally selected for the downstream, and its delay is between 2-5 seconds.

HLS protocol is commonly used on the Web side. Based on HTTP slices, it collects data for a period of time. However, if the size of slices is inconsistent, the overall delay will be large, usually more than 10 seconds. RTP is the ultimate optimization, with latency of less than 100 ms, and most mikes are connected in this way.

The delay in Tencent cloud live broadcast architecture is divided into three parts:

  1. Host terminal collection, preprocessing, coding, sending and upstream network and other push flow terminal introduced delay;
  2. The delay generated by cloud processing, including link delay, transcoding delay, and different protocols, is mapped as upward-transcoding – downlink;
  3. The downlink receiver is strongly correlated with the network, which is mapped as receiving – decoding – post-processing – display, which will generate corresponding delay.

Analyze the points that cause delay above to find the part that can be optimized.

Main optimization directions and technical directions of low delay live broadcast:

  1. The uplink or downlink is normally optimized based on the original CDN architecture;
  2. Quic is a little better than HTTP/2, and the optimization effect of Quic is obvious;
  3. Optimize using WebRTC.

In the future, with the development of 5G or even 6G technology, there will be more optimization directions for live broadcasting…

The main quality monitoring and evaluation methods of live broadcast include the following 6 points: lag rate, delay, broadcast failure rate, first frame time, video frame rate and video bit rate. The first four items can reflect the quality problems of this live broadcast.

In the CDN before low-delay optimization, the determination path of the lag is shown in the figure. Firstly, we need to pay attention to the situation of the lag, if the lag occurs to all users of the room or some users.

In the case of all users being stuck, it is necessary to check the upstream process, which will lead to the whole room being stuck. Firstly, make a same-frequency comparison, and then confirm whether the frame rate and bit rate of the upstream push stream are normal and check the smoothness, which can be obtained through the background of Tencent Cloud.

If some users are stuck, you need to check the downstream pull flow status by checking the pull flow node or checking the user’s stuck log.

For optimization of caton, the work of anchor push stream side is as follows:

  • Network diagnosis: Choose a high-quality network
  • Set reasonable parameters: encoding Settings, such as frame rate Settings above 15.
  • Set GOP to a reasonable value. Large shows or e-commerce live broadcasts are generally set to 1-2 seconds.

The client needs to do the following:

  • Check the CPU usage. If the CPU usage is high, the CPU is stuck
  • Use the appropriate bit rate frame rate corresponding to the network environment
  • It is recommended to enable hardware acceleration for soft decoding
  • Adjust the playback buffer. Increase the buffer of the player. When the network delay is large, sufficient cache can be used to eliminate the delay
  • Diagnose the network condition, perform the network condition is poor, suggest switching prompts
  • Dynamic adjustment of playback bit rate: When HLS pull stream is normally used, it can match with a variety of bit rates. When FLV pull stream is used, Tencent Cloud SDK can seamlessly switch bit rate.

All the above work can be queried through the background of Tencent cloud.

For the delay of standard live broadcast, the delay optimization through CDN is also divided into two parts:

Anchor push stream end:

  • Network diagnosis: Choose a high-quality network
  • Set GOP to a reasonable value. If all GOP values are set, the delay will be reduced by corresponding multiple, but at the same time, the lag rate will increase, so a reasonable value should be set. 1-2 seconds is generally recommended.
  • Adjusting buffers, especially for OBS streams, are adaptive buffers, and SDK buffers also need to be adapted.
  • The server avoids transcoding and pushes the stream directly at medium bit rates.

User player:

  • Cache the IP addresses resolved by DNS, or resolve DNS in parallel
  • For asynchronous authentication, play before authentication. Generally, an authentication server is designed in 3D mode. Streams are pushed first and the authentication is performed in the background. If the authentication fails, the subsequent streams are disconnected.
  • Play buffer reasonable Settings: the larger the buffer, the longer the delay. If IGK or other open source players are used, it is recommended to set GB to less than 1 second, and the catch of the network should be set according to users’ requirements on delay, generally between 1 and 4 seconds (if you want to pursue a very low delay, such as RTMP connection delay, set it to less than 1 second).
  • Selective frame loss: The policy of frame loss is applied to both THE CDN and the player. If the USER is found to be a slow user and the network condition is poor on the CDN side, the CDN will selectively drop frames, and the player will also choose the same policy to reduce the delay of playback.
  • Fast playback strategy of SDK: under the condition of good network, buffering is performed at 1.5 times of playback speed. When the network condition decreases, it is adjusted to slow play and balanced selection to reduce delay and lag.

If the SDK of Tencent Cloud is used to push the stream, the uplink rate drops to the bottom, but the audio and video bit rate of the encoder does not change, then there will be lag and data accumulation. When the data accumulation exceeds the red line, it is clear that there will be lag and delay.

The SDK of Tencent Cloud mainly focuses on three parameters: network uplink rate SPD, VRA and ARA during audio and video coding. Normally, VAR+ARA=SPD

The downlink callback parameters of Tencent cloud SDK will be richer, and the adjustment of these parameters can optimize the problems of delay and lag.

Tencent cloud SDK provides three modes for low-threshold users: automatic mode (automatic adjustment according to network conditions), speed mode (no link delay is introduced and Catch is set to about 1 second), and smooth mode. Speed mode is usually selected for live broadcast of e-commerce or shows.

If there are higher requirements for delay and lag, there are two matching optimization schemes. One is based on QUIC optimization, and the other is based on WebRTC optimization. Tencent cloud currently supports QUIC acceleration, which can be pushed through QUIC mode by adding a flag when pushing the stream through RTMP. This acceleration mode is generally based on extreme speed mode, and downlink can be accelerated through QUIC or WebRTC.

The current mainstream mode is accelerated through WebRTC, and the advantage is that the SDK changes less. Based on the standard OBS protocol push stream, the upstream processing will have a delay of 3-10 seconds even if each process is delayed by default.

If the WebRTC policy is used, after the upstream service is successfully counted through the RTMP standard protocol, the downstream service is processed through WebRTC, transcoding and CDN distribution are optimized, and the service is directly connected to the SDK by proxy. When the client integrates the SDK or uses the Chrome browser by default, the delay can be controlled within 1 second.

The second optimization strategy is to use THE TRTC technology to undertake the uplink through WebRTC or RTC during the interaction with the mic. Based on UDP acceleration, the delay between WebRTC and the nearest server and between the TRTC client and the nearest server is very small. This strategy is suitable for the scenario where the host and the audience connect to the mic or multiple anchors PK.

In TRTC data analysis based on UDP mode, the end-to-end delay is about 350 milliseconds, and its optimization point focuses on optimization of more than 350 milliseconds.

For the solution of push-stream delay, it is generally buried in SDK, and the overall time of push-stream is less than 100 milliseconds. Data acquisition usually takes about 30 milliseconds. The preprocessing time is about 30 ms, and the time with special effect is higher than that without special effect. The coding time is generally less than 50 milliseconds, and the low-end model is more time-consuming; The jitter buffer takes a lot of time in flow pushing.

After analyzing the network time, WebRTC technology can be used to maintain the network time within 50 milliseconds.

It takes less than 100 milliseconds for the player.

  • The decoding time of playback is generally less than 20 milliseconds, and about 5% is more than 50 milliseconds.
  • The rendering time is generally less than 20 milliseconds, and the time with special effects is higher than the time without special effects;
  • At the player end, network fluctuation has a great influence, and the time of introduction is 20-200 milliseconds.

The data in the table shows a detailed comparison of protocols. When the network quality fluctuates, the playback delay does not increase. After the network is restored, the delay can be recovered in time.

WebRTC technology network control and playback strategy is smooth first, weak network environment can still ensure the playback, not always stuck.

In the field of low-delay livestreaming, in addition to the two main optimization directions of traditional livestreaming and WebRTC livestreaming, there is also the optimization based on QUIC. The mode based on QUIC is mostly applied to CDN downlink.

Under the condition of weak network, the lateness rate of open and closed QUIC is compared. It can be seen that the lateness rate of open QUIC is slightly better than that of closed QUIC. At present, some large manufacturers are also carrying out the optimization and testing related to QUIC.

Similarly, by comparing the delay conditions under the condition of turning on QUIC and turning off QUIC, in the stable state of the network, the delay of turning on QUIC can be reduced by about 100 milliseconds.