Affected by the epidemic, online education ushered in explosive growth. As one of the online teaching platforms recommended by the Ministry of Education, Wisdom Tree provides online course content, online teaching platform and whole-process teaching services for nearly 2,000 colleges and universities, more than 300,000 teachers and more than 10 million college students in China. Since April 2020, UCloud URTC real-time audio and video products have been officially connected and used by Wisdom Tree, providing system services such as audio and video collection, processing, codec, transmission, and cloud transcoding and mixed stream.
Based on the 31 available areas, 29 dedicated lines and 500+ acceleration nodes deployed by UCloud in the world, URTC can provide a stable and smooth online course experience with low latency of 300ms on average, and supports the super-high concurrency capacity of millions of people.
In order to fully guarantee the real-time interaction and smooth experience of teachers and students in the Smart Tree online classroom, URTC has done a lot of network optimization work on the underlying network transmission technology. Through the global nearby access point access, self-developed HTTPDNS scheduling algorithm, packet loss and retransmission, it has realized the high quality communication of weak network. The video still runs smoothly even under 30% packet loss and the audio can still communicate normally under 70% packet loss.
Next, this paper will focus on the quality optimization practice of URTC in the process of solving network transmission path, network congestion, packet loss and other problems.
URTC transmission path optimization
The above diagram shows the simple architecture of URTC media service cluster. URTC realizes the nearest access of users through the global deployment of servers. The following focuses on the optimization practice of URTC in the network transmission path.
DNS is often used for distribution of access points in traditional live broadcasting. First, DNS is slow to parse, and second, it is confronted with the problem of hijacking, so it cannot be accessed well. URTC uses HTTP-DNS for the selection and allocation of access points. In order to ensure the efficiency and accuracy of the request, URTC will make HTTP-DNS request to several addresses at the same time. At the same time, we will distribute the user’s historical access data to large operators such as China Unicom, China Telecom and China Mobile through the way of historical record comparison. To speed up access data, non-large operators use ping speed for dynamic detection, and use delayed packet loss for fitting calculation to determine the optimal access point.
Link management and allocation of transmission
We adopt a centralized routing management system, in which all relay nodes are peer to peer. Relay nodes from different centers are connected to each other to form a graph, in which the connections at each point are equally divided into path weights. The forwarding control center plans the optimal path through real-time calculation and conducts path allocation when users request. And carry on the corresponding monitoring to the transmission path, carry on the link switch in the fault.
In order to reduce the transmission delay between data centers and improve the transmission throughput between data centers, we use the self-developed private protocol based on UDP for transmission between data centers.
Through the above three optimization schemes, URTC effectively solves the problems of the user’s nearby access, transmission path allocation, failover and other transmission paths, but we are still faced with the quality assurance of the transmission data content, especially the “last kilometer” of the user. Therefore, we further optimize the quality of the transmission content, mainly to solve the problem of network congestion and packet loss recovery.
URTC transmission quality optimization
1. Optimization of network congestion algorithm
Currently, there are many algorithms to solve network congestion, such as CUBIC, LeBAT, Scream, BBR, GCC, PCC, etc. These algorithms can be roughly divided into three directions: packet loss based, delay based and machine learning based. URTC adopts GCC algorithm and is optimized according to different usage scenarios:
- Live retweet scenes
In the scene of live broadcast retweeting, URTC mainly completed the task of upward push stream of users. At the same time, because in the scene of retweeting, users are not very sensitive to delay (800ms — 2s), but have relatively high requirements for the quality of transmitted content, URTC is more inclined to quality assurance in this scene, and its sensitivity to jitters is decreased. Therefore, we degenerate the GCC algorithm of URTC into a congestion control algorithm based on packet loss, with the goal of ensuring the quality of push flow as far as possible within the acceptable delay of users.
- Even the wheat scenario
Real-time interaction is emphasized in the scene of Linemap, and users have a high sensitivity to delay time (less than 400ms). Therefore, the algorithm needs to be more adaptable to the changes of the network to ensure better real-time performance. Here, the sensitivity of the GCC algorithm of URTC to jitter and delay increases, so we optimize it as a congestion algorithm based on delay.
II. Packet loss recovery plan
When it comes to packet loss, the causes of packet loss include transmission channel error code, wireless network communication instability, signal attenuation interference, network congestion, packets do not arrive on time, system jitter, etc. There are a few points that must be emphasized here. Since the goal of RTC is to have a very low latency, we should consider the two cases of delayed arrival and excessive jitter when defining the meaning of packet loss. In these two cases, the sample with too large a sample value should also be defined as packet loss in RTC. Only with a clear definition of packet loss can the problem be solved appropriately.
Traditional anti-packet loss algorithms
Traditional anti-packet loss algorithms mainly include NACK(packet loss retransmission request), FEC(forward error correction) and ACK(acknowledgement).
NACK is an active transmission request for data that has not been received, which can make a relatively accurate packet loss and retransmission request. However, NACK also brings some problems:
1. Too dense NACK requests are easy to form a large number of retransmission storms, resulting in a low success rate of retransmission and a waste of bandwidth; 2. Even if the retransmission is not intensive, when the packet loss rate is too large, the success rate of packet loss retransmission request and transmission feedback delivered to the source end is also decreasing, that is, the feedback packet is also facing the problem of packet loss; 3.NACK is bound to introduce an extra delay, preferably an RTT delay; 4. It is assumed that NACK will eat a lot of users’ bandwidth in a multi-player scenario, thus causing fluctuations in network transmission.
FEC essentially uses redundant data. For example, the simplest redundancy algorithm should send multiple data, which can be set to 3 times, to improve the correct and complete arrival of data. There are many FEC algorithms: XOR calculation, RS-FEC, fountain code and so on. FEC has the advantage of reducing end-to-end delay through redundant data, but it also brings additional problems: 1.FEC will bring additional redundant bandwidth consumption, and bad control will lead to more serious congestion and packet loss problems; 2. 2. Over-forwarding of FEC packets in multi-player scenarios will also cause excessive bandwidth consumption of the viewing users.
ACK is based on the related data confirmed that the sender according to confirm the serial number, determine if the retransmission of data or new data sending, ACK the common algorithm is a kind of application, but in order to improve the efficiency of network, an ACK is usually confirmation or delay, it will introduce additional delay.
URTC technology optimization
Considering the advantages and disadvantages of the above traditional anti-loss algorithms, URTC adopts the algorithm scheme of NACK+FEC+ARQ for packet loss recovery algorithm, but it also makes a lot of technical optimization in the concrete implementation. It mainly includes:
- Push up the stream end
Through the dynamic intelligent linkage of the three algorithms, URTC can dynamically adjust the ratio of retransmission and redundant data. When the packet loss is low and RTT is low, the data can be recovered through NACK. When the packet loss is high and the RTT is low, and the feedback packet is not received for a long time, the remote end will automatically make dynamic adjustment to adjust the ratio of redundant data, retransmission data and actual media data, and then get a new target bit rate. In case of high RTT low packet loss or high RTT high packet loss, NACK will be shut down and only FEC will be used. Of course, when the FEC proportion increases, the remote data will also be reduced accordingly.
At the same time, for audio sensitive scenes, URTC will ensure that the audio is transmitted first, and the video quality will gradually decrease until suspended in case of competition to ensure the audio quality.
- The service side
On the server side, the URTC has a buffered window for each user. The traditional media server generally adopts the way of pure forwarding, and the client carries out the corresponding packet loss and congestion control. The problem brought by this is that the server can not perceive the changes of the network when the network jitters, so as to find the congestion earlier. Therefore, URTC designs a network congestion module on the server side, whose main function is to perceive the network state, fight against network jitter, and reduce the instantaneous packet loss and retransmission storm caused by network jitter.
For end users with poor network, URTC first notifies the remote end to reduce the bit rate, the bit rate reaches the lower limit, and the data is discarded in the cache window to ensure the low latency of the receiver. At the same time for different network users, the server also sends redundant data according to the current network state.
The current cache window takes a single storage, each only records its current read position, to reduce the memory pressure.
- Pull down the flow end
URTC jitter buffer strategy to jitter, and adopts the intelligent broadcast strategy, strategy area using state machine, filling, broadcast, slowed down, waiting for, fast, etc., depending on the state machine, the data of different processing logic, to ensure smooth data broadcast and latency, NACK to RTT related strategies at the same time, Change the delivery interval according to the success rate of delivery to prevent the retransmission storm and bandwidth waste caused by NACK delivery.
Through some of the quality engineering optimisation and algorithmic engineering optimisation described in this article, URTC improved the anti-loss capability of audio from 20% to 70%, and the anti-loss capability of video upline from 20% to 30%. In the future, URTC will continue to optimize and improve the stability, low latency and fluency of real-time audio and video services, committed to provide more enterprises and developers with high quality, high reliability of SDK services.