background

Real-time audio and video calls are ubiquitous in our daily lives, including social, security, transportation and so on. Complex user scenarios, stringent requirements, and inconsistent network environments bring great conditions to real-time audio and video calls. We have done some work in this direction. Although we are still at a disadvantage compared with the optimization work of other big factories, there is still a lot to be optimized and improved, but I would like to share with you some progress and work content at present.

0.1 Network Transmission:

We know that there are two kinds of network transmission, NAMELY TCP and UDP. The relevant advantages and disadvantages are shown in the brain diagram below. There are many reasons affecting the quality of network transmission, including network congestion, network packet loss and so on. These factors directly determine the quality of real-time video calls and have a great impact on users’ experience. This is the basic reason why we optimize.

0.2 Definition of weak network:

For real-time audio and video calls, weak networks are characterized by the complexity, heterogeneity, irregularities in protocols, network anomalies, and network errors of the network environment. Weak network environment could not provide a high quality of network transmission, is unable to receive continuous media pack for the receiving end, cause abnormal sound, the phenomenon such as video Mosaic, flower screen, black screen, for audio and video real-time is very deadly, directly affect the user experience, the product quality problem or customer complaint issues.

0.3 Real-time audio and video features

Low latency and high quality, which are the highest requirements for real-time audio and video calls, are inherently contradictory. High quality requires the sender to send high resolution and high quality audio and video streams as far as possible. It has high requirements for bandwidth and network environment, and does not allow various packet loss and high jitter. However, low delay is not so strict for the network environment, allowing certain packet loss and receiving jitter within a certain range. Otherwise, only space can be exchanged for time, resulting in real-time timeliness of audio and video, which cannot meet the requirements of low delay. So it’s a tale of contradiction and shield. Such harsh conditions can only be met by seeking a breakthrough on the spear and giving protection on the shield.

The FEC

Webrtc FEC can be implemented in three ways: RED (RFC2198), ULPFEC (RFC5109), and FLEXFEC(not yet approved). However, Ulpfec uses RED’s shell for transmission, so we use Ulpfec for FEC protection.

Its basic principle is: at the sending end, add some redundant FEC packets for media packets, FEC packets are obtained by XOR. If some media packets are lost during network transmission, XOR is performed between the received media packets and FEC packets to obtain the lost media packets. In this way, network resources are not occupied by sending NACK retransmission packets.

Since this part has been well realized in WebRTC, only compatibility test is needed. This part is not regarded as a key optimization object, so the content in WebRTC can be used.

Two NACK

2.1 Introduction to NACK:

NACK stands for negative confirmation. It is one of the error recovery mechanisms in WebRTC. NACK is the receiver’s way of indicating that it has not received a particular packet.

NACK messages are sent via RTCP to the sender of the media, which needs to decide whether to retransmit the lost packets it ever received based on its availability in the cache and its estimate of retransmission usefulness (availability or not). The sender maintains a buffer queue, and if the retransmission packet is in the buffer queue, it is removed from the buffer queue and sent again. If it is not in the buffered queue, it is no longer sent, so that the decoder cannot receive the retransmission packet.

2.2 Optimization and improvement:

Because the current system is still using the old version of the NACK process, the specific process is as follows:

  • After the RTP module decapsulates the packets, the packets are sent to the JitterBuffer module in the order of arrival.
  • When inserting the JitterBuffer module, the sequence numbers of each data packet need to be compared and sorted. If the sequence numbers of the data packet are relatively new, the NACK construction is carried out, starting with the sequence number saved for the previous time +1 and ending with the newly received sequence number. Cache the serial numbers between to missing_sequence_numbers;
  • WebRTC uses thread query in the JitterBuffer, traversing every certain time to confirm whether there are packets that are not received in the current time node in the current NACK List. If there are packets, it will assemble and send NACK RTCP packets to the sender. Request to send the corresponding received packet.

NACK is a confirmation process of unarrived packets. There were many nested functions or complex parts in the original process, so we optimized it twice on the basis of further understanding. The approximate flow chart of the two rounds of optimization is as follows:

There’s another question NACK can’t avoid: If the network packet loss rate is higher, or the network jitter, network heterogeneity leads to all sorts of out-of-order reached the situation is serious, such as jitter exceeds 200 ms, some lost packet has not arrived, the package will repeat send many times, this will lead to network congestion phenomenon, especially when the resolution is higher, extremely easy to cause the video frame can’t complete the decoding, A Mosaic or black screen appears.

Therefore, we added and modified some judgment conditions, optimized the judgment conditions of cache queue and empty queue, and partially adjusted and optimized the process of obtaining complete video frames, so as to mitigate the influence of the above situation and improve user experience.

Three bandwidth adaptive

3.1 Bandwidth adaptive:

When we conduct video real-time call, we set parameters according to different frame rate data on the device side, so that the sender side maintains a maximum frame rate parameter set. Then the collected images are encoded and sent to the network by subcontracting. However, if the network changes and the current bit rate is still used to send frame by frame to the uplink network, or the acquisition end cannot consume the image frame sequence due to the unstable coding performance of the acquisition end, the sending end will reduce frame rate or drop frame to relieve the sending pressure of the sender end.

In WebRTC, when the encoded transmission bit rate is overloaded or the load is uneven, the relevant interface of MediaOptimization class can be called to reduce or increase the frame rate and then reduce or increase the bit rate, so as to effectively use the current bandwidth and prevent the network from getting worse or the network bandwidth load is not enough to affect the user experience.

3.2 frame

Sender: Estimates the available bandwidth based on the packet loss rate

Receiving end: Calculates available bandwidth based on packet arrival time

Synthesis: The receiver sends REMB feedback to the sender, and then determines the final sending rate based on the bandwidth estimation of the sender and the bandwidth estimation of the receiver.

3.3 the sender

The basic principle of the bandwidth estimation algorithm on the sender is as follows: It reads the packet loss rate information in THE RTCP, dynamically calculates the basic situation of the current network through the algorithm, and determines whether to increase or decrease bandwidth resources. If it is determined that the bandwidth needs to be reduced, the TFRC algorithm is used to smooth the process and reduce the risk of sudden increase or decrease.

3.4 the receiver

The basic principle of the bandwidth estimation algorithm at the receiving end is to read the statistics of the received RTP data and estimate the current network bandwidth; In WebRTC, kalman filter frame is used to complete the basic statistics and calculation of the receiving and sending time stamps of the current frame, so as to estimate the current network bandwidth congestion and utilization, evaluate and correct the bandwidth size, and then affect the network bandwidth.

Iv optimization and improvement:

4.1 Optimization and improvement

The main optimization points of our work are as follows:

NACK two rounds of optimization: Including the overall improvement of the previous version of the algorithm (reconstructing the original R4X related code, adopting the same NACK algorithm as iOS, On this basis, the optimization of cache queue and emptying judgment conditions, the optimization of process related to obtaining complete decoded frames, the adjustment of long-duration cache frame discarding strategy, and the optimization and adjustment of Jitterbuffer parameters were carried out.

FEC, dynamic resolution, NACK overall strategy optimization: According to different network conditions, according to the packet loss rate, RTT and other related parameters, as well as the average jitter in 5s, an overall scheme is designed to dynamically adjust the current combination. It not only adds some redundancy to prevent the time-lapse phenomenon of high FEC packet loss, but also gradually reduces the resolution to achieve smooth playback experience when high PACKET loss occurs.

Network storm Suppression optimization: WebRTC has a network suppression policy for retransmitting packets. When the ratio of retransmitting packets is 35%, NACK retransmitting packets and FEC redundant packets are no longer sent. However, this ratio is very unfriendly to 720P and over 30% packet loss. It can reasonably avoid network storms and meet the retransmission policy of NACK requests. If 720P and 30% packet loss is achieved, 15-25 frame rate can be obtained to achieve smooth playback.

4.2 Optimization Results

The overall solution has been tested and accepted in some laboratory scenarios, including IP and SIP calls, 720P and VGA resolution verification, and improved user experience to some extent compared with the previous version. Especially under the condition of wired network, the improvement of IP live broadcasting is obvious, and the overall subjective score is obviously improved. At the same time, the counter delay is also created from scratch, which can cover less than 300ms environment. The scheme has been used in the actual environment, recognized by users, and in the process of PK with competitors, overall lead.

Five issues and Measures:

5.1 Congestion Detection:

You need to quickly and accurately locate the current network status. Once congestion occurs, you can quickly adjust the current sending policy. A video project has already begun.

5.2 The latest WebRTC medium weak network control

As an independent module, the NACK module in WebRTC is integrated into the VIE module and decoued from the Jitterbuffer module to realize real-time monitoring of network packet loss and independent transmission.

Advantages: The NACK list can be obtained in real time. Decoupled from the JitterBuffer to obtain the JitterBuffer more conveniently and quickly. A video project has already begun.

5.3 Weak network front

To participate in the AUDIO network RTE2020 Internet Real-time Internet Conference, audio network has achieved a 720P 2.0m bandwidth of 65% packet loss, smooth playback.

1. The application of deep reinforcement algorithm in congestion control;

2. Deep optimization of real-time H264 video encoder algorithm.

These are the places we will investigate and study in the future. We also look forward to discussing and learning with children who have relevant experience.