Livestreaming and interactive livestreaming have attracted great attention from people in 2016 and 2017, resulting in numerous livestreaming apps. With the gradual rise of interactive live streaming, interaction has become a strong demand for live streaming apps. However, packet loss, delay, jitter and other problems in the actual network still seriously affect the effect of live broadcast. Aiming at the above problems, this paper introduces the network QoS technology of netease Cloud Communication Live broadcast (https://netease.im/live), aiming to help readers understand how to guarantee the smoothness and clarity of live broadcast to the maximum extent under the extremely poor network environment. In the process of watching live broadcast or interacting with the host, the audience’s feeling of audio and video fluency and clarity can be represented by objective parameters such as video frame rate, video PSNR (or SSIM) score, audio MOS score and so on. The higher video frame rate brings the higher video fluency, the higher video PSNR (or SSIM) score brings the higher video clarity, and the higher audio MOS score brings the higher audio fluency and clarity. Then, how to improve network quality by improving network QoS technology, so as to improve the above objective indicators? Here we introduce one-way live broadcast and interactive live broadcast respectively. Fluency and clarity of one-way live broadcast The one-way live broadcast here refers to a live broadcast mode in which audio and video streams are pushed to CDN through RTMP/TCP protocol, and then the audience pulls the stream to watch them. As we all know, TCP is a connection-oriented transport layer protocol, which ensures the reliability of transmission. By calling the open source framework Librtmp, developers can easily implement the RTMP push stream service. However, when packet loss and jitter occur on the network, TCP congestion control policy will limit the transmitting bit rate of the push stream end, resulting in sudden traffic lag on the audience end, affecting the fluency of audio and video. Generally, the strategies to deal with network packet loss include forward error hiding (FEC), audio RED redundancy, retransmission, etc., and adaptive bit rate adjustment strategies for audio and video to deal with network bandwidth limitation. Considering the particularity of TCP protocol, we cannot design flexible retransmission and adaptive bit rate adjustment strategy, and the amount and frequency of data transmission are completely controlled by TCP protocol itself. In this case, what we can do is timely and effectively detect the available bandwidth of the network, and adjust the output bit rate of the audio and video encoder to achieve bit rate self-adaptation. The specific implementation method is to obtain network information through the relevant TCPsocket interface of the platform (ios, Android or Windows), sense network congestion, estimate available bandwidth, adjust the setting bit rate of the audio and video encoder in time, prevent audio and video lag, and ensure fluency. Fluency and clarity of interactive live broadcast The interactive live broadcast here specifically refers to the live broadcast in which the linker pushes audio and video streams to the transfer server through RTP/UDP protocol, and then pushes them to CDN through RTMP/TCP protocol, and then the audience pulls the stream to watch. Unlike TCP, UDP does not care whether the data reaches the peer end in a timely and reliable manner. Therefore, we can use a variety of technical means to ensure the reliable access of UDP protocol data. For example: forward error hiding (FEC), audio RED redundancy, retransmission, etc. According to the network condition and media data, we adopt the corresponding strategy. According to the following technology respectively. Bandwidth estimation The function of bandwidth estimation is to accurately obtain the current available network bandwidth, and then guide the bandwidth allocation of the audio and video encoder, so that the actual transmitting bit rate does not exceed the available bandwidth, thus avoiding delay increase and packet loss. Common bandwidth estimation methods include bandwidth estimation based on packet loss or delay change. Google’s WebRTC contains a complete bandwidth estimation method, which is worth learning from. Error hiding When the audio and video data received by the receiver has been lost, how can we recover the data? From the perspective of audio and video decoding, lost data can be recovered through the previous frame or multiple frames of video (audio) data. However, the commonly used video error hiding methods often cause Mosaic phenomenon to the restored image, and the effect of error hiding is not good. Therefore, such error hiding technology is not adopted in most cases, but whether a frame of data is complete will be judged before decoding, and complete data will be sent into the decoder, and incomplete data will be directly discarded. Error concealment in audio field is another situation, the error concealment technology of audio is generally better than that of video, popular audio compression standards such as Opus, iLBC, iSAC/SILK, all contain their own PLC (PacketLossConcealment) module, The decoder will automatically hide errors when it detects frame loss, and the actual effect is acceptable. Forward error correction The forward error correction technology is equivalent to sending a part of data, which may be a copy of the original data, or the result of multiple original data calculation. If the original data is lost during transmission, the redundant data can be used to help recover the lost original data. Of course, this strategy sacrifices limited network bandwidth. Video data differs from audio data in that the packet size of video data is large, usually close to the MTU size, and the audience is less sensitive to the end-to-end delay of video data than audio data. Therefore, a large number of FEC groups can be used for forward error correction. However, audio packets are small, and the proportion of data packet headers in the whole packet is much higher than that of video packets. Therefore, RED redundancy can make multiple audio packets reuse the same packet header and improve data utilization. On the other hand, if the audio data uses FEC for forward error correction, it will inevitably increase latency and affect the call experience. Therefore, FEC technology is more suitable for video data forward error correction, and RED technology is more suitable for audio data redundant operation. Retransmission In addition to the forward error correction technology, when the network RTT is small, we can also request the sender to the lost packet in the network, which is the retransmission technology. Compared with FEC and RED, retransmission can greatly improve bandwidth utilization. The lost packets can be retransmitted in a targeted manner. Considering the audience’s sensitivity to audio data, audio retransmission technology is generally not used unless the network RTT is very small. Most videos use retransmission technology for error recovery, which is divided into I frame request and packet request according to the different data retransmission request. I frame request is when the receiver cannot continue decoding and the GOP length of the sender is very long, it needs to timely request the sender to send I frame, so that the receiver can recover the display as soon as possible according to this I frame. Packet request refers to a specific request sent to the sender based on the lost packets. In this case, the sender needs to cache the sent packets for future requests from the receiver. The above briefly introduces several methods and strategies for improving fluency and clarity in live streaming. In the actual use, we also need to consider the organic combination of various technologies, the cooperation between the server and the client, and the user scenario and the performance of the client device.