In the post-epidemic era, cloud conference has been known and used by more and more enterprises, and is becoming the “new infrastructure” of enterprise digital office. The underlying technologies supporting cloud conference, such as RTC and audio and video technology, have also attracted much attention, and are enabling more industries to improve digital management and contact-free service capabilities. Zhang Jun, General Manager of KT R & D Center, was invited to share two major difficulties in RTC: compatibility with massive equipment and dealing with complex network, and to share KT’s customer demand insight, technical thinking and technical innovation practice in this process.

The content was shared by Zhang Jun, general manager of KT Cloud R&D Center, in the second half of the video conference.

It is a great honor for me to make the first sharing on behalf of Ke Tianyun. I have been doing conferences for the past 20 years. Today, I will share with you my recent practical experience in equipment compatibility and network response in video conferencing.

First of all, let me introduce myself. I joined WebEx in 2003 and joined Cisco after Cisco acquired WebEx in 2007. In 2014, Cisco and TCL teamed up to set up KT Cloud, which led to the establishment of KT Cloud. It has been six and a half years since the end of 2014 to today. In this process, we have built the basic platform of audio and video, made video conferencing and intelligent customer service based on it, and integrated these audio and video products with relevant emerging technologies.

1 Introduction of Ke Tianyun

The above picture shows the current products made by Ketianyun. The team of the company brings together many talents who have worked in Cisco, Huawei, IBM and other enterprises, and have very rich experience in the development of audio and video products.

EnDOWS all industries with new vitality with audio-video technology

We have made many adaptations for many terminals. Some terminals requiring audio-video technology may be unexpected, such as doorbells, sweeping robots and drones, which also need the ability to access audio-video conferencing. Our self-developed Universal SDK for audio and video can connect all kinds of devices, whether directly for use in the public cloud, or if you want to deploy in the private cloud.

3. Two difficulties in RTC

RTC has two difficulties: one is how to be compatible with a large number of devices, and the other is how to deal with the complex and changeable network.

3.1 Adapt mass terminals

Let’s start with the device. The diagram illustrates how videoconferencing covers the needs from the pocket to the boardroom, from the simplest phones and Pads on the far left, to desktop devices, to two-screen and three-screen telepresence. Those of you who have used video conferencing have probably heard of TelePresence, which has a life-size screen, and many television interviews are conducted using TelePresence devices.

3.2 Adapting mass terminals

Many people may not think of this when they have meetings at home, but most companies still use SIP\H.323 hardware terminals to have meetings in the conference room. Some manufacturing customers with security requirements use VDI virtual desktop. This kind of hardware can encounter a lot of problems when running meetings. VDI point-to-point calls can have the so-called hair-card effect, and audio and video have to go around the virtual server. Special measures must be taken to make audio, video and conference clients on this server. Like SIP and h. 323 protocol compatible is very complex, at least compatible with three major international manufacturers such as cisco, POLYCOM, a deal, the earliest three major domestic manufacturers such as huawei, zte, xiamen union, these vendors to support head, the old equipment manufacturers many years ago the old version of the compatible challenges especially, talk about some details in the rear.

3.3 Adapt a large number of home appliances

By last year, with the rise of the Internet of Everything and the strategy of parent TCL, which has a variety of home appliances on its screens, to bring entire audio and video to all home appliances, the company had begun to expand the types of adapters. This is also a very big challenge. In the past, we mainly made Android phones. Now, Android TV needs to adapt CPU, GPU, and resolution, from portrait to landscape. Now the market has also rolled out a sweeping robot with a camera, a range hood with a screen, a refrigerator has a screen, smart doorbell also has a screen camera, these devices are varied. Linux, such as the Raspberry Pi sweeping robot at the bottom right. The smart doorbell uses Huawei LiteOS, and there are some chips developed by the manufacturer, such as Qigan chip, which is equipped with FreeRTOS operating system to do hardware and operating system adaptation.

I think it is very meaningful, some old people and children or used to watch TV, new TV cameras, out of office we want to see the old man and child in the home, can be very convenient on the phone check it, don’t need at home that the old man child do any operations through the audio and video, very convenient, always caring family. Including home messaging and visitor calls on all home appliances, can produce synergies. Many years ago, we were in the 2B market of meeting rooms. After the epidemic, we are also adapting audio and video to a large number of home appliances. These appliances are very different from the chips used in mobile phones.

3.4 Octopus cross-platform all-end SDK

We put a lot of effort into the above terminal adaptations. With pure C code to achieve all the end-to-side functions, WEBRTC is written in C++, with C language to do portable, do very small installation package, can adapt to ultra-low power consumption, the use of different compilers to adapt to different platforms. Despite the porting and tailoring, we achieved very good results in terms of weak net resistance and low latency on the first frame.

4 audio weak network lag phenomenon

I’ve just sketched the audio flow a little bit more easily. Audio as a whole is a little easier than video from send to receive. We’ll talk about video processing which is more complicated than audio in a moment, and most of the overall loss is on the network side, and some of the device side loss and signal processing, and the anti-loss and congestion control, the audio and video technology is similar.

4.1 Comparison of anti-packet loss algorithms for audio

The above diagram simply summarizes and compares the anti-packet loss techniques of audio. Audio overall bandwidth consumption is not very large, even if the overall 64K broadband audio, that double the FEC redundancy is also acceptable. Many customers still want Web-based audio and video SDKs to integrate directly with their OA and CRM systems in the browser. In the browser with OPUS FEC belt, anti-packet loss has an upper limit, unlike their own soft client, there can be more anti-weak network means, according to the loss of packets to do dynamic redundancy adjustment. There are a lot of sharing about packet loss retransmission and error hiding technology in LVS, so I won’t go into details this time.

4.2 Octopus Audio Anti-packet Loss

In terms of the client side, the redundancy algorithm developed by ourselves is better. In the actual test, the audio is acceptable in the scenario of about 70% network packet loss. In the browser and SIP terminal, there are not as many optimization methods as in the soft client side, and we can only use the FEC in the OPUS belt, which can resist about 40% packet loss.

5. Octopus Video QoS Indicators

Video QoS can be broken down into the following major metrics: first screen, frame rate, latency, size, bit rate, and lag.

5.1 Whole process of RTC video

The above diagram is a simplified diagram of the entire flow of RTC video. It’s more complicated than audio, and at the heart of video transmission is a bit rate controller, which needs to control the coded bit rate, FEC redundancy, packet loss and retransmission rate, and finally the transmission rate to the receiving end via the same Sender. After many generations of iteration, at first it was completely dependent on the receiver to feedback the receivable bandwidth in the REMB message, and now it mainly evaluates the bandwidth at the sender (transport-cc message based on the receiver feedback).

5.2 Octopus video anti-packet loss

Finally, through the optimization of the above links, the video can resist about 60% of packet loss. According to NACK request, the packet loss retransmission is carried out. In the low RTT scenario, the anti-loss effect of NACK retransmission is relatively obvious, and it is also combined with some FEC error correction. In terms of SIP video, video resolution and bit rate need to be dynamically adjusted according to the packet loss rate and delay situation reported by SIP devices. The software and hardware of SIP terminal are difficult to control, and there are many manufacturers, which are more difficult to tune than the soft client.

5.3 Video streaming control — BBR

BBR won’t talk much about it this time. In April last year, Google removed this part of the code from WebRTC because it was not as effective as GCC.

5.4 Video streaming control — GCC

The above diagram shows the classic Google congestion control model, and we use its PACER module in the following example.

5.5 Use PACER to do video traffic shaping

Here is an example to share. If the conference push stream and share is SIP hardware terminal, it shares the desktop through the computer HDMI cable, and the video is also pushed through the SIP terminal. The other end is also SIP terminal, called double stream in the industry, the left side is video stream, the right side is shared stream, double screen display. Some older SIP terminals have limited processing power and can experience some problems.

The CFO of the client I met last year likes to use the desktop SIP terminal for sharing. The I frame of the shared content is very large, and the financial statements are dense with numbers and marked with various colors. Sometimes, the receiving end sees a splintered screen and unclear places in the weak network, because both the sending end and the receiving end are SIP, and the receiving device is relatively old. If densely sharing the I frame, the receiver as CPU, memory, limited resources, will in the heart of the kernel part of the packet dropped, so want to do when forwarding flow control, let whole traffic shaping before forwarding, particularly in terms of sharing flow, flow control effects enhance or obvious, you can see the I frame and P frame has nearly ten times the difference of flow rate, If the number of the number of uncontrolled transmission will produce congestion, in the forwarding part must do a good job of flow control, equivalent to the dam before the flood.

6 Bandwidth Estimation

In the past, the receiver was the main feedback. Now, WEBRTC mainly uses the transport-cc of the sender based on the feedback of the receiver to evaluate the bandwidth. It is different from the message fed back by the REMB receiver, including the delay filter.

7. Use FMO for video error hiding

It was previously analyzed that the screen sharing between two SIP terminals would lead to the solution that the receiver end would spend the screen in the case of weak network, and the video in the case of weak network would also have similar problems. The peers doing meetings knew that when receiving with SIP terminals, it might be speaker mode or equal screen mode. At this time, the video is sent through MCU, and then the video is mixed and pushed to the receiving end. We add FMO in the recoding of the server, which is beneficial to prevent errors from being magnified and will not cause long-term large area of screen splashing. For this part, there are many details that you can check by yourself. Its disadvantage is that the coding complexity increases, and increases the bandwidth, but any optimization has a certain cost and cost.

Octopus signaling upgrade

If the audio and video are better weak network antagonism, signaling is the next point we need to optimize, to consider the upgrade of signaling. At first, many peers used WebSocket to send signaling, and some people did signaling transmission on the private RUDP protocol. The new WebRTC standard pushed WebTransport and used it to upgrade signaling, which has many advantages compared to WebSocket based on TCP. Here’s a quick list of five: faster connection handshake, connection multiplexing, built-in FEC mechanism in the protocol, support for connection migration, if you go from WiFi to 4G, the meeting reconnect process is better when switching network at home, to Office.

9 WebTransport

As a simple comparison, the new WebTransport abandons the directly QUIC-based Transport. The new standard is WebTransport over HTTP3, which is an improvement over the current WebSocket and Datachanel signaling.

Full link quality monitoring to feedback transmission optimization and equipment compatibility

Ultimately, a full link quality monitoring system is needed to provide feedback on transmission optimization and equipment compatibility. For example, when we write code, we need a feedback safety net of unit tests to do refactoring. We also have a lot of metrics to give feedback on. Look at the transmission optimization and equipment compatibility, and do a good job of reporting indicators and events.

That’s all for today, thank you!

For more information, please scan the __ QR code in the picture or click __ to read the original __ to learn more about the conference.