Morning of May 29, on “QCon Beijing global software developers conference”, pat LeYun Pano founder zhao plus rain was invited to a “special audio and video service solution” first technical experts, to bring us real time audio and video technology architecture of the thorough analysis, and interpretation of the racquet LeYun in video call, interactive whiteboard and the weak to fight best practices in the technology, At the same time, the industry’s first “art teaching real-time audio and video program” has been officially released, causing a lot of attention and praise.

QCon is a comprehensive technology event hosted by InfoQ under Geekbang Technology. It is open to technical leaders, architects, engineering directors and developers with more than 5 years of working experience to share technical innovations and best practices. In addition to Beijing, it is held in London, Tokyo, New York, Sao Paulo, Shanghai, San Francisco and other places every year.

Real-time audio and video products online, digital trend, has become a rigid demand of all walks of life. As a new generation of real-time audio and video cloud service leader, Pano team has accumulated nearly 20 years of profound accumulation in RTC technology, and pays attention to explore the value and implementation of these innovative technologies in business scenes. The following is the transcript of the speech.

#1 real-time audio and video “Two high and one low”

Real-time audio and video have a strong emphasis on “two high and one low”, that is, high quality, high fluency and low delay.

High quality refers to high fidelity audio, no echo noise, moderate volume and clear video;

High fluency refers to smooth audio and video without lag;

Low latency is a very important indicator in real-time audio and video products. Itu-t recommends that if the end-to-end delay is less than 200 ms, user experience will be very good; when the delay is between 200 and 400 ms, user experience will be degraded but acceptable; when the delay is more than 400 ms, most users will be dissatisfied.

In order to realize real-time audio and video high and low, it is necessary to optimize the system architecture, audio and video codec, global networking, server distribution, weak network confrontation and other links. Generally speaking, we should try to take into account the three indicators of “two high and one low”, but in practical scenarios, it is often impossible to do both, so we will choose between “two high and one low”. In video conference scenarios, high fluency and low delay are generally guaranteed, while high quality may be sacrificed. However, in scenes such as beauty live broadcast and e-commerce live broadcast, high quality and low delay may be more important, and the smoothness of the video will be appropriately sacrificed. In most scenes, low delay should be guaranteed as far as possible.

#2 Overall architecture of audio and video technology

Paileyun’s technical architecture comes from the video conference architecture, and on this basis, it has made evolution and iteration.

A typical video conferencing architecture is multi-DC and distributed, where clients are connected to nearby servers and media servers may cascade with and across DCS. The video conferencing architecture is very complex and involves many technical points, including: How to realize nearby access for clients, how to conduct global networking and intelligent routing, how to select and optimize audio and video codec, how to use SFU or MCU for servers, how to realize end-to-end QoS, how to interconnect with PSTN network and traditional SIP terminals, how to do cascading, etc.

From another dimension, The technical architecture of Paileyun is also a very typical microservice architecture, with service governance modules such as configuration center, service registration and discovery, global scheduling, authentication and authorization, as well as components to ensure high availability of services such as limited flow fusing, elastic scaling capacity, monitoring and alarm, big data platform and Pano Backbone. The core functions include audio and video calls, interactive whiteboard, and interactive live broadcasting.

#3 Practice of interactive whiteboard technology

The interactive whiteboard is a necessary function in online education, enterprise training and other scenarios. A good interactive whiteboard product needs to solve the following problems:

Performance, latency is low enough? Is handwriting synchronization real-time during frequent interactions? Does the synchronization fail or a blank screen appear on the weak network? We use private data format and do extreme compression, to ensure that the amount of data as small as possible, in whiteboard drawing and rendering using native technology, to ensure lower memory consumption, CPU consumption, in the network transmission using our global acceleration network Pano Backbone to ensure transnational, cross-operator real-time transmission.

How is the dynamic effect courseware supported? Are audio and video playback supported in courseware? We have developed the transcoding engine and whiteboard engine, which can achieve ultra-high fidelity of dynamic courseware, and also support various dynamic effects and audio and video file playback.

And audio and video synchronization push stream and synchronous recording how to achieve? We provide both audio and video and whiteboard functions, so only one API call can complete audio and video and whiteboard synchronous recording and push stream, users will be very convenient to use.

#4 Practice of weak net counter technology

The Internet has several characteristics. First, the Internet is essentially about delivery, not reliable delivery. When congestion and weak networks occur, packet loss may occur. Second Internet more emphasis on fairness, to the Internet for all packets is treated, when the congestion could be indiscriminate lost package, we are doing both weak network confrontation also need to consider the fairness, don’t put the congestion control algorithm design of special radical, finally, the Internet has a variety of multinational, across continents, the problem of cross operator. Common network problems include limited bandwidth, jitter, delay, packet loss, and disorder.

To resist weak networks, both hard and soft are needed. Hardware mainly refers to infrastructure and scheduling algorithms, and software refers to various QoS algorithms.

Pano Backbone has built a global real-time transmission acceleration network in the network infrastructure layer, which can realize global acceleration, intelligent routing and user access nearby.

At the software layer, our QoS algorithm consists of several parts. The first part is bandwidth assessment and congestion judgment. Generally, congestion is judged based on packet loss, delay, RTT and other indicators. When congestion is detected, various anti-packet loss strategies can be implemented, including FEC, ARQ, RED, etc. These strategies have their own advantages and disadvantages. Finally, network QoS is not only anti-packet loss, but also includes various congestion control methods such as smooth transmission, slow start and keyframe request.

In any weak network confrontation, we should not forget the “two high and one low”. Every QoS algorithm has applicable scenarios. If the wrong scenario is used, it may bring serious side effects. Even if the RTT is only 100 milliseconds, the final packet transmission takes 2 seconds, which is unacceptable in real-time systems. Another example is FEC, when 70% packet loss occurs, in order to achieve 99.9% packet recovery rate, taking five original packets as an example, it will also lead to 790% redundancy rate, which is very high for bandwidth consumption. When weak networks occur, we need to choose different anti-weak network algorithms according to different scenarios, and ultimately ensure high quality, high fluency and low delay.

#5 Initiate art teaching audio and video program

Online education is a very important application scene of real-time audio and video, combined with our technical accumulation and customer needs, we launched the art teaching audio and video solutions, to help customers build a strong interactive, immersive art teaching online classroom. Through the multi-camera capability, teachers and students can see each other’s video and drawing board at the same time. Through Angle correction can adjust the drawing board to the effect of shooting; The ability of video annotation can point out the key points of painting at any time; Through the dynamic effect courseware can be very convenient for painting teaching; It is convenient to replay and review after class through cloud recording; Pano Backbone provides local access for global users.

Paileyun was founded in 2019 and led by Sequoia China Seed Fund. It is the only real-time interactive communication cloud service provider with video conferencing background in China. The company brings together a large number of senior technical experts focusing on audio, video, whiteboard, network, AI and other fields. Through the integration of Pano SDK, enterprise developers can quickly realize interactive small class, super small class, double normal university class, voice chat room, video social networking, live broadcast, game voice, video customer service, telemedicine, office collaboration and other scenes worldwide.