On January 13, 2018, Gao Zehua, the music craftsman of Agora. IO, shared his speech “WebRTC Architecture Optimization and Practice” in “The Way of Architect Training — Aurora Developer Salon JIGUANG MEETUP”. As the exclusive video partner, IT mogul Said (wechat ID: Itdakashuo) is authorized to release the video through the review and approval of the host and the speaker.

Read the word count: 2500 | 5 minutes to read

Access to guest speech videos and PPT:
suo.im/597g90


Abstract

WebRTC is now supported by almost all major browsers, including Firefox, Chrome, Opera, and Edge. WebRTC 1.0 standardization is also at a very advanced stage. More and more companies are using WebRTC and adding it to their applications. So what should enterprises look out for when building WebRTC applications? This talk will explain the typical architecture of standard WebRTC systems, WebRTC pits and optimization practices, browser adaptation and platform compatibility practices, etc.

What is WebRTC?

WebRTC is first and foremost a standard. The W3C standard for WebRTC 1.0 is now available and version 2.0 is in the works (as of presentation time). The second is a protocol that developers can follow to achieve communication with WebRTC. It’s also an engine, since its predecessor was the GIPS engine for audio and video. In addition, there are a large number of audio and video algorithms under this engine, so WebRTC can also be said to be an algorithm.

WebRTC’s past lives

As mentioned above, WebRTC is the predecessor of GIPS, and the development of GIPS is actually divided into two stages: the first stage is Global IP Sound, and the second stage is Global IP Solution. In the Global IP Sound stage, it is used as an audio communication engine in various embedded system devices, mainly responsible for echo cancellation, noise reduction, codec and other basic functions.

After that, the audio service joined the video service, which is the stage of Global IP Solution. Later, in the communication with customers, they continuously added IP communication protocol and RTP protocol to realize the ability to connect with the network. However, due to commercial irrationality and failure of sales strategy, it was eventually acquired by Google at a price of more than 60 million dollars.

In terms of the evolution of technology, GIPS to WebRTC is actually a encapsulation of the Engine layer, and by the way, the original GIPS missing network module is added. Generally, to provide audio and video services, there must be a server. To avoid this mode, WebRTC adopts P2P communication mode.

How do I use WebRTC

Personally, MORE than 99% of real-time communication related apps are increasingly inseparable from WebRTC. Even if the application code framework is different, WebRTC still has many classical algorithms worth learning from.

WebRTC can often be used as a starting point for learning and migrating algorithms, abstracting its audio and video engine, or as a PaaS or SaaS service.

Although WebRTC has a variety of different applications, due to different goals, there will be different emphasis on combining WebRTC’s own capabilities. It is necessary to look at the corresponding code, find out the defective parts and make breakthroughs. For example, to do SaaS and PaaS services, prepare for server-related work.

Use WebRTC to provide SaaS

WebRTC can be used as a SaaS service, but you need to think about what kind of SaaS service you want to do first. For example, for audio and video communication services that are not based on the Web, it is necessary to consider the client to communicate with each other.

Different industries require different services, such as call centers and education, which use the Web directly. But WebRTC doesn’t work well across multiple browsers, and video codec support varies.

When using P2P, we should flexibly use it according to service quality requirements and user characteristics. For example, P2P may be a better choice when most users are in WiFi environment, but the effect of P2P on 4G network will be worse.

If users have high requirements on service quality, you are advised to set up servers and deploy o&M monitoring and load balancing capabilities to ensure that users are unaware of problems on back-end servers.

Applying these measures to purely Web-based SaaS services is not enough, and there are many details that need to be worked out. For example, the user is connected to an audio device, or the audio communication product of a browser is not properly adapted on the local machine, resulting in various problems such as echo.

Since the bottom layer is not built by itself, users can only be reminded to test the service before using it, or make other suggestions.

Use WebRTC to provide PaaS

The so-called PaaS is to provide a platform for manufacturers to use, different from SaaS, it does not do too fine in the upper application, its main goal is to provide more stable and efficient communication services.

In SaaS services, we can analyze user behavior and optimize for user usage scenarios. However, in the Paas service, users are so different that they may be involved in IoT, education, games and other fields, and the original WebRTC engine is definitely not enough.

Therefore, in addition to a variety of basic services including SaaS, we also need to abstract a set of APIS, and then adapt to each mobile device, but also according to the application scenario to provide a variety of value-added functions, to provide scenario-based special optimization and package tailoring.

How to use WebRTC to develop AV Engine

Since WebRTC is essentially more like an audio and video engine, it is much more convenient when used as an AV SDK.

Of course, the premise is that the user has enough experience to customize the parameters according to the specific scenario.

The audio part is encapsulated in WebRTC with 4 modules, ANM (network module), APM, ACM (codec module) and ADM. The corresponding video has the same 4 modules, so there are 8 modules in total. In my opinion, WebRTC focuses on the 8 modules themselves and the configuration of parameters between them.

However, this also brings the corresponding disadvantage, we have to do mobile function instance optimization, performance instance optimization and some special optimization, which also makes the debugging more and more.

How to learn WebRTC algorithm

Only after learning the algorithm of WebRTC, can we explain to customers from different levels why we should adopt the current scheme and why we do not use other schemes. The essence of WebRTC is the underlying engine part, but to understand this part you need to study the algorithms.

As mentioned above, WebRTC has 8 modules and 2 engines, among which the audio module includes APM, ACM and ADM, and the video module includes VNM, VPM and VCM.

APM

APM covers AGC, ANS, DE and echo cancellation algorithm NLP. Let’s take a step-by-step look at what’s inside.

There are two common echo cancellation algorithms, one is adaptive filtering, the other is NLP. In fact, adaptive filtering has been done well enough before WebRTC. At present, the research in this area has basically stagnated, and there may be some research value in multi-channel and stereo echo cancellation. NLP is different, and there are many more details to explore, because the sound design of each cavity is different, resulting in differences in the nonlinear part.

There are two echo cancellation modules in WebRTC, respectively AEC and ANS. AEC’s NLP algorithm takes into account a lot of details. If you want to study its algorithm, you can take a good look at its patent description.

DE is a delay estimation module, and now almost all delay estimation modules use this set of algorithms. The main task of AGC is to turn up the non-noise part and turn down the noise part. The key here lies in how to distinguish the noise, which is equal to a problem of noise reduction.

AGC in WebRTC is put together with VAD. VAD uses GMM model to judge whether the current Voice is Voice or not through statistics, and then combines it with AGC. Although the parameters in AGC still need to be adjusted, the algorithm is still good and can be used directly.

ACM

The codecs of WebRTC include ILBC, ISAC and Opus. ILBC is narrowband encoder, ISAC is broadband encoder, and Opus is full-band audio and voice unified encoder. Opus can do very well with strong CPU performance and high bandwidth.

ANM

ANM does bandwidth estimation and congestion control. Due to the large bandwidth now, few people have done bandwidth estimation for audio, and video is still relatively common. What is interesting is that the bandwidth estimates for audio are written into the ISAC code.

Packet loss can be classified into random packet loss and bond packet loss. Congestion belongs to bond packet loss. For example, continuous packet loss or failure to send packets can be regarded as congestion.

PLC in ANM is a stretch fast forward and slow play algorithm, such as to store 200 milliseconds of data in the 100 milliseconds packet, PLC will slow down the packet to achieve 200 milliseconds effect, through this way to deal with network packet loss. The advantage of PLC lies in variable speed.

video

Compared with the audio module, the video part can be mined for a lot of space, easy to make differentiation.

Video will be more demanding on the network, and live broadcasting and communication are common scenarios. While the focus is on clarity when broadcasting, the focus is on fluency when communicating. The different emphasis brings more parameter adjustment. For example, considering anti-packet loss with codec, considering codec with noise reduction, as well as hardware adaptation.