This article is compiled from the RTC conference, Chen Gong’s speech “Web real-time audio and video service architecture and practice”. Welcome to the RTC developer community to exchange experience with more developers and participate in more technical events.

Chen Gong is responsible for the technical architecture of audio and video communication on the web. Graduated from The University of Science and Technology of China, Ph.D. Former multimedia architect of Intel Server Division, leading the construction of video conferencing solutions based on WebRTC. Once worked in Marvell Video Business Division, studied multimedia system framework, and participated in Google TV, OTT and other projectsCopy the code

What are the features of real-time communication on the web

First, on the browser side, it relies on the browser’s ability to get audio and video, as well as powerful rendering capabilities on web pages, so it can lay the foundation for a high-definition communication experience. At the same time, compared to the mobile terminal, the screen is larger, window selection is more flexible.

Second, cross-platform. We all know the particularity of browsers on various terminals. Browsers are not only available on PC and mobile terminals, but also embedded in some well-known social apps. This requires a cross-platform experience, and more and more browsers support WebRTC, which is a feature of real-time web communication

Third, no installation, convenient access. With the popularity of WebRTC, it does not need to install any plug-ins to achieve real-time communication on the web side.

In what scenarios can web real-time communication be applied

First is live streaming, which is very popular. There will be demands on the host side of live broadcast, and it will start broadcasting on the web side, because the screen of the web side is larger, the video is clearer and the processing capacity is stronger. At the same time, it is also very interesting that one of our clients is using the SDK of our webpage to monitor the live broadcast. As we all know, there are a lot of rooms for live broadcast. 40-50 rooms can be monitored on a webpage. It is convenient to use the real-time audio and video SDK on the webpage to be the inspector of live broadcast.

Another is online education, online education teacher end is generally on the PC, if you want to install applications, some teachers are not very understand computer technology, to configure it is more trouble. If there is a web installation free scheme, they will use it, the user experience will be better. In addition to audio and video, online education also requires screen sharing and whiteboard. Since both are H5 technologies, it will be very convenient to integrate with the SDK on the Web side.

The last thing is video conferencing, if you’ve ever used a browser in your company, you’ll have a video conferencing experience, HR will send you a link, at a certain point in time you click on the link, in addition to some instructions about what you need to install, it’s a bit more complicated. Now with installation-free WebRTC, the video conferencing experience will take a step up. These are typical scenarios, but there are also telemedicine, enterprise collaboration and so on.

From the technical point of view, is the real-time communication of the web terminal mature? Are you ready?

When it comes to real-time communication between web pages and browsers, Webex is the first thing that comes to mind. Its experience is very good, and it also cultivates a large group of target users, enabling them to realize that video communication can be conducted on the browser, opening a market. However, it has the disadvantage of having to install browser plug-ins and extensions, which is the same as GoToMeeting, which is very inconvenient. Alternatively, you can install an application on your PC that captures and processes the stream of audio and video. The web side is just for rendering and presentation.

For a long time, these web users felt that this was the way the technology was, that this was the experience, and that it couldn’t be improved. It wasn’t until 2011 that WebRTC came along and was promoted by Google. WebRTC’s experience is notable for being install-free. Now, after almost six years of development, there are many doubts, such as whether Google’s project will be abandoned, and whether major browser manufacturers will not support the idea of opening up the browser ecosystem.

Is ##WebRTC mature and ready for productization?

First, take a look at the current knowledge of WebRTC browser and platform.

WebRTC was first supported by Firefox and Chrome, followed by Opera, which has the same kernel. Microsoft proposed an ORTC protocol in the last two years, which is somewhat similar to WebRTC. ORTC didn’t go well, and now WebRTC is supported in Edge. Following apple’s recent IOS update, WebRTC is now available in Safari 11. On Android, WebRTC has been supported for a long time.

We take a look at these browsers in the market share, it is not difficult to see, now in addition to accounting for 8% IE is not supported, the other are actually supported.

Let’s look at it from the protocol stack point of view. WebRTC 1.0 Spec has been finalized, except for some details that have not been finalized. Support for standards is also getting better across browsers. Although Google itself is promoting the technology, Google’s own browser, Chrome, is not very good at supporting WebRTC 1.0 because Google has done a lot of experimental things with WebRTC internally. The Chrome team says it will be fully compliant with WebRTC 1.0 by the end of this year.

On the other hand, look at the open source community. Kurento, for example, is a powerful multimedia processing framework that supports the WebRTC protocol stack. It can be used as Media Server, background transcoding ability, and OpenCV processing ability. Licode can be used as a lightweight communication platform of WebRTC and is a server processing mode of pure forwarding. Janus can be used as a WebRTC communication gateway and is relatively lightweight. Of interest is the React-native WebRTC. Now more and more developers are moving to the front end, JS. This is more common abroad. There is a project in the open source community that encapsulates a WebRTC module so that developers can easily implement real-time communication and WebrTC-compatible applications on mobile phones.

Finally, look at the ecosphere. At the CPaas level, companies like Audio, Twilio and Tokbox are contributing.

Market analysis is very optimistic about WebRTC, and the market size is expected to reach $6.49 billion by 2022. Overall, WebRTC technology has never been better.

For developers, how to use this technology and how to build a WebRTC system? Well, there is a certain way.

We know that the underlying WebRTC is based on P2P connections, point-to-point communication. Many developers will do P2P quality checks when they get started. Let’s say a company product manager tells a developer, “WebRTC is so hot right now, you give me a whole WebRTC system.” Eighty percent of developers will deliver such a mesh WebRTC architecture.

So what are the characteristics of an architecture that is entirely based on point-to-point communication? The delay will be smaller. However, there is one big drawback. This kind of point-to-point audio and video stream transmission, each user has to transmit their own video stream to another user, so it puts a great pressure on its upstream bandwidth. In addition, each video stream is independently collected and encoded, which is also a great test to the browser coding pressure. One might ask, is it possible to collect the code once and then send the stream to different terminal receivers? Unfortunately, this is not possible. Because WebRTC protocol is to do end-to-end quality policy optimization, so it has some policy adjustments, are end-to-end RTCP to achieve, must go through multiple encoding, and then transmitted to different receivers.

The table in the lower right is a list of systems currently using WebRTC on the Internet, according to a respected industry body. Only 19% are pure P2P.

With the architecture just described, developers will deliver the product back, because as you all know, uplink bandwidth is very valuable and very limited. In order to solve this problem, the developers will do some in-depth research and find that the WebRTC architecture in the field actually has a point in the middle, which is the server. The typical media server only does multi-channel streaming forwarding, and does not do background media processing and transcoding.

Both the Janus and Licode open source projects mentioned above are capable of forwarding media servers. Its feature is that each end user only needs to upload the stream all the way to the intermediate server, saving bandwidth. At the same time, SFU only does forwarding, so its impact on delay is relatively small. The disadvantage is that if there are two receivers with different downlink bandwidth capacities, one is 500K and the other is 2m. Since the source sender sends only one stream, it is difficult to adapt to a multi-channel terminal.

When you move to a pure forward media server, it still has some drawbacks, and developers will try to say, WELL, I’m adding mixed streaming to this node in the middle. We can also see from this architecture, in the server side after receiving two different video streams, will do decoding, splicing synthesis, coding. According to the bandwidth of different receivers, different profile video streams are sent to different receivers. This solves the problem that if multiple receivers are used, bandwidth adaptation cannot be achieved. This disadvantage is also obvious. Transcoding in the middle will inevitably bring delay, followed by the high cost of transcoding server, but it does save the downstream bandwidth.

After introducing several typical WebRTC system architectures, developers can easily build such a Demo system based on the several open source projects mentioned above, or it does not take too much time to build such a Demo system. Is the story ended here? It’s not even close. There are more hidden pits.

The technical difficulties behind it

The figure above is a relatively stable product made from a Demo.

The first is usability. WebRTC is based on P2P, and P2P availability and connection success rates have been criticized, not just for tunneling and traversing these technologies.

Platform interworking: The time of WebRTC is still limited. Companies in many fields have developed their own communication protocols, which are generally used in Native terminals, mobile phones, PC terminals, MAC terminals and Windows terminals. This also brings some problems. There’s also a lot of potholes in there. It’s an end-to-end optimized transport strategy. If you break it down end-to-end, your uplink is WebRTC, your sender is WebRTC, and your receiver is your own end, there’s a lot of policy matching to do.

Encoder selection: audio and video coding is very important in real-time communication. WebRTC video, vp8/9,H.264 support. One might choose H.264 because it is adaptable on mobile. However, H.264 is not very mature on Chrome, so many commercial products still use VP8, but when it comes to mobile interoperability, the choice of encoder has to be considered.

Weak network confrontation: WebRTC has its own transmission policies, such as packet loss and retransmission, FEC, bandwidth detection, and dynamic bit rate adjustment. However, in terms of weak network antagonism, as mentioned in the previous architecture diagram, we will add a forwarding node in the middle, so that end-to-end transmission link cannot be achieved. Therefore, weak net confrontation is a headache. How to optimize the uplink and downlink separately on the browser and forwarding server, there are also many difficulties to overcome.

Multi-user scenario: Like a typical small live scenario, there are many receivers and one sender. If the transmission strategy of pure WebRTC is used, the estimated downlink bandwidth between multiple receivers is different, and the requirements for the sender end and the adjustment of the transmitted bit rate are also different. If you have test experience will find that WebRTC do multi-person scenes, if the number of people on the receiving end is more than 4 or 5, it will send out the bit rate will be very low, so see the picture will be very paste.

Although the major browsers are beginning to support WebRTC, there are still many compatibility issues with the so-called support for WebRTC. The yellow in the figure above represents partial compatibility, where only Firefox supports better. At present, Safari is hot. We can see all kinds of technical difficulties here, and we have to face them when we make WebRTC in Demo production.

Intelligent routing: Optimized traffic between the browser and the server, but there is still a traffic between the server and the distributed server. This requires that vendors providing WebRTC services have a very good intelligent routing and optimization of transmission between background servers, such as cross-operator and transnational. High availability o&M, not to expand said, to ensure that the service is available, the service process can not be hung. Massive concurrency architectures, typically provided by WebRTC vendors, are a form of on-demand scaling, but also require that you design an architecture that can support massive concurrency scaling. There is also a global monitoring system where you can monitor the quality of service stability of your online run. Another very important issue is the survey tool, when you provide WebRTC real-time communication, to have the ability to investigate the problem. For example, when there is delay and delay in communication, what is the cause and what factors bring bad communication experience? This requires a very complete set of problem investigation tools.

Having said that, we can sum up two points, from a WebrTC-developed Demo to a truly mature product service, we must first have a professional team. This professional team should cover audio and video professionals and experts, as well as audio and video communication and video transmission experts, which requires strong industry experience. High-availability operation and maintenance, real-time monitoring and investigation tools all require the accumulated experience of real engineers and experts.

Finally, the SDK of WebRTC is introduced. Also introduced from several aspects: core quality, ease of integration, flexible extension of functions, service stability and global monitoring ability.

  • Core quality: Soundnet has a large network SD-RTN™, which can guarantee global transmission. The Web SDK uses a distributed gateway architecture that is deployed at the edge of a large network to improve availability.

  • Focus on communication: because of the distributed gateway architecture of sound Network, we can receive adaptation information from different ends and flexibly adjust the policies of each platform. As a result, we can do a relatively good job of monitoring each browser.

  • Flexible transport strategy: We transform the whole end-to-end WebRTC transport link into end-to-end gateway and gateway to end. In this, we also configured a lot of FEC, some policy optimization. At the same time, we can also multistream. Differential encoder selection: we can choose different encoders, according to the ability of the terminal for adaptation, at the same time take into account whether the end of the hardware codec, and software codec. These points add up to ensure that the SDK of our Sonnet Web can be upgraded to a high quality delivery service.

  • Fast integration: very convenient, only need four lines of code can access to our video communication channel, very convenient, generally 4, 5 minutes can complete a small program of audio and video chat. We also have a well-documented, very simple and readable Demo. Flexible extension of functions: traditional WebRTC is used for communication scenes, while our Web SDK currently supports live scenes, bypass stream pushing and server recording.

  • Global monitoring capability: We currently provide Dashboard and service quality reports, which can see the transmission quality, transmission effect, user access and other data within the channel. In addition, we also have global network indicators, including packet loss, delay, jitter information. The one on the right is part of the problem diagnosis system. Why are problem diagnosis systems important? When users access real-time communications, you have delays, jitter, we can know where there is a problem, combined with this information, we need to know very clearly what is wrong with the transmission channel of a particular channel on the line, where can be tuned.

Course preview: “Building web Real-time Video Call from 0”

  • Time: 16:00-17:00, November 29 (Wednesday)

  • Content:

    • Necessary background knowledge of communication and live broadcasting
    • Agora SDK API call logic
    • The development environment
    • Demo source code interpretation. Demo functions include creating a channel, adding a channel, leaving a channel, enabling or disabling the camera, mute the channel, and testing the channel.
    • How to test. What tests should be conducted for video call and live broadcast? How to conduct simple tests?
  • Participate in the way

    This course is live online.

    • First, visit the link below or click to read the original text to register and wait for the course to begin.
    • To enter the course, visit the link below at 3.45pm on November 29th
  • Sign up link

www.itdks.com/liveevent/d…