1. What is WebRTC

Short for Web Real-Time Communication, it is an API that enables Web browsers to conduct real-time voice or video conversations. It became open source on June 1, 2011 and was incorporated into the World Wide Web Consortium’s W3C recommendation with support from Google, Mozilla, and Opera.

Composition of 1.1

  • VideoEngine
  • Sound Engine
  • Session Management
  • ISAC: Sound compression
  • VP8: Video codec for Google’s own WebM project
  • APIs (Native C++ API, Web API)

1.2 important API

The WebRTC native APIs are written based on the WebRTC specification. These APIs can be divided into Network Stream API, RTCPeerConnection API and Peer-to-peer Data API.

1.2.1 Network Stream API

  • MediaStream: MediaStream is used to represent a media data stream.
  • MediaStreamTrack represents a media source in the browser.

1.2.2 RTCPeerConnection

  • RTCPeerConnection: An RTCPeerConnection object allows users to communicate directly between two browsers.
  • RTCIceCandidate: indicates an ICE protocol candidate.
  • RTCIceServer: indicates an ICE Server

1.2.3 Peer – to – Peer Data API

  • DataChannel: DataChannel an interface represents a bidirectional DataChannel between two nodes

2. The architecture

2.1 Description of Color Labels

  • The purple is the Web developer API layer;
  • The solid blue line is the API layer for browser vendors
  • The dotted blue line is implemented by the browser vendor

2.2 Architecture Components

2.2.1 Your Web App

  • Web developers can develop real-time communication applications based on video and audio based on Web API provided by integrated WebRTC browsers.

2.2.2 Web API

The WebRTC standard API (Javascript) for third-party developers makes it easy for developers to develop web applications similar to web video chat. The latest standardization process can be seen here.

These apis can be divided into Network Stream API, RTCPeerConnection API and peer-to-peer Data API. You can see the detailed API description here. Network Stream API

  • MediaStream: MediaStream is used to represent a media data stream.
  • MediaStreamTrack represents a media source in the browser.
  • RTCPeerConnection
  • RTCPeerConnection: An RTCPeerConnection object allows users to communicate directly between two browsers.
  • RTCIceCandidate: indicates an ICE protocol candidate.
  • RTCIceServer: indicates an ICE Server.
  • Peer-to-peer Data API
  • DataChannel: DataChannel an interface represents a bidirectional DataChannel between two nodes.

2.2.3 WebRTC Native C++ API

The native C++ API layer makes it easy for browser manufacturers to implement the WebRTC standard Web API and abstract the digital signal process.

2.2.4 Transport/Session

  • Transport/session layer
    • The session layer component uses part of the libjingle library to implement, without using XMPP/Jingle protocol
  • A. RTP Stack
    • Real Time Protocol
  • b. STUN/ICE
    • Call connections between different types of networks can be established through STUN and ICE components.
  • c. Session Management
    • An abstract session layer that provides session establishment and management capabilities. This layer protocol is left to the application developer to customize the implementation.

2.2.5 VoiceEngine

Audio engine is a framework that contains a series of audio multimedia processing, including the whole solution from video capture card to network transmission end.

PS: VoiceEngine is one of WebRTC’s most valuable technologies and is open source after Google acquired GIPS. VoIP is a technology industry leader, more on that in future articles

  • a. iSAC
    • Internet Speech Audio Codec
    • Wideband and ultra-wideband audio codecs for VoIP and audio streaming are the default codecs for the WebRTC audio engine
    • Sampling frequency: 16khz, 24khz, 32khz; (default: 16khz)
    • The adaptive rate ranges from 10kbit/s to 52kbit/s.
    • Adaptive packet size: 30~60ms;
    • Algorithm delay: Frame + 3ms
  • b.iLBC
    • Internet Low Bitrate Codec
    • Narrowband voice codec for VoIP audio streams
    • Sampling frequency: 8khz;
    • The 20ms frame bit rate is 15.2 KBPS
    • 30ms frame bit rate is 13.33 KBPS
    • Standards are defined by IETF RFC3951 and RFC3952
  • c.NetEQ for Voice
    • Speech signal processing components for audio software implementation
    • NetEQ algorithm: adaptive jitter control algorithm and speech packet loss hiding algorithm. Enables it to adapt to the changing network environment quickly and with high resolution, ensuring beautiful sound quality and minimum buffering delay.
    • GIPS is the world’s leading technology to effectively deal with the impact of network jitter and voice packet loss on voice quality.
    • PS: NetEQ is also a very valuable technology in WebRTC. It has obvious effect on improving the quality of VoIP. It is better to integrate AEC, NR, AGC and other modules.
  • d.Acoustic Echo Canceler (AEC)
    • Echo canceller is a software-based signal processing component that can remove the echo collected by MIC in real time.
  • e.Noise Reduction (NR)
    • Noise suppression is also a software-based signal processing element for eliminating certain types of background noise associated with VoIP (hiss, fan noise, etc… …).

2.2.6 VideoEngine

  • WebRTC video processing engine
    • VideoEngine is a series of video processing framework, from camera capture video to video information network transmission and video display the whole process of the solution.
  • a. VP8
    • The video image codec is the default codec for the WebRTC video engine
    • VP8 is suitable for real-time communication applications because it is primarily a codec designed for low latency.
    • PS: The VPx codec was open-source after Google acquired ON2, and VPx is now part of the WebM project, which is
    • One of the HTML5 standards Google is committed to promoting
  • b. Video Jitter Buffer
    • Video jitter buffer reduces the impact of video jitter and video packet loss.
  • c. Image enhancements

Image quality enhancement module * processes the images collected by the webcam, including brightness detection, color enhancement, noise reduction and other functions, to improve the video quality.

3. The video

  • The video part of WebRTC includes collection, codec (I420/VP8), encryption, media files, image processing, display, network transmission and flow control (RTP/RTCP) and other functions.

3.1 Video Capture: Video_Capture

  • The source code is in webrtc\modules\video_capture\main, which contains the source code of the interface and each platform.
  • On Windows platform, WebRTC uses Dshow technology to enumerate video device information and video data collection, which means that most video collection devices can be supported; There’s nothing to do with video capture cards that require a separate driver, such as the Hikon HD card.
  • Video capture supports multiple media types, such as I420, YUY2, RGB, UYUY, etc., and can control frame size and frame rate.

3.2 Video codec –video_coding

  • The source code is in the webrtc\modules\video_coding directory.
  • WebRTC uses I420/VP8 codec technology. VP8 is an open source implementation of Google’s acquisition of ON2 and is also used in WebM projects. VP8 provides higher quality video with less data, making it particularly suited to requirements such as video conferencing.

3.3 Video Encryption –video_engine_encryption

  • Video encryption is a part of the Video_engine of WebRTC, which is equivalent to the function of video application layer. It provides data security for both sides of the point-to-point video and prevents video data leakage on the Web.
  • Video encryption encrypts and decrypts video data at both the sender and the receiver. The key is negotiated by both sides of the video, but the performance of video data processing is affected. You can also do without video encryption, which is better for performance.
  • The data source of video encryption may be the original data stream or the encoded data stream. It is estimated to be the encoded data stream, so that the encryption cost will be smaller, which needs further study.

3.4 Video media file –media_file

  • The source code is in the webrtc\modules\media_file directory.
  • The function is to use local files as video sources, somewhat similar to the function of virtual cameras; The supported formats are Avi.
  • In addition, WebRTC can also record audio and video to local files, more practical functions.

3.5 Video image processing –video_processing

  • The source code is in the webrtc\modules\video_processing directory.
  • Video image processing for each frame of the image processing, including brightness detection, color enhancement, noise reduction and other functions, to improve the quality of video.

3.6 Video display –video_render

  • The source code is in the webrtc\modules\video_render directory.
  • On Windows platform, WebRTC uses Direct3D9 and DirectDraw to display video, only this way, must be this way.

3.7 Network transmission and flow control

  • For network video, data transmission and control is the core value. WebRTC uses mature RTP/RTCP technology.

4 audio

WebRTC audio parts, including equipment, decoding (iLIBC/iSAC/G722 / PCM16 / RED/AVT, NetEQ), encryption, sound files, voice processing, sound output, volume control, audio and video synchronization, network transmission and flow control (RTP/RTCP), and other functions.

4.1 Audio Device — Audio_device

  • The source code is in webrtc\modules\ Audio_device \main, which contains the source code for the interface and each platform.
  • On Windows, WebRTC uses Windows Core Audio and Windows Wave technologies to manage Audio devices, and also provides a mix manager.
  • Using audio equipment, you can achieve sound output, volume control and other functions.

4.2 Audio codec — Audio_coding

  • The source code is in the directory webrtc\modules\ Audio_coding.
  • WebRTC using iLIBC/iSAC/G722 PCM16 / RED/AVT codec technology.
  • WebRTC also offers NetEQ, a jitter buffer and packet loss compensation module that improves sound quality and minimizes latency.
  • Another core feature is voice conference-based mixing.

4.3 Voice Encryption: voice_engine_encryption

  • Like video, WebRTC offers voice encryption.

4.4 Sound Files

  • This feature allows you to use local files as audio sources in Pcm and Wav formats.
  • Also, WebRTC can record audio to local files.

4.5 Sound Processing — Audio_processing

  • The source code is in the webrtc\modules\ Audio_processing directory.
  • Sound processing Audio data is processed, including echo cancellation (AEC), AECM(AEC Mobile), automatic gain (AGC), Noise reduction (NS), and mute detection (VAD) to improve sound quality.

4.6 Network Transmission and flow control

Like video, WebRTC uses mature RTP/RTCP technology.

5. Several streaming video formats

agreement httpflv rtmp hls dash webscoketflv webRtc rtsp
transport HTTP transport TCP transport HTTP transport HTTP transport Webscoket transmission Webscoket transmission HTTP transport
Video package format flv flv tag M3u8 Ts file Mp4 3gp webm flv SDP
Time delay low low high high low low
The data segment Continuous flow Continuous flow Continuous flow Slice file Continuous flow
HTML 5 play Unpack playback via HTML5 (flv.js) Does not support Html5 unpack playback (hls.js) If dash file list is mp4 webM file, can play directly Unpack playback via HTML5 (flv.js)

6.