🌺 [WebRTC principle exploration] The general trend, towards understanding WebRTC first step

Daily sentence

Life challenges, everywhere, full of confidence, travel light, tomorrow is always full of hope of the battlefield.

In the above

After introducing the concept and development process of relevant WebRTC technology based on the above content, we begin to preliminarily explore the functions and principles of relevant WebRTC technology. 🌺 [WebRTC principle exploration] The future can be expected, the birth and development of WebRTC

Technology review

WebRTC concept definition

WebRTC, short for Web Real-Time Communication, is a technology that enables Web browsers to conduct real-time voice or video conversations. Is a technology that Google acquired in 2010 when it bought Global IP Solutions for $68.2 million.

WebRTC functional category

WebRTC is an open source project designed to enable browsers to provide a simple JavaScript interface for real-time communication (RTC).
WebRTC can not only transmit video, but also transmit other data such as text, pictures, etc. It is important to note that WebRTC is not a subset of browsers, which implement WebRTC’s native interface according to the standard WebRTC protocol. WebRTC is also supported on Android and IOS.

WebRTC applications include the following four main concepts

Signalling Servers
ICE Server (ICE Servers)
Media Servers
JavaScript interface (JavaScript API)

WebRTC has been incorporated into the HTML5 standard

Currently, browsers that support WebRTC include Chrome, Firefox, Opera, and Internet Explorer

WebRTC does not specify specific signaling protocols, which are left to the application implementation.
WebRTC uses the JSEP protocol to establish sessions
WebRTC uses ICE to implement NAT traversal.
WebRTC clients can transfer media point-to-point.

Core components of WebRTC

Audio and video engines: OPUS, VP8/VP9, H264
Transport layer protocol: The underlying transport protocol is UDP
Media protocol: SRTP/SRTCP
Data protocol: DTLS/SCTP
P2P Intranet penetration: STUN/TURN/ICE/Trickle ICE
Signaling and SDP negotiation: HTTP/WebSocket/SIP, Offer Answer model

The following diagram shows a simplified internal structure of WebRTC:

At the bottom is the hardware.
Above are the audio capture module and video capture module.

The middle part is the audio and video engine:
- The audio engine is responsible for audio acquisition and transmission, with noise reduction, echo cancellation and other functions.
- Video engine is responsible for network jitter optimization and Internet transmission codec optimization.

On top of the audio and video engine is a set of C++ apis, and on top of the C++ API is the Javascript API for the browser.

JSEP

JSEP (JavaScript Session Establishment Protocol) is a signaling API that allows developers to build more powerful applications and increase flexibility in signaling Protocol selection.
What JSEP does is provide an interface such as createOffer() for web applications to call to generate SDP on the one hand and an ICE functional interface on the other. All these functions are realized by the browser, which uses Websocket for WebRTC signaling (Offer/Answer).
- The key to establishing a session is media negotiation. Although WebRTC does not specify a specific signaling protocol, media negotiation uses SDP protocol.
- If the Web application uses JSEP without additional signaling protocols, it is also possible to establish a link between two WebRTC clients (the same WebRTC client, both login), as long as the application can parse the Offer/Answer message sent by WS. Extract the SDP and ICE information from it.
Codelabdemo on Github directly uses JSEP to generate offer/ Answer signaling without other signaling protocols, and then uses WS protocol to transmit.
JSEP is not a signaling protocol, so signaling protocols such as SIP can be introduced on the basis of JSEP to complete the application functions of WebRTC.

Signaling server

A signaling server is used to exchange information between two users. Although WebRTC is point-to-point communication, the server is required to initiate the connection and pass some information.

WebRTC does not define the protocol used to establish the signaling of the channel, so any transport mode can be used, such as WebSocket, XMPP, SIP, AJAX.

You can use real-time transport protocols such as WebSocket to exchange data, or you can use simple GET/POST polling to retrieve data from the server.

The signaling server transmits the following data

Negotiate media functions and Settings
Identify and verify the identity of the session participant
Controls media sessions, indicates progress, changes sessions, and terminates sessions

There are only the necessary functions of the first one. Everything else is free to be adjusted according to business needs.

SDP protocol

The most important function of media negotiation is the exchange of session description protocol (SDP) between two browsers participating in point-to-point communication.
“SDP” contains all the information required to configure the browser’s RTP media stack, including the media type (audio, video, data), the codec required, parameters or Settings for the codec, and information about bandwidth.

In addition, signaling channels are used to exchange candidate addresses for ICE drilling.

Signaling interworking scheme

The WebRTC communicates with SIP properly

For WebRTC to communicate with SIP, there are two layers to solve: the signaling layer and the media layer.

The two networks use different signaling mechanisms. Therefore, signaling must be converted to complete media negotiation and establish sessions. The media layer should complete the coding conversion, as well as RTP/SRTP conversion and other functions.

The communication at the signaling level is discussed here.

There are two solutions for SIP and WebRTC signaling communication:

JavaScript implements the SIP protocol stack on which WebRTC applications are developed. The signaling sent by the WebRTC Client is SIP signaling, but generally uses WebSocket as the signaling transmission protocol.
- This allows WebRTC clients to register directly with A WS-enabled SIP Server. Both JSSIP and SIPML5 are such solutions.
Through the conversion gateway to achieve protocol conversion, so as to communicate. An open source gateway project is WebRTC2SIP.
- WebRTC2SIP is a very complete function of the gateway, both to achieve the signaling layer, but also to achieve the media layer, the encoding conversion function is very powerful, but also can be directly as a media gateway, used for decoding, communication media at both ends.

ICE server

The key to point-to-point communication is that two browsers can directly send and receive data packets. However, generally, both browsers and mobile phones access the Internet through routers, so network address translation (NAT) exists.
The IP address in the NAT is private and cannot be accessed externally. The IP address assigned to the NAT is the public address. NAT uses a public address every time it forwards a packet from inside to outside.
Interactive Connection Establishment (ICE) is a standard penetration protocol that utilizes STUN and TURN servers to establish connections.
The STUN server traverses the NAT and obtains candidate addresses of the browser, including private addresses and public IP addresses of the outer NAT.
Communication signaling channels can exchange candidate addresses, and once the browser sends and receives the candidate address, it starts a connection check, and if the check succeeds, it uses the candidate to send media.
In most cases, a direct peer connection can be established through penetration. However, if the NAT or firewall restrictions are too severe to establish a connection, the media can only be relented through the TURN server.

Media server

A media server is not required, but can be considered for multi-party sessions or for additional media processing. For meetings with multiple browsers, a centralized media server can be used. In this case, American browsers only need to establish a single connection to the media server. The advantage of this structure is that it can extend very large sessions while minimizing the amount of processing required by American browsers when new entrants join the session. Meanwhile, the media server can also analyze, process and save the media.

JavaScript interface

getUserMedia

You can retrieve video or audio data by calling navigator.getUserMedia(), and the Constraints parameter optionally retrieves video or audio. Here is a simple example

var constraints = {
  audio: false.video: true
};
var video = document.querySelector('video');

function successCallback(stream) {

  if (window.URL) {
    video.src = window.URL.createObjectURL(stream);
  } else{ video.src = stream; }}function errorCallback(error) {
  console.log('navigator.getUserMedia error: ', error);
}


navigator.getUserMedia(constraints, successCallback, errorCallback);

Copy the code

RTCPeerConnection

RTCPeerConnection is one of the most important interfaces in WebRTC. It is used to determine ICE server and exchange SDP. The connection process is as follows:

Create an RTCPeerConnection object

The RTCPeerConnection parameter is used to determine the ICE server. Here is a STUN server using Google open

let iceServer = {
    "iceServers": [{
        "url": "stun:stun.l.google.com:19302"}};let pc = new RTCPeerConnection(servers);
Copy the code

Banishes the media into the RTCPeerConnection object

pc.addStream(localStream);
Copy the code

Exchange SDP descriptors through offer and Answer

A and B establish a PC instance respectively
- A establishes an OFFER signaling containing a’s SDP descriptor through the createOffer() method provided by the PC.
- A passes its SDP descriptor to its PC instance through the setLocalDescription() method provided by the PC.
User A sends the Offer signaling to user B through the server
- User B extracts the SDP descriptor contained in the Offer signaling of User A and delivers it to user B’s PC instance through the setRemoteDescription() method provided by the PC
- User B establishes an SDP descriptor answer signaling including user B through the createAnswer() method provided by the PC
- User B uses the setLocalDescription() method provided by the PC to deliver user B’s SDP descriptor to the PC instance of user B
User B sends the Answer signaling to User A through the server

After receiving the Answer signaling from B, A extracts THE SDP descriptor of B and calls setRemoteDescripttion() to a’s PC instance.

ICE to make hole

When network candidates are available, they are sent to the peer’s browser via the signaling server

pc.onicecandidate = function(event) {
  if (event.candidate) {
    sendToServer(event.candidate)
  }
};
Copy the code

When a network candidate is received, join it

let candidate = new RTCIceCandidate(candidate);
pc.addIceCandidate(candidate);
Copy the code

Monitor whether the media sent by the peer party is available and play the media

pc.onaddstream = event= > {
  remoteVideo.src = window.URL.createObjectURL(event.stream);
}
Copy the code

RTCDataChannel

RTCDataChannel is part of the RTCPeerConnectionAPI and a data channel can only be created after an instance of RTCPeerConnection has been created.

Data channels can be used to send text or files.

pc = new RTCPeerConnection();
dc =  pc.createDataChannel('dc');
dc.onmessage = event= > console.log(event.data);
dc.send('text');
dc.sed(new arraybuffer(32))
Copy the code

On the other end you can use onDatachannel to get the RTCDataChannel object

pc.ondatachannel = event= > dc = event.channel;
Copy the code