A detailed explanation of WebRTC foundation

preface

At the beginning of contact with WebRTC, it is easy to be confused by some obscure concepts. After a period of reading articles and practice, this article will sort out the basic knowledge of WebRTC entry in detail.

This post is synced to my Github

RTC

Real-time Communications (RTC) is short for real-time audio and video.

What is the WebRTC

Web Real-Time Communications (WebRTC), a part of RTC, is a real-time communication technology that allows Web applications or sites to establish peer-to-peer connections between browsers without resorting to intermediaries. Realize the transmission of video/audio streams or other arbitrary data.

Advantages and disadvantages of WebRTC

advantages

Cross-platform (Web, Windows, MacOS, Linux, iOS, Android)
Free, plug-in – free, installation – free, supported by mainstream browsers
Strong hole drilling capability, including STUN, ICE, TURN key NAT and firewall penetration technology

disadvantages

Lack of server solution design and deployment
Audio device adaptation problem

Application scenarios

Audio and video call
Video/teleconference
Remote Host access
Online education (live mic, screen recording, shared remote desktop)

Collect audio and video data

This method prompts the user to grant permission to use media input, which accesses the computer’s webcam and microphone, takes a MediaStream object, assigns the MediaStream to the page’s video TAB, and then sees and hears their own audio and video, and thus gets the audio and video stream.

async function createLocalMediaStream() {
  // (non-https /localhost) navigator. MediaDevices returns undefined
  const localStream = await navigator.mediaDevices.getUserMedia({
    video: true.audio: true,})document.getElementById('local').srcObject = localStream
}
Copy the code

Can refer to: MediaDevices getUserMedia ()

RTCPeerConnection

We call the two ends that initiate WebRTC communication as peers, namely peers. The so-called peer-to-peer communication means that the two clients are directly connected and do not need an intermediate server to send data. A successful connection is called a PeerConnection, and a WebRTC communication can contain multiple peerConnections.

RTCPeerConnection is an API for creating point-to-point connections, representing a WebRTC connection from the local computer to the remote. This interface provides implementations of methods to create, hold, monitor, and close connections.

Create two connection instances

const peerA = new RTCPeerConnection()
const peerB = new RTCPeerConnection()
Copy the code

API

pc.createOffer: Creates the offer method, which returns SDP Offer information
pc.setLocalDescriptionExample Set the description of the local SDP
pc.setRemoteDescription: Sets the description of the remote SDP, that is, the SDP information sent by the peer
pc.createAnswer: Creates an Answer method on the remote end. The method returns SDP Offer information
pc.ontrack: After the description of the remote SDP is set, this method is triggered to receive media streams from the remote SDP
pc.onicecandidate: This method is triggered after the local SDP description is set to open a connection and start media streaming
pc.addIceCandidate: Connection Adds the network information of the peer
pc.setLocalDescriptionLocalDescription is the SDP we need to send to the responder. This description specifies the properties of the local end of the connection, including the media format
pc.setRemoteDescription: Changes the connection-related description, which describes some properties of the connection, such as the decoder used by the peer

Basic flow of WebRTC audio and video communication

One party invokes getUserMedia to open the local camera
Media Negotiation (Signaling Exchange)
Establish communication

Media negotiation

Media negotiation refers to SDP exchange.

The figure above shows the general process of media negotiation:

The originator Amy creates the Offer and saves the Offer information by calling setLocalDescription, which is sent to the receiver Bob through the signaling server
After receiving the Offer information of Amy from the peer, Bob at the receiver calls the setRemoteDescription method to save it and create the Answer information. Similarly, the Answer message is also saved by setLocalDescription. And sent to the calling Amy through the signaling server
After receiving the Answer message from the peer Blob, Amy calls the setRemoteDescription method to save the Answer

When you first see this process, you may feel a little confused about what is going on. Here I will list some possible doubts to explain one by one:

1. Why do we need media negotiations?

The role of media negotiation is to enable both parties to find the media capability to support each other, so as to realize the audio and video communication between each other. For example, two people want to chat, one can only speak Chinese and likes to discuss front-end; One speaks only English and dislikes front-end technology; I found it impossible to chat by swapping profiles. But if there’s another person who speaks Chinese, who likes to study code and front-end technology, and you exchange information with the first person, it’s a good place to chat. So if the sender and the receiver can communicate,

2. What is the media negotiation doing?

Media negotiation is the process of exchanging SDP. The session initiator creates an offer and sends it to the receiver through the signaling server. The receiver creates an Answer and returns it to the sender to complete the exchange.

3. What is SDP?

Session Description Protocol (SDP) is a universal text-based Protocol. It is not a transport Protocol and relies on other transport protocols (such as RTP) to exchange media information.

The SDP describes multimedia sessions, including session declaration, session invitation, and session initialization. In layman’s terms, it can represent the capabilities of each side, recording information about your audio codec type, codec related parameters, transport protocol, etc.

When exchanging SDP, the communication parties will compare the received SDP with their own SDP and take out the intersection between them, which is the result of negotiation, namely, the audio and video parameters and transmission protocol used in the audio and video communication between the two parties.

4. What are offer and answer?

To establish point-to-point communication between the two parties, the SDP message sent by the sender is called Offer, and the SDP message sent by the receiver is called Answer

Therefore, offer and answer are essentially objects with SDP information, so they are also called SDP Offer and SDP Answer.

Print offer and answer information as shown below

5. Signaling and signaling server

Signaling usually refers to the control information transmitted between devices to coordinate the operation of various devices on a network.

For WebRTC communication, the sender sends the Offer SDP and the receiver receives the Answer SDP. How to send the Offer SDP to the other side? This process also requires a mechanism to coordinate communication and send control messages, a process called signaling.

The signaling server, called the signaling server, acts as a middleman to help establish the connection and is responsible for:

Signaling processing, such as media negotiation message delivery
Manage room information. For example, when the user connects, the signaling server tells the signaling server its own room number, and the signaling server finds the peer that is also in the room number and tries to communicate, notifies the user who has joined the room and left the room, notifies the user whether the room is full, etc., so it is also called signaling server or room server.

WebRTC does not specify which implementation the signaling must use. Currently, WebSocket + JSON/SDP is widely used in the industry. WebSocket provides the signaling transmission channel, and JSON/SDP encapsulates the signaling content.

ICE

After the media negotiation is completed, WebRTC begins to establish network Connectivity, a process known as Interactive Connectivity Establishment (ICE).

ICE is not a protocol and combines the framework of STUN and TURN protocols (for NAT penetration).

ICE is started by calling setLocalDescription() at each end. The operation is as follows:

Collect Candidate
Exchange Candidate
Try to connect by priority

Having said some of the more obscure concepts mentioned above, let me explain what is involved

1. What is Candidate?

For example, if you want to use a socket to connect to a server, you must know some basic information about the server, such as the SERVER IP address, port number, and transport protocol used. Only with this information can you establish a connection to this server. A Candidate is the basic information WebRTC uses to describe the remote it can connect to, so a Candidate is a set of information including at least IP address, port number, and protocol.

2. Collect Candidate

There are three types of ICE candidates in WebRTC:

Host candidate: Indicates the IP address and port number of the NIC. This parameter is obtained through the device network adapter and has the highest priority. At the WebRTC base, the first attempt is to establish a connection within a local area network.
Reflection candidate: Indicates the external IP address and port after NAT. The IP address and port are obtained by the ICE STUN (STUN) server. Based on the returned information of the server, the reflection candidate determines and knows its own address on the public network. The WebRTC has a lower priority than the host candidate. If the local connection fails, the WebRTC attempts to connect to the host by reflecting the IP address and port obtained by the candidate.
Relay candidate: Indicates the forwarding IP address and port of the TURN server, which is provided by the ICE relay server. The lowest priority, if the first two fail, will be used this way.

When creating an RTCPeerConnection, you can specify the ICE server address in the constructor, otherwise it means that the connection can only be made on the Intranet.

Each time WebRTC finds/collects an available Candidate, an ICecandidate event is triggered. In order to exchange the collected Candidate to the peer, we need to set a callback function to the onicecandiDate method. The addIceCandidate method is called inside the addIceCandidate function to add the candidate to the communication.

With this callback function, we can get all of the candidates collected by WebRTC. You can also send the collected candidates to the peer in this function.

peer.onicecandidate = (event) = > {
  if (event.candidate) {
    // ...}}// Called after receiving candidate information from the signaling server to add ICE proxy to the local machine
peer.addIceCandidate(candidate)
Copy the code

3. The exchange of the Candidate

After WebRTC collects good candidates, it sends them to the peer through the signaling system. Upon receiving these candidates, the peer side will form a CandidatePair with the local Candidate (that is, connect the Candidate pair). With CandidatePair, WebRTC can start trying to establish a connection. It is important to note that the exchange of candidates does not take place after all candidates have been collected, but rather as they are collected.

CandidatePair: one local Candidate, one remote Candidate

Once WebRTC has formed CandidatePair, an attempt is made to connect. Once WebRTC finds that one of the CandidatePair is reachable, it does not proceed with subsequent connection attempts, but the exchange can continue when a new Candidate is discovered.

4. What is NAT?

NAT: network address translation (NAT) is a technology that allows devices on private networks to connect to the public network.

I won’t write here. There are easy-to-understand articles explaining NAT in the community, and I recommend that you read them before you continue:

What exactly is a “NAT” in a home network?

4. What is STUN?

STUN (Session Traversal Utilities for NAT). It allows clients behind a NAT (or multiple NAts) to find their own public IP addresses and ports. This is also known as “tunneling “/”NAT tunneling “/”NAT traversal”.

To put it more bluntly, a STUN server is used to obtain a computer’s public IP address.

Google provides a site to test the STUN/TURN service, and you can test the corresponding STUN service.

How to understand the hole? If you want to climb a mountain, you have to follow the mountain path from one foot to the other. But if there is a tunnel directly through the mountain, it is equivalent to two points (the foot of the mountain) directly connected, this is “hole”, to achieve point-to-point.

From this point of view, the mechanism for making holes is called ICE, and the server that helps make holes is called STUN service.

Before two users communicate with each other, they first send a request to the STUN service of the public network to obtain their own public addresses, and then forward their public addresses to the peer through the server. In this way, the two users know the public addresses of each other and can communicate with each other directly based on the public addresses.

This can be configured using the SUTN server provided by Google

const config = {
  iceServers: [{urls: 'stun:stun.l.google.com:19302',},],}const peer = new RTCPeerConnection(config)
Copy the code

When the STUN service encounters symmetric NAT, the hole fails, and the TURN service is required.

5. What is “TURN”?

TURN(Traversal USing Replays around NAT) is an extension of STUN.

If STUN fails to allocate public IP addresses, it can use the TURN server to request a public IP address as a relay address. Media data can be transferred through the TURN server and used as a relay when peer connection fails. This is the final alternative.

The purpose is to solve the problem that symmetric NAT cannot be traversed. Unlike other relay protocols, it allows clients to communicate with multiple peers simultaneously using a single relay address. This perfectly compensates for the STUN’s inability to traverse symmetric NAT.

Unlike STUN servers, TURN servers serve as intermediaries and consume a lot of bandwidth to forward multimedia data.

ICE hole related process, developers only need to configure STUN/TURN corresponding address, in the corresponding function call can be, WebRTC in the bottom to help us to achieve.

The simplest version of point-to-point communication practices

The STUN/TURN service is not required if the local LAN is used for communication.

In the local simulation of the simplest point-to-point communication in the browser, we directly create two peers at the same time with the local code, so that no signaling service is required to transmit SDP information, less than 50 lines of JS code can achieve the following effect:

index.html

<! DOCTYPEhtml>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
    <title>Browser local same-page analog point-to-point connection (no Signaling service Edition)</title>
    <style>
      video {
        width: 300px;
        height: 250px;
      }
    </style>
  </head>
  <body>My local video:<video id="local" autoplay></video>Remote access to my local video<video
      id="local-from-remote"
      autoplay
    ></video>

    <script src="./index.js"></script>
  </body>
</html>
Copy the code

let localStream

const createLocalMediaStream = async () => {
  localStream = await navigator.mediaDevices.getUserMedia({
    video: true.audio: false,})document.getElementById('local').srcObject = localStream
}

const createPeerConnection = async() = > {const peerA = new RTCPeerConnection()
  const peerB = new RTCPeerConnection()

  // Add a local media stream
  localStream.getTracks().forEach((track) = > {
    peerA.addTrack(track, localStream)
  })

  // Listen for ICE candidate events
  peerA.onicecandidate = (event) = > {
    if (event.candidate) {
      peerB.addIceCandidate(event.candidate) // Set the ICE candidate}}// Listen to get media data (if peerA has added media stream data)
  peerB.ontrack = (event) = > {
    document.getElementById('local-from-remote').srcObject = event.streams[0]}/** * Media negotiation (exchange SDP) */
  const offer = await peerA.createOffer()
  await peerA.setLocalDescription(offer)

  await peerB.setRemoteDescription(offer)
  const answer = await peerB.createAnswer()
  await peerB.setLocalDescription(answer)

  await peerA.setRemoteDescription(answer)
}

const main = async() = > {await createLocalMediaStream()
  await createPeerConnection()
}
main()
Copy the code

Code full address

If you feel OK, you can try the signaling version (Socket + room management) implementation,

Client: webrtc-client
Signaling server: webrTC-server

conclusion

This paper gives a general explanation of the WebRTC process, but gives a specific explanation of the relevant concepts. It is suggested that you can read the article again in the learning process to make it clearer.

The resources

WenRTC website
WebRTC Audio and Video Real-time Interaction Technology
Build a WebRTC application from zero to one
Why WebRTC | Detailed explanation of the working principle of “shallow into deep out”
WebRTC: A simple example of video chat
With WebRTC, live streaming can be played like this!
Front-end audio and video WebRTC real-time communication core
# WebRTC: how two browsers agree on voice and video calls