introduce

WebRTC is an open source project currently under development that aims to provide real-time, point-to-point communication between Web applications.

WebRTC provides a simple JavaScript API that helps developers easily build Web applications with real-time audio and video transmission capabilities. WebRTC also plans to support native apps for mobile phones. Under the API provided by WebRTC, the underlying implementation principle of WebRTC is hidden, so it is necessary to understand the underlying implementation of WebRTC in addition to using the API.

This article is ideal for those who are new to WebRTC, especially those who are not familiar with how WebRTC works. In order to make it as understandable as possible, we will use simple terms and analogies to explain the underlying principles of WebRTC in detail.

start

To establish a WebRTC connection between PARTY A and Party B, perform the following two steps:

  1. Find each other’s position.

  2. Notify the other party to set up a WebRTC connection.

Step 1: Find the person

The process of finding the other party in WebRTC is similar to that of making a phone call. When you need to talk to someone on the phone, you need to enter the number of the other party before you can contact the person. The same process is needed when someone wants to call you. We use the phone number as the identity of the user when making a phone call, and then further use that identity in the telecommunications system to locate the user.

However, Web applications cannot dial and call each other. There are many browsers in the world, and several browsers can exist in a system at the same time. The browser does not have a unique ID like a phone number. Although there is no unique ID, the system where the browser is located has a unique ID, that is, an IP address, which can be used for location.

But the process is not so easy. For the most part, these systems sit behind network address translation (NAT) devices. NAT devices are required for security and IPv4 restrictions on available public IP addresses. The NAT device assigns dedicated IP addresses to systems on the local network. This private IP address is valid and visible only on the local network, and cannot be used to receive communications from the outside world, because the system outside the network does not know the public IP of the device inside the network.

Because the NAT device is involved, the peer does not know its own public IP address because it is masked by the private IP address assigned by the NAT. Therefore, it cannot share its public IP address with another peer to accept connections. More intelligently, if you want someone to call you, you need to give your phone number to someone else. However, in the presence of NAT, it is like staying in a hotel where the phone number of the room is hidden from the outside world, and incoming calls to the hotel are handled at reception and further redirected to your room upon request. This indirect form of connection is not used for peer-to-peer connection techniques.

To overcome this problem, we use a protocol called ICE (Interactive Connection Establishment). ICE’s job is to find the best path to connect two peers. ICE can perform direct connections, in the absence of NAT, and indirect connections, in the presence of NAT. The ICE framework provides us with ICE candidates. An ICE candidate is simply an object that contains our own public IP address, port number, and other connection-related information.

In the absence of NAT, ICE is very simple because the public IP address of the peer is always available. However, in the presence of NAT, ICE relies on entities called session traversal utilities for NAT (STUN) and/or traversal using relays around NAT (TURN).

STUN servers are only involved in the process of looking up public IP addresses. Once a WebRTC connection is established, all further communication is conducted through WebRTC. However, in the case of TURN, the TURN server is required even after the WebRTC connection is set up.

But because of STUN’s limitations, we have to rely on it. STUN servers have a success rate of only 86%.

Step 2: Notify the peer to set up the WebRTC connection

Now that we have ICE candidates, the next step is to send these candidates to our peers we wish to connect with. With the candidate, session descriptions such as session information, time description, and media description are sent. ICE candidates and session descriptions are bundled within objects and transmitted using SDP (Session Description Protocol). In some cases, ICE candidates are not bundled with the session description in the same object, but are sent separately, which is called Trickle ICE.

When establishing a connection, we need to “send” information to other peers. But how do we transfer candidates and session descriptions when we only know the IP address of the sender and not the IP address of the receiving peer? Since WebRTC connections have not yet been established, through what medium does this information travel?

The answer to all these questions lies in a concept called signalling. Before establishing a WebRTC connection, we need some medium to transfer the above information between peers and let them know how to locate and connect WebRTC connections. This is where the signaling mechanism comes in. As the name implies, the signaling mechanism exchanges connection signals (ICE candidate, session description, etc.) between the two peers intended to connect.

WebRTC does not define any standards for implementing this signaling mechanism, but lets developers create a mechanism of their choice. Signaling mechanisms for exchanging information can be implemented by simply copying and pasting information to peers or by using communication channels such as WebSockets, socket. IO, Server Side Events, etc. In short, the signaling mechanism is just a pattern. Exchange connection-related information between peers so that peers can recognize each other and begin further communication using WebRTC.

Quickly review

Let’s review the whole process again to better understand it.

If peer A wants to establish A WebRTC connection with peer B, perform the following operations:

  1. Peer A uses ICE to generate its ICE candidates. In most cases, it requires NAT’s (STUN) session traversal utility or NAT’s (TURN) server traversal using relay.
  2. Peer A binds the ICE candidate and session description to an object. This object is stored as A local description (the connection information of the peer) in peer A and sent to peer B through the signaling mechanism. This part is called an offer.
  3. Peer B receives the proposal and stores it as a remote description (connection information for the peer on the other end) for further use. Peer B generates its own ICE candidate and session descriptions, stores them as local descriptions, and sends them to peer A via A signaling mechanism. This part is called the answer. (Note: As mentioned earlier, ICE candidates in steps 2 and 3 can also be sent separately.)
  4. Peer A receives the answer from peer B and stores it as A remote description. In this way, both peers have connection information for each other and can successfully start communicating via WebRTC!

Agora SDK experience essay contest essay | the nuggets technology, the campaign is underway