“This is the 29th day of my participation in the Gwen Challenge in November. See details of the event: The Last Gwen Challenge in 2021”

This is the third part of RTC series. The last part introduced the local connection process of RTCPeerConnection. If it is connected with remote users, the exchange between SDP and ICE needs to be transmitted through the network. All it does is transmit information before it connects. As BOTH SDP and ICE are simple string data, the signaling service can be used as a signaling server as long as it can transmit string data. Any server that meets the requirements can be used as a signaling server. We can build our own signaling service as required, using Websocket, HTTP, SIP and other protocols.

Server:

Because it is easy to handle two-way communication using socket. IO, this article uses socket. IO as an example to show a simple P2P signaling service implementation, as well as other protocols.

In order to communicate with other users, users first need to know who the other user is. Therefore, the signaling server needs a room-like mechanism, the specific form depends on the business scenario, it may be a chat room or a square, in short, users need to be connected on the network. Here we implement a simple user-list system:

io.on("connection".async (socket) => {
  const userList = [...(await io.allSockets())];
  io.emit("user-list", userList);
});
Copy the code

When user A initiates A connection to user B, user A only needs to send the content previously sent to user B to the server and tell the server to send the message to user B. Similarly, user B only needs to send the message to the server and specify the target to user A. The server simply forwards the content, without understanding it at all:

socket.on("message".(msg) = > {
  consttarget = io.sockets.sockets.get(msg.target); target? .emit("message", { from: socket.id, data: msg.data, type: `on-${msg.type}` });
});
Copy the code

This implements a server. Let’s look at the client implementation.

The client implemented:

The biggest change on the client side compared to the previous local demo is that the sender and receiver are separated, so we need to separate the two logic. Now let’s look at the SDP interaction flow:

  1. A createOffer offerA
  2. A setLocalDescription offerA
  3. A sends offerA to B
  4. B setRemoteDescription offerA
  5. B createAnswer AnswerB
  6. B setLocalDescription AnswerB
  7. B sends AnswerB to A
  8. A setRemoteDescription AnswerB

Here, A and B are on two different pages, and they do not communicate directly. The signaling server is added in the middle, and the interaction flow becomes like this:

  • The sender:

    • createOffer
    • setLocalDescription
    • Send an offer to the server
    • Receive the answer of the server
    • setRemoteDescription
  • Receiver:

    • Accept the offer of the server
    • setRemoteDescription
    • createAnswer
    • setLocalDescription
    • Send the answer to the server

We first implement a sending client. As the sending client is the active initiator, it needs to know the sending target. Here, a simple user-list mechanism is used as a demonstration to send a user when clicking a user in the user list:

userWrapper.addEventListener('click'.async (e) => {
	targetId = e.target.innerText;
	if(targetId ! == socket.id) {if (targetId) {
			const localStream = await navigator.mediaDevices.getUserMedia({ video: true.audio: true });
			localVideo.srcObject = localStream;
			createPeerConnection();
			localStream.getTracks().forEach(track= >pc.addTrack(track, localStream)); }}});Copy the code

Create an RTCPeerConnection in createPeerConnection. You can create an offer in a Negotiationneeded event callback and send it when listening for icecandiDate:

function createPeerConnection() {
    pc = new RTCPeerConnection(pcConfig);
    pc.addEventListener('negotiationneeded'.async() = > {const offer = await pc.createOffer();
    await pc.setLocalDescription(offer);
    socket.emit('message', {
        type: 'offer'.target: targetId,
        data: { offer }
    });
  });
}
Copy the code

The server is waiting for a reply.

case 'on-answer':
	await pc.setRemoteDescription(data.answer);
	break;
Copy the code

The data source at the receiving end comes from the signaling server, starting with the listener server:

case 'on-offer':
	targetId = from;
	createPeerConnection();
	await pc.setRemoteDescription(data.offer);
	const localStream = await navigator.mediaDevices.getUserMedia({ video: true.audio: true });
	localVideo.srcObject = localStream;
	localStream.getTracks().forEach(track= > {
		pc.addTrack(track, localStream);
	});
	const answer = await pc.createAnswer();
	await pc.setLocalDescription(answer);
	socket.emit('message', {
		type: 'answer'.target: targetId,
		data: { answer },
	});
	break;
Copy the code

The createPeerConnection logic is consistent with that on the sender. Negotiationneeded is not triggered and an offer does not need to be created. After that, the data is forwarded back to the sender for normal processing, and the SDP switching process ends.

After SDP, ICE exchange is carried out. The ICE logic is very simple, no data handshake is required, as long as it is sent when received and set to PeerConnection when received:

pc.addEventListener('icecandidate'.e= > {
    if (e.candidate) {
        socket.emit('message', {
            type: 'candidate'.target: targetId,
            data: { candidate: e.candidate } }); }});Copy the code
case 'on-candidate':
	await pc.addIceCandidate(data.candidate);
	break;
Copy the code

If you want to obtain the stream from the remote stream, you can directly monitor the track:

pc.addEventListener('track'.e= > {
	if(remoteVideo.srcObject ! == e.streams[0]) {
		remoteVideo.srcObject = e.streams[0]; }});Copy the code

The above is the realization of a simple WebRTC P2P system. In the whole process, only the signaling server is required to participate in the connection. After the connection is successful, the transmission of audio and video is completed through addTrack, and the signaling server is not required to participate in this process.

The problem with the P2P demo above, however, is that it only works in the same Intranet environment, which brings up another point of knowledge in P2P: burrowing and traversal.

STUN/TURN

Due to NAT, an external host cannot directly establish a connection with an internal host. In this case, I need to drill a hole on my OWN NAT device, through which the extranet host can establish a connection with the Intranet user.

We can use STUN/TURN service to achieve Intranet penetration capability. When creating RTCPeerConnection, we can pass STUN/TURN information by parameter:

const config = {
	'iceServers': [{'urls': 'stun:stun.l.google.com:19302' },
		{ "urls": "turn:numb.viagenie.ca"."username": "[email protected]"."credential": "muazkh"}};new RTCPeerConnection(config);
Copy the code

The STUN/TURN server setup process is not developed here, but you can read about the Coturn project.

Multiplayer scene Architecture

The above is the implementation scheme of one-to-one scenario. In practical application, there are also the requirements of multi-person audio and video. There are the following schemes for dealing with multi-person scenarios:

  • P2P: Peer-to-point (P2P) is a peer-to-peer (P2P) connection. We have seen this in one-to-one conversations. In a multi-party conversation, two people need To establish a P2P connection.

    • Client: n person session, push n-1 pull n-1
    • Advantages: No need for the server to be aware of the video stream
    • Disadvantages: The larger the number of people, the greater the pressure on the client, unable to handle the scene of too many people
  • MCU: The MULTI-point Control Unit (MCU) in Chinese is a central server. All user streams are directly pushed to this server. The server combines multiple data streams into one channel, and the client pulls this channel.

    • Client: n participants in a session. Push flow 1 and pull flow 1
    • Advantages: Saves client resources and applies to broadcast service scenarios
    • Disadvantages: The flow has been determined on the server and cannot be customized, which cannot meet flexible customization business scenarios
  • SFU: Selective Forwarding Unit (SFU) in Chinese, the central server is only responsible for Forwarding traffic, and the client selects the traffic as required.

    • Client: a session with N users. Push traffic 1 and pull traffic N
    • Advantages: It can meet multiple complex requirements
    • Disadvantages: The client performance may be affected due to excessive traffic, which needs to be balanced based on actual services

Multi-party audio and video sessions have different service modes. Therefore, you need to select an appropriate architecture solution based on specific scenarios.