WebRTC: Web Real-time Communication. It addresses the inability of the Web to capture audio and video, and provides peer-to-peer (that is, between browsers) video interaction. In fact, when broken down, it consists of three parts:

  • MediaStream: Captures audio and video streams
  • RTCPeerConnection: Transmits audio and video streams (generally used in peer-to-peer scenarios)
  • RTCDataChannel: used to upload audio and video binary data (usually used for streaming upload)

But in general, the peer-to-peer scenario is not really used much. This is where WebRTC should be used more often than not, compared with the live-streaming business that exploded last year. So for Web broadcast, we usually need two ends:

  • Host end: Record and upload videos
  • Audience: Download and watch the video

Here, I will not talk about the audience end, the back of another article to write an introduction (because, this is in is too much). Here, we mainly talk about the use of WebRTC anchor side. To simplify, the application technology of the anchor terminal can be divided into recording video and uploading video. Keep these two goals in mind and we’ll use WebRTC to achieve them later.

WebRTC basic understanding

WebRTC is primarily developed by two organizations.

  • Web Real-Time Communications (WEBRTC) W3C Organization: Defines browser apis
  • Real-Time Communication in Web-Browsers (RTCWEB) IETF standard organization: Defines the protocols, data, and security functions required.

Of course, our primary goal is to ask what are the basic browser defined apis? And how to use it? Then, the later goal is to learn the relevant protocols and data formats within the period. Step by step, this is more suitable for our study.

WebRTC’s Audio and video processing is mainly handled by Audio/Vidoe Engineering. The processing process is as follows:

  • Audio: Capture by physical device. And then we get startedThe noise reduction.Eliminate echo.Jitter/packet loss hidden.coding.
  • Video: Captured by physical devices. And then we get startedImage enhancement.synchronous.Jitter/packet loss hidden.coding.

Finally, the mediaStream Object is exposed to the upper-level API for use. That is, mediaStream is the middle layer that connects the WebRTC API to the underlying physical flow. So, for a better understanding, here’s a brief introduction to mediaStream.

MediaStream

MS (MediaStream) exists as a secondary object. It carries the filtering of audio and video streams, obtaining recording permissions, and so on. MS consists of two parts: MediaStreamTrack and MediaStream.

  • MediaStreamTrack represents a single type of data stream. If you havevideostudioYes, I think soorbitalThe word is familiar. In layman’s terms, you can think of them as equivalent.
  • MediaStream is a complete audio and video stream. It can contain >=0MediaStreamTrack. Its main function is to ensure that several tracks are playing simultaneously. For example, sound needs to be synchronized with a video screen.

We won’t go into too much depth here, but just talk about basic MediaStream objects. In general, we can get an object by instantiating an MS object.

// You also need to pass track or other stream as arguments. // This is for demonstration purposeslet ms = new MediaStream();
Copy the code

We can look at the object properties attached to ms:

  • Active [Boolean]: Indicates whether the CURRENT MS is active (that is, playable).
  • Id [String]: uniquely identifies the current MS. For example: “f61641ec – ee78-4317-9415-58 acac066a4d”
  • Onactive: This event is triggered when active is true
  • Onaddtrack: This event is triggered when a new track is added
  • Oninactive: This event is triggered when active is false
  • Onremovetrack: This event is triggered when a track is removed

There are other methods on the prototype chain, and I’ll pick out a few that are important.

  • Clone (): A clone of the current MS stream. This method is often used when there is an operation on the MS stream.

As mentioned earlier, MS can be used for other filtering purposes, so how does it do it? In MS, there is another important concept called Constraints. It is used to regulate whether the data currently collected meets the needs. Because when we capture video, different devices have different Settings. Commonly used are:

{
    "audio": true// Whether to capture audio"video": {// Video related Settings"width": {
            "min": "381"// The minimum width of the current video"max": "640" 
        },
        "height": {
            "min": "200"// Minimum height"max": "480"
        },
        "frameRate": {
            "min": "28"// Minimum frame rate"max": "10"}}}Copy the code

How do I know which properties are supported for tuning on my device? Here, you can directly use the navigator. MediaDevices. GetSupportedConstraints () to obtain the relevant properties can be tuned. However, this is generally set up for video. Now that we know about MS, it’s time to really get into the WebRTC API. Let’s take a look at the basic WebRTC API.

The common WebRTC API is as follows, but for browser reasons, the corresponding prefix needs to be added:

W3C Standard           Chrome                   Firefox
--------------------------------------------------------------
getUserMedia           webkitGetUserMedia       mozGetUserMedia
RTCPeerConnection      webkitRTCPeerConnection  RTCPeerConnection
RTCSessionDescription  RTCSessionDescription    RTCSessionDescription
RTCIceCandidate        RTCIceCandidate          RTCIceCandidate
Copy the code

However, you can simply use the following methods to solve the problem. You can use adapter.js to make up for the inconvenience

navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia
Copy the code

Here, let’s do it step by step. If you want to interact with video, you should first capture audio and video.

Capture audio and video

To capture audio and video in WebRTC, you only need to use one API, getUserMedia(). The code is simple:

navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia; Var constraints = {// Set the constraints to audio:false,
  video: true
};

var video = document.querySelector('video');

functionsuccessCallback(stream) { window.stream = stream; // This is the mediaStream instance mentioned aboveif(window.URL) { video.src = window.URL.createObjectURL(stream); // create a video playable SRC}else{ video.src = stream; }}function errorCallback(error) {
  console.log('navigator.getUserMedia error: ', error); } // This is the basic format of getUserMedia: navigator.getUserMedia(constraints, successCallback, errorCallback);Copy the code

For detailed demo, see WebRTC. If you use Promise, getUserMedia would say:

navigator.mediaDevices.getUserMedia(constraints).
    then(successCallback).catch(errorCallback);
Copy the code

The above comments have probably made the basics clear. One caveat is that when you capture a video, you must be aware of the parameters you need to capture.

Once you have your own video, how do you share it with others? (It can be understood as the way of live broadcasting) In WebRTC, RTCPeerConnection is provided to help us establish a connection quickly. However, this is only the middle step in establishing peer-to-peer. There are some complicated procedures and additional protocols involved, so let’s go through them step by step.

Basic content of WebRTC

WebRTC uses UDP to transmit video packets. This has the advantage of low latency and not being overly concerned with the order of packages. However, UDP is just a transport layer protocol. WebRTC still has a lot of work to do

  1. Traverses the NATs layer to find the specified peer
  2. Both parties negotiate basic information so that both parties can play the video normally
  3. During transmission, information security needs to be ensured

The overall architecture is as follows:

These protocols, such as ICE/STUN/TURN, will be covered later. So let’s take a look at how the two negotiate information, and usually this is what we call signaling.

Signaling task

Signaling is actually a negotiation process. Because both ends do not enter WebRTC video communication between, need to know some basic information.

  • Instructions to open/close a connection
  • Video information, such as decoder, decoder Settings, bandwidth, and video format, etc.
  • Key data, equivalent to that in HTTPSmaster keyUsed to ensure a secure connection.
  • Gateway information, such as IP addresses and ports of both parties

However, the Signaling process is not written in writing, meaning that it doesn’t matter which protocol you use, as long as it is secure. Why is that? Because different applications have their own negotiation methods that are best suited to them. Such as:

  • The single gateway protocol (SIP/Jingle/ISUP) applies to the calling mechanism (VoIP, Voice over IP).
  • Custom protocol
  • Multigateway protocol

We can simulate a signaling channel ourselves. Its principle is to transmit information, usually for convenience, we can directly use socket. IO to establish room to provide channels for information exchange.

The establishment of the PeerConnection

Let’s say we’ve now set up a communication channel through socket. IO. Then we can enter the RTCPeerConnection section to establish the connection. We should first use signaling for the exchange of basic information. So what is this information? WebRTC already does this for us at the bottom – Session Description Protocol (SDP). We use signaling to deliver the relevant SDP to ensure that both parties are correctly matched. The underlying engine automatically parses the SDP (thanks to JSEP) without us having to manually parse it. Suddenly the world feels great… So let’s see how we do that.

// Make use of the channels already created. var signalingChannel = new SignalingChannel(); // Enter RTC connection. This is equivalent to creating a peer end. var pc = new RTCPeerConnection({}); navigator.getUserMedia({"audio": true })
.then(gotStream).catch(logError);

functiongotStream(stream) { pc.addStream(stream); // Create local SDP pc.createOffer(function(offer) { 
    pc.setLocalDescription(offer); 
    signalingChannel.send(offer.sdp); 
  });
}

function logError() {... }Copy the code

What is the format of SDP? Take a look at the format, ok, this does not need to know too much:

V =0 o= -1029325693179593971 2 IN IP4 127.0.1s = -t =0 0 a=group:BUNDLE audio video a=msid-semantic: WMS M =audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126 C =IN IP4 0.0.0.0 A = RTCP :9 IN IP4 0.0.0.0 a=ice-ufrag:nHtT a=ice-pwd:cuwglAha5fBmGljFXWntH1VN a=fingerprint:sha-256 24:63:EB:DD:18:1B:BB:5E:B3:E8:C5:D7:92:F7:0B:44:EC:22:96:63:64:76:1A:56:64:DE:6B:CE:85:C6:64:78 a=setup:active a=mid:audio a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=inactive a=rtcp-mux ...Copy the code

The preceding procedure is the peer-to-peer negotiation process. There are two basic concepts: offer and answer.

  • Offer: The host terminal provides other users with the basic information of the live video broadcast in their province
  • Answer: The client sends feedback to the host to check whether it can play normally

The specific process is:

  1. The anchor side generates the SDP description through createOffer
  2. The anchor uses setLocalDescription to set the local description
  3. The host sends the offer SDP to the user
  4. You can use setRemoteDescription to set the description of the remote device
  5. The user creates his own SDP description with createAnswer
  6. You can use setLocalDescription to set the local description
  7. The user sends the Anwser SDP to the anchor
  8. Anchors use setRemoteDescription to set the description of the remote end.

However, the above is only a simple establishment of the connection information at both ends, but does not involve the transmission of video information, that is, UDP transmission. UDP transmission is a very painful work, if the client-server model is fine, direct transmission can be, but this is the peer-to-peer model. Think about it, you are going to use your computer as a server now, and you have to go through if you break through the firewall, if you find a port, how do you do that across network segments? So, here we need an additional protocol, namely STUN/TURN/ICE, to help us with this transfer task.

NAT/STUN/TURN/ICE

During UDP transmission, it is inevitable to encounter a Network Address Translator (NAT) server. That is, it is primarily passing messages from other network segments to machines in the network segment it is responsible for. However, our UDP packets are generally passed with NAT’s host. If you do not have an entry from the destination machine, the UDP packet will not be forwarded successfully. However, if you are a client-server, you will not encounter such problems. However, we transmit in peer-to-peer mode, so we can’t avoid such problems.

To solve this problem, we need to establish an end-to-end connection. What is the solution? Simply set up a server in the middle to keep the destination machine’s entry in the NAT. Common protocols are STUN, TURN, and ICE. What’s the difference between them?

  • STUN: As the most basicNAT traversalServer, reserved for the specified machineentry
  • TURN: The presence of a retry server when STUN fails.
  • ICE: Choose the most efficient delivery channel among many STUN + TURN servers.

So, the above three are usually used in combination. Their roles in PeerConnection are shown below:

If ICE is involved, we also need to pre-set the specified STUN/TRUN server when instantiating the Peer Connection.

var ice = {"iceServers": [{"url": "stun:stun.l.google.com:19302"}, // TURN () {'url': 'turn: 192.158.29.39:3478? transport=udp'.'credential': 'JZEOEt2V3Qb0y27GRntt2u2PAYA='.'username': '28224511:1379330808'
    },
    {
      'url': 'turn: 192.158.29.39:3478? transport=tcp'.'credential': 'JZEOEt2V3Qb0y27GRntt2u2PAYA='.'username': '28224511:1379330808'}}; var signalingChannel = new SignalingChannel(); var pc = new RTCPeerConnection(ice); // Complete when instantiating the Peer Connection. navigator.getUserMedia({"audio": true }, gotStream, logError);

functiongotStream(stream) { pc.addStream(stream); // Add the stream to connection. pc.createOffer(function(offer) { pc.setLocalDescription(offer); }); } // use ICE to listen for user connections to pc.onicecandidate =function(evt) {
  if (evt.target.iceGatheringState == "complete") { 
      local.createOffer(function(offer) {
        console.log("Offer with ICE candidates: "+ offer.sdp); signalingChannel.send(offer.sdp); }); }}...Copy the code

In ICE processing, there are also iceGatheringState and iceConnectionState. What is reflected in the code is:

  pc.onicecandidate = function(e) {
    evt.target.iceGatheringState;
    pc.iceGatheringState
    
  };
  pc.oniceconnectionstatechange = function(e) {
    evt.target.iceConnectionState;
    pc.iceConnectionState;
  };
Copy the code

Of course, the main player is the OnicecandiDate.

  • IceGatheringState: Used to check the status of the local candidate. It has the following three states:
    • New: The candidate has just been created
    • Gathering: ICE is collecting local candidates
    • Complete: ICE completes the collection of local candidates
  • IceConnectionState: Checks the status of the remote candidate. Condition is more complex, on the far side of a total of seven kinds of: new/checking/connected/completed/failed/disconnected/closed

However, to better explain the basic process of WebRTC establishing a connection. Let’s use a one-page link to simulate this. Now suppose you have two users, pc1 and PC2. Pc1 captures the video, and then PC2 establishes a connection with PC1 to complete the effect of pseudo live broadcast. Let’s get straight to the code:

  var servers = null;
  // Add pc1 to global scope so it's accessible from the browser console window.pc1 = pc1 = new RTCPeerConnection(servers); Onicecandidate = function(e) {onicecandiDate (pc1, e); }; // Add pc2 to global scope so it's accessible from the browser console
  window.pc2 = pc2 = new RTCPeerConnection(servers);
  pc2.onicecandidate = function(e) {
    onIceCandidate(pc2, e);
  };
  pc1.oniceconnectionstatechange = function(e) {
    onIceStateChange(pc1, e);
  };
  pc2.oniceconnectionstatechange = function(e) { onIceStateChange(pc2, e); }; Pc2. onAddStream = gotRemoteStream; // Once candidate has been added, the stream will play pc2. onaddStream = gotRemoteStream; // pc1 adds stream to Connection. pc1.addStream(localStream);

  pc1.createOffer(
    offerOptions
  ).then(
    onCreateOfferSuccess,
    error
  );
  
functionOnCreateOfferSuccess (desc) {// desc is the SDP data pc1.setLocalDescription(desc).function() {
      onSetLocalSuccess(pc1);
    },
    onSetSessionDescriptionError
  );
  trace('pc2 setRemoteDescription start'); Pc2.setremotedescription (desc).then()function() {
      onSetRemoteSuccess(pc2);
    },
    onSetSessionDescriptionError
  );
  trace('pc2 createAnswer start');
  pc2.createAnswer().then(
    onCreateAnswerSuccess,
    onCreateSessionDescriptionError
  );
}
Copy the code

Look at the above code, we estimate a little confused, to some real, we can refer to the single page live. While viewing the page, you can open the console to see the process in action. One of the things you can see is that onAddStream starts before the SDP negotiation is complete, which is one of the things that is wrong with the DESIGN of the API, so the W3C has removed the API from the standard. For now, though, it’s not a problem because it’s just for demonstration purposes. Let’s go through the whole process step by step.

  1. pc1 createOffer start
  2. Pc1 setLocalDescription start // SDP of PC1
  3. Pc2 setRemoteDescription start // SDP of PC1
  4. pc2 createAnswer start
  5. Pc1 setLocalDescription complete // SDP of PC1
  6. Pc2 setRemoteDescription complete // PC1 SDP
  7. Pc2 setLocalDescription start // SDP of PC2
  8. Pc1 setRemoteDescription start // SDP of PC2
  9. Pc2 Received remote stream: At this point, the receiving end can play videos. Pc2’s onAddStream listening event is then triggered. Get the remote video stream. Note that the SDP negotiation of PC2 has not been completed yet.
  10. At this point, the state of the local PC1 candidate has changed, triggering pc1 onicecandidate. Start bypc2.addIceCandidateMethod to add pc1.
  11. Pc2 setLocalDescription complete // PC2 SDP
  12. Pc1 setRemoteDescription complete // PC2 SDP
  13. Pc1 addIceCandidate success. Pc1 is added successfully
  14. The triggeroniceconnectionstatechangeCheck the status of pc1 remote candidate. As for thecompletedWhen is, PC2 is triggeredonicecandidateEvents.
  15. Pc2 addIceCandidate success.

In addition, there is another concept, RTCDataChannel, which I won’t touch on much here. If you are interested, see WebrTC, Web performance optimization for further study.