Web Real-Time Communication (WebRTC) is an API that can be used in Web apps such as video chat, audio chat, and P2P file sharing.

— MDN Web Docs WebRTC is a new front in the long war for an open and unencumbered Web. WebRTC technology is a breakthrough in the fierce open Web war ✌️ – Brendan Eich

I share this, put their popular understanding to make a long story short.

What is theWebRTC?

Imagine mobile phones, TVS, and computers all communicating through a single platform. Imagine how easy it is to add video chat and P2P data sharing to your site. This is the vision of WebRTC technology.

WebRTC is an open source project that enables real-time communication of audio, video, and data in Web applications. In real – time communication, the acquisition and processing of audio and video is a very complicated process. Such as audio and video streaming codec, noise reduction and echo cancellation, but in WebRTC, all this is done by the browser’s low-level encapsulation. We can take optimized media streams directly and output them to local screens and speakers, or forward them to their peers. To make a long story short, it is an API that enables web browsers to conduct real-time voice, video, and data transmission.

WebRTC was outlined at the Google I/O conference in 2013: io13webrtc.appspot.com/#1

WebRTC has achieved the development of standards for real-time communication, plug-in free audio and video data transmission, the requirements are:

  • Many web services already use RTC, but require downloads, native applications, or plug-ins;
  • Downloading and installing an upgrade plug-in is complex, error-prone, and annoying;
  • Plug-ins can be difficult to deploy, debug, troubleshoot, and so on;
  • Plug-ins may require technology licensing, complex integration, and expensive technology;

Therefore, the guiding principle of the WebRTC project is that APIs should be open source, free, standardized, browser-built, and more efficient than existing technologies.

A time line

  • In 1876, Bell invented the telephone;
  • Global IP Solutions was founded in 1990 in Stockholm, Sweden;
  • Later, Skype, Tencent QQ, WebEx, Google and others used its audio processing engine;
  • In February 2010, Google acquired On2; It also acquired its VPx series of video codecs and made them open source;
  • In May 2010, Google acquired THE GIPS engine of VoIP software developer Global IP Solutions, which is open source X2, for $68.2 million.
  • In May 2010,WebRTCOpen source project born: GIPS audio and video engine + replace H.264 VPx video codec;
  • In 2011, Google launched WebTRC open source project and standardization work;
  • In 2012,WebRTCIs integrated into Chrome;
  • 2013年,Google I/O “Real-time communication with WebRTC“The Presentation;
  • In 2017, after years of renovations,WebRTC1.0 standard has entered the Candidate Recommendation stage;
  • In 2017, Edge and Safari joined the pairWebRTCSupport; So far, mainstream browser compatibility ✅;

WebRTCstructure

Description of color identification of architecture drawing:

  • The purple is the Web developer API layer;
  • The solid blue line is the API layer for browser vendors;
  • The dotted blue line allows browser vendors to customize the implementation.

As we can see in the figure above, WebRTC has three modules:

  • Voice Engine
    • Voice Engine includes iSAC/iLBC Codec (audio Codec, the former for wideband and uWB, the latter for narrowband);
    • NetEQ for Voice (handle network jitter and voice packet loss);
    • Echo Canceler/Noise Reduction;
  • Video Engine
    • VP8 Codec (video image Codec);
    • Video Jitter buffer (handle Video jitter and packet loss);
    • Image enhancements;
  • Data Transport
    • SRTP(secure real-time transmission protocol for audio and video streaming);
    • Multiplexing d.
    • P2P.STUN+TURN+ICE(forNATNetwork and firewall traversal);
    • In addition, secure transport may be usedDTLS(Secure transmission of datagrams) for encrypted transmission and key negotiation;
    • The wholeWebRTCCommunication is based on UDP;

Core components

  • Audio and Video engine:OPUS,VP8 / VP9,H264;
  • Transport layer protocol: The bottom transport protocol is UDP.
  • Media Agreement:SRTP / SCTP;
  • Data protocol:DTLS / SCTP;
  • P2P Intranet penetration:STUN / TURN / ICE / Trickle ICE;
  • Signaling andSDPConsultation:HTTP / WebSocket / SIP,Offer AnswerModel;

WebRTCAudio and video engine

  • The bottom layer is the hardware device, above is the audio capture module and video capture module;
  • The middle part is the audio and video engine. The audio engine is responsible for audio acquisition and transmission, with noise reduction, echo cancellation and other functions. Video engine is responsible for network jitter optimization, Internet transmission codec optimization;
  • On top of the audio and video engine is a setC++API,C++API is provided to browsersJavascript API;

WebRTCProtocol stack

  • WebRTCThe core protocols are based on the rightUDPBuilt on the basis of;
  • Among them,ICE,STUN,TURNFor Intranet penetration, it solves the problem of obtaining and binding extranet mapped addresses, as wellkeep aliveMechanism;
  • DTLSUsed to encrypt the transmitted content, which can be regarded as the UDP version of TLS. Due to theWebRTCThis layer is a must. allWebRTCComponents must be encrypted, and theJavaScript APICan only be used for secure sources (HTTPSOr localhost). Signaling mechanisms are not governed byWebRTCStandards-defined, so you must ensure that you use security protocols;
  • SRTPAnd SCTP is the encapsulation and transmission control protocol for media data.
  • SCTPSCTP is a flow control transport protocol that provides tcp-like features. SCTP can be built on UDP, inWebRTCIs inDTLSAbove the agreement;
  • RTCPeerConnectionUsed to establish and maintain end-to-end connections and provide efficient audio and video streaming;
  • RTCDataChannelUsed to support end-to-end arbitrary binary data transmission;
  • WebRTCProtocol stack:
    • ICE: Interactive connection establishment (RFC 5245);
    • STUN: used toNATSession traversal utility (RFC 5389);
    • TURNIn:NATCircumferential relay traversal (RFC 5766);
    • SDP: Session Description Protocol (RFC 4566)
    • DTLS: Datagram Transport Layer security (RFC 6347);
    • SCTP: Flow control Transport protocol (RFC 4960);
    • SRTP: Secure Real time Transport protocol (RFC 3711);

Now I see these strange agreements, I am also at a loss (_¬ ¬), from a different Angle to continue to study.

WebRTCCall principle

Let’s think about the pain points of the next WebRTC call. For example, how to make a real-time audio and video call on two devices with completely different network environments and multimedia hardware?

Media negotiation

Firstly, both ends should negotiate the media format supported by each other.

As shown in the figure above, assume that there are two devices, Peer A and Peer B. Through negotiation, the two devices know that the video codec compatible with each other is H.264. Therefore, to complete the exchange of media information, it is necessary to use the aforementioned SDP protocol.

The Session Description Protocol (SDP) describes the initialization parameters of streaming media. This protocol is published by THE IETF as RFC 2327. SDP was originally a part of the Session Announcement Protocol, or SAP. The first version was released in April 1998, but has since been widely used to work with RTSP and SIP, as well as to describe multicast sessions on its own.

– Wikipedia

Therefore, in WebRTC, media capabilities are ultimately presented through SDP. Before transmitting media data, we should first negotiate media capabilities to see which encoding methods and resolutions are supported by both sides. The method of negotiation is to exchange media capability information through the signaling server.

The variety of WebRTC media negotiation is shown in the figure above.

  • The first step, Amy callscreateOfferMethod to create the Offer message. The content in the offer message is Amy’sSDPInformation.
  • The second step, Amy callssetLocalDescriptionMethod to the local endSDPThe information is saved.
  • Third, Amy sends the Offer message to Bob via the signaling server.
  • Fourth, after Bob receives the Offer message, he calls the setRemoteDescription method to store it.
  • Fifth, Bob callscreateAnswerMethod to create an answer message, again, the contents of the answer message are Bob’sSDPInformation.
  • In step 6, Bob callssetLocalDescriptionMethod to the local endSDPThe information is saved.
  • In step 7, Bob passes the Anwser message to Amy via the signaling server.
  • In step 8, Amy calls after receiving the answer messagesetRemoteDescriptionMethod to save it.

We can simulate this process in Javascript:

// A signaling service is used to exchange the SDP between the two ends
const signaling = new SignalingChannel(); 
// Create an RTC end
const pc = new RTCPeerConnection(null);
// When two devices need to negotiate media data, the following method is triggered
pc.onnegotiationneeded = async() = > {// Create the offer and save it locally
    await pc.setLocalDescription(await pc.createOffer());
    // Then call the signaling service to transfer the data
    signaling.send({desc: pc.localDescription});
};

// Use the signaling service to exchange SDP information
signaling.onmessage = async ({desc}) => {
    if (desc) {
        // After receiving the SDP offer, SDP answer is required
        if (desc.type === 'offer') {
            await pc.setRemoteDescription(desc); // Save the remote SDP
            // Set local SDP and answer remote
            await pc.setLocalDescription(await pc.createAnswer());
            // Through signaling
            signaling.send({desc: pc.localDescription});
        } else if (desc.type === 'answer') {
            // If the answer message is received, the local end stores it, and the media information exchange is complete
            awaitpc.setRemoteDescription(desc); }}};Copy the code

As for the media format, I will try to study it further in the future.

Network consultation

You need to understand each other’s network so that you can find a link to communicate with each other. Here’s a summary, then jump into another long and complicated topic that I don’t quite understand.

The ideal steps for network negotiation are:

  • Obtain the extranet IP address mapping of the current end
  • Exchanging network information through signaling services: Signaling constant signaling in the code above

Question after question:

  • But how do I get the current extranet IP address mapping?
  • Why distinguish between Intranet and extranet IP?

A quick story about IPv4

Note: The following story is from the website 🐒 :

On February 3, 2011, the Chinese New Year, IANA announced that the last five address blocks in the IPv4 address space had been allocated to the five regional councils under the IANA. On April 15, 2011, the Asia-pacific Regional Commission, APNIC, announced that the region had run out of IPv4 addresses except for a few reserved addresses. All of a sudden, IPv4 addresses soared in value as an endangered resource, and web companies paid huge sums to buy up the remaining spare addresses. In fact, IPv4 address shortage is not a new problem, as early as 20 years ago, the problem of running out of IPv4 addresses has been in front of Internet pioneers. It makes us wonder what technology delayed the crisis for as long as 20 years.

IPv4 stands for Internet Protocol Version 4 — Internet Protocol Version 4. IPv4 defines a hypernet that spans heterogeneous networks and assigns globally unique IP addresses to each node of the Internet. If we compare the Internet to a postal system, then an IP address is equivalent to a complete address, including the city, block, and house number. IPv4 uses a 32bit integer to represent an address. The maximum range of addresses is 232 or 4.3 billion. Considering the devices that could be connected to the Internet during the founding of IP, such a space is already large and difficult to be used up in a short time. However, the reality is far more than people thought, computer networks in the following decades of rapid expansion, the number of network terminals explosive growth.

To make matters worse, the 4.3 billion address Spaces are divided into class A,B,C, and D address networks and reserved addresses with different prefix lengths for routing and administration purposes. There are 127 segments of Class A network addresses, each of which contains about 16.78 million host addresses. Class B network address 16384 segments, each segment contains 65536 host addresses.

IANA assigns Class A network addresses to very large enterprises/organizations, one segment at A time. Assign class B network addresses to medium-sized enterprises or educational institutions, one segment at a time. Such an allocation policy causes a serious waste of IP addresses. Many allocated addresses are not really used, and address consumption is fast. So much so that in the early 1990s, network experts realized that such spending would soon run out of IPv4 addresses. As a result, people began to consider alternatives to IPv4 and took a series of measures to slow down the consumption of IPv4 addresses.

The Internet at first hoped that everyone interconnects, through the IP address to ensure that the interconnection is unique and accurate, but never expected to join the Internet of equipment terminal unexpectedly so many. Hence NAT, which has proven to help slow the depletion of the available IP address space. NAT is specified in RFC2663.

NAT & NATthrough

NAT is network address translation, which replaces the address information in the IP packet header. NAT is usually deployed at the egress of an organization’s network and translates an internal network IP address into an egress IP address to provide public network accessibility and upper-layer protocol connectivity.

RFC1918 provides for three reserved address paragraphs:

  • 10.0.0.0 – those
  • Along – 172.31.255.255
  • 192.168.0.0-192.168.255.255

These three ranges are in the address segment of class A,B, and C respectively. They are not assigned to specific users and are reserved by IANA as private addresses. These addresses can be used within any organization or enterprise. The difference between these addresses and other Internet addresses is that they can only be used internally and cannot be used as global routing addresses. For a network that requires Internet access but uses private IP addresses internally, deploy a NAT gateway at the egress of the organization. When packets leave the private network and enter the Internet, the source IP address is replaced with a public IP address, usually the interface address of the egress device. When an external access request reaches the destination, it appears to be initiated by the organization’s egress device, so the requested server can send the response back to the egress gateway over the Internet. The egress gateway replaces the destination address with the source host address of the private network and sends the destination address to the Intranet. In this case, the request and response from the private network host to the public network server are completed without being perceived by both ends. According to this model, a large number of Intranet hosts no longer need public IP addresses.

All NAts fall into several categories:

  1. staticNAT:

A single private IP address is mapped to a single public address, that is, a private IP address is translated into a public IP address. 2. Dynamic NAT: In this type of NAT, multiple private IP addresses are mapped to a public IP address pool. This is used when we know how many fixed users want to access the Internet at a given point in time. 3. PAT (NAT overloading) : Many local (private) IP addresses can be translated into a single public IP address using NAT overloading. Port numbers are used to distinguish traffic, that is, which traffic belongs to which IP address. This is the most common approach because it is cost-effective, since thousands of users can be connected to the Internet using only one real global (public) IP address.

STUNagreement

Introduction to the

Simple Traversal of UDP over NATs Protocol STUN is a network protocol that allows clients to find their public address after NAT. Find out which type of NAT you are behind and which Internet port the NAT binds to a local port. This information is used to create UDP communication between two hosts that are behind the NAT router. The default port number is 3478. It divides NAT implementations into four categories: classification

  1. Full-cone NAT, aka one – to – oneNAT
  • Completely coneNAT, all requests sent from the same Intranet IP address and port number are mapped to the same Internet IP address and port number, and any Internet host can send packets to the Intranet host using the mapped Internet IP address and port number.

  1. (Address)-restricted-cone NAT
  • Limit coneNATAll requests sent from the same Intranet IP address and port number will be mapped to the same external IP address and port number. Unlike a full cone, an extranet host can only send packets to an Intranet host that has previously sent packets to it.

  1. Port-restricted cone NAT
  • Port limit coneNAT, with restricted coneNATVery similar, except that it includes the port number. That is, if an extranet host with IP address X and port P wants to send packets to an Intranet host, the Intranet host must have sent packets to this IP address X and port P before.

  1. Symmetric NAT
  • symmetryNAT, all requests sent from the same Intranet IP address and port number to a specific destination IP address and port number are mapped to the same IP address and port number. If the same host uses the same source address and port number to send packets to different destinations,NATA different mapping will be used. In addition, only the extranet host that receives data can send packets to the Intranet host.

plan

Once the client is aware of the UDP port on the Internet side, communication can begin. If the NAT is fully conical, then either party can initiate communication. If the NAT is a restricted cone or port restricted cone, both sides must start the transmission together.

Note that it is not necessary to use STUN to use the technology described in STUN RFC; you can design a separate protocol and integrate the same functionality into the server running the protocol (TURN).

Protocols like SIP use UDP packets to transfer audio/video data over the Internet. Unfortunately, because the two ends of the communication tend to be behind the NAT, it is impossible to create a connection using traditional methods. This is where STUN comes in.

STUN is a CS protocol. A VoIP phone or software package might include a STUN client, and the RTCPeerConnection interface in WebRTC gives us the ability to call the STUN server directly. The client sends a request to the STUN server, which then reports to the STUN client the PUBLIC IP address of the NAT router and the port that the NAT has opened to allow incoming traffic back to the Intranet to assemble the correct UDP packets.

The above response also enables the STUN client to determine the type of NAT being used — because different NAT types handle incoming UDP packets differently. Three of the four main types are available: full conic NAT, restricted conic NAT, and port restricted conic NAT — but symmetric NAT (also known as bidirectional NAT), which is often used in large corporate networks, is not.

algorithm

Note: UDP communication is not possible once the path has passed the end of the red box. Once through the yellow or green boxes, there is a possibility of connection.

TURNagreement

Symmetric NAT can be penetrated using the TURN protocol. The TURN protocol allows a host to use the trunk service to transmit packets with its peer. TURN differs from other relay protocols in that it allows clients to communicate simultaneously with multiple peers using a single relay address. This perfectly compensates for the STUN’s inability to penetrate symmetric NAT.

RTCPeerConnection attempts to establish direct communication between peers over UDP.

If this fails, RTCPeerConnection will use TCP for the connection. If TCP still fails, the TURN server can be used as a backup to forward data between terminals. To reiterate: TURN is used to relay audio/video/data streams between peers. The TURN server has a public address, so peers can communicate with others even if they are behind a firewall or proxy. TURN servers have a conceptually simple task-relaying data streams-but unlike STUN servers, they consume a lot of bandwidth. In other words, TURN servers need to be more powerful.

Specific principle I have the opportunity to go deep again next, the focus of this inquiry is not here. But the important thing is that with these two protocols, we can easily obtain the current end of the extranet IP address mapping.

The deployment ofSTUNandTURNservice

Notice The STUN service must be deployed on a device that has a unique public IP address. I chose Ali Cloud ECS experiment.

  • STUNandTURNThe server source code can be obtained from code.google.com/p/rfc5766-turn-server, the code also provides a link about server install multiple information sources.
  • VM images for Amazon Web Services are also available.
  • anotherTURNThe server is restund, which provides the source code and also has AWS services.

Here I explore another STUN service, COturn: Coturn server complete implementation of STUN/TURN protocol, support P2P through the firewall; Coturn supports TCP, UDP, TLS, DTLS connections; Linux and macOS are supported. Windows is not supported.

Deployment steps:

Ali Cloud panel configuration

  1. Enable the UDP access rule for port 3478 in the security group, as shown in the following figure:

  1. View the Intranet IP address and extranet IP address of the instance and record:

ECS instance deploys the service

  1. Git clone/configure/make/make install
git clone 'https://github.com/coturn/coturn cd coturn'
./configure 
make 
sudo make install

yum install libevent-devel If you don't have LibEvent2 installed on your computer, install libevent-devel first
yum -y install openssl-devel # openSSL should also be installed
Copy the code
  1. The configuration file
  • After Coturn compilation is complete, a configuration file template will be generated automatically. The path is as follows:/usr/local/etc/turnserver.conf.default
  • Let me write a minimal configuration hereturnserver.conf.min:
# server
listening-port=3478 The default port is 3478Listening - IP = 172.16.205.16The internal IP address that the service listens toExternal - IP = 118.178.181.100The extranet IP address of the current service
realm=stun.neotape.live # domain flag, do not know what to write
no-tls # Turn off TLS, minimal service
no-dtls # DTLS similarly
mobility Enable Mobility ICE, which allows streams to move between multiple devices.
no-cli Disable the local Telnet CLI management interface
verbose Log output verbose mode
fingerprint # message validation is used in WebRTC messages
lt-cred-mech # webrTC through turn relay, must use long authentication mode
stale-nonce=3600 # Provides more secure access to TURN services
user=neo:tape Create a database for a single user
Copy the code

For full command and configuration, see coturn wiki: github.com/coturn/cotu… 3. Start the service

turnserver -c turnserver.conf.min You can view the log of the service
Copy the code
  1. Verify that the service works
  • Webrtc. Making. IO/samples/SRC…

TURNService validation

STUNService validationIf it fails, it will get an error message, such as the Timeout error below:

These are the two protocols commonly used in WebRTC, STUN and TURN server, the coturn open source project we use. At this point, the task of obtaining the extranet IP address mapping of the current end is complete.

Note that ICE, unlike STUN and TURN, is not a protocol, but a Framework that integrates STUN and TURN. The Coturn open source project integrates STUN and TURN capabilities.

Exchanging network information

The term WebRTC uses to describe network information is a candidate.

Then we need to use the signaling service to exchange candidate information between the two ends:

// A little more work is done than in the above example
const signaling = new SignalingChannel();
// The configured STUN service
const configuration = {iceServers: [{urls: 'stuns:stun.example.org'}};// The configuration of STUN/TURN can be written into RTCPeerConnection
const pc = new RTCPeerConnection(configuration);
// Triggered when the local agent ICE needs to pass information to other peers via the signaling server
pc.onicecandidate = ({candidate}) = > signaling.send({candidate});
pc.onnegotiationneeded = async() = > {try {
    await pc.setLocalDescription(await pc.createOffer());
    signaling.send({desc: pc.localDescription});
  } catch (err) {
    console.error(err);
  }};

signaling.onmessage = async ({desc, candidate}) => {
  try {
    if (desc) {
      if (desc.type === 'offer') {
        await pc.setRemoteDescription(desc);
        const stream =
          await navigator.mediaDevices.getUserMedia(constraints);
        stream.getTracks().forEach((track) = >
          pc.addTrack(track, stream));
        await pc.setLocalDescription(await pc.createAnswer());
        signaling.send({desc: pc.localDescription});
      } else if (desc.type === 'answer') {
        await pc.setRemoteDescription(desc);
      } else {
        console.log('Unsupported SDP type.'); }}else if (candidate) {
      // Get the candidate information on the peer end and make a save
      awaitpc.addIceCandidate(candidate); }}catch (err) {
    console.error(err);
  }};
Copy the code

Signaling server

WebRTC cannot create a connection without some kind of server in the middle. We call this signal channel (signaling). Whether it’s an email, a postcard or a carrier pigeon… , can exchange information through any means of communication, it’s up to you. However, this also leads to a variety of ways to set up signaling services, as long as the two ends of the RTC can correctly receive and save each other’s SDP and candidates. I use socket. IO to implement simple signaling service:

const os = require('os'); // operating system lib
const nss = require('node-static'); // node-static-server lib
const http = require('http'); // node http lib
const socketIO = require('socket.io'); // socket.io lib

const fs = new (nss.Server)('./server/template'); // fs here is not file system :( but file server :) anyway

// create an server
const app = http.createServer((req, res) = > {
    fs.serve(req, res);
}).listen(8080);

const io = socketIO.listen(app); // load socket.io listening to http server created before

// when io found client connected
io.sockets.on('connection'.function (socket) {

    // convenience func to notify server messages to the client
    function notify() {
        let array = ['[[[Server Notifications]]]:']; // init message list
        array.push.apply(array, arguments); // push args to message list
        socket.emit('notify', array); // emit messages
    }

    // got message then resend to other client
    socket.on('message'.function (message) {
        notify('Client said:', message);
        socket.broadcast.emit('message', message); // broadcast the message to other clients, but here maximum client nums is 2
    });

    socket.on('create or join'.function (room) {
        notify('Received request to create or join room: ' + room);
        let clientsInRoom = io.sockets.adapter.rooms[room]; // found members in room now
        let nums = clientsInRoom ? Object.keys(clientsInRoom.sockets).length : 0;
        notify(`Room ${room} now has ${nums} client(s)`);

        // if no members in current room
        if (nums === 0) {
            socket.join(room); // create new room
            notify(`Client ID ${socket.id} created room ${room}`);
            socket.emit('created', room, socket.id); // emit signal to client that room created successfully
        } else if (nums === 1) {
            notify(`Client ID ${socket.id} joined room ${room}`);
            io.sockets.in(room).emit('join', room); // emit join signal to the peer now in the room
            socket.join(room); // join to the room
            socket.emit('joined', room, socket.id); // emit signal to client that room joined successfully
            io.sockets.in(room).emit('ready'); // when both of two clients are here in room now, emit ready to both of them
        } else { // max two clients
            socket.emit('full', room); }}); socket.on('ipaddr'.function () {
        const ifaces = os.networkInterfaces(); // get network interfaces
        for (let dev in ifaces) {
            ifaces[dev].forEach(function (details) {
                if (details.family === '`IPv4`'&& details.address ! = ='127.0.0.1') {
                    socket.emit('ipaddr', details.address); // emit server ip address to socket}}}}));// bye handler
    socket.on('bye'.function () {
        console.log('received bye from client');
        notify('bye~');
    });
});
Copy the code

conclusion

Through signaling services,STUN&TURNService, we can do itWebRTCThe entire process of establishing a call. The diagram above is a snapshot of how connections are made.

WebRTC Javascript APIs

  • GetUserMedia () : Capture audio and video
  • RTCPeerConnection: Streams audio and video between users
  • RTCDataChannel: transfers data between users
  • MediaRecorder: Record audio and video

With these apis, we can implement many interesting RTC functions on the Web.

WebRTCBrowser compatibility — WebrTC-Adapter

origin

Adapter.js has been around since WebRTC’s early days in late 2012 or early 2013. It was a very small project, not 150 lines at the time. The main feature is to hide prefix differences like webkitRTCPeerConnection and mozRTCPeerConnection, and to provide functions to append MediaStream to HTML or elements. With the support of weBRTC for various browsers, adapter.js is used to mask the differences between browsers and provide a unified interface. The complexity of the interface is increasing, and currently exceeds 2200 lines of code.

use

  • www.npmjs.com/package/web… Module is introduced into
  • Webrtc. Making. IO/adapter/Ada… The script is introduced into

WebRTCAudio and video call

Safety source limitation

Due to the security source policy of mainstream browsers, the audio and video capabilities and WebRTC Javascript API are not available by default if you access a non-localhost or HTTPS address. There are two solutions:

  1. Use terminal to open Chrome and add the –unsafely-treat-insecure-origin-as-secure=”http://xxx” command
  2. Use HTTPS (WSS enabled for WebSocket)
  • For example, I use nginx to forward port 443 to the Node service, so I need to do additional forwarding to WS:

Location /ws/ {proxy_pass http://127.0.0.1:8080; Proxy_http_version 1.1; proxy_set_header Upgrade httpupgrade; proxysetheaderConnection”upgrade”; ProxysetheaderX – Forwarded – Forhttp_upgrade; proxy_set_header Connection “upgrade”; proxy_set_header X-Forwarded-For httpupgrade; proxysetheaderConnection”upgrade”; ProxysetheaderX – Forwarded – Forproxy_add_x_forwarded_for; proxy_set_header Host host; ProxysetheaderX – Real – IPhost; proxy_set_header X-Real-IP host; ProxysetheaderX – Real – IPremote_addr; }

getUserMedia API

Introduction to the

MediaDevices. GetUserMedia () will prompt the user for permission to use the media input media input will produce a MediaStream, containing the request of the orbit of media types. This stream can contain A video track (from hardware or virtual video sources, such as cameras, video capture devices, screen sharing services, and so on), an audio track (also from hardware or virtual audio sources, such as microphones, A/D converters, and so on), or some other track type.

It returns a Promise object, and on success resolve calls back a MediaStream object. If the user denies permission, or if the desired media source is not available, Promise calls back a PermissionDeniedError or NotFoundError.

The returned Promise object may neither resolve nor reject, because the user does not have to choose to allow or reject.

You can usually get mediaDevices using navigator. MediaDevices, for example:

navigator.mediaDevices.getUserMedia(constraints)
.then(function(stream) {
  /* Use this stream stream */
})
.catch(function(err) {
  /* Handle error */
});
Copy the code

grammar

var promise = navigator.mediaDevices.getUserMedia(constraints)
Copy the code

parameter

Constraints,

As a MediaStreamConstraints object, it specifies the media type of the request and the corresponding parameters.

The constraints parameter is a MediaStreamConstraints object that contains both video and audio members and is used to specify the media type of the request. At least one or both types must be specified simultaneously. If the browser cannot find the specified media type or fails to meet the parameters, the returned Promise object is in the Rejected state, with NotFoundError as the parameter to the Rejected callback.

{audio: true, video: true}

If true is set for a media type, the resulting stream needs to have tracks of that type. If one of these is not available for some reason, getUserMedia() will generate an error.

When the user’s camera and microphone information is not accessible for privacy reasons, an application can request the camera and microphone capabilities it needs or wants using the additional CONSTRAINTS parameter. Here’s an example of the camera resolution the app wants to use:

{
  audio: true.video: { width: 1280.height: 720}}Copy the code

The browser will try to satisfy the request parameters, but it may return other resolutions if the request parameters are not exactly satisfied or if the user chooses to override the request parameters.

To force a specific size, use the keywords min, Max, or exact(min == Max). The following parameters indicate that a minimum resolution of 1280×720 is required.

{
  audio: true.video: {
    width: { min: 1280 },
    height: { min: 720}}}Copy the code

If the camera does not support the requested resolution or higher, the Promise returned is in the Rejected state, NotFoundError as the Rejected callback argument, and the user will not be prompted for authorization.

The reason for the difference in performance is that the keywords min, Max, and exact are intrinsically mandatory as opposed to the simple request value and ideal keyword. Here’s a more detailed example:

{
  audio: true.video: {
    width: { min: 1024.ideal: 1280.max: 1920 },
    height: { min: 776.ideal: 720.max: 1080}}}Copy the code

When a request contains an Ideal value, this value carries a higher weight, meaning that the browser will first try to find the Settings or camera (if the device has more than one camera) that are closest to the specified ideal value.

Simple request values can also be interpreted as applying ideal values, so our first request to specify resolution can also be written as follows:

{
  audio: true.video: {
    width: { ideal: 1280 },
    height: { ideal: 720}}}Copy the code

Not all constraints are numbers. For example, on mobile devices, the following example shows preference for front-facing cameras (if any) :

{ audio: true.video: { facingMode: "user"}}Copy the code

To force a rear camera, please use:

{ audio: true.video: { facingMode: { exact: "environment"}}}Copy the code

The return value

Return a Promise whose successful callback takes a MediaStream object as an argument.

abnormal

Returns a failed Promise whose failed Promise callback takes a DOMException object as an argument. Possible exceptions are:

  • AbortError Although both the user and the operating system grant access to the device hardware, and there are no hardware issues that could raise NotReadableError, there are still issues that cause the device to be unusable.

  • NotAllowedError [rejection error] The user rejected the access request of the current browser instance; Or the user denies access to the current session; Or the user globally rejects all media access requests. Older versions of the specification used SecurityError, but it has been given a new meaning in the new version.

  • NotFoundError [error not found] Could not find a media type that meets the request parameters.

  • NotReadableError A hardware, browser, or web page level error occurs on the operating system, causing the device to be inaccessible even though the user has authorized the device.

  • OverConstrainedError the specified requirement cannot be met by the device. This exception is an object of type OverConstrainedError that has a constraint attribute, This property contains the constraint object that cannot currently be satisfied, as well as a message property that contains a readable string to illustrate the situation. Because this exception can be thrown even if the user has not authorized use of the current device, it should be used as a means of detecting the device’s capability attributes [Fingerprinting Surface].

  • SecurityError The use of device media is disallowed above the Document on which getUserMedia() is called. Whether this mechanism is turned on or off depends on the preferences of individual users.

  • TypeError constraints objects are not set [empty], or are both set to false.

The sample

Width and height

This example sets the camera resolution and assigns the resulting MediaStream to a video element.

// Want to get a camera resolution closest to 1280x720
var constraints = { audio: true.video: { width: 1280.height: 720}}; navigator.mediaDevices.getUserMedia(constraints) .then(function(mediaStream) {
  var video = document.querySelector('video');
  video.srcObject = mediaStream;
  video.onloadedmetadata = function(e) {
    video.play();
  };
})
.catch(function(err) { console.log(err.name + ":" + err.message); }); // Always check for errors at the end
Copy the code

Use the new API in the old browser

This is a use the navigator. MediaDevices. GetUserMedia () example, take a polyfill so as to adapt to the old browsers. Note that this polyfill does not fix any lingering differences in the constraint syntax, which means that the constraint may not work well on some browsers. It is recommended to use adapter.js polyfill that handles the constraints instead.

// Older browsers may not implement mediaDevices at all, so we can set an empty object first
if (navigator.mediaDevices === undefined) {
  navigator.mediaDevices = {};
}

// Some browsers partially support mediaDevices. We cannot set getUserMedia directly to the object
// This may overwrite existing attributes. Here we will only add the getUserMedia property if it is not there.
if (navigator.mediaDevices.getUserMedia === undefined) {
  navigator.mediaDevices.getUserMedia = function(constraints) {

    // First, get getUserMedia, if there is one
    var getUserMedia = navigator.webkitGetUserMedia || navigator.mozGetUserMedia;

    // Some browsers don't implement it at all - so return an error to the Promise reject to keep a unified interface
    if(! getUserMedia) {return Promise.reject(new Error('getUserMedia is not implemented in this browser'));
    }

    // Otherwise wrap a Promise for the old navigator.getUserMedia method
    return new Promise(function(resolve, reject) {
      getUserMedia.call(navigator, constraints, resolve, reject);
    });
  }
}

navigator.mediaDevices.getUserMedia({ audio: true.video: true })
.then(function(stream) {
  var video = document.querySelector('video');
  // Older browsers may not have srcObject
  if ("srcObject" in video) {
    video.srcObject = stream;
  } else {
    // Prevent it from being used in newer browsers as it is no longer supported
    video.src = window.URL.createObjectURL(stream);
  }
  video.onloadedmetadata = function(e) {
    video.play();
  };
})
.catch(function(err) {
  console.log(err.name + ":" + err.message);
});
Copy the code

Frame rate

In some cases, such as when limited bandwidth transport is used on WebRTC, a lower frame rate may be more appropriate.

var constraints = { video: { frameRate: { ideal: 10.max: 15}}};Copy the code

Front or rear camera

On a mobile device (phone)

var front = false;
document.getElementById('flip-button').onclick = function() { front = ! front; };var constraints = { video: { facingMode: (front? "user" : "environment")}};Copy the code

permissions

To use getUserMedia() in an installable app such as the Firefox OS app, you need to specify the following permissions in the declaration file:

"permissions": {
  "audio-capture": {
    "description": "Required to capture audio using getUserMedia()"
  },
  "video-capture": {
    "description": "Required to capture video using getUserMedia()"}}Copy the code

Video stream acquisition and transmission

/ / the Client side
// RTCPeerConnection is assumed to have been created
const pc = newRTCPeerConnection({... configs});// Get the video stream
navigator.mediaDevices.getUserMedia({
    audio: false.video: true
})
    .then(gotStream)
    .catch(function (e) {
        alert('getUserMedia() error: ' + e.name);
    });

// Get the local video stream
function gotStream(stream) {
    localVideo.srcObject = stream;
    sendMessage('got user media'); // Notify the peer to establish a link
    // if it is the end that initiates the link actively
    if (isInitiator) {
        maybeStart();
    } 
}

signaling.on('message'.function(message) {
    if (message === 'got user media') { maybeStart(); }})function maybeStart() {
    // isChannelReady is when two devices join the session at the same time
    if (typeoflocalStream ! = ='undefined' && isChannelReady) {
        console.log('>>>>>> creating peer connection');
        createPeerConnection();
        pc.addStream(localStream); // Join the local stream to transmit to the peer end
        // if it is the end that initiates the link actively
        if (isInitiator) {
            doCall(); // Initiate a call for an SDP exchange}}}function createPeerConnection() {
    try {
        pc = newRTCPeerConnection({... configs}); pc.onicecandidate = handleIceCandidate; pc.onaddstream = handleRemoteStreamAdded;// At this point, the peer video stream has been obtained}}Copy the code

WebRTCThe data transfer

RTCDataChannel API

Developer.mozilla.org/zh-CN/docs/…

Code sample

var pc = new RTCPeerConnection(servers,
  {optional: [{RtpDataChannels: true}]});

pc.ondatachannel = function(event) {
  receiveChannel = event.channel;
  receiveChannel.onmessage = function(event){
    document.querySelector("div#receive").innerHTML = event.data; }; }; sendChannel = pc.createDataChannel("sendDataChannel", {reliable: false});

document.querySelector("button#send").onclick = function (){
  var data = document.querySelector("textarea#send").value; sendChannel.send(data); };Copy the code

Write in the last

My little Demo

Call 8: neotape.live

Debug WebRTC In Chrome

chrome://webrtc-internals/

Video Chat for the Web Android and iOS: Android iOS is also availableWebRTC

Docs.google.com/presentatio…

WebRTC& RTMP comparison: RTMP is the scenario where live broadcast is used more frequently

Segmentfault.com/a/119000001…

WebRTCSamples: Some examples

webrtc.github.io/samples/

WebRTCGood article in the real world

www.html5rocks.com/en/tutorial…

WebRTCSome Experiments.WebRTCThe experiment

Github.com/muaz-khan/W…

WebRTCPast lives: A good translation

Blog.coding.net/blog/gettin…

WebRTC1.0: Real-time Communication Between Browsers:WebRTCThe API documentation

www.w3.org/TR/webrtc/

Rtcmulticonnection.js: Open sourceWebRTClibrary

www.rtcmulticonnection.org/

Pure Go implementation of the WebRTC API

github.com/pion/webrtc