Background: With the outbreak of COVID-19, online video, online education and other real-time audio and video interactive systems have become the key to the online war, and the market demand for stable and efficient audio and video systems has exploded.

When it comes to audio and video technology, as a front-end developer, we have to say WebRTC. Due to WebRTC’s excellent real-time performance and cross-platform support, WebRTC has been highly concerned by the public since its birth. Today we learn about the darling of the audiovisual world.

MDN definition of WebRTC: WebRTC (Web Real-Time Communications) is a real-time communication technology that allows Web applications or sites to establish peer-to-peer connections between browsers without resorting to intermediaries. Implement the transmission of video streams and/or audio streams or any other data. WebRTC includes standards that make it possible to create peer-to-peer data sharing and teleconferencing without having to install any plug-ins or third-party software.

WebRTC framework

In general, the architecture of WebRTC looks like this:

As we can see, a simple point-to-point communication system mainly consists of four parts:

  • WebRTC client: the WebRTC client is responsible for the production and consumption of audio and video data. It is located in the NAT and belongs to the Intranet

  • NAT: Network Address Translation (NAT) is a Network Address Translation protocol that maps the Intranet Address of a device to a public Address.

  • Signaling server: Used to transmit signaling data such as SDP and candidate.

  • STUN/TURN server (relay server) :

    • STUN: used to find the public address of the device in the NAT. The WebRTC client sends a request to the STUN server on the public network to get information about its public address and whether it can be accessed (through the router).
    • TURN: For symmetric NAT that cannot traverse the Intranet through the STUN server, we can use the TURN server as a relay server to forward data through the TURN server.

Point-to-point communication principle:

  1. First, the client needs to connect to the signaling server, and then the two parties need to use the signaling server to learn necessary information about the other party, such as the audio and video formats supported by the client, and the number of their external IP address and port (the public IP address cannot be known at this time).

  2. Establish a connection with STUN, obtain its own external IP address and port, and whether Intranet traversal is possible. If Intranet traversal is not supported, you need to connect to the TURN server for trunk communication.

  3. After the WebRTC client obtains its external IP address and port, it exchanges candidate information to the other party through the signaling server. Once both parties have obtained each other’s addresses, they can attempt NAT traversal and make P2P connections.

WebRTC implements point-to-point communication

To achieve point-to-point communication, we need to go through the following steps:

  1. Detect local audio and video equipment and collect audio and video;
  2. Establish a connection with the peer party through the signaling server.
  3. Create an RTCPeerConnection object
    • Bind audio and video data
    • Conduct media negotiations
    • Exchange candidate information
  4. Audio and video data transfer and rendering

Let’s walk through the steps step by step.

Front knowledge

Before introducing the steps for implementing point-to-point communication, let’s look at some of the concepts that precede it.

MediaStreamTrack

MediaStreamTrack is the basic media unit in WebRTC. A MediaStreamTrack contains a single type of media (such as audio, video) returned by a media source (media device or recorded content). A single track may contain more than one channel, such as a stereo sound source which, although composed of multiple audio tracks, can also be regarded as a single track.

MediaStreamTrack MDN document

MediaStream

MediaStream is a collection of mediastreamtracks that can contain >=0 Mediastreamtracks. MediaStream ensures that all tracks it contains are played simultaneously and that the tracks are single.

MediaStream MDN document

The source and sink

In the source code of MediaTrack, MediaTrack is composed of the corresponding source and sink.

//src\pc\video_track.cc
void VideoTrack::AddOrUpdateSink(rtc::VideoSinkInterface<VideoFrame>* sink, const rtc::VideoSinkWants& wants) {
  RTC_DCHECK(worker_thread_->IsCurrent());
  VideoSourceBase::AddOrUpdateSink(sink, wants);
  rtc::VideoSinkWants modified_wants = wants;
  modified_wants.black_frames = !enabled(a); video_source_->AddOrUpdateSink(sink, modified_wants);
}
 
void VideoTrack::RemoveSink(rtc::VideoSinkInterface<VideoFrame>* sink) {
  RTC_DCHECK(worker_thread_->IsCurrent());
  VideoSourceBase::RemoveSink(sink);
  video_source_->RemoveSink(sink);
}
Copy the code

The browser has a media channel from Source to sink. Source is responsible for producing media resources, including static resources such as multimedia files and Web resources, and dynamic resources such as audio resources collected by microphones and videos collected by cameras. Sink is responsible for consuming source and producing media resources, that is, passing,, and other media labels, or the source can be transmitted to the remote end through the network through RTCPeerConnection. RTCPeerConnection can play the role of source and sink at the same time. As sink, the obtained source can reduce bit rate, zoom, adjust frame rate, etc., and then be transmitted to the remote end. As source, the obtained remote code stream is transmitted to local rendering.

MediaTrackConstraints

MediaTrackConstraints describes the capabilities of MediaTrack and the one or more values that each capability can take to select and control sources. MediaTrackConstraints can be passed as a parameter to applyConstraints() to control the track properties, and you can call getConstraints() to see how recently the custom constraints have been applied.

const constraints = { width: {min: 640, ideal: 1280}, height: {min: 480, ideal: 720}, advanced: [ {width: 1920, height: }, {aspectRatio: 1.333}]}; //{video: true} is also a MediaTrackConstraints object that specifies the media type of the request and the corresponding parameters. navigator.mediaDevices.getUserMedia({ video: true }) .then(mediaStream => { const track = mediaStream.getVideoTracks()[0]; track.applyConstraints(constraints) .then(() => { // Do something with the track such as using the Image Capture API. })  .catch(e => { // The constraints could not be satisfied by the available devices. }); });Copy the code

More different constraints:

{audio: true, video: {width: 1280, height: 720}}Copy the code
{audio: true, video: {facingMode: "user"}}Copy the code
{audio: true, video: {facingMode: {exact: "environment"}}}Copy the code

How do I play MediaStream

The MediaStream object can be assigned directly to the srcObject property of the HTMLMediaElement interface.

video.srcObject = stream;
Copy the code

SrcObject MDN document

Detect local audio and video equipment and collect audio and video;

Detect local audio and video devices

Through MediaDevices. EnumerateDevices () we can get a native of the available media list of input and output devices, such as a microphone, camera, headphones equipment, etc.

// Get the media device
navigator.mediaDevices.enumerateDevices().then(res= > {
	console.log(res);
});
Copy the code

Each media input in the list is available asMediaTrackConstraintsFor example, an audio device input audioDeviceInput can be set to the value of the audio property in MediaTrackConstraints

cosnt constraints = { audio : audioDeviceInput }
Copy the code

The constraint value passed as a parameter to MediaDevices. The getUserMedia (constraints), can obtain the device MediaStream.

Collect local audio and video data

Can by calling MediaDevices. GetUserMedia () to access the local media, call this method after the browser will prompt the user for permission to use the media input media input will produce a MediaStream, containing the request of the media type of orbit. This stream can contain A video track (from hardware or virtual video sources, such as cameras, video capture devices, screen sharing services, and so on), an audio track (also from hardware or virtual audio sources, such as microphones, A/D converters, and so on), or some other track type.

navigator.mediaDevices.getUserMedia(constraints)
  .then(function(stream) {
    /* 使用这个stream*/
    video.srcObject = stream;
  })
  .catch(function(err) {
    /* Handle error */
  });
Copy the code

MediaDevices. EnumerateDevices (MDN) document

MediaDevices. GetUserMedia (MDN) document

Establish a connection with the peer through the signaling server

The signaling server is mainly used to help us with business logic processing (such as joining a room, leaving a room) and media negotiation and candidate exchange.

Signaling server can have many schemes, here we use Node. js and socket. IO to achieve a simple signaling server.

Creating an HTTP Server

let http = require('http'); // let express = require('express'); let app = express(); let http_server = http.createServer(app); Http_server. Listen (8081, '127.0.0.1);Copy the code

Socket. IO is introduced to realize real-time communication between the two ends

let http = require('http'); // provide HTTP service const {Server} = require('socket.io'); let express = require('express'); let app = express(); //HTTP server let http_server = http.createserver (app); Http_server. Listen (8081, '127.0.0.1); Const IO = new Server(http_server, {// handle cross-domain configuration cors: {origin: [' http://127.0.0.1:3000 ', 'http://localhost:3000'], credentials: true,,}});Copy the code

Listen for messages from the client

socket.on('messageName', messageHandler)
Copy the code

Client joins the room

socket.join(roomId);
Copy the code

Send messages to clients in the room

socket.to(roomId).emit('messageName', data);
Copy the code

Forwarded message

Socket. On ('message', ({roomId, data}) => {socket.to(roomId).emit('message', data); });Copy the code

Create an RTCPeerConnection object

The RTCPeerConnection interface represents a WebRTC connection from the local computer to the remote. This interface provides implementations of methods to create, hold, monitor, and close connections.

const pc = new RTCPeerConnection()
Copy the code

Bind audio and video data

We can bind audio and video data to the RTCPeerConnection object using the addTrack and addStream methods (deprecated).

mediaStream.getTracks().forEach(track => {
  peerConnection.addTrack(track, mediaStream);
});
Copy the code

Conduct media negotiations

The so-called media negotiation is the exchange of SDP information of both parties, including audio and video coDEC (CODEC), source address, and time information.

The caller obtains the local SDP (Offer), calls pc.setlocalDescription (offer) to save the local SDP information, and sends the local SDP to the remote end through the signaling server.

Pc.createoffer (). Then (offer => {console.log('------ get local offer', offer); // Bind local SDP information pc.setlocalDescription (offer); // Send the local SDP to the remote signalServer.send({type: 'offer', value: offer,}); });Copy the code

After receiving the offer from the remote end, the called end invokes pc.setRemoteDescription(Offer) to bind the remote SDP. Then call pc.createAnswer() to create the local SDP and save the SDP using pc.setlocalDescription (answer). Finally, send the ANSWER SDP to the remote end using the signaling server.

Const onGetRemoteOffer = offer => {console.log('------ got remote offer', offer); // The remote end initiates a call and starts to establish a connection. // Bind the remote SDP pc.setRemoteDescription(offer); // Create local SDP pc.createAnswer().then(answer => {// bind local SDP pc.setLocalDescription(answer); Console. log('------ got local answer', answer); // Send the local SDP to the remote signalServer.send({type: 'answer', value: answer,}); }); };Copy the code

After receiving the answer from the remote end, the caller invokes pc.setremoteDescription (answer) to bind the remote SDP.

Const onGetRemoteAnswer = answer => {console.log('------ got remote answer', answer); // Bind remote SDP pc.setremoteDescription (answer); };Copy the code

ICE

When the media negotiations are complete, WebRTC starts setting up network connections. The premise of establishing a network connection is that the client needs to know the extranet address of the peer. The process of obtaining and exchanging extranet addresses is called ICE.

collect

WebRTC integrates the collection of candidates internally. After a Candidate is collected, WebRTC also provides the onicecandidate event to notify the upper layer.

Pc.addeventlistener (' icecandiDate ', event => {const candidate = event. Candidate; If (candidate) {console.log('------ got local candidate: ', candidate) //... }});Copy the code
exchange

After the candidate is collected, the candidate information can be sent to the remote end through the signaling system.

// Send a candidate to the remote signalServer.send({type: 'candidate', value: candidate});Copy the code

After receiving the candidate from the peer, the remote end will form a CandidatePair with the local candidate (that is, connect the candidate pair). With CandidatePair, WebRTC can start trying to establish a connection.

// Get remote candidate const onGetRemoteCandidate = candidate => {console.log('------ get remote candidate', candidate); pc.addIceCandidate(candidate); };Copy the code

Remote audio and video data receiving and rendering

When both parties get the candidate information from the other end, WebRTC attempts to establish a connection internally. Once the connection is established, audio and video data begins to flow from sender to receiver.

Through the track event of RTCPeerConnection object, we can receive remote audio and video data and render it.

Pc.addeventlistener ('track', e => {console.log('------ got remote media data: ', e); if (remoteVideo.srcObject ! == e.streams[0]) { remoteVideo.srcObject = e.streams[0]; }});Copy the code

The complete code

Signaling server code

'use strict '; let http = require('http'); // provide HTTP service const {Server} = require('socket.io'); let express = require('express'); const MaxUserNum = 2; let app = express(); const roomsInfo = {}; const userRoomInfo = {}; //HTTP server let http_server = http.createserver (app); Http_server. Listen (8081, '127.0.0.1); Const IO = new Server(http_server, {cors: {origin: ['http://127.0.0.1:3000', 'http://localhost:3000'], credentials: true, }, }); Io.sockets. On ('connection', socket => {console.log('got a connection'); On ('message', ({roomId, data}) => {console.log('message, room: ' + roomId + ', data , type:' + data.type); socket.to(roomId).emit('message', data); }); socket.on('join', ({ roomId }) => { if (! roomId) return; socket.join(roomId); console.log(`${socket.id} join ${roomId}`); If (! roomsInfo[roomId]) { roomsInfo[roomId] = {}; } roomsInfo[roomId][socket.id] = socket; // Register user room if (! userRoomInfo[socket.id]) { userRoomInfo[socket.id] = []; } userRoomInfo[socket.id].push(roomId); let userNum = Object.keys(roomsInfo[roomId]).length; Emit ('joined', {roomId, userNum}); // If (userNum <= MaxUserNum) { If (userNum > 1) {socket.to(roomId).emit('otherjoined', {roomId, userId: socket.id}); }} else {// If the room is full of people socket.leave(roomId); Emit ('full', {roomId, userNum}); }}); const onLeave = ({ roomId }) => { if (! roomId) return; socket.leave(roomId); roomsInfo[roomId] && roomsInfo[roomId][socket.id] && delete roomsInfo[roomId][socket.id]; userRoomInfo[socket.id] && (userRoomInfo[socket.id] = userRoomInfo[socket.id].filter(id => id ! == roomId)); console.log( 'someone leaved the room, the user number of room is: ', roomsInfo[roomId] ? Object.keys(roomsInfo[roomId]).length : 0, ); To (roomId). Emit ('bye', {roomId, userId: socket.id}); // Notify other users that someone has left socket.to(roomId). Socket. Emit ('leaved', {roomId}); }; // The user leaves the room socket.on('leave', onLeave); //disconnect socket.on('disconnect', () => { console.log(socket.id, 'disconnect, and clear user`s Room', userRoomInfo[socket.id]); if (userRoomInfo[socket.id]) { userRoomInfo[socket.id].forEach(roomId => { onLeave({ roomId }); }); delete userRoomInfo[socket.id]; }}); });Copy the code

The signaling server processes objects on the client side

import { io, Socket } from 'socket.io-client'; interface Option { onJoined? : (message: { roomId: string; userNum: number }) => void; onOtherJoined? : (message: { roomId: string; userId: number }) => void; onMessage: (data: { type: string; value: any }) => void; onFull? : (message: { roomId: string }) => void; onBye? : (message: { roomId: string; userId: number }) => void; onLeaved? : (message: { roomId: string }) => void; serverUrl? : string; } export default class SignalServer { socket: Socket; roomId: string; constructor(option: Option) { this.init(option); {} init (option). This socket = IO (option. The serverUrl | | 'http://127.0.0.1:8081/'); this.socket.connect(); this.socket.on( 'joined', option.onJoined || (({ roomId, usersNum }) => { console.log('i joined a room', roomId); console.log('current user number:', usersNum); })); this.socket.on( 'otherjoined', option.onOtherJoined || (({ roomId, userId }) => { console.log('other user joined, userId', userId); })); this.socket.on('message', option.onMessage); this.socket.on( 'full', option.onFull || (({ roomId }) => { console.log(roomId, 'is full'); })); this.socket.on( 'bye', option.onBye || (({ roomId, userId }) => { console.log(userId, `leaved`, roomId); })); this.socket.on('leaved', option.onLeaved || (({ roomId }) => {})); window.addEventListener('beforeunload', () => { this.leave(); }); } send(data) { if (! this.roomId) return; this.socket.emit('message', { roomId: this.roomId, data }); } join(roomId) { this.roomId = roomId; this.socket.emit('join', { roomId }); } leave() { this.roomId && this.socket.emit('leave', { roomId: this.roomId }); this.roomId = ''; }}Copy the code

Client code

import React, { useEffect, useState, useRef, useMemo } from 'react';
import { Button, Input, message } from 'antd';
import SignalServer from '../components/SignalServer';

import './index.less';

const pcOption = {};

type State = 'init' | 'disconnect' | 'waiting' | 'canCall' | 'connecting';

const Simple1v1 = () => {
  // 远端传递过来的媒体数据
  const remoteMediaStream = useRef<MediaStream>(null);
  // 本地设备采集的媒体数据
  const localMediaStream = useRef<MediaStream>(null);
  const localVideo = useRef<HTMLVideoElement>(null);
  const remoteVideo = useRef<HTMLVideoElement>(null);
  // 信令服务器对象
  const signalServer = useRef<SignalServer>(null);
  const peerConnection = useRef<RTCPeerConnection>(null);

  const [roomId, setRoomId] = useState('');
  const [state, setState] = useState<State>('disconnect');

  const tip = useMemo(() => {
    switch (state) {
      case 'init':
        return '正在获取媒体数据...';
      case 'disconnect':
        return '请输入房间号并加入房间';
      case 'waiting':
        return '等待对方加入房间...';
      case 'canCall':
        return '可点击啊call进行呼叫';
      case 'connecting':
        return '通话中';
      default:
        return '';
    }
  }, [state]);

  useEffect(() => {
    // 初始化信令服务器
    signalServer.current = new SignalServer({ onMessage, onJoined, onOtherJoined });

    const initPeerConnection = () => {
      console.log('------ 初始化本地pc对象');
      // 创建pc实例
      peerConnection.current = new RTCPeerConnection(pcOption);
      const pc = peerConnection.current;
      // 监听 candidate 获取事件
      pc.addEventListener('icecandidate', event => {
        const candidate = event.candidate;
        if (candidate) {
          console.log('------ 获取到了本地 candidate:', candidate);

          // 发送candidate到远端
          signalServer.current.send({ type: 'candidate', value: candidate });
        }
      });

      // 监听到远端传过来的媒体数据
      pc.addEventListener('track', e => {
        console.log('------ 获取到了远端媒体数据:', e);
        if (remoteVideo.current.srcObject !== e.streams[0]) {
          remoteVideo.current.srcObject = e.streams[0];
        }
      });
    };

    //获取本地媒体数据
    const getLocalMediaStream = () => {
      navigator.mediaDevices.getUserMedia({ audio: false, video: true }).then(mediaStream => {
        console.log('------ 成功获取本地设备媒体数据:', mediaStream);
        if (mediaStream) {
          localVideo.current.srcObject = mediaStream;
          localMediaStream.current = mediaStream;

          // 绑定本地媒体数据到pc对象上
          if (localMediaStream.current) {
            console.log('------ 绑定本地媒体数据到pc对象上');
            localMediaStream.current.getTracks().forEach(track => {
              peerConnection.current.addTrack(track, localMediaStream.current);
            });
          }
        }
      });
    };

    initPeerConnection();

    getLocalMediaStream();

    return () => {
      // 离开页面前销毁mediaStream数据
      localMediaStream.current &&
        localMediaStream.current.getTracks().forEach(track => track.stop());
      remoteMediaStream.current &&
        remoteMediaStream.current.getTracks().forEach(track => track.stop());

      //销毁本地pc
      peerConnection.current && peerConnection.current.close();
    };
  }, []);

  const join = () => {
    if (!roomId || state !== 'disconnect') return;
    signalServer.current.join(roomId);
    setState('waiting');
  };

  const onJoined = ({ roomId, userNum }) => {
    message.success('成功加入房间,当前房间人数为:' + userNum);
    console.log('------ 成功加入房间,当前房间人数为:' + userNum);

    if (userNum === 1) {
      setState('waiting');
    } else {
      setState('canCall');
    }
  };

  const onOtherJoined = data => {
    console.log('------ 有人加入房间了');
    setState('canCall');
  };

  const call = () => {
    if (state !== 'canCall') return;
    // 开始建立连接
    setState('connecting');

    const pc = peerConnection.current;

    // 获取本地sdp(offer)
    pc.createOffer().then(offer => {
      console.log('------ 获取到了本地offer', offer);

      // 绑定本地sdp
      pc.setLocalDescription(offer);

      // 发送本地sdp到远端
      signalServer.current.send({
        type: 'offer',
        value: offer,
      });
    });
  };

  const onMessage = ({ type, value }) => {
    switch (type) {
      case 'offer':
        onGetRemoteOffer(value);
        break;
      case 'answer':
        onGetRemoteAnswer(value);
        break;
      case 'candidate':
        onGetRemoteCandidate(value);
        break;
      default:
        console.log('unknown message');
    }
  };

  const onGetRemoteAnswer = answer => {
    console.log('------ 获取到了远端answer', answer);

    const pc = peerConnection.current;

    // 绑定远端sdp
    pc.setRemoteDescription(answer);
  };

  const onGetRemoteOffer = offer => {
    console.log('------ 获取到了远端offer', offer);
    // 远端发起呼叫,开始建立连接
    setState('connecting');

    const pc = peerConnection.current;

    // 绑定远端sdp
    pc.setRemoteDescription(offer);

    // 创建本地sdp
    pc.createAnswer().then(answer => {
      // 绑定本地sdp
      pc.setLocalDescription(answer);

      console.log('------ 获取到了本地answer', answer);
      // 发送本地sdp到远端
      signalServer.current.send({
        type: 'answer',
        value: answer,
      });
    });
  };

  // 获取到远端的candidate
  const onGetRemoteCandidate = candidate => {
    console.log('------ 获取到了远端candidate', candidate);

    peerConnection.current.addIceCandidate(candidate);
  };

  return (
    <div className="one-on-one">
      <h1>Simple1v1{tip && `-${tip}`}</h1>
      <div className="one-on-one-container">
        <div className="one-on-one-operation">
          <div className="room-selector operation-item">
            <Input
              value={roomId || undefined}
              disabled={state !== 'disconnect'}
              onChange={e => setRoomId(e.target.value)}
              placeholder="请输入房间号"></Input>
            <Button disabled={state !== 'disconnect'} onClick={join} type="primary">
              加入房间
            </Button>
          </div>
          <div className="call-btn operation-item">
            <Button disabled={state !== 'canCall'} onClick={call} type="primary">
              call
            </Button>
          </div>
        </div>
        <div className="videos">
          <div className="local-container">
            <h3>local-video</h3>
            <video autoPlay controls ref={localVideo}></video>
          </div>
          <div className="remote-container">
            <h3>remote-video</h3>
            <video autoPlay controls ref={remoteVideo}></video>
          </div>
        </div>
      </div>
    </div>
  );
};

export default Simple1v1;

Copy the code

Implementation effect