This article is participating in Python Theme Month. See the link to the event for more details

WebRTC

WebRTC, which stands for Web Real-time Communication, is an API that enables Web browsers to conduct real-time voice or video conversations. Compared with the RTMP protocol commonly used in live broadcasting, WebRTC has a very low delay, and integrates the implementation of a large number of terminal multimedia problems and transmission problems, including audio and video codec, synchronization, bandwidth prediction, QoS, AEC, etc. Therefore, P2P real-time voice calls can be easily implemented using webrTC-enabled devices and browsers. To realize the P2P real-time voice call function, the following questions need to be considered:

  1. How do two ends of a call find each other and establish a call connection?
  2. How do I obtain video and audio from a local device?
  3. How do I transfer local video and voice messages to a peer?
  4. If devices on both ends of a call are different, ensure that the video and voice sent by one end can be correctly understood by the other end.

For the above problems, WebRTC gives solutions in turn

Establish a connection

The process of WebRTC connection establishment mainly relies on signaling service and TURN/STUN service

Signaling service

Since the two parties at the beginning of the call know virtually nothing about the other, a “mediator” is needed to forward the message, and the signaling service acts as this “mediator.” The two ends that want to communicate with each other can register with the signaling service after startup and can generally communicate with the signaling service by establishing a WebSocket with the signaling service.

WebRTC does not specify the communication protocol between the client and the signaling service. In addition to WebSocket, other methods such as socket. IO can also be used

When the client connects to the signaling service, it can tell the signaling service who it wants to talk to, and the signaling service will look up the corresponding user in its own registration list. If not, it will reject the client’s request, and otherwise, it will forward the requested message to the corresponding user.The request message contains the Candidate information needed to establish an RTC connection, which stores the IP address of client A. After B knows A’s IP address, it can send A connection request to A.

STUN service

As mentioned earlier, A sends A request message containing its OWN IP to the signaling service, which forwards the message to B. So how does A know its IP? That’s where the STUN service comes in. After A client requests the STUN service, the STUN service returns the internal and public IP addresses of USER A to the client. The STUN service doesn’t have to be built. Google offers a free STUN servicestun.l.google.com:19302For client use, openWebRTC samples Trickle ICECan experience.Click on “Gather Candidates “to get local Candidate information. Component Type indicates that the Protocol Address corresponding to host is an Intranet Address, and Component Type indicates that the Protocol Address corresponding to SRFLX is a public Address. After client A sends its internal and public IP addresses to client B, client B can use the two IP addresses to establish A connection.

TURN the service

Due to the existence of the firewall, the request that B attempts to send to A may be blocked by the firewall. The actual situation depends on the TYPE of NAT. For details, see the following: Firewall Traversal Technology. If both A and B are symmetric NAT or port-restricted NAT, the RTC connection cannot be directly established between AB and A needs A TURN service for forwarding. There are many mature TURN services on GitHub, such as Coturn /coturn, Pion/TURN, that can be deployed and used directly. In addition to forwarding data, the TURN service itself has the function of STUN, which acts as a superset of STUN, and returns a Candidate of the relay type in addition to host and SRFLX candidates. The client gets these candidates and tries them in the order of host, SRFLX, and relay. If a client tries to establish an RTC connection using a Candidate of a relay, then the representative RTC connection relies on forwarding from the TURN service.

SDP consultation

In the process of establishing a connection, you need to know not only the address of the other party, but also the audio and video protocols of the other party. Communication parties communicate audio and video information through SDP negotiation. The SDP contains information about the audio format, video format, and Candidate. For details, see Session_Description_Protocol. The process of SDP negotiation is mainly to see what format the other party supports, which determines the format to be used when sending audio and video to the other party. Data transmission during SDP negotiation depends on the forwarding of signaling services.

The following figure shows the process of establishing the connection

After the above metadata information is communicated, the two parties can finally begin a formal communication.

Get local audio and video

The browser

The current mainstream browsers support through MediaDevices. GetUserMedia () interface to get camera and microphone permissions:

var promise = navigator.mediaDevices.getUserMedia(constraints);
Copy the code

The resolution and frame rate of the video can be specified with the constraints parameter. For example:

navigator.mediaDevices.getUserMedia({
  audio: true.video: { width: 1280.height: 720}})Copy the code

Said for audio and video, and video of a resolution of 1280 x720, detailed interface can view MediaDevices/getUserMedia. Note that most browsers restrict access to cameras and microphones only over HTTPS connections.

The operating system

If you do not use a browser to obtain the permissions of the camera and microphone, you can also use the interface provided by the operating system to obtain the permissions. The Python library AIORTC, described later, encapsulates the interface so that the audio and video streams can be retrieved directly after determining the operating system type:

import platform
from aiortc.contrib.media import MediaRelay

if platform.system() == "Darwin":
    webcam = MediaPlayer(
        "default:none".format="avfoundation", options=options
    )
elif platform.system() == "Windows":
    webcam = MediaPlayer(
        "video=Integrated Camera".format="dshow", options=options
    )
else:
    webcam = MediaPlayer("/dev/video0".format="v4l2", options=options)
Copy the code

aiortc

Currently, most browsers support WebRTC, but if you want the browser to communicate with the server using WebRTC, you need the server to support WebRTC as well. Aiortc is a Python implementation of WebRTC, which is based on Asyncio and takes full advantage of Python coroutines. Project address: github.com/aiortc/aior…

demo

The demo logic is as follows:

  1. The client uses the browser to obtain the video and sends the video to the server implemented by AIORTC
  2. The server replaces the face in the video with Daniel Wu 😁 and returns it to the client
  3. The client displays the new video obtained from the server

Demo complete code 👉🏻 github.com/tsonglew/ai…

Signaling service implementation

The signaling service forwards the data at both ends of the communication. For convenience, the browser in Demo sends data to the signaling service through HTTP requests instead of using WebSocket connections. The signaling service and the WebRTC server are directly run in the same process. When the signaling service receives the request from the client, it sends the data to the server through shared memory. The Demo messaging service functions as a Web server and provides static files for the front end.

Providing static files

Provide home page resources

async def index(request):
    content = open(os.path.join(ROOT, "index.html"), "r").read()
    return web.Response(content_type="text/html", text=content)
Copy the code

Provide js files

async def javascript(request):
    content = open(os.path.join(ROOT, "client.js"), "r").read()
    return web.Response(content_type="application/javascript", text=content)
Copy the code

Provision of signaling Services

Here the signaling service receives the SDP from the client using an HTTP request, forwards the SDP to the server, and returns the SDP from the server to the client

pcs = set(a)async def offer(request) :// Receive client request params =awaitSDP offer = RTCSessionDescription(SDP =params[);"sdp"].type=params["type"]) // Create the server connection object PC = RTCPeerConnection() // Save the server connection object to the global variable PCS PCS. Add (PC) // Server logicawaitServer (PC) // Returns the SDP of the server to the clientreturn web.Response(
        content_type="application/json",
        text=json.dumps(
            {"sdp": pc.localDescription.sdp, "type": pc.localDescription.type}),)Copy the code

Client implementation

The client contains the following parts:

  1. WebRTC negotiation process
  2. Get the camera and send the video to the server
  3. Receive the new video from the server and display it on the page

Initialize the WebRTC connection

function start () {
  // WebRTC connection parameter, using Google STUN service
	var config = {
		sdpSemantics: 'unified-plan'.iceServers: [{ urls: ['stun:stun.l.google.com:19302']]}};// Create a WebRTC connection object
	pc = new RTCPeerConnection(config);
  
  // Send the local video stream to the server over the RTC connection
	localVideo.srcObject.getVideoTracks().forEach(track= > {
		pc.addTrack(track);
	});
  
  // Bind serverVideo to serverVideo
	pc.addEventListener('track'.function (evt) {
		if (evt.track.kind == 'video') {
			document.querySelector('video#serverVideo').srcObject = evt.streams[0]; }}); }Copy the code

WebRTC negotiation process

function negotiate () {
  // Create the local SDP on the client
	return pc.createOffer().then(function (offer) {
    // Record the local SDP
		return pc.setLocalDescription(offer);
	}).then(function () {
		var offer = pc.localDescription;
    // Send the local SDP to the signaling service
		return fetch('/offer', {
			body: JSON.stringify({
				sdp: offer.sdp,
				type: offer.type,
			}),
			headers: {
				'Content-Type': 'application/json'
			},
			method: 'POST'
		});
	}).then(function (response) {
    // Receive and resolve the SERVER SDP
		return response.json();
	}).then(function (answer) {
    // Record the SDP of the server
		returnpc.setRemoteDescription(answer); })}Copy the code

Obtaining a Local Camera

navigator.mediaDevices.getUserMedia({
	video: true
}).then(stream= > {
	// Bind the locally obtained video stream to the localVideo object
  localVideo.srcObject = stream;
	localVideo.addEventListener('loadedmetadata'.() = > {
		localVideo.play();
	});
});
Copy the code

Server-side implementation

The server processes the following logic:

  1. Handle WebRTC negotiation process
  2. Receive the video from the client
  3. Use the Cascade Classifier provided by OpenCV to locate faces
  4. Use to replace faces in videos
  5. Send the replaced video back to the client

Handle WebRTC negotiation process

async def server(pc) :
    # Monitor RTC connection status
    @pc.on("connectionstatechange")
    async def on_connectionstatechange() :
        print("Connection state is %s" % pc.connectionState)
        # Close the RTC connection when the RTC connection is disconnected
        if pc.connectionState == "failed":
            await pc.close()
            pcs.discard(pc)

    # Listen for video streams from the client
    @pc.on("track")
    def on_track(track) :
        print("======= received track: ", track)
        if track.kind == "video":
            # Face replacement for the video stream
            t = FaceSwapper(track)
            # Bind the replaced video stream
            pc.addTrack(t)
            
    Record client SDP
    await pc.setRemoteDescription(offer)
    # Generate local SDP
    answer = await pc.createAnswer()
    Record the local SDP
    await pc.setLocalDescription(answer)
Copy the code

Replace the face

FaceSwapper inherits AiORTC’s VideoStreamTrack. Aiortc calls FaceSwapper’s recv() method to get a video frame and sends it to the client over the RTC connection. Here we first initialize a face detector, self.face_detector, using the XML provided by OpenCV, and prepare the image, self.face, to replace the face. Self. Track is used to store the original video stream.

class FaceSwapper(VideoStreamTrack) :
    kind = "video"

    def __init__(self, track) :
        super().__init__()
        self.track = track
        self.face_detector = cv2.CascadeClassifier("./haarcascade_frontalface_alt.xml")
        self.face = cv2.imread("./face.png")...Copy the code

The aiORTC layer will call recv() in a loop and send the obtained video frame to the client. Self.next_timestamp () is used to control the frame rate and time parameters for generating video frames. In the process of generating the returned video frame, a frame is read from the original video stream, the face detector is used to detect the position of the face in the video frame, and then the corresponding returned picture is replaced with the prepared picture, and finally the generated video frame is returned.

class FaceSwapper(VideoStreamTrack) :.async def recv(self) :
        # Generate the time parameter corresponding to the video frame
        timestamp, video_timestamp_base = await self.next_timestamp()
        Read a frame from the original video stream
        frame = await self.track.recv()
        
        # Convert video frames from BGR24 to NUMPY array for later processing
        frame = frame.to_ndarray(format="bgr24")
        # Detect face location
        face_zones = self.face_detector.detectMultiScale(
            cv2.cvtColor(frame, code=cv2.COLOR_BGR2GRAY)
        )
        # Replace the corresponding position with the prepared image
        for x, y, w, h in face_zones:
            Resize the image so that it fills the area of the face in the image before replacing
            face = cv2.resize(self.face, dsize=(w, h))
            # Perform the replacement procedure
            frame[y : y + h, x : x + w] = face
        # Convert the modified Numpy array back to video frames
        frame = VideoFrame.from_ndarray(frame, format="bgr24")
        
        # Fill the video frame parameters
        frame.pts = timestamp
        frame.time_base = video_timestamp_base
        
        # Return the video frame
        return frame
Copy the code

The final result

reference

  • Firewall crossing technology
  • Python implements STUN+TURN+P2P chat
  • Session_Description_Protocol
  • WebRTC_API/Connectivity
  • Cascade Classifier