1. The introduction

In a common audio and video session, the transmission of local audio and video data from one end to the other end will go through at least three steps: collection -> coding -> transmission. The flow of data from the collection module to the sending module is called the pipeline of audio and video data. In the next few articles, we will take video data as an example and discuss how WebRTC built this pipeline: how the data is collected, and how it flows from the collection module to the network transmission module step by step, and finally out.


2. The acquisition

As the starting point of the data pipeline, the video acquisition module is responsible for collecting original video frames from the video source and pushing them to the next station of the pipeline: local rendering module for local echo, or coding module for data encoding and compression.


The video source can be the camera, the desktop, window screen capture (remote desktop, video streaming based electronic whiteboard and other applications), and even the video file, picture file on the disk. WebRTC provides a video capture framework based on camera, which is the focus of this paper. Of course, WebRTC also provides a desktop, window screen capture frame, which provides a different interface with the camera-based acquisition interface. The whole video pipeline is built on the basis of camera capture interface, which leads to such a problem: when screen capture data needs to be pushed as video source, adapter mode needs to be used to implement a set of camera-based video capture interface. This has been done in interactive whiteboards based on video streaming, but is not the focus of this article, so it will be covered in another article.


Video capture module is platform related module, MacOS/IOS generally use AVFoundation framework or QuikTime framework, Linux platform generally use V4L2 library, Android generally use Camera1 or Camera2 framework, The Windows platform uses DS (DirectShow) or MF (MediaFoundation). Because WebRTC is a very active project, the code structure is constantly changing, such as the code of April 2019 and the code of VideoCaptureMF, and the comments suggest using the MediaFoundation collection framework for Vista and above. The code associated with the November 2019 MediaFoundation has been removed. For MacOs/IOS, the Android code has been moved to the SDK /objc and SDK/Android directories. This article to modules/video_capture code to do the elaboration, platform independent code in the direct directory, platform related implementation in modules/video_capture/ Windows, modules/video_capture/ Linux directory, As shown in the figure:




2.1 UML related to video collection





The DeviceInfo interface provides device enumeration-related functionality, and instances of platform-related subclasses are provided to VideoCapture as combinations.

  • Enumerate the number of devices to obtain the name of a device.
  • Enumerates all the capabilities supported by a device (VideoCaptureCapability: resolution, maximum frame rate, color space, progressive scan or not)
  • Gets the capability of a device that best matches the capability of an external setting.


VideoCaptureModule virtual base class for the VideoCaptureModule. It defines a set of generic interface functions for video capture:
  • The Start/StopCapture command is used to Start/ end video capture (platform dependent).
  • CaptureStarted determines the running status of Capture (platform dependent).
  • The Register data/DeCaptureDataCallback used/cancellation of the callback module (platform);
  • Set/GetApplyRotation to Set the rotation Angle of the video (platform independent).


The VideoCaptureImpl class is an implementation subclass of VideoCaptureModule. I did three things:
  • Declare the static Ctreate method to create the platform-specific VideoCaptureImpl subclass, VideoCaptureDS on Windows and VideoCaptureV4L2 on Linux. This method is declared in one place and implemented in several places. When compiling the corresponding platform, only the implementation code of the corresponding platform will be loaded.
  • Platform-related interfaces are left to be implemented in platform-related subclasses, mainly starting/ending video capture;
  • Implement platform-independent interfaces: register video data callbacks, apply video rotation related functions. The registration data callback assigns an object implementing the VideoSinkInterface

    interface to the VideoCaptureImpl::_dataCallBack member. When the capture module gets a frame of video data, it can push it out through the OnFrame() method of the object.


2.2 Collect the internal data flow of the module

1. Take VideoCaptureDS as an example. After the platform-related capture module collects a video frame, the platform-related function ProcessCapturedFrame() method processes it. ProcessCapturedFrame () pass video frame directly to VideoCaptureImpl: : IncomingFrame () method

2. VideoCaptureImpl: : IncomingFrame () method for video frame according to the requirements in terms of rotation, and by using libyuv library converts I420 type, then add NTP timestamp to video frames. After the above processing, IncomingFrame () pass video frames further VideoCaptureImpl: : DeliverCapturedFrame ()
3. VideoCaptureImpl: : DeliverCapturedFrame () will call VideoSinkInterface: : OnFrame (), the video frame is passed to the callback object _dataCallBack, namely data next stop, Thus the video frame is pushed out of the acquisition module.


3 Assembly line establishment

As the bottom module, the video capture module needs to cooperate with the upper module to send the collected video data to the upper display and coding module, so as to provide continuous video data for the data pipeline. From the perspective of control flow, the video capture module is created and started by the upper module during the initialization phase, and the upper module stops the video capture and destroys the module at the end. In terms of data flow, the collected video data is passed to the upper module through the callback interface for the next step in the data pipeline.


3.1 VideoCapture->VideoTrack pipeline

Whether the video stream is destined for the local rendering module or the encoder, it first passes through the VideoTrack object. In terms of control flow: a VideoTrack object creation process is VideoCapture->VideoTrack pipeline establishment process:




In terms of data flow: While the direction of video data flow is exactly opposite to the direction of creation:




The related class diagram is as follows:




1. VideoCaptureModule->VideoSource
VideoCaptureModule is composed into the VideoSource object as the source of the data. VideoSource implements the VideoSinkInterface

interface and registers itself in the VideoCaptureModule. Thus the video frame flows from VideoCaptureModule->VideoSource.


VideoSource holds a very important member, the VideoBroadcaster object, whose UML class diagram is shown below.




On the one hand, VideoBroadcaster implements VideoSinkInterface, which becomes a Sink. In this way, the VideoSource will first flow into the internal VideoBroadcaster member object after getting the video frame of the capture module. Rather than coming directly from VideoSource; On the other hand, VideoSource and VideoBroadcaster both implement the VideoSourceInterface interface, external VideoSource exists as a VideoSource, to provide the registration method AddOrUpdateSink(); This method calls VideoBroadcaster’s AddOrUpdateSink() internally, thus registering the data down a VideoSink to VideoBroadcaster, in the member STD ::vector<SinkPair> sinks_. To this, should be easy to think of VideoBroadcaster has both the data flow, also know next station (probably more) data, then VideoBroadcaster: : OnFrame () can be used cycle call next OnFrame method will broadcast video frames. Why is it designed this way? The official WebRTC 1.0 specification states that a video source can be shared by multiple video tracks. In this way, the concept of sharing can be realized.



2. VideoSource->VideoTrackSource
VideoTrackSource doesn’t implement VideoSinkInterface, so essentially video data doesn’t flow into VideoTrackSource, but it combines VideoSource objects, And the VideoSourceInterface interface is implemented, VideoSink added to VideoTrackSource will be added to VideoSource, and then further added to VideoBroadcast. Externally, VideoTrackSource is the video source.


In addition, VideoTrackSource implements the interface related to the video source state and the NotifierInterface related to the status notification, which is used to notify the state of the video source to the higher layer (VideoTrack). As it is not relevant to the discussion of data flow, it is only mentioned here and will not be detailed.


3. VideoTrackSource->VideoTrack
Just like VideoTrackSource, VideoTrack does not implement VideoSinkInterface, so video data does not flow into VideoTrack, but it combines VideoTrackSource, And the VideoSourceInterface interface is indirectly implemented. Sites that want to get video streams from VideoTrack simply implement VideoSinkInterface and register with VideoTrack’s AddOrUpdateSink(). Because the VideoSink goes through VideoTrackSource->VideoSource->VideoBroadcaster, you can eventually get video streams from VideoBroadcaster.


VideoTrack also implements an ObserverInterface interface that serves as an observer to receive a response VideoTrackSource report on the status of the video source.


VideoTrack also implements the VideoTrackInterface interface, which provides an important attribute: ContentHint. This property tells the encoder what to do if the bit rate drops: Lower the frame rate? Lower resolution? For desktop collection applications, we should set this attribute to kDetailed or kText, so that the resolution will not be reduced when the encoder encodes the video stream, and the value of quantization parameter QP will not be set too large.


3.2 VideoTrack to local Rendering

From the previous description, it’s clear how video frames flow to VideoTrack (although they don’t actually flow to the VideoTrack class), and we know how to get video data from VideoTrack:

1) Implement VideoSinkInterface interface

2) Register with VideoTrack’s AddOrUpdateSink(). In fact, this is what native rendering does: either directly use the platform-specific rendering classes provided by WebRTC, which all implement the VideoSinkInterface interface; Or you can implement your own Renderer class and implement the VideoSinkInterface interface to get the video frame in the OnFrame method and render it. When Render is registered with VideoTrack’s AddOrUpdateSink(), it is delivered to the VideoBroadcaster and held by it, from which it receives video frames directly.



A UML class diagram for rendering classes provided in WebRTC:




3.3 VideoTrack to encoder

To explain how video frames in VideoTrack arrive at the encoder, the first problem is to figure out which class represents the encoder in WebRTC so that we can study the flow of video data.


The VideoStreamEncoder class in WebRTC represents a video encoder that receives raw video frames as input and generates encoded bitstreams as output. This class is located in SRC /video/video_stream_encoder. H. The following screenshot shows the class:




With the destination in mind, the next step is to analyze how the video stream flows from VideoTrack to VideoStreamEncoder step by step, and how the pipeline is built.


In terms of data flowData from VideoTrack->VideoStreamEncoder passes through the following objects:




The UML class diagram of these objects and their relationships is shown below: From our previous analysis, we know that to actually get the video frame, this class needs to implement the VideoSinkInterface interface, in which OnFrame() is used to get the video frame from the last station. Looking at the class diagram below, we can see that only VideoStreamEncoder is essentially a VideoSink object. VideoTrack is always passed to VideoStreamEncoder as an object member. Since VideoTrack implements The VideoSourceInterface, VideoStreamEncoder can be set in reverse to VideoTrack. According to the previous conclusion, The VideoStreamEncoder is eventually stored in the VideoBroadcaster, and the VideoBroadcaster passes the video frames directly to the VideoStreamEncoder.




In terms of control flowIf you don’t delved into the details, just look at the outer API of WebRTC by using PeerConnection->AddTrack(); PeerConnection – > CreateOffer (); The three steps of PeerConnection->SetLocalDescription() set up the pipeline. The following part briefly analyzes the internal contribution of these three methods to the establishment of the above video pipeline.


1. AddTrack()
After creating VideoTrack, a VideoRtpSender object is created for each VideoTrack to be sent via the PeerConnection->AddTrack() interface. The VideoTrack becomes a member of the VideoRtpSender. Implement logical video flow VideoTrack->VideoRtpSender flow. In addition, if SDP uses kUnifiedPlan mode, an independent one will be created for each track
RtpTranceiver object, combined with the VideoRtpSender containing the track and added to the PC’s member RtpTranceiver array.


Two important members of the VideoRtpSender object are track_ and media_channel_, which are relevant to the discussion in this article. The VideoTrack and WebRtcVideoChannel objects, respectively, are the last and next stops of the video stream. Executing AddTrack() does not associate the two, just adds VideoTrack to the VideoRtpSender. But eventually the VideoRtpSender->SetSsrc() method is called to complete the binding.
  • VideoRtpSender->SetSsrc()
  • When and where media_channel_ will be created if VideoRtpSender is created in kUnifiedPlan mode and media_channel_ is not created with it.


2. CreateOffer()
The detailed process of the PeerConnection->CreateOffer() method is very complex, it collects local audio and video capabilities and network layer transport capabilities to form the SDP description structure. Although this method is not directly involved in building the video pipeline, it provides the necessary information for the next PeerConnection->SetLocalDescription() operation to complete the video pipeline.


The process of PeerConnection->CreateOffer() is as follows:




There are two functions for special tags in the diagram:

PeerConnection: : GetOptionsForUnifiedPlanOffer () will traverse all RtpTransceiver in the PC, Create a media description object MediaDescriptionOptions for each RtpTransceiver, and in the final generated SDP object, one MediaDescriptionOptions was an M-line. According to the previous analysis, one Track corresponds to an RtpTransceiver, and in essence, one Track in SDP corresponds to an M-line. The above traversal forms all of the media descriptions. MediaDescriptionOptions is stored in the MediaSessionOptions object, which is passed along the way. Finally in MediaSessionDescriptionFactory: : CreateOffer () method is used to complete the SDP created.


Another MediaSessionDescriptionFactory: : CreateOffer (SDP) creation process, will be for each media object, namely each track: audio, video and data to create the corresponding MediaContent. Above on the right shows for video track create VideoContent process, the yellow static methods CreateStreamParamsForNewSenderWithSsrcs () will be generated only for each RtpSender SSRC values. RtpSender->SetSsrc() is not called. The SSRC currently exists only in the SDP message, waiting for SetLocalDescription() to be resolved.


3. SetLocalDescription()
In the successful callback of CreateOffer(), on the one hand, we send the Offer SDP to the peer through signaling. On the other hand, SetLocalDescription() is called for the local set operation.


SetLocalDescription() follows roughly:




As shown above, the SetLocalDescription() process is quite complex, and we focus on the creation and association of key nodes in the video pipeline. The key functions are shown in yellow in the figure above.



Creation of objects in the pipeline:
1) PeerConnection: : UpdateTransceiverChannel () method to check each RtpTranceiver exists in the PC MediaChannel, There is no will call WebRtcVideoEngine: : CreateMediaChannel () create WebRtcVideoChannel object, and assign a value to RtpTranceiver RtpSender and RtpReceiver, This solves the problem that the VideoRtpSender’s media_channel_ member is empty;


2) PeerConnection: : UpdateSessionState () method, the applications of the information in the SDP to step on to create video media channel object WebRtcVideoChannel, Call WebRtcVideoChannel: : AddSendStream () method to create WebRtcVideoSendStream channel, if there are multiple video Track, there will be multiple WebRtcVideoSendStream respectively with the matching. The WebRtcVideoSendStream object is stored in the STD ::map

send_streams_ member of the WebRtcVideoChannel with SSRC as key. When you create WebRtcVideoSendStream, the VideoSendStream constructor will create the VideoSendStream, and the VideoSendStream constructor will create the VideoSendStream
,>
VideoStreamEncoder object. At this point, all the relevant objects have been created.


Establishment of assembly line:
Analyzed before VideoRtpSender – > SetSsrc () method is very important, this method in the PeerConnection: : ApplyLocalDescription finally is called (). The Track is triggered to be passed from VideoRtpSender to WebRtcVideoChannel and then to WebRtcVideoSendStream, becoming a member of WebRtcVideoSendStream source_. Thus the logical VideoRtpSender->WebRtcVideoChannel->WebRtcVideoSendStream pipeline is established;


WebRtcVideoSendStream: : SetVideoSend () method then trigger a call VideoSendStream SetSource () method, based on WebRtcVideoSendStream video source parameter (see previous class diagram, WebRtcVideoSendStream implements VideoSourceInterface) to VideoSourceProxy, a member of VideoStreamEncoder. In this VideoSourceProxy: : SetSource method, called reverse WebRtcVideoSendStream: : AddOrUpdateSink () method will be VideoStreamEncoder as VideoSink (see previous class diagram, VideoStreamEncoder implements VideoSinkInterface) to WebRtcVideoSendStream. Pay attention to, In WebRtcVideoSendStream: : AddOrUpdateSink () will call source_ – > AddOrUpdateSink () further add VideoStreamEncoder to VideoTrack (as described in previous VideoT The rack has been passed to WebRtcVideoSendStream to become a member of WebRtcVideoSendStream. In the logic of the video stream from WebRtcVideoSendStream->VideoSendStream->VideoStreamEncoder this section of pipeline.


At this point, from the sender’s point of view, the entire pipeline from the acquisition to the encoder has been established.


4 summarizes

1. From the perspective of THE API provided by WebRTC, the four steps of CreateVideoTrack(), AddTrack(), CreateOffer() and SetLocalDescription() establish the video pipeline from the beginning to the encoder. Of course, the details are complicated.
2. Although there are many classes involved, essentially a video frame starts from the capture module and does not flow through too many objects to the encoder module. The receiving objects all implement the VideoSinkInterface interface, and the video frames flow in the OnFrame method of these objects. Data in WebRTC always flows from Source to Sink.

                                                                 End     

Author’s brief introduction

Li Yi for good future advanced C/C++ project ⅲ

Recruitment information



Good future technology team is hot recruitment test, background, operation and maintenance, client and other directions of senior development engineer positions, you can scan the following TWO-DIMENSIONAL code or wechat search attention “good future technology” public number, click “technical recruitment” column to learn more details, welcome interested partners to join us!


Maybe you want to see it again



How does Deep Knowledge Tracking help intelligent Education

Best practices for lightweight TV remote control interactive library

The Science Behind “Testing” : Theories and Models in Educational Measurement

Help with technology education | together feel the power of role models

Want to understand the architecture evolution of a remote multi-school platform? Let me tell you!

Design and Implementation of Mobile Show costume Game System (based on Egret+DragonBones animation)

How to implement a page-turning pen plugin

There is no respite from the outbreak of human war