WebRTC SDP details and analysis

WebRTC is the abbreviation of Web real-time Communication, which is a Web implementation of RTC protocol. The project is open source by Google and formulated industry standards with IETF and W3C. WebRTC has gained more and more support from more and more manufacturers in China, and its application prospect has become broader. Therefore, we also set up a column to share WebRTC research work inside Ali Cloud.

This is ali Cloud video cloud WebRTC technology column first article, the author will be WebRTC SDP examples and key attributes of the point of view for you an in-depth analysis of interpretation, which also shared some practical experience of Ali Cloud technical experts, hoping to help or inspire you. Follow-up WebRTC technical column series will continue to introduce WebRTC ICE/DTLS/SRTP/RTCP/TURN detailed explanation and analysis, welcome to pay attention to our public number.

Author: Forget Li, ali Cloud senior technical expert, responsible for Ali Cloud RTC server research and development; Tai Yi, ali Cloud senior development engineer, engaged in Ali Cloud RTC server research and development

Overview

In a narrow sense, WebRTC refers to the browser side. How does the browser side directly exchange data? You can’t do it all by yourself, you have to rely on the server. Generally depends on several servers:

Signaling Signaling servers, which exchange room and conference media information, as well as messages during the conference, use THE SDP protocol for media description, which is the focus of this article.
ICE server can be divided into STUN server, which helps two clients to make a hole to establish P2P connection, and TURN server, which directly forwards if the connection fails. ICE information is called Candidate and can be exchanged via SDP or trickle-down.
For an SFU or MCU server, if more than one person has a meeting, each end will directly send data to the other end, which is called MESH. However, MESH has obvious limitations. SFU forwarding allows the client to transfer only upstream and downstream to other clients, while MCU is more powerful, allowing only upstream and downstream flows.

Note: In addition to transmission, another important feature of WebRTC is security, namely DTLS. Some information of DTLS is transmitted through SDP. There will be related technical articles to introduce DTLS later.

Next, we introduce the SDP protocol.

What’s SDP

The key SDP attribute diagram at the beginning of this article has helped us to get a global view of what SDP looks like. The SDP describes media sessions, network information, security features, and transmission policies. Each SDP attribute in the figure plays a different role in different application scenarios and should not be underestimated.

Next, we further give the official definition of SDP: Session Description Protocol (SDP) is a text-based Session Description Protocol. It is not a transport Protocol and relies on other transport protocols (such as SIP and HTTP) to exchange necessary media information for media negotiation between two Session entities.

The Offer and Answer of WebRTC include SDP. Related RFCS include:

1998, RFC2327
2006, RFC4566

A good WebRTC SDP example analysis

Offer and Answer

WebRTC uses the offer-answer model to exchange SDP, with SDP in Offer and SDP in Answer. For example, Alice and Bob communicate via WebRTC:

// Alice Offer v=0 o= -2397106153131073818 2 IN IP4 127.0.0.1s = -t =0 0 a=group:BUNDLE video a=msid-semantic: WMS gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS m = 9 video/RTP/UDP/TLS SAVPF 96 = 97 c IN IP4 0.0.0.0 a = RTCP: 9 IN IP4 0.0.0.0 a=ice-ufrag:l5KU a=ice-pwd:+Sxmm3PoJUERpeHYL0HW4/T9 a=ice-options:trickle a=fingerprint:sha-256 7C:93:85:40:01:07:91:BE:DA:64:A0:37:7E:61:CB:9D:91:9B:44:F6:C9:AC:3B:37:1C:00:15:4C:5A:B5:67:74 a=setup:actpass a=mid:video a=sendrecv a=rtcp-mux a=rtcp-rsize a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96 a=ssrc-group:FID 2527104241 a=ssrc:2527104241 cname:JPmKBgFHH5YVFyaJ a=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS c7072509-df47-4828-ad03-7d0274585a56 a=ssrc:2527104241 mslabel:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS a=ssrc:2527104241 Label: c7072509-DF47-4828-AD03-7d0274585a56 // Bob Answer V =0 o= -5443219974135798586 2 IN IP4 127.0.0.1s = -t =0 0 a=group:BUNDLE video a=msid-semantic: WMS uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVs m = 9 video/RTP/UDP/TLS SAVPF 96 = 97 c IN IP4 0.0.0.0 a = RTCP: 9 IN IP4 0.0.0.0 a=ice-ufrag:MUZf a=ice-pwd:4QhikLcmGXnCfAzHDB++ZjM5 a=ice-options:trickle a=fingerprint:sha-256 2A:5A:B8:43:66:05:B3:6A:E9:46:36:DF:DF:20:11:6A:F6:11:EA:D9:4E:26:E3:CE:5A:3A:C6:8D:03:49:7B:DE a=setup:active a=mid:video a=sendrecv a=rtcp-mux a=rtcp-rsize a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96 a=ssrc-group:FID 3587783331 a=ssrc:3587783331 cname:INxZnBV2Sty1zlmN a=ssrc:3587783331 msid:uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVs a3b297e7-cdbe-464e-a32c-347465ace055 a=ssrc:3587783331 mslabel:uiZ7cB0hsFDRGgTIMNp6TajUK9dOoHi43HVs a=ssrc:3587783331 label:a3b297e7-cdbe-464e-a32c-347465ace055Copy the code

Remark: Use Chrome, first open WebrtC-internals, then open Alice’s page and click the Share button, then open Bob’s page and click the Share button, and see the above Offer and Answer.

Candidate: After the SDP is exchanged, the SDP is exchanged.

// Alice Candidate candidate: Candidate :1912876010 1 UDP 2122260223 30.2.220.94 52832 TYP host Generation 0 ufrag l5KU network-id 1 network-cost 10 candidate: Candidate :1015535386 1 TCP 1518280447 30.2.220.94 9 TYP host tcpType Active Generation 0 uFRAg l5KU network-id 1 Network-cost 10 // Bob Candidate Candidate :1912876010 1 UDP 2122260223 30.2.220.94 51551 TYP Host Generation 0 UFRAg MUZf network-id 1 network-cost 10Copy the code

Finally, the Candidate pair that Alice and Bob communicate with selects the UDP channel:

The Video message sent by Alice:

Alice received (Bob’s) Video message:

Generally speaking, the stream promoter initiates the Offer first, and the receiver sends the Offer to the Answer. For example, the client pushes the flow to the SFU, the client initiates the Offer push flow, the SFU sends the Answer to the client, the client pushes the flow to the SFU, and the SFU forwards the flow to other clients. Licode and Janus both do this. In this way, if the client needs to pull streams from other clients, it generally needs to use another PeerConnection to receive offers from SFU, generate answers and reply to SFU.

However, it is not necessary for the stream to initiate an Offer. The recipient can also send an Offer and the stream to Answer. For example, an SFU like MediaSoup, the client first gives an Offer to the SFU, the SFU just checks the media features in the Offer, and then the SFU generates the Offer (containing the stream of other clients in the meeting, or no SSRC if there is no one) to the client. The client sends the Answer to the SFU. The advantage of this method is that Reoffer can be used when other clients join in or when the stream changes (such as when the video is closed and the video is opened). In other words, SFU initiates a new Offer uniformly and the client responds. There is only one interaction mode between SFU and the client.

SDP Structure

The SDP description consists of session level and media level. For details about the SDP description, see RFC4566. The SDP description marked with an asterisk (*) is optional. Common contents are as follows:

Session Description (Session level description) V = (Protocol Version) O = (Originator and Session Identifier) s= (Session Name) C =* (connection information -- not required if included in all media) One or more Time descriptions ("t=" and "r=" lines; see below) a=* (zero or more session attribute lines) Zero or more Media descriptions Time description t= (time the Session is active) Media description, if present m= (media name and transport address) c=* (connection information -- optional if included at session level) a=* (zero or more media attribute lines)Copy the code

Compare Alice’s Offer (which only includes video but does not open audio) :

Session description v=0 o= -2397106153131073818 2 IN IP4 127.0.0.1s = -c =IN IP4 0.0.0.0 // Time description t=0 0 // Session Attributes a=group:BUNDLE video a=msid-semantic: WMS gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS / / Media description m = 9 video/RTP/UDP/TLS SAVPF 96 = 97 c IN IP4 0.0.0.0 A = RTCP :9 IN IP4 0.0.0.0 a= ICE-UFRAg :l5KU a= ICE-PWd :+Sxmm3PoJUERpeHYL0HW4/T9 a= ICE-options: Trickle-A =fingerprint: SHA-256 7C:93:85:40:01:07:91:BE:DA:64:A0:37:7E:61:CB:9D:91:9B:44:F6:C9:AC:3B:37:1C:00:15:4C:5A:B5:67:74 a=setup:actpass a=mid:video a=sendrecv a=rtcp-mux a=rtcp-rsize a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96 a=ssrc-group:FID 2527104241 a=ssrc:2527104241 cname:JPmKBgFHH5YVFyaJ a=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS c7072509-df47-4828-ad03-7d0274585a56 a=ssrc:2527104241 mslabel:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS a=ssrc:2527104241 label:c7072509-df47-4828-ad03-7d0274585a56Copy the code

The SDP Line is sequentially related, such as a= rTPmap :96, and then all its related Settings, until the next Line is a= rTPMap or some other property.

SDP Line does not have a unified Schema description, that is, no fixed rule can parse all lines. SDP Grammer only describes SDP related attributes, and the specific expression of each attribute needs to be defined according to the attributes defined in RFC 4566, for example:

a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding parameters>]
Copy the code

When SDP is parsed, each SDP Line starts with key=… RFC4566: key = a; key = A

a=<attribute>
a=<attribute>:<value>
Copy the code

For example, if c=IN IP4 0.0.0.0, the key is c. For example, a=rtcp-mux, the key is A, the attribute is rtcp-mux, and there is no value. For example, a= rTPmap :96 VP8/90000, key is A, attribute is rTPmap, and value is 96 VP8/90000.

Sometimes it’s not a colon (:) but

:

. In fact, a value can also have a colon inside it, for example:

a=fingerprint:sha-256 7C:93:85:40:01:07:91:BE
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS
Copy the code

Session Level Field

The SDP description fields at the session level are V, O, S, C, B, and T.

v(version)

SDP version. The value is fixed to 0.

o(origin)

Represents the initiator of the session.

s(session name)

The name of the session. There can be only one S description in each SDP. The value cannot be empty.

c(connection data)

It carries the connection information of the session, which is actually the IP address. The session level description of the SDP can contain this field, and the description of each media level can also contain this field. If both the session level and media level have C lines, the C line of the media level prevails. Because WebRTC uses ICE candidates to exchange address information, c line is not used, but this does not mean that C line is useless. In SIP video conferencing scenarios, C line is essential, and this field will be covered again at the end of this article.

b (bandwidth)

Represents the recommended bandwidth for session or media use.

t(timing)

Specifies the start and end times of the session. If both start and end times are 0, that means the session is permanent.

Refer to RFC 4566 for a more detailed description of session level fields.

Media Codecs

After the session-level description is complete, it is followed by a description of zero to multiple media levels, such as:

// Session Description
v=0
......

// Audio Media Description
m=audio 9 UDP/TLS/RTP/SAVPF 111
......

// Video Media Description
m=video 9 UDP/TLS/RTP/SAVPF 96 97
......
Copy the code

This SDP describes one audio and one video, and its format refers to RFC4566:

m=<media> <port> <proto> <fmt> ...
Copy the code

The numbers 111 and 96 and 97 are FMT, representing the Media Codec of audio and video respectively, followed by rTPMap, RTCP-FB and FMTP for further detailed description.

m=audio 9 UDP/TLS/RTP/SAVPF 111 a=mid:audio a=rtpmap:111 opus/48000/2 a=rtcp-fb:111 transport-cc a=fmtp:111 minptime=10; useinbandfec=1 m=video 9 UDP/TLS/RTP/SAVPF 96 97 a=mid:video a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96Copy the code

Remark: Of course, M line includes not only audio and video, but also application(BFCP), text and other media types.

The Remark: A =mid attribute can be considered a unique ID for each M description. For example, if a=mid:audio, then audio is the ID described by M. Sometimes the value of the mid attribute can also be represented by a number. For example, a=mid:0, then 0 is also the ID described by M. The mid value is used in combination with the BUNDLE policy of the grouping transport attribute. For example, a=group:BUNDLE Audio Video indicates that M whose MID is audio and video will be multiplexed.

Remark: The number 9 of M line represents the transmission port of the media type. In RTC scenarios, the ICE candidate’s address is used for data transmission, so the port of M line is not used. However, in the SIP scenario, the port of the M line is important, where the port represents the RTP port and must be even. Combined with the IP address in THE C line of SDP session level description, we can know the transport address of this media stream of SIP.

Remark: RTX indicates retransmission, such as video 97, apt 96. In other words, if using 97 encoding format, it is 96(VP8) based on the retransmission function.

The number of media streams is specified by the SSRC:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=ssrc:2582129002 cname:8Y1pmIKBijmWeALu
a=ssrc:2582129002 msid:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H bab38910-40cd-4581-9a20-e3f558abb397
a=ssrc:2582129002 mslabel:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H
a=ssrc:2582129002 label:bab38910-40cd-4581-9a20-e3f558abb397

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=ssrc:565530905 cname:8Y1pmIKBijmWeALu
a=ssrc:565530905 msid:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H 2c533cfe-b6bf-41a8-93f0-1ca031436702
a=ssrc:565530905 mslabel:34fD1qguf2v79436S1khLkth8Nb6LbedcF9H
a=ssrc:565530905 label:2c533cfe-b6bf-41a8-93f0-1ca031436702
Copy the code

Remark: THE SSRC contains the media streams to be sent. In addition, the SSRC can be included in both Offer and Answer. For example, when a client communicates with MediaSoup, MediaSoup always sends an Offer to the client. MediaSoup’s Offer contains the SSRC of media streams that MediaSoup wants to send (forward streams from other clients to the client). At the same time, the Answer of the client also contains the SSRC streams to be pushed, and their type is SendRecv.

Remark: mSID corresponds to netstream. id, that is, represents different media sources. These SSRC can be different media sources.

How to determine the final code? The other party will give it in the Answer. For example, the Offer above gives multiple codes, and one of them will be selected in the Answer:

m=audio 9 UDP/TLS/RTP/SAVPF 111 m=video 9 UDP/TLS/RTP/SAVPF 100 102 127 125 108 124 a=rtpmap:100 H264/90000 a=rtpmap:102  H264/90000 a=rtpmap:127 H264/90000 a=rtpmap:125 H264/90000 a=rtpmap:108 red/90000 a=rtpmap:124 ulpfec/90000Copy the code

Although the Video codes are 100 to 125, they are ALL H.264, while 108 and 124 are FEC, based on H.264.

PlanB and UnifiedPlan

The MediaCodecs above does not specify how to specify multiple streams. In fact, both Audio and Video have multiple SSRC’s, each of which may have the same or different encoding. For example, for Internet video conferencing, the coding may be H.264 when connected with mobile terminals, but there may be other coding when connected with other terminals.

If the SSRC codes are not the same, then putting these SSRC in the same M description is problematic, which is the crux of the PlanB and UnifiedPlan. For PlanB, there is only one M(audio) and one M(video), and their encoding should be the same. When multiple media streams are present, they should be distinguished according to the SSRC. UnifiedPlan can have multiple M(audio) and M(video), each stream has its own M description, so it can support different encoding.

PlanB and UnifiedPlan are actually two different SDP negotiation modes of WebRTC in multi media Source scenario. If the concepts of Stream and Track are introduced, then a Stream may contain AudioTrack and VideoTrack. When there are multiple streams, there will be more tracks. If each Track uniquely corresponds to its own M description, Then this is a UnifiedPlan. If each M line describes multiple tracks (Track id), this is Plan B.

Note: Plan B and UnifiedPlan formats are compatible when there is only one audio stream and one video stream.

Remark: Chrome supports PlanB earlier, and the latest version supports UnifiedPlan. Refer to Need to Implement WebRTC “Unified Plan” for Multistream.

PlanB refer to the following figure:

UnifiedPlan refer to the following figure:

Candidate

A Candidate is a Candidate for transmission. The client generates multiple candidates, such as host, relay, UDP and TCP, as shown in the following figure:

SdpMid: audio, sdpMLineIndex: 0, candidate:2213672593 1 UDP 2122260223 30.2.228.19 51068 Video, sdpMLineIndex: 1, candidate:2213672593 1 UDP 2122260223 30.2.228.19 55061 TYp host sdpMid: audio, sdpMLineIndex: audio 0, candidate:3446803041 1 TCP 1518280447 30.2.228.19 9 TYP host sdpMid: video, sdpMLineIndex: 1, candidate:3446803041 1 TCP 1518280447 30.2.228.19 9 TYP host sdpMid: video, sdpMLineIndex: 1, candidate:150963819 1 UDP 41885439 182.92.80.26 54400 TYp relay raddr 42.120.74.91 rport 37714 sdpMid: Audio, sdpMLineIndex: 0, candidate:150963819 1 UDP 41885439 182.92.80.26 59241 TYp relay raddr 42.120.74.91 rport 49618Copy the code

Remark: We removed the following attributes, such as generation 0 uFRAg KCE9 network-id 1 network-cost 10, which belong to Candidate descriptions and are related to connectivity check.

The client generates six Candidates, three Audio and three Video Candidates, two TCP and four UDP Candidates, four hosts and two relay Candidates. Of course, there will also be many Candidates on the other side. Next, their own Candidates will match with each other’s ICE Connectivity Checks, forming the CandidatePair. Candidate also comes with network attributes, such as network-cost used in ICE Connectivity Checks.

Remark: About the types of candidates, as well as SRFLX and PRFLX, the definition and differentiation of these two types of candidates will be introduced in the technical article related to ICE later.

We will give a detailed analysis of ICE Connectivity Checks later, involving STUN protocol. Ice-related SDP information is summarized below.

SDP and Candidate are exchanged through signaling. If the Candidate is only given by relay, for example:

sdpMid: audio, sdpMLineIndex: Candidate :150963819 1 UDP 41885439 182.92.80.26 51542 TYp relay Raddr 42.120.74.91 rport 56380Copy the code

In this case, the last connected CandidatePair is Relay to Relay, as shown below:

From this figure, you can see the bit rate, number of packets, RTT and packet loss rate of the transmission channel.

In fact, since our client has a host Candidate, it will try to connect to the relay directly using the host Candidate:

sdpMid: audio, sdpMLineIndex: Candidate :2213672593 1 UDP 2122260223 30.2.228.19 51068 TYP host Statistics CONN-audio-1-1 googActiveConnection FalseCopy the code

Of course, CandidatePair is unavailable because there is no connection.

Remark: WebRTC is capable of switching between multiple candidates, which will be analyzed again in ICE Connectivity Checks.

The Candidates themselves generate two Relay Candidates, one for audio and the other for video. This is where the BUNDLE below comes in.

Bundle and RTCP-MUX

During transmission, media channels can be multiplexed, one is audio and video multiplexing, the other is RTCP and RTP multiplexing.

RTCP and RTP multiplexing, indicating that the Sender uses a single transport channel (single port) to send RTP and RTCP:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126 a=rtcp-mux m=video 9 UDP/TLS/RTP/SAVPF 96 97 98  99 100 101 102 123 127 122 125 107 108 109 124 a=rtcp-muxCopy the code

At this point, the Receiver must be ready to receive RTCP data on the RTP port and must reserve some resources, such as RTCP bandwidth.

In the end, only one Candidate will be used for audio and video multiplexing, such as the client’s own SDP Offer and two relay Candidates:

a=group:BUNDLE audio video

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=mid:audio

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=mid:video
Copy the code

sdpMid: video, sdpMLineIndex: 1, candidate:150963819 1 udp 41885439 182.92.80.26 54400 typ relay raddr 42.120.74.91 rport 37714
sdpMid: audio, sdpMLineIndex: 0, candidate:150963819 1 udp 41885439 182.92.80.26 59241 typ relay raddr 42.120.74.91 rport 49618
Copy the code

This means that eventually audio and video may have separate candidates, but if the BUNDLE is the same, then only one Candidate will be used. For example, if the Answer is:

a=group:BUNDLE audio video

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=mid:audio

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=mid:video
Copy the code

sdpMid: audio, sdpMLineIndex: Candidate :150963819 1 UDP 41885439 182.92.80.26 51542 TYp relay Raddr 42.120.74.91 rport 56380Copy the code

In the end they only transmit with one Candidate. As shown below:

Rtcp-mux multiplexes RTP and RTCP to a single port for transmission, which simplifies NAT traversal, and the BUNDLE multiplexes multiple media streams to the same port for transmission. This not only simplifies candidate harvesting and other ICE-related SDP attributes, but also further simplifies NAT traversal.

Rtcp-mux is an important SDP attribute related to RTC transport, and its SDP negotiation principles are as follows:

If the Offer carries the RTCP-MUx attribute, and the Answer side wants to reuse RTP and RTCP to a single port, then the Answer must also carry this attribute.
If the Offer does not carry the RTCP-MUx attribute, then the Answer must not carry the RTCP-MUx attribute, and the Answer party forbids RTP and RTCP to reuse a single port.
Negotiation and use of RTCP-MUX must be bidirectional.

Let me give you an example. If a client subscribing to a stream does not carry the RTCP-MUx attribute, the server will assume that the client does not support RTCP-MUx and will not go through the PROCESS of RTCP reuse. Instead, the server creates two transport channels, RTP and RTCP, respectively. Only when the ICE and DTLS of the two channels are successful, the subscribed transport channel is considered to be established successfully, and the traffic is sent to the client.

Imagine if the rTCP-MUx attribute is missing from the Offer because of your negligence, you will never get the server Ready. Therefore, THE SDP seems to be just some text, very simple, but only in the actual practice of the project, encountered several pits, can we more deeply understand the meaning of SDP attributes and how these attributes play a role in the RTC scene.

Remark: For more details on rTCP-MUX negotiation, refer to RFC 8035.

Remark: For details about how to distinguish RTP and RTCP by header fields in the RTCP-MUX scenario, see RFC 5761.

ICE Connectivity

Here we only explain the information related to SDP and ICE Connectivity Checks, and the specific process will be analyzed separately in other articles.

Information about THE SDP and ICE includes:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126 a=ice-ufrag:kce9 a=ice-pwd:M31WxfrwmrFvPws4+tPdbsCE a=ice-options:trickle m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122  125 107 108 109 124 a=ice-ufrag:kce9 a=ice-pwd:M31WxfrwmrFvPws4+tPdbsCE a=ice-options:trickleCopy the code

Ufrag and PWD are the user names and passwords used by the ICE short-term authentication algorithm. Trickle indicates that SDP does not contain candidate information. Candidates are exchanged separately through signaling, which allows Connectivity checks and candidate harvesting. Increase the speed of session establishment.

DTLS

Here we only describe the information about DTLS in the SDP. The specific DTLS handshake process will be analyzed separately in the technical articles related to DTLS.

Information about the SDP and DTLS includes:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126B0:A2:B3:AB:0B:A3:44:22:B1:C8:69:52:ED:04:E8:5A:A4:C3:7A:A6:55:F3:BA:76:62:26:4B:F7:9F:DD:F1:BD
a=setup:actpass

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
a=fingerprint:sha-256 B0:A2:B3:AB:0B:A3:44:22:B1:C8:69:52:ED:04:E8:5A:A4:C3:7A:A6:55:F3:BA:76:62:26:4B:F7:9F:DD:F1:BD
a=setup:actpass
Copy the code

Fingerprint indicates the signature of the Certificate in the DTLS process, preventing the client and server certificates from being tampered with.

Setup refers to the DTLS role, that is, who is DTLS Client(active), who is DTLS Server(passive), if you can do both then you are actPass. Here we are actpass, so it is up to the other party to determine the final DTLS role in the Answer:

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
a=fingerprint:sha-256 B1:FD:D6:2D:94:4E:33:A1:8C:9D:EF:ED:EB:AC:CC:2D:E2:37:15:9B:24:8C:BF:F2:7D:6A:B3:81:23:AA:13:54
a=setup:active
Copy the code

If the peer is an active DTLS Client, the peer can only be a DTLS Server. The peer initiates a DTLS ClientHello to start the DTLS process.

Stream Direction

There are four directions for media streams, namely sendOnly, RecvOnly, SendRecV, and Inactive, which can appear in either the session-level description or media-level description.

Sendonly indicates that only data is sent. For example, if a client pushes a stream to SFU, it will carry the senonly attribute in its Offer(or Answer)
Revonly accepts only data. For example, if a client subscribes to a stream from SFU, it will carry the RECvOnly attribute in its Offer(or Answer)
Sendrecv indicates two-way transmission. For example, when a client joins a video conference and wants to publish its own stream and subscribe to others’ stream, it needs to carry the SendRecv attribute in its Offer(or Answer)
Inactive indicates that data cannot be sent. For example, if the moderator temporarily disables user A’s voice in an RTP-based video conference, user A’s media-level description of audio shall carry inactive property, indicating that audio data cannot be sent.

NOTE: RFC 4566: SenOnly and RecvOnly properties apply only to media, not media control-related protocols. For example, in rTP-based media sessions, RTCP packets are still sent even in RecVOnly mode, and RTCP packets are still received and processed normally even in Senonly mode.

The four attributes of the media flow direction are very important. Check them carefully when assembling the SDP to ensure the correctness of the media flow direction.

For example, a client subscribes to a stream from a server. If the attribute carried by the Offer of the client is not recvOnly but sendonly, even though it is indeed the semantic of subscription at the signaling level, because some servers are very comprehensive and strict in the verification of all SDP attributes (it should be), in this scenario, The server will not send the media stream to the client, and the Answer the server replies to may not carry the SSRC at all.

RTCP Feedback

Next, let’s talk about the RTCP-fb media-level SDP property, which tells us what RTCP messages the media session can respond to, and is an important SDP property related to QoS.

m=video 9 UDP/TLS/RTP/SAVPF 96
a=mid:video
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack pli
Copy the code

As shown in the SDP, this is the M description of a video, VP8 code, payload Type 96. The last three RTCP-FB attributes indicate TWCC support for network congestion control for the 96 media Codec; In ARQ support NACK processing, can retransmit lost RTP packets; Support FIR and PLI processing in key frames, have the ability to send key frames.

When doing SIP, I encountered a pit: after sending PLI request to a certain type of SIP device, I did not receive the key frame. After some trouble, I finally found that the rTCP-FB description of this device is as follows:

m=video 16402 RTP/AVP 34 a=rtpmap:34 H263/90000 a=fmtp:34 CIF4=1; CIF=1; QCIF=1; SQCIF=1 a=sendrecv a=rtcp-fb:* ccm tmmbr a=rtcp-fb:* ccm firCopy the code

That is to say, this device only supports FIR requests, and does not have the ability to process PLI requests. Here I also want to emphasize that for some very professional and rigorous systems or equipment, SDP fully reflects the capabilities they have, but also allows us to discover the capabilities they do not have. Every attribute of SDP has its existence meaning and cannot be ignored.

Note: rTCP-fb cannot be used for session level description, only for media level description, and the proto field of its M description must specify AVPF.

Note: This format exists, a= RTCP-FB :* CCM FIR, the asterisk is a wildcard, indicating that all types of media codec described by this M support FIR processing and keyframe feedback.

Compare with SIP SDP

In the RTC and SIP scenarios, SDP descriptions are different at the transport, media, and signaling levels.

Transmission Level

Connection process. In the RTC scenario, the process for establishing audio and video media streams is generally ICE + DTLS. In the SIP scenario, this process is not available. Therefore, SDP attributes related to ICE/DTLS, such as UFRAG, PWD, setup, and Fingerprint, are not available.
Port overcommitment. In THE RTC scenario, audio and video streams and RTP/RTCP are multiplexed through a single port. SSRC is used to distinguish each stream, and RTP/RTCP is distinguished by the header field value of data packets. In the SIP scenario, ports are not multiplexed, so there is no RTCP-MUx attribute. There is no grouping attribute, such as BUNDLE. RTP and RTCP are independent ports for streaming and RTP/RTCP. There is no SSRC attribute.
Link detection. In RTC scenarios, THE STUN detection link of ICE is used to discover the peer egress address after NAT mapping, which is called SRFLX. In SIP scenarios, you need to discover the peer address to obtain the SIP egress address after NAT mapping.
Address information. In the RTC scenario, the SDP candidate exchanges the peer IP address. In the SIP scenario, the C line IP address and the M line port exchange the peer IP address.

/ / RTC scenario
a=candidate:1 1 udp 2013266431 30.27136.138. 14306 typ host

/ / SIP scenario
c=IN IP4 30.41. 5131.
m=audio 2352 RTP/AVP 107 108 114 104 105 9 18 8 0 101 123
m=video 2374 RTP/AVP 97 126 96 34 123
Copy the code

Media Level

Screen sharing. In the SIP scenario, BFCP is used to negotiate screen sharing and the a=content attribute is used to distinguish main from shared streams. In the RTC scenario, external/service signaling is used to negotiate screen sharing. The SDP descriptions of the mainstream and shared streams are the same.
Media Codec. At present, Opus + H.264/VP8 is the common audio and video encoding in RTC scenario. In SIP scenario, many SIP devices do not support Opus for audio encoding, but use older audio encoding, such as G722, PCMA and PCMU. For video encoding, H.264 is generally supported, but VP8 is not.

Signaling Level

SDP exchange. Both are Offer/Answer models. In RTC scenarios, SDP is mainly exchanged through HTTP/TCP. Generally, SDP information is carried in HTTP body. In the SIP scenario, SDP can be exchanged over UDP, TCP, or TLS, and SDP information is carried in the INVITE and 200 OK packets.

Summary

In fact, the protocol format of SDP is very simple. The difficulty lies in the complex attributes and their meanings extended in different application scenarios (such as traditional SIP video conferencing or RTC scenarios). These SDP attributes are scattered in numerous RFC and drafts. Not a certain amount of effort is difficult to achieve a comprehensive understanding and mastery (PS: whenever said here, the heart is always ten thousand horses racing, WebRTC RFC is too much and related to each other and reference each other, after watching these RFC to prepare 0.2 degree vision loss).

In the next post, we’ll focus on WebRTC ICE, including connectivity detection, state switching, Trickle and Nomination. Thanks for reading.

Ali Cloud video cloud technology public account to share video cloud industry and technology trends, to create “new content”, “new interaction”.