This series of articles will sort out various streaming media transmission protocols, including RTP/RTCP and RTMP, hoping to give some inspiration to developers in the field of streaming media by sorting out the design details of the protocols in depth.

Author: Yi Shu Review: Tai Yi

introduce

RTP, or real-time Transport Protocol, provides end-to-end transport services for real-time transmission of interactive audio and video. These include payload type confirmation, sequence encoding, timestamp and transmission monitoring functions. General applications are based on UDP protocol, to use RTP multiplexing technology and check and service. However, RTP can also be used with other suitable protocols, and if the underlying network supports multiplexing, RTP can also transfer data to multiple targets.

It is important to note that RTP does not provide any mechanism to ensure real-time data and quality-of-service (QOS), but relies on the underlying services to provide these functions. RTP neither guarantees the reliability of transmission nor the disorderly transmission, nor assumes that the underlying network is trusted and orderly. The receiver can sort the received packets using the sequence number in RTP.

RTP and RTCP

  • Real – time Transport protocol (RTP) for the transmission of real – time data

  • RTP control protocol (RTCP), which monitors QOS and transmits information about participants in a session. It does not have explicit member control functions and Session establishment procedures, but these are sufficient for a relatively loose Session control, and it is not necessary to include all the control functions of an application.

RTP represents a new protocol that follows Application Level Framing and Integrated Layer Processing. That is, RTP can be easily extended to deliver specific needs, and can be easily integrated into an application, rather than as a stand-alone supplemental layer. The RTP protocol is deliberately designed to be an incomplete protocol framework.

RTP usage scenarios

The following examples describe some of the features of RTP. The examples are chosen to illustrate the basic operations of RTP-based applications, not to say that RTP can only be used for such applications.

Simple multicast audio conferencing

One group was conducting an audio conference over the Internet, using an IP multicast service. Based on some allocation mechanism, the team got a multicast group address and a pair of ports, one for audio data and the other for RTCP packets. This multicast address and port are sent to all participants. To introduce some security policies, you can encrypt data packets and control packets and distribute the encryption key to participants.

The audio conferencing software may always send audio packets that are 20ms long. Each actual audio packet starts with RTP header data, which is then encapsulated and sent using UDP. The header of an RTP packet identifies the data type of the packet so that the message sender can change the encoding of the data. For example, do some tuning for low-bandwidth attendees, or respond to network congestion.

Networks with packet types like UDP have occasional packet loss, out-of-order, and long latency. To address such contingencies, RTP packets contain time information and serial numbers that the receiver can use to reorder the packet’s timing. In this example, we can play each 20ms of audio data sequentially. In a conference, RTP packets are rearranged independently for each data source. The receiver can also use the serial number to determine how many packets are missing.

Since there will be some people joining or dropping out of the network meeting during the group meeting, we need to know exactly who is joining the meeting and whether they are receiving the audio data normally. For this purpose, each network meeting client periodically reports the user’s name through the RTCP port and the data it receives. If someone receives data incorrectly, it may need to change the corresponding encoding. Furthermore, there is some additional information in addition to the user’s name to control the bandwidth limit. When someone exits a video conference, you need to send an RTCP BYE packet.

Audio and video conferencing

If the meeting wants to transmit both audio and video, they will be transmitted in a separate RTP Session. That is, the part responsible for audio transmission and the part responsible for video transmission transmit their respective RTP packets and RTCP packets through different multicast addresses (and port pairs). At the RTP layer, audio and video sessions are not combined. We expect participants to set up audio and video sessions with the same name so that the two sessions can be linked.

One reason for the DESIGN of the RTP protocol is that some participants can choose to receive only one type of data (Audio only). Even though Audio and Video data are distributed independently, they can still be played synchronously by referring to the time information in the RTCP protocol.

Mixers & Translators

So far, we have assumed that all participants want to receive media data in the same format. This is obviously not a good idea, considering that some attendees may have relatively slow Internet connections while others may have fast Internet connections. In this case, instead of forcing everyone to use low bandwidth and reduce the quality of the audio encoding, rTP-class relay nodes (Mixer) should be used to distribute low-bandwidth consumption data to surrounding low-bandwidth users.

The Mixer synchronizes the audio data received from different participants and couples them into a single stream, then compresses the stream with a low-bandwidth consuming coding scheme and sends it to low-bandwidth participants. The Mixer could write special content in the RTP header to indicate which participants the Mixer package was coupled to, so that the receiver of the Mixer package could determine who was currently speaking.

In addition, some participants may be behind an application-level firewall and cannot be accessed through IP multicast alone. In this case, Mixer makes little sense; they need another class of RTP-class translators. We need two translators installed on both sides of the firewall. The outer Translator sends all multicast messages received to the inner Translator over a secure connection. The Translator on the firewall then sends these packets to Intranet participants.

Layer coding

Multimedia applications can adjust transmission rates according to the capacity of the receiver or network congestion. Many implementations place the responsibility for bit rate control on the sender. This is not compatible with multicast because different data receivers have different bandwidth conditions, leading to a barrel effect, in which the worst bandwidth receivers drag down the communication quality of the entire conference.

Therefore, the task of bandwidth adaptation should be assigned to the receiver. The sender needs to split the media streams (500K, 2M, and 5M) for participants with different bandwidths. The media streams correspond to different multicast addresses.

define

  • RTP payload: Data transmitted in RTP packets, such as audio sampling data or compressed video data.
  • RTP packet: a packet consisting of a fixed-length RTP header, a list of data sources, and RTP payload. Some of the underlying protocols may define their own RTP encapsulation format. Generally, a single underlying protocol packet contains only one RTP packet, but it is possible for multiple RTP packets to be merged together.
  • RTCP packet: An RTP control packet. It starts with a fixed-length RTC header and is followed by some structured elements. These elements have different structures when RTCP performs different functions. Typically multiple RTCP packets are combined and sent together through a single underlying protocol packet.
  • Port: An abstraction used in the transport layer protocol to distinguish different applications under a host. RTP relies on the lower-layer network to provide the port mechanism, which in turn provides multicast RTP and RTCP packets.
  • Transport Address: a combination of a network address and port used to locate Transport layer nodes.
  • RTC Media Type: the set of all payload types used in an RTP Session.
  • Multimedia Session: A group of RTP sessions that work simultaneously in a video conference group. For example, Audio Session and Video Session in Video conferencing.
  • RTP Session: A group of participants using RTP to communicate. A participant can join multiple RTP sessions at the same time. In a Multimedia Session, each data flow is transmitted through a different RTP Session, unless the Multimedia is deliberately encoded into the same data stream. Participants use Transport addresses to distinguish RTP sessions. Different participants in the same RTP Session may share the same Transport address, or each participant may have its own Transport address. In unicast cases, one participant may use the same port pair (RTP&RTCP) to receive data from all other participants, or different port pairs (RTP&RTCP) may be used for different participants.
  • Synchronization Source (SSRC) : A source for RTP packet flows. It is identified by the 32-bit SSRC identifier defined in the RTP header to be independent of network addresses. All packets sent in the same SSRC have the same timing and sequence number interval, so the receiver can use the SSRC to group and sort the received packets. The flow of messages from a single source (microphone, camera, and Mixer) will be sent by an SSRC transmitter. An SSRC may change its data format over time, such as audio encoding. SSRC identifiers are generated randomly, but must not be duplicated for the entire RTP Session, which is done over RTCP. If a participant sends different media streams in an RTP Session, the SSRC of each stream must be different.
  • Contributing source (CSRC) : A list of all data that RTP Mixer mixed corresponding to SSRC. The Mixer writes to the RTP header a list of all source SSRCS contained in the mixed message.
  • End System: an application that generates the RTP payload and consumes the received RTP payload. An End system can play one or more SSRC roles, but it is usually one.
  • Mixer: An intermediary system that receives data from one or more sources, which may subsequently change the format of these data and combine them into a new RTP packet. Because the timing of multiple input sources is generally inconsistent, Mixer often synchronizes The Times of the different sources and generates its own timing to process the combined data stream. All packets coming out of the Mixer are marked with the Mixer’s SSRC.
  • Translator: A mediation system that forwards RTP packets without changing its original SSRC.
  • Monitor: An application that receives RTCP packets in an RTP Session. It summarizes reports on received data, evaluates QOS, diagnoses errors, and collects long-term statistics for the current distribution system. Monitor can be integrated into a conference application or an independent third-party application that receives RTCP packets but sends nothing.
  • Non-rtp means: a protocol or mechanism added to enable RTP to provide available services. In multimedia conferences, a control protocol is required to distribute multicast addresses and encryption keys, coordinate encryption algorithms, and define the dynamic mapping between RTP payload format and RTP payload type.

Byte order, data alignment, time format

All integer fields use network byte order (big-endian), and numeric constants are expressed in decimal unless otherwise specified.

All header data is aligned according to the original length of its data, for example, 16-bit data will be aligned to an even offset, and 32-bit data will be aligned to an offset divisible by 4. In addition, use 0 as the padding byte.

Wallclock time is expressed in the network Time Protocol (NTP) time format, i.e. the number of seconds from 0 o ‘clock on January 1, 1900 to the present day. NTP timestamps are represented by 64-bit unsigned fixed decimal points, where the first 32-bit is used to represent the integer part and the last 32-bit is used to represent the decimal part. The RTP time format uses a simplified version of NTP. It uses only the middle 32-bit of NTP 64-bit data. That is, the first 16-bit indicates an integer and the last 16-bit indicates a decimal.

The NTP timestamp loops back to 0 in 2036, but since RTP only uses the difference between different NTP times, this doesn’t matter. As long as a pair of timestamps are in the same cycle, the modular architecture can be directly subtracted or compared, the NTP cycle problem is not important.

RTP data transfer protocol

RTP fixed-length header field

The RTP header format is as follows:

The first 96-bit data in the figure above is the part of each RTP packet, while the CSRC part is only available in the packets sent by Mixer. The meaning of these fields is as follows:

  • Version (V) : 2 bits, RTP Version number. Currently, 2 is used. (1 for the first RTP draft)
  • Padding (P) : 1 bit. If this field is set, the end of the packet contains one or more Padding bytes. These Padding bytes are not the content of the payload. The last padding byte identifies how many padding bytes (including itself) need to be ignored. Padding may be used by encryption algorithms that require blocks of fixed length. Padding may also be used by some lower-level protocols to send multiple RTP packets at once.
  • Extension (X) : 1 bit. If this field is set, the header data will be followed by an Extension data.
  • CSRC count (CC) : 4 bits, the length of the CSRC list.
  • Marker (M) : 1 bit, Marker will be defined in the preset (the relationship between preset and RTP can be referred to RFC3551. My understanding is that preset is a supplement to RTP to meet the needs of a certain type of actual use scene), and used in the packet flow to divide the boundary of each frame. Additional markers may be defined in the preset, or markers may be removed to extend the length of the payload Type field.
  • Payload type (PT) : 7bits. This field defines the format of the RTP Payload and its meaning in the preset. Upper-layer applications may define a mapping (static type code payload format). You can also define the payload type dynamically in ways other than the RTP protocol. The payload type may change in an RTP Session, but the payload type should not be used to distinguish different media streams. As mentioned earlier, different media streams should be transmitted through different sessions.
  • Sequence number: 16 bits. This Sequence number is + 1 for each RTP packet sent. The receiver of an RTP packet can use this Sequence number to determine packet loss and reorder packets. The initial value of this field should be random, which makes known-plaintext more difficult.
  • Timestamp: 32 bits. The Timestamp reflects the time when the RTP packet generated the first piece of data. This timestamp must grow consistently and linearly because it will be used to synchronize packets and calculate network jitter, and the clock solution must have sufficient precision, such as a single clock tick for a video frame, which is certainly not enough. If RTP packets are generated periodically, the sampling clock is usually used instead of the system clock. For example, each RTP packet in an audio transmission contains 20ms of audio data, so the timestamp of the next adjacent RTP packet is added by 20ms instead of capturing the system time. Like the sequence number, the initial value of the timestamp should be random, and if multiple RTP packets are generated at once, they will all have the same timestamp. Timestamps for different media streams may grow at different steps, and they are usually independent, with random offsets. While these timestamps are sufficient to reconstruct the timing of a single media stream, it is not possible to synchronize multiple media streams by directly comparing timestamps. Each timestamp will form a time pair with the reference clock, and different streams that need to be synchronized will share the same reference clock. By comparing the time pairs of different streams, the timestamp offset of different streams can be calculated. This time pair is not sent with each RTP packet, but is shared over RTCP at a relatively low frequency.
  • SSRC: 32 bits. This field is used to determine the data source. This identity should be generated randomly and ensure that there are no duplicate SSRCS in the same RTP Session. Although the probability of an SSRC conflict is small, every RTP client should be vigilant and resolve conflicts if found.
  • CSRC list: 0 ~ 15 items, 32 bits each, CSRC list indicates all SSRC that contribute to the payload data. The number of SSRC’s that this field contains is defined by the CC field. If there are more than 15 SSRC, only 15 can be recorded.

RTP Session multiplexing

In RTP, multiplexing is provided by the destination transport address (Address: port), and different RTP sessions have different transport addresses.

Separate audio and video streams should not be included in the same RTP Session, nor should they be differentiated by payload type and SSRC. If the same SSRC is used to send different data streams, the following problems can be introduced:

  1. Suppose two audio streams share an RTP Session and use the same SSRC. If one of them changes the encoding, this causes the payload type to change, but there is no way to let the receiver know which audio stream has changed the encoding.

  2. An SSRC has only one sequence and sequence number, which may be required if multiple streams have different clock cycles. And you can’t use the serial number to determine which stream lost the packet.

  3. The RTCP sender and receiver reports only describe timing and sequence numbers, not payload type data.

  4. Mixer was unable to merge two incompatible streams.

  5. If an RTP Session contains multiple media streams, the following advantages are lost:

  • Use different network paths or allocate network resources
  • Accept only one media (audio only if the network is poor)
  • The receiver does different processing for different media types

Different streams using different SSRC but still sending in the same RTP Session do solve the first three problems, but still not the last two.

Default possible changes to the RTP header

The existing RTP headers are sufficient for general applications. If necessary, the header fields can be modified by default, but the detection and statistics functions should still work.

RTP, head of development

RTP provides an extension mechanism for upper-layer applications to store customized information in RTP headers. If upper-layer applications receive unrecognized header stretching data, they ignore it.

It’s worth noting that there are some limitations to this head extension. If additional information is meaningful only for certain formats of the payload, it is better not to put that information in the header extension, but in the Payload section.

If the X bit in the RTP Header is set to 1, the Header must be followed by an extender block of indefinite length, following the CSRC list (if any). The header of the extension contains a 16-bit data that describes how many 32-bit words the extension contains (not including the header of the extension). Since only one extension can be attached to the end of the RTP header, some applications may have multiple types of extensions, so the 16-bit header of the extension is left to the developer to customize the parameters.

RTP control protocol

All participants in a Session periodically send control packets. The RTP control protocol uses the multicast mechanism like RTP data transmission. The underlying protocol must provide the multiplexing function of data packets and control packets, for example, using independent UDP ports to transmit data and control packets respectively. RTCP provides the following four functions:

  1. The primary function is to provide feedback on the quality of data distribution. This is the key function of RTP as a transport protocol, and it is closely related to traffic control and congestion control. Feedback information may directly affect the control of adaptive coding. Sending feedback reports to all participants allows them to assess whether the data distribution problems they encounter are personal or global. With distribution mechanisms such as IP multicast, organizations such as network providers can receive feedback even if they do not participate in the RTP Session, acting as a third party monitor to confirm data distribution issues. This feedback feature is reported by both RTCP senders and receivers.

  2. RTCP also gives each RTP source a constant transport layer identifier (CNAME), which the receiver needs to keep track of each participant because the SSRC may change mid-stream (program restart). Furthermore, the receiver can use the CNAME to link together all data streams for the same attendee, such as synchronizing audio and video. Data synchronization within a single media also requires NTP and RTP timestamps, which are in the RTCP packets sent by the data sender.

  3. The first two functions require all participants to send RTCP packets. Therefore, you need to properly control the frequency of sending packets to ensure that RTP can work properly when a large number of clients join the RTP protocol. By broadcasting control packets for each participant, each participant can independently calculate the total number of participants.

  4. As an optional feature, RTCP can be used to share small amounts of Session control information, such as identifying participants. Typically, this feature is used by loosely managed sessions. RTCP can be used as a convenient channel to communicate with other participants, but you should not expect RTCP to meet all of an application’s transport control requirements, which are usually met through a higher level Session control protocol.

Of these four features, the first three should be used in all application scenarios (with IP multicast). Designers of RTP applications should avoid having to work only in unicast mode, and RTP applications should be designed to be scalable, allowing for concurrent use by a large number of users. In addition, RTCP transmission should be controlled separately according to the roles of sender and receiver. For example, in some single connections, the feedback of receiver cannot be sent out.

Note: as with SSM, only one person can send data, and other recipients cannot use multicast to communicate directly with others. In this case, it is recommended to turn off the receiver’s original RTCP functionality completely, and then set up an RTCP adapter for the SSM to receive all feedback.

RTCP packet format

RTCP defines a number of packet types to transmit different control information:

  • SR: Sender report, statistics of sender data sent and received.
  • RR: the receiver reports the receiving statistics of the node that receives data only.
  • SDES: Source description, including CNAME.
  • BYE: Exit.
  • APP: upper-layer application customization.

Each RTCP packet has a fixed-format header similar to that of RTP, followed by structured data of varying length, which varies from RTCP type to RTCP type, but must all be 32-bit aligned. RTCP headers are fixed-length, and there is a field in the header that describes the length of the RTCP data, so RTCP can be sent together as a group without any delimiters to separate individual RTCP packets. The underlying protocol may decide how many RTCP packets are combined to form a composite packet according to its own situation.

Each individual RTCP packet in the composite packet is out of order and may be randomly compounded. In order for the protocol to function properly, there are the following restrictions:

  • Receive statistics (SR | RR) maximum limit the bandwidth of the frequency to send, so each cycle send RTCP composite bag need to include a message like this.
  • A new receiver needs to get the CNAME of the data source as quickly as possible, because it uses the CNAME to determine who each data source corresponds to and to link the data sources together for synchronization, so every RTCP composite must contain the SDES CNAME (unless the composite is split in half and encrypted). Plain text, described later in this section).
  • The number of packet types in composite packets needs to be limited, which reduces the likelihood that other errant packets or unrelated packets will be identified as RTCP packets and increases the number of fixed bits in the first word.

Therefore, a composite packet must contain at least two types of RTCP packets in the following formats:

  • Encryption prefix: Inserts a random 32-bit number in the header of the composite if and only if the composite needs to be encrypted. If the encryption algorithm needs to populate the data, it needs to populate it after the last RTCP packet in the composite packet.
  • SS or RR: The first RTCP packet in a composite packet must be a report packet, which speeds up the data verification in the packet header. A report message must be sent even if no RTP data is sent and received, in which case an empty RR message must be sent, even if other RTCP messages in the compound packet are BYE.
  • Additional RRs: If the received RTP data comes from more than 31 different sources, the first 31 receive reports are included in the SR or RR packets. Additional receive reports follow the default SR or RR packets.
  • SDES: The SDES package must contain CNAME, and each composite package must contain an SDES package. If upper-layer applications need to add other SDES packets, it depends on the bandwidth limit.
  • BYE or APP: Other RTCP packet types (including those not yet defined in the protocol) may follow THE SDES in any order, but expect the BYE packet to be written last (the BYE packet needs to be sent with the SSRC/CSRC).

A single RTP participant should send only one composite RTCP packet in a reporting cycle, which should be estimated by bandwidth per participant unless a composite packet is split and encrypted. If the number of data senders is so large that all RR packets cannot be packed into a composite packet except for the MTU increase method, only part of RR data is packed into the composite packet at a time, and other data is not sent. Of course, in order for all sources to be reported, all sources are shared in a loop over multiple cycles.

To reduce packet overhead, it is recommended that the Translator and Mixer combine RTCP packets from multiple sources into a composite packet at all times. An example of a mixer-generated composite package is shown below:

If the length of a composite packet exceeds the MTU of the underlying network protocol, the composite packet is divided into smaller composite packets and sent separately. This will not have any impact on THE bandwidth estimation of RTCP, because even though the Mixer’s composite packets are divided into multiple smaller composite packets, the smaller composite packets must meet the condition that “each composite packet must contain SS or RR”, so each smaller composite packet corresponds to at least one participant. In this way, the Mixer generated composite packets roughly matched, or even reduced, the number of RTCP packets it received.

If a client receives an RTCP-type packet that it cannot parse, it should ignore it. Additional RTCP packet types are registered through IANA.

RTCP transmission period

The idea behind RTP is that it ADAPTS to the increasing number of Session participants. For example, if there are usually only one or two people speaking at any one time in an audio meeting (which internally limits the transmission of audio data), it can be assumed that the bandwidth resources used to distribute multicast data are independent of the number of participants. The transmission of control information is different from the transmission of audio data. Each participant continuously sends RTCP packets. If the receiving reports of each participant are sent at the same period, the resources consumed by RTCP packet transmission increase linearly as the number of participants increases. Therefore, when the number of participants increases, the interval for sending RTCP packets should increase dynamically.

For each Session, there is a total Session bandwidth limit, which is allocated to each individual participant. The bandwidth of the entire network may be reserved and Session bandwidth may be forcibly limited at the network level. If the bandwidth of the network is not reserved, there may be some other limitations, but these will depend on the network environment, and you will end up with a reasonable maximum Session bandwidth. Session bandwidth may be estimated based on the actual network resources consumed, or may vary midway based on the remaining available bandwidth of the Session.

These are independent of the encoding of the media data, but the specific encoding will be chosen based on bandwidth constraints. Typically, you estimate how many participants in a Session will send data at the same time, and then estimate the bandwidth of the Session based on how much bandwidth it would take to send such data at the same time. In audio conferencing, this is generally the bandwidth required by one audio sender (usually only one person is speaking at a time). In the case of layered coding, each layer is in a separate RTP Session that has its own independent bandwidth limits.

There should be a management application in the RTP Session to adjust the Session bandwidth, but audio conferencing applications may set themselves a default bandwidth limit based on the encoding format chosen in the Session, assuming only one sender sends data. The audio conferencing application may also be limited by multicast network bandwidth (or other factors). All participants in the same Session must use the same Session bandwidth limit, because only then can they send RTCP packets at the same frequency.

The Session bandwidth assessment process needs to consider whether the lower transport and network layers have some resource retention mechanisms. Upper-layer applications also need to know what protocols are used in the lower layers of RTP, but not at the data link layer and below, where packets start with different headers.

The transmission of control packets should use only a small part of the Session bandwidth, so that the transmission of media data is not affected. It is recommended that 5% of the Session bandwidth be used for RTCP traffic and that the media sender use at least a quarter of the RTCP bandwidth, because in this way the new sender can receive the CNAME of the media sender more quickly. In some presets, receiving reports may be completely closed if the number of senders exceeds 1/4, although this is not recommended by the RTP protocol standard, but is generally done on systems that have only one-way links or do not require feedback from the receiver.

RTCP packets are transmitted at a longer interval. In this way, the number of RTCP packets does not exceed the bandwidth limit when the number of participants increases rapidly. When an application is started, it should wait a certain amount of time (usually half of the minimum RTCP packet interval) before sending the first RTCP packet, so that the calculation of the sending interval converges more quickly. The recommended minimum interval for sending RTCP packets is 5 seconds.

Upper-layer applications of RTP may use shorter RTCP send intervals, but they also follow the following principles:

  • For multicast sessions, only the data sender uses the shorter RTCP sending interval.
  • For unicast sessions, both sender and receiver may use shorter RTCP intervals, and they may not wait for some time before sending the initial RTCP.
  • All sessions should determine the timeout of participants based on the minimum RTCP sending interval.
  • The recommended minimum RTCP sending interval is calculated in the format of “360 KB /Session bandwidth (KB /s) “. In this case, if the Session bandwidth is greater than 72kb/s, the RTCP sending interval is less than 5 seconds.

In addition, in order to ensure the normal operation of RTCP in large sessions, the existing algorithm also has the following characteristics:

  • The interval for sending RTCP packets decreases linearly as the number of Session participants increases.
  • The RTCP packet sending interval is randomly scaled by 0.5 to 1.5 times to prevent a large number of participants from sending RTCP packets at the same time.
  • The control packet data contained in the RTCP composite packet changes dynamically according to the status of receiving and sending packets.
  • The INTERVAL of RTCP packets is calculated based on known Session participants. Therefore, when a new participant wants to join a Session, the size of the entire Session may be misestimated and a shorter RTCP interval may be used. This is especially true when large numbers of people join sessions at once. Therefore, there may be a “send timing reordering” algorithm, which implements a simple revocation mechanism, and can appropriately withdraw some RTCP packets as the Session size continues to grow.
  • When someone sends a BYE packet or exits the Session due to timeout, the RTCP sending interval should be shortened.
  • Compared with other RTCP packets, BYE packets have some special features. When someone wants to exit and send a BYE message, it can send it before the next sending period. Of course, if a large number of users exit at the same time, the RTCP packet revocation mechanism is also affected.

Maintains the number of Session members

As we already know, to calculate the RTCP sending interval, we need to know the number of members in the entire Session. When a new node is listened, it is added to the total number of sessions, and it is added to an SSRC (CSRC) identity table and kept track. Only when you receive multiple packets from this new node, or receive its SDES packet (CNAME), will you feel that this new node is reliable. When a node sends a BYE message, its message may be deleted. However, due to packet loss or network congestion, the node is marked as “received BYE” and then deleted if no other message has been received.

If a node does not receive a packet from another node for more than one RTCP cycle, it may mark it as inactive or delete it, so that packet loss is minimized. However, it is impossible not to lose packets, so people usually multiply the RTCP transfer interval by a factor (greater than 1) as the timeout.

For sessions with a very large number of participants, it may not be possible to maintain an SSRC table to store all participants. It is common to simplify the SSRC table, but it is important to note that no matter how simplified the table is, you should not underestimate the total number of participants. It is permissible to overestimate the total number of participants.

Rules for sending and receiving RTCP packets

First, both multicast and unicast with multiple nodes must follow the RTCP interval mentioned earlier. To send and receive RTCP packets, each participant in a Session maintains the following information:

  • TP: indicates the time when the last RTCP packet is sent.
  • TC: current time.
  • TN: indicates the time when the packet is to be sent next.
  • P-members: calculates the total number of Session Members referenced in the last TN;
  • Members: The current Session Members are always;
  • Senders: total number of data Senders;
  • RTCP_BW: indicates the target bandwidth of RTCP.
  • WE_Sent: indicates whether data is sent after the penultimate RTCP packet is sent.
  • AVG_RTCP_Size: average RTCP composite packet size, including transport layer and network layer headers;
  • Initial: Indicates whether no RTCP packets are sent.

Calculates the RTCP sending interval

In order for THE RTP protocol to be scalable, the RTCP send interval needs to be scaled appropriately as the total number of sessions changes. Based on the above states, we calculate the RTCP packet interval as follows:

  1. If the number of media stream senders is less than 25% of the total number, this interval depends on whether the current node is a media stream sender (determined by WE_Sent). For the sender of media streams, the calculation formula is Senders * AVG_RTCP_Size/(25% * RTCP_BW); for the receiver of media streams, the calculation formula is: (Members – Senders) * AVG_RTCP_Size/(75% * RTCP_BW). When the number of media stream senders exceeds 25%, senders and receivers are treated equally, with the RTCP cycle formula: Members * AVG_RTCP_Size/RTCP_BW.

  2. If a participant has not sent any RTCP packets, the minimum sending interval (Tmin) is 2.5 seconds; otherwise, it is 5 seconds.

  3. The determined send interval (Td) will be the larger of the value calculated in the first step and the Tmin.

  4. The packet will be randomly scaled by 0.5~1.5 times on the basis of Td.

  5. Finally, this interval is divided by e-3/2=1.21828 to compensate for the effect of the “send time refactoring” algorithm (which causes the actual bandwidth used by RTCP to be lower than the expected bandwidth).

Initialize the

When a person joins a Session, tp=0, TC =0, senders=0, p-members=0, members=1, we_sent=false, rtcp_BW = 5% * Session bandwidth, initial=true, Avg_rtcp_size is set to the size of the first RTCP packet that will be sent later, and then the sending interval T is calculated based on the above initial state and used as a reference to send the first packet, and finally add its SSRC to the membership list.

Receive RTP and non-bye RTCP packets

When an RTP or RTCP packet is received by another person (A), if the SSRC of the packet has not been seen by A, he adds it to the SSRC table and updates the Session total (Members). The same is done for each CSRC.

If an RTP packet is received and its SSRC is not in the sender’S SSRC table, it adds it to the sender’s SSRC table and updates the total number of Senders.

When each composite RTCP packet is received, the status of the average RTCP packet size (AVG_RTCP_Size) is updated by the following formula: AVG_RTCP_Size = (1/16) * last_rtcp_package_size + (15/16) * previous_AVg_rtcp_size.

Received RTCP BYE packets. Procedure

If an RTCP BYE message is received, the SSRC entry is checked in the member list and removed and the total number of Members (Members) is updated. A similar operation is also done in the Senders SSRC table, deleting it if found and updating the total number of Senders.

In addition, to make the RTCP transmission rate dynamically change with the number of users in the Session, the following algorithm is executed when a BYE packet is received:

  1. TN is updated as follows: TN = TC + (Members/p-members) * (tn-tc).

  2. TP is updated as follows: TP = TC – (Members/p-members) * (TC-TP).

  3. The next RTCP packet is sent according to the new TN instruction (earlier than the original one).

  4. Set p-members to the value of Members.

This algorithm does not take into account the unexpected situation that when a large number of people (not all of them) exit the Session at the same time, the RTCP period drops to a very small value, which can lead to a false Timeout judgment, which eventually causes the total number of people in the entire Session to drop to zero. However, this happens so rarely that no one thinks it’s a big deal.

SSRC timeout

Once in a while, you need to confirm whether you have not received a packet from a participant for a long time. Generally, you need to confirm this every RTCP period. If timeouts are found, the SSRC needs to be removed from the Members & Senders list and the current number updated.

Member table: Generally not receiving a message from someone for more than 5 send cycles (regardless of random scaling) is considered timeout.

Sender table: Normally 2 send cycles.

If a member is determined to timeout, the algorithm described in the previous step takes effect.

Send countdown

As we know, each RTCP is periodically sent. When an RTCP packet is sent, a countdown will be created according to TN, and the following operations will be repeated each time when the countdown returns to zero:

  1. The transmission period T is calculated and the random scaling factor is introduced.

  2. If TP + T <= TC, an RTCP packet is sent immediately and TP is set to TC and TN to TC + T. The next countdown will return to zero at TN time. If TP + T> TC, RTCP packets are not sent. After calculating TN = TC + T, reset a timer to return to zero at TN.

  3. P-members Sets this to Members.

If RTCP packets are sent, initial is set to FALSE and AVG_RTCP_Size is updated as follows: AVG_RTCP_Size = (1/16) * last_rtcp_package_size + (15/16) * previous_AVg_rtcp_size.

Sending a BYE Message

When someone wants to exit a Session, he sends a BYE message to others. To prevent BYE packet blowout when a large number of users exit the Session at the same time, perform the following operations when the number of sessions exceeds 50:

  1. When an actor wants to leave, TP is set to TC, Members and p-members to 1, initial to 1, we_send to false, senders to 0, Avg_rtcp_size Is set to the size of the compound BYE message. The RTCP sending interval (T) is calculated. The next BYE packet is sent after TN = TC + T.

  2. Every time the departing person receives a BYE message from someone else, Members increases by one, whether or not the person is in the member list. The number of Members packets is increased only when a BYE packet is received. Similarly, avg_rtcp_size is only concerned with the size of the received BYE message. The number of Senders stays the same.

  3. For BYE packets, the sending logic is the same as mentioned above, except that the maintenance routine of the status value is changed. The preceding solution enables BYE packets to be sent correctly and controls the overall bandwidth. In the worst case, RTCP packets occupy only 10% of the total Session bandwidth.

Some participants may not want to send a BYE message in this manner and may leave without sending anything. Such cases are held by the Timeout mechanism.

If the total number of participants in a Session is less than 50, a participant may directly send a BYE message or follow the preceding scheme.

There is another rule that must be followed anyway. If a participant has not sent a SINGLE RTP or RTCP packet, he must not send a BYE packet when leaving a Session.

Update WE_Sent

When an actor has recently sent an RTP, he sets WE_Sent to true and adds himself to the Senders table. Otherwise, if he hasn’t sent an RTP packet for more than two RTCP send cycles, he removes himself from the Sender table. And set WE_Sent to false.

Bandwidth allocation for SDES packets

In addition to the mandatory CNAME, SDES packets contain other information, such as NAME and EMAIL. Upper-layer applications can also customize some message types, but be careful not to add so much custom information that it slows down the entire RTCP protocol. It is recommended that the bandwidth usage of these additional content do not exceed 20% of the total RTCP bandwidth. Also, don’t assume that every upper-layer application will contain all SDES content. Upper-layer applications allocate bandwidth to the content based on actual usage, and generally they control the bandwidth by controlling the sending interval.

For example, an application’s SDES may contain only CNAME, NAME, and EMAIL, where NAME may be allocated more bandwidth than EMAIL. Because the NAME will always show up, whereas the EMAIL may only show up when you click to view it. In each RTCP transmission period, the SDES contains the CNAME. If the RTCP interval is five seconds, the SDES may carry information except CNAME every 15 seconds. For example, the SDES carries NAME information seven times and EMAIL information one time in two minutes.

“Video cloud technology” your most noteworthy audio and video technology public account, weekly push from Ali Cloud front-line practice technology articles, here with the audio and video field first-class engineers exchange exchange.