The author | Vitaly Suturikhin

Translation | air-conditioning xu

Low broadcast latency has become a necessary feature in any bidding and competition for the construction of source end stations and CDN. This standard used to apply only to sports broadcasts, but now operators require low latency from broadcast equipment providers in every area: broadcast news, concerts, performances, interviews, talk shows, debates, esports, and more.

What is low latency?

In general, the delay is the time difference between the time that a particular video frame is captured by a device (camera, player, encoder, etc.) and the time that frame is played on the end user’s display.

What is low latency video streaming?

Low latency should not degrade the quality of signal transmission, which means minimal buffering is used when coding and multiplexing, while maintaining a smooth and clear picture on any device’s screen. Another prerequisite is to ensure transmission: all lost packets should be recovered, and transmission over an open network should not cause any problems.

More and more services are migrating to the cloud to save on rental space, power and hardware costs. This increases the requirement for low latency at high RTT (Round Trip Time). This is especially true when transmitting high bit rates when playing HD and UHD video. For example, if the cloud server is in the US and the content consumer is in Europe.

In this article, we will analyze the solutions currently on offer in the market for low-delay broadcasting.

UDP

Probably the first technique to come into widespread use in modern television broadcasting and to be associated with the term “low latency” was multicast of streaming content via MPEG TS over UDP. In general, this format is suitable for closed, no-load networks, where the packet loss rate is minimal. For example, broadcast from the encoder to the source end station modulator (usually in the same server rack), or IPTV via dedicated copper or fiber optic lines with amplifiers and Repeaters.

This technique is widely used and exhibits good latency characteristics. The latency associated with encoding, data transmission, and decoding achieved by companies in the market using Ethernet is no more than 80ms at 25 frames per second. At higher frame rates, this delay is even lower.

Figure 1. UDP broadcast delay measurement

The upper half of Figure 1 shows the signal from the SDI acquisition card, and the lower half shows the signal through the encoding, multiplexing, broadcasting, receiving, and decoding phases. As shown, the second signal arrives one frame later (in this case, because the signal is 25fps, 1 frame is 40 ms). A similar solution was used for the Confederations Cup 2017 and FIFA World Cup 2018, where only one modulator, one distributed DVB-C network and one TV as a terminal device were added to the architecture chain, resulting in a total delay of 220-240 ms.

What if the signal goes through an external network? There are various issues to overcome: interference, shaping, traffic blocking channels, hardware errors, cable damage, and software-level issues. In this case, not only low latency is required, but also retransmission of lost packets is required.

In the case of UDP, the forward error correction technique FEC with redundancy (which has additional test traffic or overhead) does a good job. At the same time, the requirements for network throughput increase, and therefore, latency and redundancy increase, depending on the percentage of packets expected to be lost. Because the percentage of packets that FEC can recover is always limited, and can vary greatly over an open network. Therefore, in order to transmit large amounts of data reliably over long distances, it is necessary to add a large amount of excess traffic.

TCP

Let’s look at a technology based on the TCP protocol (reliable delivery). If the checksum of the received packet does not match the expected value (set in the TCP packet header), the packet is resended. If the client and server do not support the selective acknowledgment (SACK) specification, the entire TCP packet chain (from the lost packet to the last packet received at a lower rate) is resended.

Previously, the TCP protocol was generally avoided when it came to the low latency of live streaming because of the increased latency associated with error checking, packet retransmission, three-way handshakes, “slow start” and channel congestion prevention (TCP slow start and congestion avoidance phases). At the same time, even with very wide channels, the latency before transmission starts can be up to five times that of RTT, and the increase in throughput has very little effect on the latency.

In addition, an application using TCP broadcast has no control over the protocol itself (its timeout, rebroadcast window size), because the TCP transmission is implemented as a single continuous stream, and the application can be “frozen” indefinitely until an error occurs. And the high-level protocol does not have the ability to flexibly configure TCP to minimize broadcast problems.

At the same time, some protocols work effectively over UDP even over open networks and long distances.

Let’s consider and compare implementations of various protocols. In tcp-based protocols and data transfer formats, RTMP, HLS, and CMAF will be involved, while in UDP-based protocols and data transfer formats, WebRTC and SRT will be involved.

RTMP

RTMP was a proprietary Macromedia protocol (now owned by Adobe) that became popular when Flash-based applications became popular. It comes in several varieties, with support for TLS/SSL encryption, and even a UDP-based variant called RTFMP (Real-time Media Streaming protocol for point-to-point connections). RTMP splits the data stream into segments whose size can change dynamically. Within a channel, audio – and video-related packets can be interleaved and reused.

Figure 2. RTMP broadcast implementation use case

RTMP builds several virtual channels over which audio, video, metadata, and so on are transmitted. Most CTNS no longer support RTMP as the protocol for allocating traffic to end customers. However, Nginx has its own RTMP module that supports the normal RTMP protocol, which runs on top of TCP and uses the default port 1935. Nginx can act as an RTMP server and distribute the content it receives from RTMP streams. In addition, RTMP is still a popular protocol for delivering traffic to CDN, but in the future, traffic will be transported using other protocols.

Currently, Flash is outdated and unsupported: browsers have either reduced support for it or banned it altogether. RTMP does not support HTML5 and is difficult to use in browsers (playback requires the Adobe Flash plug-in). To get around the firewall, they used RTMPT (encapsulated into HTTP requests and used the standard 80/443 port instead of 1935), but this significantly affected latency and redundancy (RTT and overall latency increased by 30%, according to various estimates). Despite this, RTMP is still popular, for example, in broadcasts on YouTube or on social media (Facebook’s RTMPS).

The main disadvantages of RTMP are the lack of support for HEVC/VP9/AV1 encoders and the restriction of allowing only two tracks. In addition, RTMP does not include a timestamp in the data packet header. RTMP contains only labels calculated based on frame rate, so the decoder doesn’t know exactly when to decode the stream. This requires a receiving component to uniformly generate samples for decoding, so the buffer must be increased by the size of packet jitter.

Another problem with RTMP is resend lost TCP packets, which was described earlier. Acknowledgement of receipt (ACK) does not go directly to the sender in order to keep back traffic low. An acknowledgement message of ACKs or NACKs is sent to the broadcaster only after the packet chain is received.

By various estimates, the latency for broadcasting with RTMP through the full encoding path (RTMP encoder →RTMP server →RTMP client) is at least two seconds.

CMAF

Common Media Application Format (CMAF) is a protocol developed by Apple and Microsoft inviting MPEG for adaptive broadcasting over HTTP (with adaptive bit rates that vary according to the bandwidth rate of the entire network). Typically, Apple’s HTTP Live streaming (HLS) uses MPEG TS streams, while MPEG DASH uses episodic MP4. In July 2017, the CMAF standard was released. In CMAF, fragmented MP4 clips (ISOBMFF) are transmitted over HTTP, and the same content has two different playlists for a specific player: iOS (HLS) or Android/Microsoft (MPEG DASH).

By default, CMAF (like HLS and MPEG DASH) is not designed for low-latency broadcast. But industry interest in low latency is increasing, so some vendors are offering extensions of the standard, such as low latency CMAF. This extension assumes that both the broadcaster and the receiver support two approaches:

  1. Block coding: The segment is divided into sub-segments (small segments with moof+ MDAT MP4 frames, which eventually make up the whole segment suitable for playing) and sent before the whole segment is pieced together.

  2. Block transfer encoding: Send sub-fragments to CDN (source) using HTTP 1.1: Send HTTP POST request for the entire fragment only once every 4 seconds (25 frames per second), after which 100 small fragments (one frame per fragment) can be sent in the same session. The player can also attempt to download an incomplete fragment, while the CDN provides the completed portion using block transfer encoding, and then stays connected until a new fragment is added to the fragment being downloaded. Once the entire segment is formed on the CDN side, the transmission of the segment to the player is complete.

Figure 3. Standard partitioned CMAF

Buffering (at least 2 seconds) is required if you want to switch between configuration files. Because of this and possible distribution issues, the developers of the standard claim a delay of less than 3 seconds. At the same time, important features such as extension via CDN with thousands of concurrent clients, encryption (along with general encryption support), HEVC and WebVTT (subtitle) support, guaranteed delivery, and compatibility with different players (Apple/Microsoft) are preserved. On the downside, one might notice mandatory LL CMAF support on the player side (support for fragmented fragments and advanced operations on internal buffers). However, in the case of incompatibility, players can still use content within the CMAF specification, with HLS or DASH’s standard latency.

LL-HLS

In June 2019, Apple released specifications for low-latency HLS.

It consists of the following parts:

  1. Generate partial segments (segmented MP4 or TS) for a minimum duration of 200 ms, even before the entire segment made up of these parts is complete. Outdated clips are periodically removed from the playlist;

  2. The server side can use HTTP/2 push mode to send updated playlists along with new fragments. However, in the last revision of the specification in January 2020, this proposal was excluded;

  3. It is the responsibility of the server to hold the request (block) until the playlist version containing the new fragment is available. Blocking playlist reloading eliminates polling;

  4. Instead of sending the full playlist, increments of the playlist are sent (the default playlist is saved and then only increments are sent as they occur, not the full playlist);

  5. The server announces upcoming new partial fragments (preloading hints);

  6. Information about the playlist is loaded simultaneously in adjacent configuration files to speed up switching.

Figure 4. How LL HLS works

In cases where the SPECIFICATION is fully supported by the CDN and the player, the latency is expected to be less than 3 seconds. HLS are very widely used for open network broadcasting due to their excellent scalability, encryption and adaptive bitrate support for cross-platform functionality, and backward compatibility, which is useful if players do not support LL HLS.

WebRTC

WebRTC (Real-time Web Communication) is an open source protocol developed by Google in 2011. It’s used on Google Hangout, Slack, BigClueButton, and YouTube Live. WebRTC is a set of standards, protocols, and JavaScript programming interfaces, and implements end-to-end encryption in point-to-point connections using DTLS-SRTP. In addition, the technology does not use third-party plug-ins or software and can be passed through firewalls without loss of quality and delay (for example, during video conferencing in a browser). When playing video, the UDP-based WebRTC implementation is usually used.

The protocol works as follows: A host sends a connection request to the peer to which it wants to connect. Before a connection between peer customers is established, they communicate with each other through a third-party signaling server. Each peer client then asks the STUN server “Who am I?” (How to find me from the outside?) .

Also, there are public Google STUN servers (for example, stun.l.google.com:19302). A STUN server provides a list of IP addresses and ports through which the current host can be reached. The ICE candidates are formed from this list. The second customer does the same. The ICE candidates exchange via signalling servers, and it is at this stage that the point-to-point connection is established, forming a point-to-point network.

If a direct connection cannot be established, then a so-called TURN server acts as a relay/proxy server and is also shortlisted for ICE.

The SCTP (Application data) and SRTP (audio and video data) protocols are responsible for multiplexing, sending, congestion control, and reliable delivery. For “handshake” exchanges and further traffic encryption, DTLS is used.

Figure 5. WebRTC protocol stack

Opus and VP8 are used as codecs. The maximum resolution supported is 720p, 30fps, and bitrate up to 2Mbps.

One security disadvantage of WebRTC technology is that a real IP is defined even after NAT and when Tor networks or proxy servers are used. Because of the connection structure, WebRTC is not suitable for a large number of simultaneous viewing peers (difficult to scale), and it is currently rarely supported by CDN. Finally, WebRTC is inferior to other protocols in terms of coding quality and the maximum amount of data transmitted.

WebRTC is not available in Safari and is partially unavailable in Bowser and Edge. Google claimed the delay was less than a second. At the same time, the protocol can not only be used for video conferencing, but also for file transfer and other applications.

SRT

SRT (Secure Reliable Transport) is a protocol developed by Haivision in 2012. The protocol is based on UDT and ARQ data packet recovery technology. It supports AES-128 and AES-256 encryption. In addition to listening (server) mode, it also supports call (client) and rendezvous (when both parties initiate a connection) modes, which allow connections to be established through firewalls and NAT. The “handshake” process of SRT is carried out within existing security policies, thus allowing external connections without the need to open permanent external ports in the firewall.

SRT contains a timestamp within each packet, which allows playback at a rate equal to the stream encoding rate without the need for heavy buffering, while keeping jitter (the changing packet arrival rate) consistent with the incoming bit rate. Unlike in TCP, where the loss of one packet may cause the resending of the entire packet chain, SRT identifies a particular packet by its number from the lost packet and resends only that packet. This has a positive effect on delay and redundancy.

Resend packets have a higher priority than standard broadcasts. Unlike the standard UDT, SRT completely redesigns the architecture for resending packets, and responds immediately when a packet is lost. This technique is a variant of the selective repeat/reject ARQ. It is worth noting that a particular lost packet may only be resended a fixed number of times. When the time on a packet exceeds 125% of the total delay, the sender skips the packet. SRT supports FEC, and it is up to the user to decide which of the two technologies to use (or both) to balance the lowest latency with the highest transmission reliability.

Figure 6. How SRT works on an open network

Data transmission in SRT can be bidirectional: both can send data at the same time, or both can be the listener and the initiator of the connection. Rendezvous mode is used when both parties need to establish a connection. The protocol has an internal multiplexing mechanism that allows several streams from a session to be multiplexed into a single connection using a UDP port. SRT is also suitable for fast file transfer, which was first introduced in UDT.

SRT has a network congestion control mechanism. Every 10 milliseconds, the sender receives the latest data about the RTT and its changes, the available buffer size, the packet reception rate, and the approximate size of the current link. SRT has a limit on the minimum delay between two packets sent consecutively. If they don’t arrive in time, they are removed from the queue.

The developers claim that the minimum latency possible with SRT is 120 ms when setting buffers to a minimum for short distance transfers over closed networks. The recommended total delay for stable broadcasts is 3-4 RTT. In addition, SRT handles long distance (thousands of kilometers) and high bit rate (10Mbps and above) transmissions better than its competitor RTMP.

Figure 7. SRT broadcast delay test

In the example above, the laboratory measured SRT broadcast delay was 3 frames at 25fps. That’s 40ms*3=120ms. From this we can conclude that the very low latency of 0.1 second, which can be achieved in UDP broadcast, can also be achieved in SRT broadcast. SRT is not on the same level of scalability as HLS or DASH/CMAF, but SRT is strongly supported by CDN and restreamer, as well as broadcasting directly to end customers in listening mode via media servers.

In 2017, Haivision disclosed the source code of the SRT library and created the SRT Alliance, which includes more than 350 members.

conclusion

To summarize, the following table shows a comparison of the protocols:

  1. Transmission from CDN to end-user is not supported. Support content flow to the last mile, for example, to a CDN or restreamer.

  2. Not supported in browsers

  3. Not supported in Safari

All things open source and well-documented are catching on fast these days. It can be argued that formats like WebRTC and SRT have a long-term future in their respective applications. These protocols have outperformed adaptive broadcasting over HTTP in terms of minimum latency, while maintaining reliable transmission, having low redundancy, and supporting encryption (AES for SRT and DTLS/SRTP for WebRTC).

In addition, more recently the RIST protocol, SRT’s “little brother” (in terms of when the protocol was created, but not in terms of functionality and capabilities), is gaining popularity, but that’s a topic for another day. At the same time, RTMP is being aggressively squeezed out of the market by new competitors, and the lack of native support in browsers makes it difficult for it to be widely used anytime soon.

The original title | Low Latency Streaming separate Protocols SRT, WebRTC, LL – HLS, UDP and TCP, RTMP Explained

The original link | ottverse.com/low-latency…

This article is redirected from the official account:

The coal miners factory

The first time to release the most up-to-date media technology information. We advocate the spirit of geeks and makers, and promote the sharing of information, goods and value among academia, industry and open source communities.

710 original content public account


Scan the QR code to learn more about the conference