Currently, real-time video systems such as Skype, Facetime, and WebRTC are built from two separate components, a “video codec” that compresses video, and a network “transport protocol” that transmits packets based on data transfer speeds and avoids network overloads.

Over the years, video codecs and transport protocols have been designed and built by various companies and then integrated into applications such as Skype and FaceTime.

However, each component has its own control loop, a “congestion control” in the transport protocol that controls the amount of data entering the network when congestion occurs, and the codec has its own “rate control” algorithm that compresses video based on information provided by the transport protocol. The transport protocol must work with the codec. When the transport protocol and the codec are out of sync or the network conditions are unstable, delays or failures may occur.

Salsify works together to control frame-by-frame compression and packet-by-packet transmission in a single loop, which allows video streams to adjust to changes in the network and avoid delays.

Current video encoders cannot accurately predict the compression size of individual video frames, instead codecs always try to control the “bitrate” of several frames (for example through VBV constraints). If the encoder produces a frame that is too large compared to what the transport protocol thinks the network can accommodate, the application will usually transmit it, and later, it determines that this may cause congestion, the application may pause the data input to the video encoder to relieve congestion.

But this is not a good solution because the network is still congested. A better approach would be to remind the encoder to try again if the encoder produces a compressed frame that is too large. Instead of re-encoding the same frame by reducing its quality, a new frame should be taken from the camera and encoded, with the quality set to a compressed frame that does not jam the network. In practice, frames should be sent only if they do not cause packet loss and delay, and not at a fixed rate.

Salsify’s pure feature video codec does just that. Its video codec is 100% compatible with the Google VP8 format. The encoding and decoding is purely functional, with no side effects, and represents the interframe “state” of the decoder. This will allow the Salsify encoder to compress a frame relative to any decoder state, allowing the application to safely skip frames coming from the encoder’s output, not just the frames coming in.

Salsify’s codec guarantees that the sender will not send frames during network congestion (discarding encoded frames if necessary) and does not fix the rate at which frames are sent. It also allows the codec to generate frames closer to the available network capacity, generating two versions of each frame: one of slightly higher quality and one of slightly lower quality than the previous success stories. After looking at the actual compression size of each option, the application selects (or deselects) from these options.

Salsify compressed video was in Google’s VP8 format, which has since been replaced by VP9 and H.265. Salsify’s pure feature VP8 encoder/decoder, which was a modified version of ExCamera last year, has the advantage of allowing us to subdivide video coding into small threads (smaller than the interval between key frames) and can process thousands of threads on AWS Lambda in parallel. This year, Salsify can explore the execution paths of video codecs. Salsify’s congestion control scheme is based on sprout-EWMA, which in turn is based on the early working of packet-pairs and packet-train estimates of available bandwidth. Salsify’s loss-recovery strategy is linked to Mosh P-ReTELEVISED.

Taken together, Salsify innovations are embodied in new combinations of codec rate control and transmission congestion control, as well as the use of functional video codecs and the sending of encoded frames only when the network can accommodate them.

The researchers tested Salsify against Microsoft’s Skype, Google Hangouts, Apple’s FaceTime, and Internte’s standard WebRTC protocol in Google’s Browser. On average, Salsify’s latency was only a quarter of the average, image quality improved by at least 60 percent,

The test results are as follows:

Salsify’s technical team admits that they would love to see Salsify integrated into the industry and are willing to help legally, but technically, it’s not that easy. If Salsify is a good video codec or transport protocol, it will be easier, instead, Salsify parts is a combined form of technology, this means that in the case of no refactoring to modify an existing application will become very difficult, this is the conclusion after their dialogue with the industry, It’s hard for them to prove that Salsify’s revenue is significant, and it’s hard for them to make small changes to existing applications to achieve Salsify’s new features right now.

The Salsify project was developed by Stanford University students with funding from the National Science Foundation and the Defense Advanced Research Projects Agency (DARPA). It also has support from Google, Huawei, VMware, Dropbox, Facebook, and The Stanford Platform Lab. The papers and raw data are open to the public, and the source code is open source.

Original link:

https://snr.stanford.edu/salsify/

Making:

https://github.com/excamera/alfalfa

Related papers:

https://www.usenix.org/system/files/conference/nsdi18/nsdi18-fouladi.pdf