Long time no see, nearly four months have not updated the blog!

Last year’s final article covered some of the optimizations for our Electron desktop client, and this article has some relevance to the Electron client we are developing. We have recently been working on a solution to play the conference video stream in real time on the Electron page.



The video conferencing interface is the last page that has not been replaced by the Web. It is developed entirely in native language, so the development efficiency is relatively low. For example, it is painful to develop some animation effects, and it is difficult to respond to changing product requirements. So we wondered: Can we play the video streams received by the underlying library WebRTC on the Web page side? Or why not communicate directly through the browser’s WebRTC API?

Answer the latter first, because our video conference logic processing, audio and video processing has been extracted into independent, cross-platform modules, unified maintenance; In addition, the browser’s WebRTC API provides a very advanced interface, which is like a black box, unable to customize, extend, and difficult to diagnose and deal with problems, limited to the browser. The biggest reason is that the change is a little big and time is not allowed.

Therefore, the former can only be selected at present, that is, the underlying library pushes the video stream to the Electron page and plays it in real time on the page. Before that, THE author has hardly been exposed to audio and video development. What I can think of is the way of similar live broadcast, with the underlying library as the “anchor side” and the Web page as the “audience side”.


Because the video stream is only forwarded locally, we do not need to consider various complex network conditions and bandwidth limitations. The only requirements are low latency, low resource consumption:

  • Our video conferencing voice and video are separate. Only one mixed voice is transmitted through SIP. In the case of conference video, there may be multiple channels and WebRTC is used for transmission. We don’t need to deal with voice (played directly by the underlying library), which requires that our video playback latency is not so high that the voice and video are out of sync.
  • Browser compatibility is not a concern. The Electron browser is Chrome 80
  • Local forwarding does not need to consider the network condition and bandwidth limit




Recently, I had the opportunity to contact with the knowledge related to audio and video due to my work. I only know a little about it. Therefore, there must be many problems in the article. Below, follow the audio video small white I, explore together to explore what solutions.




directory

  • (1) Typical Web live broadcast scheme
    • The RTMP push flow
    • RTMP pull flow
    • RTMP low latency optimization
  • (2) JSMpeg & BroadwayJS
    • The Relay server
    • push
    • Video playback
    • Multi-process optimization
    • A quick word about Broadway. Js
  • ③ Render YUV directly
  • Further reading




(1) Typical Web live broadcast scheme

There are many scenarios for Web live streaming (see this article, “What You Need to Know About Web Live”):

  • Real Time Messaging Protocol (RTMP) belongs to Adobe. Low delay and good real-time performance. But the browser needs Flash to play it; But we can also convert it into an HTTP/Websocket stream and feed it to flv.js for playback.
  • Real-time Transport Protocol (RTP) WebRTC is based on RTP/RTCP. Very good real-time, suitable for video surveillance, video conference, IP phone.
  • Http Live Streaming (HLS) An Http – based Streaming protocol proposed by Apple. Safari is well supported, as are higher versions of Chrome, and there are some mature third-party solutions.


The HLS delay was too high to meet our requirements, so we gave up at the beginning. We have searched a lot of information, many of which are about RTMP, so we can see how widely RTMP is used in China, so we plan to try:


The RMTP Server can be directly based on Nod-media-server. The code is very simple:

const NodeMediaServer = require('node-media-server')

const config = {
  // RMTP server for RTMP push and pull streams
  rtmp: {
    port: 1935.// 1935 is the standard port for RTMP
    chunk_size: 0.gop_cache: false.ping: 30.ping_timeout: 60,},// HTTP/WebSocket stream, exposed to flV.js
  http: {
    port: 8000.allow_origin: The '*',}}var nms = new NodeMediaServer(config)
nms.run()
Copy the code




The RTMP push flow

Ffmpeg is a necessary artifact for audio and video development. This article will use it to capture the camera, carry out various conversion and processing, and finally push the video stream. Let’s see how to use FFMPEG to push RTMP.

The following command lists all supported device types:

All the commands in this article are executed under macOS, other platforms are similar, search yourself

$ ffmpeg -devices
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
 D  avfoundation    AVFoundation input device
 D  lavfi           Libavfilter virtual input device
  E sdl,sdl2        SDL2 output device
Copy the code


Avfoundation is usually used for device collection in macOS. The following lists all the input devices supported by the current terminal:

$ fmpeg -f avfoundation -list_devices true -i ""
[AVFoundation input device @ 0x7f8487425400] AVFoundation video devices:
[AVFoundation input device @ 0x7f8487425400] [0] FaceTime HD Camera
[AVFoundation input device @ 0x7f8487425400] [1] Capture screen 0
[AVFoundation input device @ 0x7f8487425400] AVFoundation audio devices:
[AVFoundation input device @ 0x7f8487425400] [0] Built-in Microphone
[AVFoundation input device @ 0x7f8487425400] [1] Boom2Device
Copy the code


We’ll use the FaceTime HD Camera input device to capture the video and push the RTMP stream:

$ ffmpeg -f avfoundation -r 30 -i "FaceTime HD Camera" -c:v libx264 -preset superfast -tune zerolatency -an -f flv rtmp://localhost/live/test
Copy the code


A little explanation of the above command:

  • -f avfoundation -r 30 -i "FaceTime HD Camera"Says from theFaceTime HD CameraCapture video at a frame rate of 30 FPS
  • -c:v libx264The encoding format of output video is H.264, and RTMP usually adopts H.264 encoding
  • -f flvRTMP generally uses the FLV packet format.
  • -anIgnore audio stream
  • -preset superfast -tune zerolatencyH.264 transcoding default parameters and tuning parameters. It affects video quality and compression


Format and codec are the most fundamental concepts in audio and video development. Package format: equivalent to a container for storing video information, the encoded audio, video, or subtitles, scripts and other files are combined according to the corresponding specifications to produce a package format file. Common packet formats include AVI, MPEG, FLV, MOV and other encoding formats: the main purpose of encoding is to compress. Audio and video streams collected from the device are called raw bit streams (RAWVideo format), that is, data that has not been encoded or compressed. For example, for a 720p, 30fps, 60min movie, the raw stream size is 12Bx1280x720x30x60x100 = 1.9t. This is too expensive to store on the file system or transfer over the network, so we need to encode compression. H264 is one of the most common encoding formats.




RTMP pull flow

At its simplest, we can use the FFPlay player to test whether the push and pull streams work:

$ ffplay rtmp://localhost/live/test
Copy the code


Flash is obsolete, and in order to implement RTMP streaming in Web pages, we need to use flV.js. FLVJS is probably familiar to everyone (lacy: how to treat bilibili flV.JS writer’s monthly salary less than 5000 yuan?) , it is B station open source FLV player. According to the official introduction:

flv.js works by transmuxing FLV file stream into ISO BMFF (Fragmented MP4) segments, followed by feeding mp4 segments into an HTML5 <video> element through Media Source Extensions API.


As mentioned above, FLV (Flash Video) is a Video packet format. FLVJS only converts FLV to MP4(ISO BMFF) packet format.





FLVJS supports pulling binary video streams through HTTP Streaming, WebSocket or custom data sources. The following example uses FLVJS to pull a node-media-server video stream:

< script SRC = "https://cdn.bootcss.com/flv.js/1.5.0/flv.min.js" > < / script > < video id = "video" > < / video > < button id="play">play</button> <script> if (flvjs.isSupported()) { const videoElement = document.getElementById('video'); const play = document.getElementById('play'); const flvPlayer = flvjs.createPlayer( { type: 'flv', isLive: true, hasAudio: false, url: 'ws://localhost:8000/live/test.flv', }, { enableStashBuffer: true, }, ); flvPlayer.attachMediaElement(videoElement); play.onclick = () => { flvPlayer.load(); flvPlayer.play(); }; } </script>Copy the code


The complete sample code is here




RTMP low latency optimization

Push the flow end

The FFMPEG push side can reduce the push delay with some control parameters. The main optimization direction is to improve the coding efficiency and reduce the buffer size, of course, sometimes at the expense of code quality and bandwidth. This article summarizes some optimization measures for ffMPEG transcoding delay testing and setting optimization, which can be referred to:

  • Close the sync – lookahead
  • Lower rC-lookahead but not less than 10, default is -1
  • Reduce threads(e.g. from 12 to 6)
  • Disable the rc – lookahead
  • Disable b – frames
  • To reduce the GOP
  • Open x264 – preset fast/faster – / verfast superfast/ultrafast parameters
  • Use the -tune zerolatency parameter


node-media-server

The NMS can also optimize latency by reducing the buffer size and turning off the GOP Cache.


FLVJS end

FLVJS can enable enableStashBuffer to improve real-time performance. In actual tests, FLVJS may show ‘cumulative delay’ which can be corrected by manual seek.




After a lot of trouble, the best delay is optimized to 400ms, and then there is no way to proceed (for those familiar with this area, please ask me). And in the actual push to receive the underlying library, the playback effect is not ideal, there are a variety of lag, delay. Due to limited time and knowledge, it was difficult for us to locate the specific problem, so we gave up this scheme temporarily.




(2) JSMpeg & BroadwayJS

Jerry Qu’s HTML5 Live Video ii inspired me to learn about solutions like JSMpeg and Broadwayjs

These two libraries do not rely on the browser’s video playback mechanism, using pure JS/WASM video decoder, and then directly drawn through Canvas2d or WebGL. Broadwayjs currently does not support voice, JSMpeg supports voice (based on WebAudio).


After a simple test, compared with RTMP, JSMpeg and BroadwayJS latency is very low, basically meeting our requirements. Here is a brief introduction to JSMpeg usage. The use of Broadwayjs is similar and will be briefly discussed below. Their basic processing process is as follows:



The Relay server

Because FFMPEG cannot push streams directly to the Web, we still need to create a relay server to receive the video push streams and forward them to the page player via WebSocket.

Ffmpeg supports HTTP, TCP, UDP and other streams. HTTP push streams are more convenient for us to handle, and because of the local environment, there is no significant performance difference between these network protocols.

Create an HTTP server to receive push streams. Push path is /push/:id:

this.server = http
  .createServer((req, res) = > {
    const url = req.url || '/'
    if(! url.startsWith('/push/')) {
      res.statusCode = 404
      // ...
      return
    }

    const id = url.slice(6)

    // Disable timeout
    res.connection.setTimeout(0)

    // Forward out
    req.on('data', (c) => {
      this.broadcast(id, c)
    })

    req.on('end', () = > {/ *... * /
    })
  })
  .listen(port)
Copy the code




Then send by WebSocket will flow, page can be through the ws: / / localhost: PORT/pull / {id} pull the video stream:

/** * Pull streams using webSocket */
this.wss = new ws.Server({
  server: this.server,
  // Pull the stream through /pull/{id}
  verifyClient: (info, cb) = > {
    if (info.req.url && info.req.url.startsWith('/pull')) {
      cb(true)}else {
      cb(false.undefined.' ')}}})this.wss.on('connection', (client, req) => {
  const url = req.url
  const id = url.slice(6)

  console.log(`${prefix}new player attached: ${id}`)

  let buzy = false
  const listener = {
    id,
    onMessage: (data) = > {
      / / push
      if (buzy) {
        return
      }

      buzy = true
      client.send(data, { binary: true }, function ack() {
        buzy = false}})},this.attachListener(listener)

  client.on('close', () = > {console.log(`${prefix} player dettached: ${id}`)
    this.detachListener(listener)
  })
})
Copy the code




push

Here again ffMPEG is used as a push example:

$ ffmpeg -f avfoundation -r 30 -i "FaceTime HD Camera" -f mpegts -codec:v mpeg1video -an  -bf 0 -b:v 1500k -maxrate 2500k http://localhost:9999/push/test
Copy the code

Explain the ffmpeg command a little bit

  • -f mpegts -codec:v mpeg1video -anSpecifies that mPEG-TS packet format is used and mPEG1 video encoding is used, ignoring audio
  • -bf 0The JSMpeg decoder is temporarily unable to process B frames correctly. So these disable B frames. What is an I/B/P frameThe article
  • -b:v 1500k -maxrate 2500kSet the average bit rate and maximum bit rate of the push stream. After testing, JSMpeg bit rate is too high prone to splashes and array out of bounds crash.

JSMpeg also requires that the width of the video be a multiple of 2. Ffmpeg can solve this problem by filtering the video or by setting the video size (-s), but the extra conversions will consume a certain amount of CPU:

ffmpeg -i in.mp4 -f mpeg1video -vf "crop=iw-mod(iw\,2):ih-mod(ih\,2)" -bf 0 out.mpg
Copy the code




Video playback

<canvas id="video-canvas"></canvas>
<script type="text/javascript" src="jsmpeg.js"></script>
<script type="text/javascript">
  const canvas = document.getElementById('video-canvas')
  const url = 'ws://localhost:9999/pull/test'
  var player = new JSMpeg.Player(url, {
    canvas: canvas,
    audio: false.pauseWhenHidden: false.videoBufferSize: 8 * 1024 * 1024,})</script>
Copy the code

The API is simple, above we pass a canvas to JSMpeg, disable Audio, and set a large buffer size to handle some bit rate fluctuations.


See the full code here




Multi-process optimization

The actual test results show that JSMpeg video latency is between 100ms-200ms. Of course, this also depends on the quality of the video, the performance of the terminal and other factors.

Due to the limitation of terminal performance and decoder efficiency, there is a high probability that JSMpeg will cause blowout or memory access out of bounds problems for video streams with high average bit rate (approximately 2000K in the author’s rough test).


Therefore, we have to compress the video quality and reduce the video resolution to reduce the video bit rate. However, this does not fundamentally solve the problem, which is one of the pain points of using JSMpeg. See JSMpeg’s performance description


Because decoding itself is a CPU-intensive operation and is performed by the browser, CPU usage is quite high (the CPU usage for a single page and player on my machine is around 16%), and the JSMpeg player can be difficult to recover if it crashes abnormally.

In our practical application scenario, a page might play multiple videos, and if all the videos were decoded and rendered in the main browser process, the page experience would be poor. Therefore, it is best to separate JSMpeg into the Worker to ensure that the main process can respond to user interactions and that JSMpeg crashes do not affect the main process.

The good news is that it’s easy to execute JSMpeg in a Worker: Worker in independent WebSocket support requests, the other Canvas by transferControlToOffscreen () method to create OffscreenCanvas object and passed to the Worker, realize the Canvas off-screen rendering.

Worker.js is similar to the code above, with the addition of worker communication:

importScripts('./jsmpeg.js')

this.window = this

this.addEventListener('message', (evt) => {
  const data = evt.data

  switch (data.type) {
    // Create the player
    case 'create':
      const{ url, canvas, ... config } = data.datathis.id = url
      this.player = new JSMpeg.Player(url, {
        canvas,
        audio: false.pauseWhenHidden: false.videoBufferSize: 10 * 1024 * 1024. config, })break

    // Destroy the player
    case 'destroy':
      try {
        if (this.player) {
          this.player.destroy()
        }
        this.postMessage({ type: 'destroyed'})}catch (err) {
        console.log(LOGGER_FREFIX + 'Destroy failed:', global.id, err)
        this.postMessage({
          type: 'fatal'.data: err,
        })
      }

      break}})/ / ready
this.postMessage({ type: 'ready'.data: {}})Copy the code




Take a look at the main process, through transferControlToOffscreen () generated from screen apply colours to a drawing canvas, let JSMpeg seamlessly migrate to the Worker:

const video = document.getElementById('video')
const wk = new Worker('./jsmpeg.worker.js')

wk.onmessage = (evt) = > {
  const data = evt.data
  switch (data.type) {
    case 'ready':
      // Create the OffscreenCanvas object
      const oc = video.transferControlToOffscreen()

      wk.postMessage(
        {
          type: 'create'.data: {
            canvas: oc,
            url: 'ws://localhost:9999/pull/test',
          },
        },
        [oc] // Notice here
      )

      break}}Copy the code




A quick word about Broadway. Js

There is also a jSMPEG-like solution ———— Broadwayjs. It is an H.264 decoder converted from the Android H.264 decoder using the Emscripten tool. It supports receiving raw H.264 streams, with some limitations: Weighted prediction for P-frames & CABAC entropy encoding is not supported.


Push example:

$ ffmpeg -f avfoundation  -r 30 -i "FaceTime HD Camera"  -f rawvideo -c:v libx264 -pix_fmt yuv420p -vprofile baseline -tune zerolatency -coder 0 -bf 0 -flags -loop -wpredp 0 -an  http://localhost:9999/push/test
Copy the code




Client example:

const video = document.getElementById('video')
const url = `ws://localhost:9999/pull/test`
const player = new Player({
  canvas: video,
})
const ws = new WebSocket(url)
ws.binaryType = 'arraybuffer'

ws.onmessage = function (evt) {
  var data = evt.data
  if (typeofdata ! = ='string') {
    player.decode(new Uint8Array(data))
  } else {
    console.log('get command from server: ', data)
  }
}
Copy the code

See the full code here

In tests, JSMpeg and Broadway used about the same AMOUNT of CPU for video streams of the same quality and size. But Broadway streams are not limited by bit rates, so there are no splashes and no crashes. Of course, for high quality video, ffMPEG conversion and Broadway play, the resource consumption is very impressive.


Other similar schemes:

  • wfs html5 player for raw h.264 streams.




③ Render YUV directly

Going back to the beginning of this article, what the underlying library actually gets from WebRTC is YUV’s original video stream, that is, frame by frame images that have not been encoded and compressed. All the schemes described above have additional decamping and decoding processes, and the final output is also YUV format video frames. The last step of all of them is to convert these YUV format video frames into RGB format and render them to Canvas.

Can’t we just forward the original YUV video frame and render it directly on Cavans? Will remove the middle decoding process, what is the effect? Give it a try.


A previous article has attempted to do this: IVWEB Playing with the WASM series -WEBGL YUV Rendering Image Practices. Let’s do one with reference to that.

As for what is YUV, I will not science, search. The size of a YUV frame can be calculated using this formula: (width * height * 3) >> 1, which means that YUV420p takes 1.5 bytes per pixel.

So all we need to know is the size of the video, so we can cut the video stream and separate the video frames. Next, create a new transfer server to receive the push stream. Here, cut the YUV bare stream into frame by frame image data and send it to the browser:


this.server = http.createServer((req, res) = > {
  // ...
  const parsed = new URL('http://host' + url)
  let id = parsed.searchParams.get('id'),
    width = parsed.searchParams.get('width'),
    height = parsed.searchParams.get('height')

  const nwidth = parseInt(width)
  const nheight = parseInt(height)

  const frameSize = (nwidth * nheight * 3) > >1

  // Cut the stream by byte size
  const stream = req.pipe(new Splitter(frameSize))

  stream.on('data', (c) => {
    this.broadcast(id, c)
  })
  // ...
})
Copy the code

Splitter cuts the Buffer by a fixed byte size.


If I render YUV? Refer to the JSMpeg WebGL renderer, the Broadway. Js WebGL renderer. The yuvcanvas. js file from Broadway.

const renderer = new YUVCanvas({
  canvas: video,
  type: 'yuv420'.width: width,
  height: height,
})

// Receive YUV frames via WebSocket. And I'm going to extract the YUV component
function onData(data) {
  const ylen = width * height
  const uvlen = (width / 2) * (height / 2)

  renderer.render(
    buff.subarray(0, ylen),
    buff.subarray(ylen, ylen + uvlen),
    buff.subarray(ylen + uvlen, ylen + uvlen + uvlen),
    true)}Copy the code




Note that both JSMpeg and Broadway Canvas render require the video to be a multiple of 8. Those that do not meet this requirement will report an error. IVWEB Playing with WASM Series -WEBGL YUV Rendering Image Practice deals with this problem.


Finally, take a look at the FFMPEG push example:

$ ffmpeg -f avfoundation -r 30 -i "FaceTime HD Camera" -f rawvideo -c:v rawvideo -pix_fmt yuv420p "http://localhost:9999/push? id=test&width=320&height=240"
Copy the code


See the full code here




Let’s look at a simple resource consumption comparison. My device is 15 Macboook Pro, video source collected from the camera, resolution 320×240, uyVY422 pixel format, frame rate 30.

The following tableJsaidJSMpeg,BsaidBroadway,YsaidYUV

CPU (J/B/Y) Memory (J/B/Y) Average bit rate (J/B/Y)
ffmpeg 9% / 9% / 5% 12MB / 12MB / 9MB 1600k / 200k / 27000k
The server 0.6% / 0.6% / 1.4% 18MB / 18MB / 42MB N/A
player 16/13/8% 70MB / 200MB / 50MB N/A


The results show that direct rendering of YUV takes up the least resources. Because it is not compressed, the bit rate is also very high, but the local environment is not limited by bandwidth, so this is not a problem. We can also use requestAnimationFrame to schedule the playback rate by the browser, dropping the accumulated frames and keeping the playback latency low.

The follow-up. If the YUV scheme is used, the pressure on the relay server to forward the Websocket will be greater.




In this paper, the

Further reading

  • Live broadcasting principle and Web live broadcasting actual combat
  • IVWEB play WASM series -WEBGL YUV rendering image practice
  • Low-latency live streaming applications
  • Live broadcast protocol and video surveillance scheme based on H5
  • HTML5 Live Video (2)