Development history and scene of live broadcasting

  • Development history of live broadcasting: It started from live broadcasting on PC terminal, moved to PC terminal of browser, then to mobile terminal, and now H5 terminal of browser. It can be said that it has evolved with the development of technology.

Live streaming architecture and process

  • Live streaming architecture: one server at both ends (push streaming end, pull streaming end, streaming media server)

In the whole process of live broadcasting, streaming runs through the whole process, which is actually a process of audio and video streaming transmission and conversion

  • Live broadcast process: Audio and video recording -> audio and video processing -> Audio and video coding -> Streaming -> Streaming media server -> CDN distribution -> Decoding -> Streaming

The following explains each process one by one:

  • Audio collection: the analog signals in the environment are collected into PCM encoded original data by the equipment, and then compressed into MP3, ACC, OGG and other compressed formats for distribution.
  • Video capture: The essence of video is actually a series of pictures for continuous playback. The process of image acquisition is mainly shot by the camera and other equipment into yuV-encoded original data, and then encoded and compressed into H.264, MP4, FLV and other formats of data distributed. The collection methods include camera collection, screen recording and file streaming directly.
  • Audio and video processing: after audio and video collection is completed, in order to add some additional effects, it is generally processed before coding and compression, such as beauty, watermarking and sound effects.
  • Encoding and encapsulation:

Video coding is a very important part of live broadcasting. For streaming media transmission, encoding speed and encoding compression ratio will directly affect the efficiency of streaming media transmission. For example, video source data encoded using the H.264 codec can be compressed from hundreds of megabytes to hundreds of gigabytes. The principle of compression is to remove redundant information:

  1. Spatial redundancy: there are correlation rules between adjacent elements of the image.
  2. Temporal redundancy: contents of adjacent images are similar.
  3. Visual redundancy: people who are insensitive to certain details can compress.
  4. Knowledge redundancy: Regular predictive compression (according to previous image content).

Video encapsulation is the storage of video after video encoding, which can also be understood as a container. It also provides index for streaming media data. FLV format and MPEG2-TS format (TS format for short) are usually used in live broadcast.

  • Live push and pull stream and push and pull stream protocol
  1. Real Time Messaging Protocol (RTMP) : RTMP is an acronym for real-time Messaging transfer Protocol. Based on TCP, it belongs to Adobe, which is a network protocol for real-time data transmission. It is mainly used for audio and video transmission and data communication between Flash platform and streaming media server that supports this protocol. It is the mainstream streaming media transmission protocol at present. Supports two ends (push and pull). Protocol features:
  • Low delay, 1-2S;
  • It relies on Flash and does not support H5
  1. HTTP Live Streaming (HLS) : a Streaming media transmission protocol proposed by Apple. It is a pull Streaming protocol.
  • H5 is highly supported and compatible. Only Safari is supported on PC.
  • The encoding format is H.264 + ACC.
  • High delay (about 20s); Why is the delay high? Because we have to wait for data.
  • Shortening the TS file interval and the number of slices can reduce the delay. However, considering the pressure brought by high frequency TS resource requests to the server, the optimization of HLS needs to find a balance, adjust the sharding strategy, and achieve the balance of delay while ensuring performance. A sliced TS file looks like this. Screenshots 13.15.27 2019-12-21
  1. Http-flv (Flash Video) : This is also introduced by Adobe. HTTP + FLV means that the streaming end encapsulates audio and Video data into FLV format and transmits it to the client through HTTP. Features:
  • Low latency, up to 2s, IOS9 and below is not supported;
  • Encoding format H.264+ACC;
  • As H5 does not support Flash, we need to use Bilbli open source JS library FLv. js, which is packaged into fMP4 by MSE and then fed to video.
  • Feasible solution: If http-FLV protocol is adopted, degradation scheme needs to be used, and FLV. js, 167KB, needs to be introduced, and the internal and external environments of the end need to be tested for compatibility problems.

MSE: That is, video does not support us to operate the video stream. The emergence of MSE enables us to operate the video stream. So how does he operate.

He replaces the video SRC with a MediaSource object and multiple buffers. Without precise control over time, media quality, and memory release requirements, using video and source tags is a simpler but adequate solution. However, MSE is basically disabled in IOS, which leads to the poor compatibility of FLv.js to IOS.

You can see that the source code calls several MSE apis

The H5 live broadcast project has been implemented

Overall structure: The most basic is also the most core including two, push and pull flow and long connection, namely real-time communication.

  • Realization of real-time communication between direct broadcast – WebSocket

Usually we want and server-side real-time communication, is a simple way is polling, but the request of the high frequency, the service side pressure, but the WebSocket can solve this problem, it can be a single TPC full-duplex communication connection, also is the server and the client can push each other data, so as to realize real-time communication.

  • However, it is not possible to rely only on the long connection this time, because theoretically the connection will always remain, but such as disconnection, connection timeout, process kill, etc., the connection will not automatically detect the disconnection, the connection will be disconnected.
  • So you have a heartbeat mechanism, where the client has to tell the server that we’re online and we can receive a message, which is called ping, and the server has to tell the client that he’s online and he can receive a message, which is called Pong.
  • Long Connection Survival policy:
  1. Start the heartbeat timer and send a message, namely a ping, every 10 seconds
  2. When ws-onopen () is enabled, the setInterval timer checks every second whether no message is received for more than 30 seconds. If so, the connection status is determined. If it is disconnected, the connection is reconnected
  3. In a last-ditch strategy, if the long connection fails multiple times, start Ajax and keep trying for the long connection
let ws = new WebSocket(url);
// The connection succeeded
  ws.onopen = (a)= > {
    loopCheckValid();
  }
// The message is received
  ws.onmessage = (a)= > {
   this.receiveTime =Date.now()
  }
// The connection is closed
  ws.onclose = (a)= > {
    reconnect();
  }
// Connection failed
  ws.onerror = (a)= > {
    reconnect();
  }
  
  loop() {
    this.loopTimer = setInterval((a)= > {
      if (Date.now() - this.receiveTime >= 30) { handle(); }},1000);
  }

Copy the code
  • Another strategy:
  1. The client pings the server at intervals
  2. The timer is started when the client sends packets
  3. When the server receives a ping, it responds with a Pong
  4. If the client receives the PNG from the server, the connection is normal and the timer is deleted
  5. If the client timer times out and still does not receive PONG, it indicates that the connection is broken and reconnected

Other related

Other live broadcast scenarios include, for example, Live broadcast by Lianyimai, using WebrTC, and live broadcast by wechat mini program. Wechat mini program can also achieve live broadcast by Lianyimai, but it is limited by the support ability of wechat mini program.

In terms of communication effect, it is best to use H5 live broadcast, which saves the cost of repeated development for other scenes.