Abstract: The SEI is used to solve the problem of unsynchronization of audio and picture in the recording and playback of data stream.

Text | constructs the Web SDK development team

In June this year, ZEGO Technology launched the industry’s first set of data stream recording PaaS solution, breaking the traditional recording service tradition, to achieve 100% recording restore effect (click to see the solution introduction article).

In the process of data stream recording and playback, we need to combine audio and video pictures and whiteboard pictures into a playback picture, which is simulated as a player for synchronous playback. During this process, the recorded audio and video may be delayed due to network jitter. If the recording process is not handled in time, the playback progress and recording process, audio and video images and other images may be out of sync.

So how do we deal with this situation?

In this article, we will start with the basic concepts of the SEI, and take a look at how the SEI can be used to solve the problem of audio and picture synchronization, as well as the pitfalls of development.

What is SEI

1. Introduction to SEI

SEI, or Supplemental Enhancement Information, belongs to the code stream category and provides a way to add additional Information to the video stream. It is a feature of these H.264/H.265 video compression standards.

In the header of NAL UINT in H264/AVC encoding format, the type field indicates the type of NAL UINT. When type = 6, the information carried by NAL UINT is supplementary enhancement information (SEI).

SEI information can be inserted during the transmission of video content generation.

2. Basic characteristics of SEI

  • Not a necessary option in the decoding process. In other words, the SEI has no direct influence on the decoding process.
  • Possibly useful for decoding processes (fault tolerance, error correction), you can write logic during decoding based on information inserted in the SEI.
  • Integrated in the video code stream, read from the code stream.

SEI application

Using the SEI’s ability to store data, you can also:

  • Pass encoder parameters
  • Transfer video copyright information
  • Pass camera parameters
  • Pass clipping events during content generation
  • Pass custom messages

Enterprises can utilize SEI features to implement business functions based on their own business scenarios.

How to use SEI to implement business logic

Let’s take the Web side as a starting point to take a look at the SEI reading process and its application.

1. Insert the SEI into the video stream

Before reading the SEI can be implemented, the SEI must be inserted into the audio-video stream. You can learn about the SEI’s insertion methods and rules. You can search the web for detailed procedures.

2. Read data on the Web platform

Hjplayer. js is a Fragmented audio and video plug-in that decodes and transcodes FLV streams and HLS TS streams into Fragmented MP4. The MP4 snippet is then populated into HTML5 via the Media Source Extensions API, which provides callback methods for SEI information.

Plug-in initialization:

const videoElement = document.getElementById('videoElement');
const player = new HJPlayer({
    type: 'flv',
    url: 'http://xxx.xxx.com/xxxx.flv',
});
player.on(HJPlayer.Events.GET_SEI_INFO, (e) => {
    console.log(e); // SEI Message
});
Copy the code

This callback method provides the information returned by the SEI read, but the SEI information does not correspond to the current video playback progress, but the current video cache read progress. In other words, the SEI of the current callback is not the SEI of the current playing frame, but the SEI of the future frame. In this case, we need to know which frame the SEI of the returned frame corresponds to.

3. Get the current position returned by the SEI

To obtain the SEI return location, you need to modify the hjPlayer.js source code.

We need to understand how SEI reading works before we can transform it:

  • First, hjPlayer.js is packaged based on FLv.js. It works by multiplexing FLV file transfer codes into ISO BMFF (MP4 fragments) fragments, and then setting MP4 fragments into native HTML5 Video tags through Media Source Extensions.

  • Then, in the process of FLV file flow code reuse, the MP4 fragment will be parsed. By parsing the information carried by NALU, SEI information can be obtained.

Because the analysis is carried out in the unit of segments, we cannot know the exact position of each SEI. However, we can know the specific position of the segments containing the SEI. By calculating the specific position of the segments, we can get the general position of the SEI.

Let’s get the location of the SEI fragment by modifying the hjPlayer. js source code. Without further ado, let’s look at the source code after the transformation:

*// HJPlayer/src/Codecs/FLVCodec/Demuxer/FLVDemuxer.ts* _parseAVCVideoData( arrayBuffer: ArrayBuffer, dataOffset: number, dataSize: number, tagTimestamp: number, tagPosition: number, frameType: number, cts: number ) { const le = this._littleEndian; const v = new DataView(arrayBuffer, dataOffset, dataSize); const units: Array<NALUnit> = []; let length = 0; let offset = 0; const lengthSize = this._naluLengthSize; const dts = this._timestampBase + tagTimestamp; let isKeyframe = frameType === 1; *// from FLV Frame Type constants* while(offset < dataSize) { if(offset + 4 >= dataSize) { Log.warn( this.Tag, `Malformed Nalu near timestamp ${dts}, offset = ${offset}, dataSize = ${dataSize}` ); break; *// data not enough for next Nalu* } *// Nalu with length-header* (*AVC1*) let naluSize = v.getUint32(offset, ! le); *// Big-Endian read* if(lengthSize === 3) { naluSize >>>= 8; } if(naluSize > dataSize - lengthSize) { Log.warn(this.Tag, `Malformed Nalus near timestamp ${dts}, NaluSize > DataSize! `); return; } const unitType = v.getUint8(offset + lengthSize) & 0x1f; if(unitType === 5) { *// IDR* isKeyframe = true; } const data = new Uint8Array(arrayBuffer, dataOffset + offset, lengthSize + naluSize); const unit: NALUnit = { type: unitType, data }; Uint8Array = data.subarray(lengthSize); Event.eventemitter. Emit (events.get_sei_info, {sei: unitArray, tagPosition}); } catch (e) { Log.log(this.Tag, 'parse sei info error! '); } } units.push(unit); length += data.byteLength; offset += lengthSize + naluSize; } if(units.length) { const track = this._videoTrack; const avcSample: AvcSampleData = { units, length, isKeyframe, dts, cts, pts: dts + cts }; if(isKeyframe) { avcSample.fileposition = tagPosition; } track.samples.push(avcSample); track.length += length; }}Copy the code

In the source code above, the _parseAVCVideoData method parses the SEI information. The tagPosition parameter is used to identify the location where the fragment is being read and is exposed at the location where the events.get_sei_info callback is triggered. Divide tagPosition by totalLength, the total byte length of the video resource, to get the percentage of read position, and then calculate the approximate position of the SEI.

If you want to know exactly where the SEI is, you can read smaller fragments at a time, which makes the calculation more accurate, but also adds some performance overhead.

4. Use SEI stored timestamp to correct video progress

To take advantage of the SEI’s ability to store data, store the timestamp of the playback position of the video stream in the SEI and use this data as a baseline for the playback time.

Step 1: Calculate the current position of the SEI, such as the SEI returned by 10s.

Step 2: According to the calculated SEI position, find the frame node corresponding to the current SEI position, and save the timestamp recorded by the current SEI in the frame node data;

Step 3: Calculate the baseline progress of the video in the current frame according to the timestamp and start playing time. If the difference between the video progress and the baseline progress is greater than a certain threshold, correct back to the baseline progress.

Let’s use an example to understand the above ideas:

Here is a timeline of a certain area of the playback player, assuming that the playback player starts with a timestamp of T1:

  1. When the playback player reaches 7s, a video stream comes in and starts to play from the position of progress 0.
  2. When the playback player plays 10s, the video stream is currently playing 3s.
  3. At 10s, SEI information is stored in the frame node, and the timestamp is T2.
  4. According to T2-T1-7s, the baseline playback progress of the video stream is C;
  5. If C subtracts the current video stream progress 3s (namely c-3s), and is greater than 0.5s, the current video stream progress is adjusted to C to ensure that the current video stream picture is displayed synchronizes with other non-video stream pictures.

This is the process of correcting the video progress using the SEI’s stored timestamp to ensure the synchronization of audio and painting during playback.

3, hjplayer.js pit stepping and pit filling skills

In addition to retrieving the SEI using the hjPlayer. js plug-in, we will also use it to perform some basic audio and video operations, such as playing, fast forwarding, and fast rewind.

The following common problems may occur during use. The following describes specific situations.

Problem 1: Handling the waiting state

After the user adjusts the video progress to the uncached area, the current video will be in waiting state, causing loading of the video to fail to play and jump normally. In this case, the Unload and Load methods of player instance need to be called to reload the video.

Example code is as follows:

const videoElement = document.getElementById('videoElement'); const player = new HJPlayer({ type: 'flv', url: 'http://xxx.xxx.com/xxxx.flv', }, { ... user config } ); player.attachMediaElement(videoElement); player.load(); player.play(); / /... videoElement.addEventListener('waiting', () => { player.unload(); player.load(); });Copy the code

Problem two: Processing of jumping to an uncached area

When the user adjusts the video progress to the uncached area, a loading icon will appear on the video screen, and it will stop in the current progress, unable to jump and play normally, and the video will be in waiting state, as shown below:

 

We can avoid this by doing the following:

Step 1: Set the lazyLoad property

const videoElement = document.getElementById('videoElement'); const player = new HJPlayer({ type: 'flv', url: 'http://xxx.xxx.com/xxxx.flv', }, { lazyload: false, ... user config } );Copy the code

Setting the lazyLoad property to false indicates that the HTTP link will not be broken when the video cache is long enough. However, if a long video is loaded, the cache will stop loading after a certain progress.

Step 2: Listen for cache progress and mount it on the Player instance

videoElement.addEventListener('process', () => { const len = video.buffered.length; if (len) { player.process = video.buffered.end(len - 1); }});Copy the code

 

From the listening callback for caching progress, log the caching progress of the current video.

Step 3: Adjust the jump progress method

function seek(targetTime) { if (player.task) return; player.task = setInterval(() => { const process = player.process; if (targetTime > process) { videoElement.currentTime = process - 2; } else { videoElement.currentTime = targetTime; clearInterval(player.task); player.task = null; }}, 100); }Copy the code

The timer is used to poll the current cache progress. If the current cache progress is smaller than the target progress, the current playback progress will be adjusted to a position similar to the cache progress. At this time, the request for cache resources will be actively triggered until the target progress is cached.

At this point, the jump to an uncached area is handled.

Four,

Data stream recording is a convenient, efficient and read-to-use standardized PaaS scheme formed by optimizing the self-developed technology of education enterprises, which breaks the traditional recording service and achieves 100% recording restore effect.

Above this article is about the interpretation and application of supplement to enhance information (SEI), the structure of science and technology use FLV audio-visual carry SEI, carry some validation information, to verify the benchmark of the audio and video playback duration, using SEI implement multiple replays real-time synchronization, the highest degree of reduction on live radio, ascension back to see the quality of the recording.

For more detailed information on streaming recording, see the official documentation of Struct: doc-zh.zego.im/article/112…