Web developers have long wanted to use audio and video on the Web, but in the early days, traditional Web technologies couldn’t embed audio and video on the Web, so proprietary technologies like Flash and Silverlight became popular for handling this content.

These technologies work fine, but have a number of issues, including poor support for HTML/CSS features, security issues, and feasibility issues.

Fortunately, when the HTML5 standard was released, it included a number of new features, including the

This article will take you through eight aspects to explore front-end Video players and mainstream streaming media technologies. After reading this article, you will know the following:

  • Why is the Video source address of some web pages in the form of Blob URL?
  • What are HTTP Range requests and related concepts of streaming media technology?
  • Understand the concept of HLS, DASH, adaptive bitrate streaming technology and streaming media encryption technology;
  • Understand FLV file structure, flV.js functional features, usage restrictions and internal working principle;
  • Understand MSE (Media Source Extensions) API and related usage;
  • • Understand the principles of video player, multimedia packaging format and the difference between Fragmented MP4 and MP4;

In the last part of “What Bob has to say”, Bob will introduce how to achieve player screenshots, how to generate GIF based on screenshots, how to use Canvas to play video and how to achieve chroma keys and other functions.

Read Po’s recent popular articles (thanks to Digg friends for their encouragement and support 🌹🌹🌹) :

  • [13000 words] Play front-end binary (328+ 👍)
  • How to play Word documents in the front end (225+ 👍)
  • 1.2 W word | great TypeScript introductory tutorial (1348 + 👍)
  • Top 10 Amazing TS Projects (699+ 👍)
  • Understanding TypeScript generics and applications (7.8K words) (581+ 👍)
  • Picture processing need not worry, give you ten small helper (485+ 👍)

First, the traditional broadcast mode

Most Web developers are familiar with

<video id="mse" autoplay=true playsinline controls="controls">
   <source src="https://h5player.bytedance.com/video/mp4/xgplayer-demo-720p.mp4" type="video/mp4">
Your browser does not support the Video TAB</video>
Copy the code

After the above code is rendered by the browser, a Video player will be displayed in the page, as shown below:

(photo: https://h5player.bytedance.com/examples/)

Using Chrome Developer tools, we can see that when playing the video file “xgplayer-Demo-720p.mp4”, we send three HTTP requests:

In addition, it is clear from the figure that the status code for the first two HTTP request responses is “206”. Here we examine the request and response headers for the first HTTP request:

In the request header above, there is a range: bytes=0- header that checks whether the server supports range requests. If there is an Accept-ranges header in the response (and it does not have a value of “None”), then the server supports range requests.

In the response header above, Accept-ranges: bytes indicates that the range is defined in bytes. Content-length is also valid because it provides the full size of the video to download.

1.1 Request a specific scope from the server side

If the server supports Range requests, you can use the Range header to generate such requests. This header indicates which part or portions of the file should be returned by the server.

1.1.1 Single range

We can request a certain part of the resource. Here we use the REST Client extension in Visual Studio Code to test. In this example, we use the Range header to request the first 1024 bytes of the www.example.com home page.

For a single range request made using the REST Client, the server returns a response with the status code 206 Partial Content. The “Content-Length” header in the response header is now used to indicate the size of the previous request range (rather than the size of the entire file). The “Content-range” header indicates the location of the Content within the entire resource.

1.1.2 Multiple ranges

The Range header also supports requesting multiple parts of the document at once. Request ranges are separated by a comma. Such as:

$ curl http://www.example.com -i -H "Range: bytes=0-50, 100-150"
Copy the code

The following response information is returned for this request:

Since we are multiple parts of the request document, each part will have independent “Content-Type” and “Content-range” information, and the response body will be divided using the boundary parameter.

1.1.3 Conditional range request

When re-requesting more resource fragments, you must ensure that the resource has not been modified since the last fragment was received.

The if-range header can be used to generate conditional Range requests: If the condition is met, the conditional request is validated and the server returns a 206 Partial response with the corresponding message body. If the condition is not met, a response with a status code of “200 OK” is returned, along with the entire resource. This header can be used with either the Last-Modified validator or ETag, but not both.

1.1.4 Response to a range request

There are three states associated with scope requests:

  • In the case of a successful request, the server returns the 206 Partial Content status code.
  • If the Requested Range exceeds the size of the resource, the server returns the status code “416 Requested Range Not Satisfiable”.
  • In cases where range requests are not supported, the server returns a “200 OK” status code.

The remaining two requests, Po ge no longer detailed analysis. If you are interested, you can use the Chrome Developer Tool to view the specific request messages.

With the third request, we know that the entire video is about 7.9 MB in size. If the video file to be played is too large or the network is unstable, it takes a long time to play the video, which seriously deteriorates the user experience.

So how to solve this problem? To solve this problem, we can use streaming media technology. Next, we will introduce streaming media.

Second, streaming media

Streaming media refers to a technology and process that compresses a series of media data, sends data segmented through the Internet, and instantly transmits video and audio on the Internet for viewing. This technology enables data packets to be sent like flowing water. If you do not use this technique, you must download the entire media file before using it.

Streaming media actually refers to a new way of media transmission, audio stream, video stream, text stream, image stream, animation stream, etc., rather than a new medium. The main technical feature of streaming media is streaming transmission, which enables data to be transmitted like flowing water. Streaming is the general term for the technology of transmitting media over a network. There are two main ways to realize Streaming: Progressive Streaming and Real Time Streaming.

Common streaming media protocols on the network at present:

According to the table above, different protocols have different advantages and disadvantages. In practice, we usually choose the best streaming media transmission protocol under the condition of platform compatibility. For example, http-FLV is a good choice for live streaming in the browser. The performance is better than RTMP+Flash, and the latency can be as good as or better than RTMP+Flash.

Since HLS has a large delay, it is generally only suitable for voD scenarios. However, due to its good compatibility in mobile terminals, it can also be applied to live scenes under the condition of accepting high delay.

At this point, I believe some friends may be curious about the intuitive difference between the use of streaming media technology and the traditional playing mode for Video elements. Here’s a quick comparison of the differences, using the common HLS streaming protocol as an example.

Looking at the figure above, it is clear that the SRC attribute of the

2.1 a Blob

Blob (Binary Large Object) represents a Large Object of Binary type. In a database management system, the storage of binary data as a single collection of individuals. Blobs are usually video, sound, or multimedia files. “Objects of type Blob in JavaScript represent immutable file-like raw data.”

Blob consists of an optional string type (usually a MIME type) and blobParts:

MIME (Multipurpose Internet Mail Extensions) is an application type for opening a file with a specified extension name. When the file is accessed, the browser automatically opens the file with a specified application name. It is used to specify the name of a file customized by the client and the opening mode of some media files.

Common MIME types are: hypertext Markup language text. HTML text/ HTML, PNG image. PNG image/ PNG, and plain text.

To get a better feel for Blob objects, let’s use the Blob constructor to create a myBlob object, as shown below:

As you can see, the myBlob object has two properties: size and Type. The size attribute is used to represent the size of the data in bytes, and type is a string of MIME type. Blobs don’t necessarily represent data in JavaScript’s native format. For example, the File interface is based on Blob, inheriting Blob functionality and extending it to support files on the user’s system.

2.2 the Blob URL/Object URL

Blob URL/Object URL is a pseudo-protocol that allows Blob and File objects to be used as URL sources for images, links to download binary data, etc. In the browser, we create Blob urls using the url.createObjecturl method, which takes a Blob object and creates a unique URL for it of the form Blob :


, as shown in the following example:

Copy the code

The browser stores a URL → Blob mapping internally for each URL generated through url.createObjecturl. Therefore, such urls are short, but can access bloBs. The generated URL is valid only if the current document is open. But if the Blob URL you visited no longer exists, you will get a 404 error from your browser.

The Blob URL above may seem like a good idea, but it actually has side effects. Although the URL → Blob mapping is stored, the Blob itself remains in memory and cannot be freed by the browser. The mapping is cleared automatically when the document is unloaded, so the Blob object is then released. However, if the application is long-lived, that won’t happen quickly. Therefore, if we create a Blob URL, it will remain in memory even though the Blob is no longer needed.

For this problem, we can call the url.revokeObjecturl (URL) method to remove references from the internal map, allowing the Blob to be deleted (if there are no other references), and freeing memory.

2.3 the Blob vs ArrayBuffer

In addition to “Blob objects” on the front end, you may also encounter “ArrayBuffer objects”. It is used to represent a generic, fixed-length buffer of raw binary data. Instead of manipulating the contents of the ArrayBuffer directly, you need to create a TypedArray object or DataView object that represents the buffer in a specific format, and use that object to read and write the contents of the buffer.

Blob objects and ArrayBuffer objects have different characteristics. The differences between them are as follows:

  • Unless you need to use the write/edit capabilities provided by ArrayBuffer, the Blob format is probably best.
  • Blob objects are immutable, whereas arrayBuffers can be manipulated via TypedArrays or DataViews.
  • An ArrayBuffer is stored in memory and can be manipulated directly. Blobs can be on disk, cache memory, and other locations that are not available.
  • Although blobs can be passed directly as arguments to other functions, for examplewindow.URL.createObjectURL(). However, you may still need a File API like FileReader to work with blobs.
  • Blob and ArrayBuffer objects are interchangeable:
    • The use of FileReaderreadAsArrayBuffer()Method to convert a Blob object into an ArrayBuffer object;
    • Use the Blob constructor, as innew Blob([new Uint8Array(data]);To convert an ArrayBuffer object into a Blob object.

In a front-end AJAX scenario, we might also use Blob or ArrayBuffer objects in addition to the usual JSON format:

function GET(url, callback) {
  let xhr = new XMLHttpRequest();
  xhr.open('GET', url, true);
  xhr.responseType = 'arraybuffer'; // or xhr.responseType = "blob";
  xhr.onload = function(e) {  if(xhr.status ! =200) {  alert("Unexpected status code " + xhr.status + " for " + url);  return false;  }  callback(new Uint8Array(xhr.response)); // or new Blob([xhr.response]);  }; } Copy the code

In the example above, by setting a different data type for xhr.responseType, we can get the corresponding type of data as needed. After the introduction of the above content, we will first introduce the current widely used HLS streaming media transmission protocol.

Third, HLS

3.1 introduction of HLS

HTTP Live Streaming (ABBREVIATED HLS) is an HTTP – based network transport protocol proposed by Apple as part of Apple’s QuickTime X and iPhone software systems. It works by splitting the entire stream into small HTTP-based files for download, a few at a time. While the media stream is playing, the client can choose to download the same resource from many different alternate sources at different rates, allowing the streaming session to adapt to different data rates.

In addition, when the user’s signal strength wobbles, the video stream adjusts dynamically to provide excellent reproduction.

(photo: https://www.wowza.com/blog/hls-streaming-protocol)

Initially, only iOS supported HLS. But HLS is now a proprietary format that is supported by almost all devices. As the name implies, the HLS (HTTP Live Streaming) protocol delivers video content over a standard HTTP Web server. This means you don’t need to integrate any special infrastructure to distribute HLS content.

HLS has the following features:

  • HLS will play video encoded using H.264 or HEVC/H.265 codecs.
  • HLS will play audio encoded using AAC or MP3 codecs.
  • HLS video streams are typically cut into 10-second segments.
  • The transport/package format for HLS is MPEG-2 TS.
  • HLS supports DRM (Digital Rights Management).
  • HLS supports various advertising standards, such as VAST and VPAID.

Why apple proposed the HLS protocol, in fact, it is mainly to solve some problems existing in the RTMP protocol. For example, the RTMP protocol does not use the standard HTTP interface to transmit data, so it may be blocked by firewalls in some special network environments. However, HLS uses HTTP to transmit data, so it is not blocked by the firewall. In addition, it is also easy to transmit media streams over CDN (Content Distribution Network).

3.2 HLS adaptive bitstream

HLS is an adaptive bitrate streaming protocol. As a result, HLS streams can dynamically adapt video resolution to everyone’s network conditions. If you’re using high-speed WiFi, you can stream hd video to your phone. But if you’re on a bus or subway with limited data connections, you can watch the same video at a lower resolution.

When starting a streaming session, the client downloads an Extended M3U (M3U8) Playlist file containing metadata to find available media streams.

(photo: https://www.wowza.com/blog/hls-streaming-protocol)

To help you understand, let’s take a look at the specific M3U8 file using an online example provided by hls.js, a JavaScript implemented HLS client.


# EXT - X - STREAM - INF: PROGRAM - ID = 1, BANDWIDTH = 2149280, CODECS = "mp4a. 40.2, avc1.64001 f," RESOLUTION = 1280 x720, NAME = "720"
# EXT - X - STREAM - INF: PROGRAM - ID = 1, BANDWIDTH = 246440, CODECS = "mp4a. 40.5, avc1.42000 d," RESOLUTION = 320 x184, NAME = "240"
# EXT - X - STREAM - INF: PROGRAM - ID = 1, BANDWIDTH = 460560, CODECS = "mp4a. 40.5, avc1.420016," RESOLUTION = 512 x288, NAME = "380" url_4/193039199_mp4_h264_aac_7.m3u8 # EXT - X - STREAM - INF: PROGRAM - ID = 1, BANDWIDTH = 836280, CODECS = "mp4a. 40.2, avc1.64001 f," RESOLUTION = 848 x480, NAME = "480" url_6/193039199_mp4_h264_aac_hq_7.m3u8 # EXT - X - STREAM - INF: PROGRAM - ID = 1, BANDWIDTH = 6221600, CODECS = "mp4a. 40.2, avc1.640028," RESOLUTION = 1920 x1080, NAME = "1080" url_8/193039199_mp4_h264_aac_fhd_7.m3u8 Copy the code

By observing the m3U8 file corresponding to the Master Playlist, we can know that the video supports the following five different high-definition videos:

  • 1920 x1080 (1080 p)
  • 1280 x720 (720 p)
  • 848 x480 (480 p)
  • 512×288
  • 320×184

The media playlists corresponding to different high-definition videos are defined in their respective M3U8 files. Here we take the 720P video as an example to check its corresponding M3U8 file:

# EXTINF: 10.000.
url_462/193039199_mp4_h264_aac_hd_7.ts # EXTINF: 10.000. url_463/193039199_mp4_h264_aac_hd_7.ts # EXTINF: 10.000. url_464/193039199_mp4_h264_aac_hd_7.ts # EXTINF: 10.000. .url_525/193039199_mp4_h264_aac_hd_7.ts #EXT-X-ENDLIST Copy the code

When the user selects a video with a certain definition, the corresponding media playlist (M3U8 file) will be downloaded, which will list the information of each segment. The transmission/packaging format of HLS is MPEG-2 Transport Stream (MPEG-2 Transport Stream), which is a standard format for transmitting and storing various data including video, audio and communication protocols. It is used in digital television broadcasting systems, such as DVB, ATSC, IPTV, etc.

“It is important to note that we can merge multiple TS files into VIDEO files in MP4 format with some off-the-shelf tools.” If we want to protect video copyright, we can consider using symmetric encryption algorithms, such as AES-128 for symmetric encryption of slices. When playing, the client obtains the symmetric encryption key according to the key server address configured in the M3U8 file, and then downloads the fragment. After downloading the fragment, the client uses the matched symmetric encryption algorithm to decrypt and play the fragment.

If you are interested in the above process, you can refer to the video-hLs-ENCRYPT project on Github. This project introduces the video encryption solution based on the HLS streaming media protocol in a simple way and provides complete code examples.

(photo: https://github.com/hauk0101/video-hls-encrypt)

After introducing Apple’s HLS (HTTP Live Streaming) technology, let’s move on to DASH, another dynamic adaptive Streaming based on HTTP.

Fourth, the DASH

4.1 introduction of the DASH

“Dynamic Adaptive Flow over HTTP (English: Dynamic Adaptive Streaming over HTTP (ABBREVIATED DASH, also known as MPEG-DASH) is an Adaptive bitrate Streaming technology that enables high quality Streaming over the Internet over traditional HTTP web servers.” Similar to Apple’s HTTP Live Streaming (HLS) solution, MPEG-Dash breaks content down into a series of small HTTP-based file fragments, each containing very short pieces of playable content that can total several hours.

Content will be made into multiple bit-rate alternative segments to provide multiple bit-rate versions for selection. When the content is played back by the MPEG-DASH client, the client automatically selects which alternative to download and play based on current network conditions. The client will select the highest bit-rate segment that can be downloaded in a timely manner to play, avoiding stalling or rebuffering events. As a result, MPEG-Dash clients can seamlessly adapt to changing network conditions and provide a high-quality playback experience with less lag and re-buffering.

Mpeg-dash is the first HTTP-based adaptive bitrate streaming solution and is an international standard. Mpeg-dash should not be confused with transport protocols — MPEG-DASH uses the TCP transport protocol. “Unlike HLS, HDS, and Smooth Streaming, DASH does not care about codecs, so it can accept content encoded in any encoding format such as H.265, H.264, VP9, etc.”

Although HTML5 does not support MPEG-Dash directly, there are several JavaScript implementations of MPEG-Dash that allow mPEG-Dash to be used in web browsers through HTML5 Media Source Extensions (MSE). There are other JavaScript implementations, such as the BitDash player that supports drM-enabled MPEG-Dash using HTML5 encrypted media extensions. When used in conjunction with WebGL, MPEG-Dash’s HTML5-based adaptive bitrate streaming also enables efficient streaming of 360° video in real time and on demand.

4.2 DASH Important Concepts

  • MPD: a media file manifest, which is similar to the M3U8 file of HLS.
  • Representation: corresponds to an alternative output. For example, 480p video, 720p video and 44100 sampled audio all use Representation.
  • Segment: Each Representation is divided into multiple segments. The Segment is divided into 4 categories, among which, the most important is: Initialization Segment (each Representation contains one Init Segment) and Media Segment (each Representation contains several Media segments).

(photo: https://blog.csdn.net/yue_huang/article/details/78466537)

In China, Bilibili started using DASH technology in 2018. Why did they choose DASH technology? If you’re interested, you can read why we use DASH.

What does an MPD file look like? Here’s a look at the MPD file in the Watermelon video player DASH example:

<! -- MPD file Generated with GPAC version 0.7.2-dev-rev559-g61a50f45-master at 2018-06-11t11:40:23.972z -->
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" minBufferTime="PT1.500 S" type="static" mediaPresentationDuration="PT0H1M30. 080 s" maxSegmentDuration="PT0H0M1. 000 s" profiles="urn:mpeg:dash:profile:full:2011">
 <ProgramInformation moreInformationURL="http://gpac.io">
  <Title>xgplayer-demo_dash.mpd generated by GPAC</Title>
 </ProgramInformation>   <Period duration="PT0H1M30. 080 s">  <AdaptationSet segmentAlignment="true" maxWidth="1280" maxHeight="720" maxFrameRate="25" par="16:9" lang="eng">  <ContentComponent id="1" contentType="audio" />  <ContentComponent id="2" contentType="video" />  <Representation id="1" mimeType="video/mp4" codecs="Mp4a. 40.2, avc3.4 D4020" width="1280" height="720" frameRate="25" sar="1" startWithSAP="0" bandwidth="6046495">  <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/>  <BaseURL>xgplayer-demo_dashinit.mp4</BaseURL>  <SegmentList timescale="1000" duration="1000">  <Initialization range="0-1256"/>  <SegmentURL mediaRange="1257-1006330" indexRange="1257-1300"/>  <SegmentURL mediaRange="1006331-1909476" indexRange="1006331-1006374"/> . <SegmentURL mediaRange="68082016-68083543" indexRange="68082016-68082059"/>  </SegmentList>  </Representation>  </AdaptationSet>  </Period> </MPD> Copy the code

Documents (source: https://h5player.bytedance.com/examples/)

When playing a video, watermelon video player will automatically request the corresponding fragment to play according to the MPD file.

We have already mentioned Bilibili, next we have to mention its famous open source project – flv.js, but before introducing it we need to know about the FLV streaming media format.

Five, the FLV

5.1 FLV File Structure

FLV is short for FLASH Video. FLV streaming media format is a Video format developed with the launch of FLASH MX. Because the files formed by it are very small and the loading speed is very fast, it makes it possible to watch video files on the network. Its appearance effectively solves the problems such as large SWF files exported after the video files are imported into Flash, which can not be used well on the network and so on.

The FLV file consists of FLV Header and FLV Body, and the FLV Body consists of a series of tags:

5.1.1 FLV Header files

FLV header file :(9 bytes)

  • 1-3: The first three bytes are file format identifiers (FLV 0x46 0x4C 0x56).
  • 4-4: The fourth byte is the version (0x01).
  • 5-5: The first five bits of the fifth byte are reserved and must be 0.
    • The sixth bit of the fifth byte is TypeFlagsAudio.
    • The seventh bit of the fifth byte is also reserved and must be 0.
    • The eighth bit of the fifth byte (TypeFlagsVideo).
  • 6-9: The four bytes 6-9 are still reserved, with data of 00000009.
  • The length of the entire header is usually 9 (3+1+1+4).
5.1.2 Tag Basic format

The value contains 15 bytes of the tag type:

  • 1-4: The length of the previous tag is 4 bytes. The first tag is 0.
  • 5-5: Tag type (1 byte). 0 by 8 audio; 0 x9 video; 0x12 Script Data.
  • 6-8: Size of tag content (3 bytes).
  • 9-11: timestamp (3 bytes, ms) (always 0 for the first tag, or 0 for script tags).
  • 12-12: Timestamp extension (1 byte) makes the timestamp 4 bytes (to store longer FLV time information), with this byte as the highest bit of the timestamp.

During FLV playback, the playback sequence is based on the timestamp sequence of the tag. Any time setting data format added to the file will be ignored.

  • 13-15: streamID (3 bytes) is always 0.

The detailed structure of FLV format is shown in the figure below:


5.2 FLV. Introduction of js

Flv.js is an HTML5 Flash Video (FLV) player written in pure JavaScript with underlying dependencies on the Media Source Extensions. In practice, it automatically parses FLV files and feeds native HTML5 Video tags to play audio and Video data, making it possible for browsers to play FLV without Flash.

5.2.1 Features of FLv.js
  • Supports playing FLV files encoded by H.264 + AAC/MP3;
  • Support playing multi-segment video;
  • Supports playing HTTP FLV low-latency real-time streams.
  • Support to play FLV real-time streams transmitted based on WebSocket;
  • Compatible with Chrome, FireFox, Safari 10, IE11 and Edge;
  • Very low overhead, support for browser hardware acceleration.
5.2.2 Limitations of FLv.js
  • The MP3 audio codec does not work on IE11/Edge
  • HTTP FLV live streaming is not supported on all browsers.
5.2.3 Use of FLv.js
<script src="flv.min.js"></script>
<video id="videoElement"></video>
    if (flvjs.isSupported()) {
 var videoElement = document.getElementById('videoElement');  var flvPlayer = flvjs.createPlayer({  type: 'flv'. url: 'http://example.com/flv/video.flv'  });  flvPlayer.attachMediaElement(videoElement);  flvPlayer.load();  flvPlayer.play();  } </script> Copy the code

5.3 How FlV.js works

Flv. js operates by converting FLV file streams into ISO BMFF (Fragmented MP4) fragments and feeding the MP4 fragments to HTML5

(photo: https://github.com/bilibili/flv.js/blob/master/docs/design.md)

For a more detailed introduction to how FLv.js works, interested guys can read the article on the Real-time interactive streaming media player for the Huahua pepper open Source project. Now that we’ve introduced HLS. Js and FLV.js, two major streaming solutions, their success is due to the quiet support of the Media Source Extensions. So let’s take a look at MSE (Media Source Extensions).

Six, MSE


The Media Source Extensions (APIS) provide the function of streaming Media without plug-ins and based on the Web. With MSE, media streams can be created in JavaScript and can be played using audio and video elements.

In recent years, it has been possible to play video and audio on Web applications without plug-ins. However, the existing architecture is too simple to play the entire track at once, and cannot split/merge several buffer files. The early streaming media mainly used Flash for service and Flash media server for video streaming through RTMP protocol.

With the implementation of Media Source Extension (MSE), things are different. MSE allows us to replace the usual SRC value of a single media file with elements that reference MediaSource objects (a container containing information such as the readiness of the media file to be played) and multiple SourceBuffer objects (representing multiple different media blocks that make up the entire stream).

To help you understand, let’s look at the basic MSE data flow:

MSE gives us more precise control based on the size and frequency of content retrieval, or memory usage details such as when the cache is reclaimed. It is the basis for building adaptive bitrate streaming clients, such as DASH or HLS clients, based on its extensible API.

Creating MSE-compliant media in modern browsers is time consuming and requires a lot of computer resources and energy. In addition, external utilities must be used to convert the content into the appropriate format. Although browsers support a variety of MSE compatible media containers, h.264 video encoding, AAC audio encoding, and MP4 container formats are very common, so MSE needs to be compatible with these mainstream formats. MSE also provides an API for developers to check at run time whether containers and codecs are supported.

6.2 MediaSource interface

MediaSource is the interface to the MediaSource Extensions API that represents the HTMLMediaElement object of the Media resource. MediaSource objects can be attached to HTMLMediaElement for playback on the client. Before introducing the MediaSource interface, let’s take a look at its structure:

(photo – https://www.w3.org/TR/media-source/)

To understand MediaSource’s diagram, we need to take a look at the main process by which a client player plays a video stream:

Get streaming media -> protocol -> unpack -> audio and video decoding -> audio playback and video rendering (need to deal with audio and video synchronization).

Since the original audio and video data collected are relatively large, encoders, such as H.264 or AAC, are usually used to compress the original media signals in order to facilitate network transmission. The most common media signals are video, audio and subtitles. Everyday movies, for example, are composed of different media signals. In addition to moving pictures, most movies also contain audio and subtitles.

Common video codecs are h.264, HEVC, VP9, and AV1. Audio codecs include AAC, MP3, or Opus. Each media signal has many different codecs. Here we take watermelon video player Demo as an example, to intuitively feel the audio track, video track and subtitles track:

Now let’s get started with the MediaSource interface.

6.2.1 state
enum ReadyState {
    "closed".// Indicates that the current source is not attached to the media element.
    "open".// The source has been opened by the media element and the data is about to be added to the SourceBuffer object
    "ended" // The source is still attached to the media element, but endOfStream() has been called.
Copy the code
6.2.2 Abnormal Flow Termination
enum EndOfStreamError {
    "network".// Terminate playback and signal a network error.
    "decode" // Terminates playback with a decoding error signal.
Copy the code
6.2.3 the constructor
interface MediaSource : EventTarget {
    readonly attribute SourceBufferList    sourceBuffers;
    readonly attribute SourceBufferList    activeSourceBuffers;
    readonly attribute ReadyState          readyState;
 attribute unrestricted double duration;  attribute EventHandler onsourceopen;  attribute EventHandler onsourceended;  attribute EventHandler onsourceclose;   SourceBuffer addSourceBuffer(DOMString type);  void removeSourceBuffer(SourceBuffer sourceBuffer);  void endOfStream(optional EndOfStreamError error);  void setLiveSeekableRange(double start, double end);  void clearLiveSeekableRange();  static boolean isTypeSupported(DOMString type); }; Copy the code
6.2.4 properties
  • MediaSource.sourceBuffersRead-only: Returns a SourceBufferList object containing a list of objects for the MediaSource’s SourceBuffer.
  • MediaSource.activeSourceBuffers– read only: Returns a SourceBufferList object, contains the MediaSource. SourceBuffers SourceBuffer subset of object – that is, to provide the current selected video track (video track), An object list of enabled Audio tracks and shown/hidden text tracks
  • MediaSource.readyState– read only: Returns a collection containing the current MediaSource state, even if it is not currently attached to a media element (closed) or is attached and ready to receive a SourceBuffer object (open), Or it may be attached but the stream is closed by MediaSource. EndOfStream ().
  • MediaSource.duration: Gets and sets the duration of the current stream being pushed.
  • onsourceopen: Sets the event handler corresponding to the sourceOpen event.
  • onsourceended: Sets the event handler for the SourceEnded event.
  • onsourceclose: Sets the event handler corresponding to the Sourceclose event.
6.2.5 method
  • MediaSource.addSourceBuffer(): Creates a new SourceBuffer with the given MIME type and adds it to MediaSource’s SourceBuffers list.
  • MediaSource.removeSourceBuffer(): Removes the specified SourceBuffer from the SourceBuffers list in the MediaSource object.
  • MediaSource.endOfStream(): indicates the end of a stream.
6.2.6 Static methods
  • MediaSource.isTypeSupported(): Returns a Boolean value indicating whether the given MIME type is supported by the current browser — which means whether a SourceBuffer object of the MIME type can be successfully created.
6.2.7 Example
var vidElement = document.querySelector('video');

if (window.MediaSource) { / / (1)
  var mediaSource = new MediaSource();
  vidElement.src = URL.createObjectURL(mediaSource);
 mediaSource.addEventListener('sourceopen', sourceOpen); } else {  console.log("The Media Source Extensions API is not supported.") }  function sourceOpen(e) {  URL.revokeObjectURL(vidElement.src);  var mime = 'video/mp4; Codecs = "avc1.42 E01E mp4a. 40.2" ';  var mediaSource = e.target;  var sourceBuffer = mediaSource.addSourceBuffer(mime); / / (2)  var videoUrl = 'hello-mse.mp4';  fetch(videoUrl) / / (3)  .then(function(response) {  return response.arrayBuffer();  })  .then(function(arrayBuffer) {  sourceBuffer.addEventListener('updateend'.function(e) {(4)  if(! sourceBuffer.updating && mediaSource.readyState ==='open') {  mediaSource.endOfStream();  }  });  sourceBuffer.appendBuffer(arrayBuffer); / / (5)  }); } Copy the code

The above example illustrates how to use the MSE API, so let’s examine the main workflow:

  • (1) Determine whether the current platform supports the MediaSource Extensions API. If so, create a MediaSource object and bind the SourceOpen event handler.
  • (2) Create a new SourceBuffer with the given MIME type and add it to MediaSource’s SourceBuffers list.
  • (3) Download the video stream from the remote streaming server and convert it into an ArrayBuffer object.
  • (4) Add an Updateend event handler to the sourceBuffer object to close the video stream after it is completed.
  • (5) Add the converted video stream data in ArrayBuffer format to the sourceBuffer object.

The MSE API has been briefly introduced. If you want to learn more about its practical application, you can take a closer look at the “hls.js” or “flV.js” projects. Next, Bob will introduce the multimedia container format based on audio and video.

Seven, multimedia packaging format

Generally, a complete video file is composed of audio and video. Common AVI, RMVB, MKV, ASF, WMV, MP4, 3GP, FLV and other files can only be regarded as a package format. H.264, HEVC, VP9 and AV1 are video encoding formats, while MP3, AAC and Opus are audio encoding formats. “For example: an H.264 video encoding file and an AAC audio encoding file according to MP4 packaging standards, you get an MP4 suffix of video files, which is our common MP4 video files.”

The main purpose of the audio and video coding is the volume of raw data compression, and the encapsulation format (also known as a multimedia container), such as MP4, MKV, is used to store/transfer encoding data, and according to certain rules to organize data such as audio and video, subtitles, at the same time also contains some information, such as what code contained in the current flow type, timestamp, etc., The player can use this information to match decoders and synchronize audio and video.

To better understand multimedia packaging formats, let’s review the principles of the video player.

7.1 Principles of video Player

Video player refers to software that can play video stored in the form of digital signals, and also refers to electronic devices with the function of playing video. Most video players (with the exception of a few waveform files) carry a decoder to restore compressed media files, and video players also have a full set of built-in algorithms for converting frequencies and buffering. Most video players can also play audio files.

The basic processing flow of video playback generally includes the following stages:

(1) Protocol solution

The signaling data is deleted from the original streaming media data and only audio and video data is retained. For example, the data transmitted using RTMP is output in FLV format after decompressing the protocol.

(2) Unseal and pack

Split audio and video compressed encoded data, common package formats MP4, MKV, RMVB, FLV, AVI formats. In this way, the compressed and encoded video and audio data are put together. For example, after decoding FLV data, h.264 encoded video stream and AAC encoded audio stream are output.

(3) decoding

Video and audio compression and coding data are restored to uncompressed video and audio original data. Audio compression and coding standards include AAC, MP3, AC-3, etc. Video compression and coding standards include H.264, MPEG2, VC-1, etc. Uncompressed video color data can be obtained after decoding, such as YUV420P. RGB and uncompressed audio data such as PCM etc.

(4) Audio and video synchronization

The audio and video data decoded synchronously are respectively sent to the system sound card and video card for playback.

Understanding the principles of the video player, next we will introduce multimedia packaging formats.

7.2 Multimedia Encapsulation Format

For digital media data, a container is a thing that can store multimedia data together, just like a packing case. It can pack audio and video data, integrate the original two independent media data together, or store only one type of media data.

“Sometimes, multimedia containers are also called encapsulation formats, which simply provide a ‘shell’ for the encoded multimedia data. That is, all processed audio, video or subtitles are packaged into a file container for presentation to the audience. This packaging process is called encapsulation.” Commonly used package formats are: MP4, MOV, TS, FLV, MKV and so on. Here we will introduce the more familiar MP4 package format.

7.2.1 MP4 package format

Mpeg-4 Part 14 (MP4) is one of the most commonly used container formats and usually ends in.mp4 files. It is used for dynamic adaptive streams over HTTP (DASH) and can also be used for Apple HLS streams. MP4 is based on ISO Basic Media file format (MPEG-4 Part 12), which is based on QuickTime file format. MPEG stands for Moving Image Expert Group and is a collaboration between the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). MPEG was created to set standards for audio and video compression and transmission.

MP4 supports a variety of codecs, the common video codecs are H.264 and HEVC, and the common audio codec is AAC, which is the successor to the famous MP3 audio codec.

MP4 is made up of a series of boxes, the smallest unit of which is the box. All the data in MP4 file is stored in box, that is, MP4 file is composed of several boxes, each box has type and length, box can be understood as a data object block. A box can contain another box, called a Container box.

An MP4 file will first have only one fType box, which marks the MP4 format and contains some information about the file. Then there will be only one Moov type box (movie Box), which is a type of Container box that can have more than one. The structure of media data is described by metadata.

Some readers may wonder – what is the actual MP4 file structure? By using the online service provided by MP4box.js, you can easily view the internal structure of MP4 files locally or online:

Mp4box. Js online address: https://gpac.github.io/mp4box.js/test/filereader.html

Due to the complex structure of MP4 files (see the picture below), we will not continue to expand here, interested readers can read related articles.

We’ll introduce Fragmented MP4 container format.

7.2.2 Fragmented MP4 package format

The MP4 ISO Base Media file format allows for box organization in a fragmented manner, which means that MP4 files are organized into a series of short metadata/data box pairs. Instead of a long metadata/data pair. Fragmented MP4 contains only two fragments as shown in the image below:

(photo – https://alexzambelli.com/blog/2009/02/10/smooth-streaming-architecture/)

Fragmented MP4 contains three crucial boxes: MOOv, MOOF and MDAT.

  • Moov (movie Metadata box) : used to store multimedia file-level meta information.
  • Mdat (Media Data Box) : and ordinary MP4 filesmdatThe same, used to store media data, the difference is that ordinary MP4 file only onemdatIn Fragmented MP4 files, there is one fragment for each fragmentmdatType of box.
  • Moof (Movie Fragment box) : stores fragment-level meta information. Each fragment has one in Fragmented MP4 files. Each fragment has one in Fragmented MP4 filesmoofType of box.

Fragmented MP4 files consist of Fragmented fragments consisting of moOF and MDAT. Each fragment contains an audio track or video track and contains sufficient meta information to enable each fragment to be decoded independently. The Fragment structure is shown in the following figure:

(photo – https://alexzambelli.com/blog/2009/02/10/smooth-streaming-architecture/)

With mp4box.js’ online service, we can also view the internal structure of our Fragmented MP4 files clearly:

We have introduced our two container formats: MP4 and Fragmented MP4. We summarize the major differences between them with a picture:

Eight, Po Ge has words

8.1 How can I Preview videos locally

The function of local video preview is mainly implemented using url.createObjecturl () method. The url.createObjecturl () static method creates a DOMString with a URL representing the object given in the argument. The lifecycle of this URL and the Document binding in the window that created it. This new URL object represents the specified File object or Blob object.

    <meta charset="UTF-8" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
 <title>Example of video local preview</title>  </head>  <body>  <h3>Po: Video local preview example</h3>  <input type="file" accept="video/*" onchange="loadFile(event)" />  <video  id="previewContainer"  controls  width="480"  height="270"  style="display: none;"  ></video>   <script>  const loadFile = function (event) {  const reader = new FileReader();  reader.onload = function () {  const output = document.querySelector("#previewContainer");  output.style.display = "block";  output.src = URL.createObjectURL(new Blob([reader.result]));  };  reader.readAsArrayBuffer(event.target.files[0]);  };  </script>  </body> </html> Copy the code

8.2 How can I Take player Screenshots

Player screenshot function mainly USES CanvasRenderingContext2D. DrawImage () API to implement. Canvas in the 2 d API CanvasRenderingContext2D. DrawImage () method provides a variety of ways in the image is drawn on the Canvas.

The syntax of the drawImage API is as follows:

void ctx.drawImage(image, dx, dy); void ctx.drawImage(image, dx, dy, dWidth, dHeight); void ctx.drawImage(image, sx, sy, sWidth, sHeight, dx, dy, dWidth, dHeight);

The image parameter represents the element drawn to the context. Allows any Canvas image source (CanvasImageSource) such as: CSSImageValue, HTMLImageElement, SVGImageElement, HTMLVideoElement, HTMLCanvasElement, ImageBitmap or OffscreenCanvas.

<html lang="en">
    <meta charset="UTF-8" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
 <title>Player Screenshot Example</title>  </head>  <body>  <h3>Po: Player screenshot example</h3>  <video id="video" controls="controls" width="460" height="270" crossorigin="anonymous">  <! -- Please replace with actual video address -->  <source src="https://xxx.com/vid_159411468092581" />  </video>  <button onclick="captureVideo()">screenshots</button>  <script>  let video = document.querySelector("#video");  let canvas = document.createElement("canvas");  let img = document.createElement("img");  img.crossOrigin = "";  let ctx = canvas.getContext("2d");   function captureVideo() {  canvas.width = video.videoWidth;  canvas.height = video.videoHeight;  ctx.drawImage(video, 0.0, canvas.width, canvas.height);  img.src = canvas.toDataURL();  document.body.append(img);  }  </script>  </body> </html> Copy the code

Now we already know how to get every frame of the video, in fact, in combination with the GIF. Js library provided by the GIF encoding function, we can quickly achieve the video frame generated GIF animation function. Here Po brother does not continue to expand the introduction, interested partners can read the use of JS directly cut video clips to generate GIF animation this article.

8.3 How can I Play Video on Canvas

To play video on Canvas, ctx.drawImage(video, x, y, width, height) is mainly used to draw the image of the current frame of the video, where the video parameter is the video object in the page. Therefore, if we continuously obtain the current picture of the video in accordance with a specific frequency and render it to Canvas Canvas, we can realize the function of using Canvas to play video.

    <meta charset="UTF-8" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
 <title>Use Canvas to play video</title>  </head>  <body>  <h3>Bob: Use Canvas to play video</h3>  <video id="video" controls="controls" style="display: none;">  <! -- Please replace with actual video address -->  <source src="https://xxx.com/vid_159411468092581" />  </video>  <canvas  id="myCanvas"  width="460"  height="270"  style="border: 1px solid blue;"  ></canvas>  <div>  <button id="playBtn">play</button>  <button id="pauseBtn">suspended</button>  </div>  <script>  const video = document.querySelector("#video");  const canvas = document.querySelector("#myCanvas");  const playBtn = document.querySelector("#playBtn");  const pauseBtn = document.querySelector("#pauseBtn");  const context = canvas.getContext("2d");  let timerId = null;   function draw() {  if (video.paused || video.ended) return;  context.clearRect(0.0, canvas.width, canvas.height);  context.drawImage(video, 0.0, canvas.width, canvas.height);  timerId = setTimeout(draw, 0);  }   playBtn.addEventListener("click", () = > { if(! video.paused)return;  video.play();  draw();  });   pauseBtn.addEventListener("click", () = > { if (video.paused) return;  video.pause();  clearTimeout(timerId);  });  </script>  </body> </html> Copy the code

8.4 How to Achieve Chroma Keying (Green Screen Effect)

In the last example, we introduced the use of Canvas to play Video, so some friends may wonder why we draw Video on Canvas, isn’t the Video tag “fragrant”? This is because Canvas provides getImageData and putImageData methods that allow developers to dynamically change the display content of each frame of image. In this way, we can manipulate the video data in real time to synthesize various visual effects into the unfolding video frame.

For example, the tutorial “Using Canvas for Video processing” on MDN demonstrates how to perform chroma keying (green or blue screen effects) using JavaScript code. The so-called chromaticity keying, also known as color inlay, is a kind of back synthesis technology. Chroma means solid color, and Key means removed color. Place the person or object being photographed in front of the green screen and remove it from behind, replacing it with another background. This technology is widely used in movies, TV shows, and game production. Color keys are also an important part of Virtual Studio and Visual Effects.

Let’s take a look at the key code:

processor.computeFrame = function computeFrame() {
    this.ctx1.drawImage(this.video, 0.0.this.width, this.height);
    let frame = this.ctx1.getImageData(0.0.this.width, this.height);
    let l = frame.data.length / 4;

 for (let i = 0; i < l; i++) {  let r = frame.data[i * 4 + 0];  let g = frame.data[i * 4 + 1];  let b = frame.data[i * 4 + 2];  if (g > 100 && r > 100 && b < 43)  frame.data[i * 4 + 3] = 0;  }  this.ctx2.putImageData(frame, 0.0);  return; } Copy the code

The above computeFrame() method is responsible for taking a frame of data and performing chroma keying. Using chroma keying technology, we can also achieve a pure client real-time mask barrage. Here Po brother will not be introduced in detail, interested partners can read chuangyu front-end bullet screen not blocking people! Based on the color key technology of pure client real-time mask barrage this article.

Ix. Reference resources

  • Baike – Streaming
  • MDN – Video_and_audio_content
  • MDN – Range_requests
  • MDN – Media_Source_Extensions_API
  • Wiki – MPEG-DASH
  • w3.org – Media Source Extensions

This article is formatted using MDNICE