Summary of audio and video knowledge in the browser V1.0 (work needs to deal with video must see!)

“This is the 31st day of my participation in the August Gwen Challenge.

What is a video

Video is actually a series of pictures played continuously. If 24 pictures are played for 1s, what human eyes see is no longer independent pictures, but moving pictures. One image is called a frame, and the number of images played for 1s is called the frame rate. Common frame rates are 24, 30, and 32 frames per second.

Video is made up of pictures, which are made up of pixels, let’s say 1024 by 768. Each pixel is composed of RGB, each 8 bits, a total of 24 bits.

A little bit of base 2 here

One digit in base 10 could represent 0 to 9 in 10 different ways and one digit in base 2 could represent 0, 1, in 2 different ways. A binary digit is called a bit, or 1 bit. Byte is a basic unit of computer measurement. 1 byte= 8 bits 1 hexadecimal number can represent 0 to 15. There are 16 possible hexadecimal numbers. 1byte=8bit=2 hexadecimal number pixels, represented in hexadecimal as # FFFFFF, require 6 hexadecimal numbers, or 3 bytesCopy the code

Assuming a frame rate of 30, the video size per second is as follows:

30 frames x1024x768x24 = 566231040 bits = 70778880 bytesCopy the code

One minute of video is 4246732800Bytes which is 4 gigabytes.

1Byte=8bit
1MB=1024Byte(2^10)
1GB=1024MB(2^20Byte)
Copy the code

It can be seen that the amount of data is too large for network transmission and storage, so the video needs to be compressed, that is, encoded.

The encoding process

The reason why the video can be compressed is that the video and the image have the following characteristics:

Spatial redundancy: there is a strong correlation between the adjacent pixels of an image. The adjacent pixels of an image are often gradient, not mutation. It is not necessary to save every pixel completely, but save one every few, and calculate the middle pixels with the algorithm.
Temporal redundancy: Similar content between adjacent images of a video sequence. Images that appear consecutively in a video are not abrupt, and can be predicted and inferred based on existing images.
Visual redundancy: The human visual system is insensitive to certain details and therefore does not notice every detail, allowing some data to be lost.
Coding redundancy: Different pixel values have different probabilities. High probability values use fewer bytes, while low probability values use more bytes, which is similar to Huffman Coding.

The coding process is as follows:

The video is encoded so that so many frames of pictures become unintelligible strings of binary code, with different encoding methods (algorithms) leading to different encoding formats. Common encoding formats are H.264,MPEG-4,VP8 and so on.

One thing to note here is that the encoding format is copyrighted, so different browsers support different encoding formats, so some encoding formats may not play in some browsers, or only sound without screen.

The only thing we need to keep in mind for our front-end development is that the video encoding format supported by the major browsers is H264.

Summarize and add some concepts

The resolution of the

The screen is composed of pixels. The common 1080p refers to the vertical direction of the screen with 1080 pixels, a total of 1920 columns, a total of 2.07 million pixels. 2K, 2560×1440, total 3.69 megapixels.

Put a Baidu Encyclopedia explanation

Display resolution (screen resolution) is the precision of the image on the screen, which refers to how many pixels the display can display. Because dots, lines and planes on a screen are made up of pixels, the more pixels a monitor can display, the finer the picture and the more information it can display in the same area of the screen, resolution is a very important performance indicator. The entire image can be thought of as a large checkerboard, and resolution is represented by the number of points where all the longitude and latitude lines intersect. In the case of a certain display resolution, the smaller the display screen, the clearer the image; conversely, when the display size is fixed, the higher the display resolution, the clearer the image.Copy the code

Resolution has a certain effect on the size of the video, but the higher the resolution, the clearer the video, but also depends on the bit rate.

# # # frame rate FPS

The number of Frames per Second is the number of Frames per Second.

Baidu Encyclopedia explanation:

Due to the special physiological structure of the human eye, images viewed at a frame rate higher than 24 are considered to be coherent, a phenomenon known as visual persistence. That's why film is shot frame by frame and then played back quickly. For games, in general, first person shooters tend to focus on high and low FPS, and if the FPS is less than 30, the game will be disjointed. So here's an interesting quote: "FPS is more important than FPS. Frames per second (FPS) or frame rate is the number of updates a graphics processor can make per second while processing. Higher frame rates result in smoother, more realistic animations. Generally 30fps is acceptable, but increasing the performance to 60fps can significantly improve the sense of interaction and verisimimency, but generally speaking, it is not easy to detect a significant improvement in fluency beyond 75fps. If the frame rate exceeds the screen refresh rate, it will only waste graphics processing power, because the monitor cannot update at such a fast rate that the frame rate exceeding the refresh rate will be wasted.Copy the code

Bit rate (bit rate)

Bit rate, also called bit rate, frame rate is how many frames are played in 1S, and by analogy, bit rate is how many bits are in 1S of video.

This parameter determines whether the video is clear or not.

The size of a 1080P video can be 1 GB or 4 GB. The larger the video is, the more data is stored for 1S, the higher the bit rate, the smaller the compression ratio, and the clearer the video is.

What is the bit rate of a 1080P, 100-minute, 1GB video?

Total time: 100 minutes =100X60S=6000s total data: 1GB=1024MB= 1024X1024KB= 1024x1024byte = 1024x1024x8bit =8589934592bit frame rate (data volume/time) 8589934592/6000 = 1.4 Mbit/sCopy the code

Both frame rate and resolution can affect the video volume, but frame rate is the main factor. If you see a very short video at work, it is most likely because the frame rate is very high. In order to facilitate network transmission, the frame rate needs to be reduced. Generally speaking, the frame rate of mainstream video platforms is around 1Mbit/s.

What is audio

Audio here is a bit of a concept, I am not very good at it, if you do not understand a place, please post in the comments section. I change.

Sound is produced by the object vibration, the vibration of the speed (frequency) determine the tone of the voice (tenor, bass), and the amplitude of vibration determines the size (volume) | loudness of sound and vibration characteristics of the object determines the tone of voice. Pitch, volume and timbre are called the three elements of sound.

Sound can be represented by analog signals. The concept of analog signals is as follows:

Analog signal refers to the information represented by continuously changing physical quantity. The amplitude, frequency or phase of the signal changes continuously at any time, or in a continuous time interval, and the characteristic quantity representing the information can be presented as any value at any moment.Copy the code

The horizontal axis is time

Analog signal is continuous, especially easy to be disturbed in transmission, so in the transmission of analog signal often have to be converted into discontinuous digital signal. The transformation process mainly includes sampling, quantization, coding and so on.

You can think of sampling as a dot, 48,000 dots in 0-1s, 48,000 samples, the sampling rate is 48,000.

Quantization refers to the digitization of the sound along the vertical axis (Y-axis), which determines the range of the Y-axis. For example, a sample of sound is represented with 16 bits of data. The range of 16 bits is [-32758,32767], with 65535 possible values.

Encoding is the recording of sampled and quantized data in a certain format. The original data of audio is PCM data, Pulse Code Modulation, which contains the following contents:

The sampling size
- How many bits are used to store a sample? 16 bits is commonly used. The larger the sample size is
Sampling rate
- How many times does 1s sample
Track number
- Left channel, right channel, double channel

The bit rate of audio (the size of 1s audio) is calculated as

Audio bit rate = sample size * sample rate * sound channelCopy the code

Take the sound quality of a CD as an example: The quantization format is 16 bits (2 bytes), the sampling rate is 44100, and the number of sound channels is 2

44100 * 16 * 2 = 1378.125 KBPSCopy the code

If it is 1 minute, the required size is

1378.125 * 60/8/1024 = 10.09 MBCopy the code

Reference content of this section:

Baidu Encyclopedia – Analog signal and ANALOG-to-digital conversion
Baidu Encyclopedia – Digital signal
Audio and video learning – Master the basics of audio

Audio coding

Cd-quality audio, at 10M for one minute of data, is too large and also needs to be compressed (encoded). The coding process is as follows:

! [image-20210829104106962](/Users/h/Library/Application Support/typora-user-images/image-20210829104106962.png)

Common encoding methods are: WAV, MP3 and AAC format.

Audio is not encoded in as many ways as video, and it can be played in almost any browser.

The exact composition of the audio contained in each encoding format is not covered here.

Encapsulation format

We package video data, audio data together, and then add some basic information, such as resolution, duration, title, etc., to form a file, which is called encapsulation format. Common packaging formats are MP4,MOV,MPEG,WEBM and so on.

The main package format is as follows:

The name of the	Launch institutions	Streaming media (side streaming)	Supported video encoding	Audio encoding supported	Current application field
MP4	MPEG	support	Mpeg-2, MPEG – 44, h. 264, etc	AAC, mpeg-1, etc	Internet video website
TS	MPEG	support	Mpeg-4, h. 264, etc	MPEG-1 Layers I, II, III, AAC,	IPTV, digital TV
FLV	Adobe Inc.	support	Sorenson, VP6, H.264	MP3, ADPCM, Linear PCM, AAC, etc	Internet video website
MKV	CoreCodec Inc.	support	Almost all formats	Almost all formats	Internet video website
AVI	Microsoft Inc.	Does not support	Almost all formats	Almost all formats	Bittorrent movies

Can see from the graph, the packaging format is often has nothing to do with the video coding, a mp4 file, the inside of the video coding can be h264, also can be the mpeg-4, so will appear, also is the mp4 files, some browsers can put, the problem of some browsers can’t put, because can put is determined by the video stream encoding format.

The above table refers to:

Video audio codec technology zero-based learning method – Lei Xiaohua

Having said some theoretical knowledge, let’s start to practice.

Operation audio and video essential tool -FFMPEG

A complete, cross-platform solution to record, convert and stream audio and video.

FFMPE is the most commonly used open source software for audio and video processing. This article mainly introduces its command line tools.

First, install FFMPEG,

MAC install FFMPEG
WINDOS install FFMPEG
LINUX install FFMPEG

If not, please make your own.

Some of the concepts

Supported encapsulation formats

As a refresher, our common video file is actually a container containing video data, audio data, subtitles (optional) and some metadata (video title, duration, resolution information), etc. Common encapsulation formats are (by suffix to reflect encapsulation format)

MP4
MKV
WebM
AVI
Copy the code

View ffMPEG supported packaging formats (Container)

ffmpeg -formats
Copy the code

For example, if you want to check whether FLV format is supported, you can run the following command:

ffmpeg -codecs | grep FLV
Copy the code

Supported encoding format

As mentioned above, audio and video files need to be encoded and compressed before they can be saved as files. Different encoding formats have different compression rates, resulting in different sharpness and file sizes.

Common video encoding formats are as follows:

H.262
H.264
H.265
MPEG-4
Copy the code

Common audio encoding formats are as follows:

MP3
AAC
Copy the code

Check out the encoding formats supported by FFMPEG, including video encoding and audio encoding

ffmpeg -codecs
Copy the code

The encoder

Encoders are library files that implement a certain encoding format. Only the encoders of a certain format can be used to encode and decode the video/audio in the corresponding format.

FFMPEG built-in video encoder

Libx264 is the most popular open source encoder for h. 264. NVENC is based on NVIDIA GPUCopy the code

Audio encoder

libfdk-aac
aac
Copy the code

The following command displays the installed encoders

ffmpeg -encoders
Copy the code

Ffprobe Displays video information

Ffprobe is a command line tool provided by FFMPEG. It is used to view video metadata and encoding information of audio and video stream. It is very simple to use

ffprobe 1.mp4
Copy the code

The output is as follows:

Ffmpeg command line format

ffmpeg {1} {2} -i {3} {4} {5}
Copy the code

The parameters of the five parts are as follows:

1. Global parameters
1. Input file parameters
1. Input file, required
1. Output file parameters
1. Output file, required

Here’s a simple chestnut:

ffmpeg -i 1.mp4 output.webm
Copy the code

The above code is to convert the encapsulated MP4 file into the encapsulated WebM file, input only the input file and output file.

The above code does not specify the encoding format of the video stream or the audio stream. Ffmpeg will automatically select the encoding format. You can use FFProbe to check the encoding format after the conversion is completed. ffprobe output.webm

If you want to convert only the encapsulated format and the encoding format remains the same, you can add -c copy to the output file parameter

ffmpeg -i 1.mp4 -c copy output.avi
Copy the code

Common parameters

-c: Specify encoder
-c copy: Copy directly, without recoding (this is faster)
-c:v: Specifies the video encoder
-c:a: Specifies the audio encoder
-i: Specifies the input file
-an: Removes the audio stream
-vn: Remove the video stream
-preset: Specifies the output video quality, which affects file generation speed. Ultrafast, Superfast, Veryfast, faster, fast, medium, slow, slower, veryslow are available.
-y: Overwrites files with the same name without confirmation.

Common use

I suggest you create a new folder and try out all the commands

Viewing Video Information

Use FFProbe to view video information

Parameter format: ffprobe [options] [input_file]

# Simplest use
ffprobe 1.mp4
# no welcome message is displayed, only stream related information is displayed
ffprobe -hide_banner 1.mp4
Display each stream information as JSON
ffprobe -print_format json -show_streams 1.mp4
Display container format information
ffprobe -show_format 1.mp4
Copy the code

Parameter Description:

Hide_banner does not display welcome and configuration information, but only video metadata
Show_format Displays information about the container format

Conversion encoding format

Convert container format and encoding format.

If you do not manually specify the encoding format for the video and audio streams in the package format, FFMPEG decides for itself what encoding to use based on the output package format.

ffmpeg -i input.mp4 output.mpeg
Copy the code

Specifies the video stream encoding format.

Convert to H.264 encoding, usually using the encoder libx264

Convert to H.264 format
ffmpeg -i 1.mp4 -c:v libx264 output.mp4
Convert to H.265 format
ffmpeg -i 1.mp4 -c:v libx265 output-265.mp4
Copy the code

Parameter Description:

-c:v Specifies the video encoder
-c:a Specifies the audio encoder

Convert container format

Mp4 to avi

ffmpeg -i input.mp4 -c copy output.avi
Copy the code

In the example above, just rotate the container, the encoding format inside is the same, so use -c copy to specify direct copy without transcoding, which is faster

Ffmpeg will determine the encoding of the input file
ffmpeg -i input.mp4 output.webm
Copy the code

The encoding format can be specified manually during conversion

The encoding format can be specified manually during conversion
ffmpeg \\
-y \\ # global parameters
-c:a libfdk_aac -c:v libx264 \\ Enter file parameters
-i input.mp4 \\ # Input file
-c:v libvpx-vp9 -c:a libvorbis \\ Output file parameters
output.webm Output file
Copy the code

Adjust bit rate (bit rate)

Adjusting bitrate refers to changing the bit rate of the encoding, usually used to reduce the size of a video file.

Bit rate calculation: Video number of bits (bit)/ video duration (s)

For example, a video is 2.6m converted to bits. 2.6x1024x1024x8=21810380.8 bit

Video is 22 s

Bit rate: 21810380.8/22=991 Kb/s

Set bit rate:

# Set the output bit rate to 1.5m,ps: the final output video bit rate will have some small deviationFfmpeg-i input.mp4-b 1.5m output.mp4# By default, FFMPEG uses variable bit rate (VBR). Static images use less bit rate and dynamic images use more bit rate
Copy the code

Parameters that

-b Specifies the bit rate of the video stream and audio stream
You can use -b:v -b:a to specify the bitrates for video and audio streams, respectively

You can also specify the minimum bit rate, maximum bit rate, and buffer size manually:

ffmpeg -i input.mp4 -minrate 964k -maxrate 3856k -bufsize 2000k output.mp4
Copy the code

For live video streaming such as video conferencing, fixed bit rate (CBR) can be used:

Fixed bit rate means that all frames use the same bit rate

# -b-minrate-maxrate specifies a fixed value. Maxrate specifies a bufsize buffer.Ffmpeg-i input. Mp4-b 0.5m minrate 0.5m maxRate 0.5m bufsize 1M output. Mp4Copy the code

Change resolution

Change the resolution to 640*480
ffmpeg -i input.mp4  -vf scale=640:480 output.mp4

# Adjust according to the original ratio
ffmpeg -i input.mp4 -vf scale=480:-1 output.mp4
Copy the code

Parameters that

Vf filter
Vf Scale adjusts the resolution

Split video (remove audio from video)

# 
ffmpeg -i input.mp4 -c:v copy -an output.mp4
Copy the code

Parameters that

An removes the audio stream

The separation of audio

ffmpeg -i input.mp4 -vn -c:a copy output.aac
Copy the code

Parameters that

Vn: Deletes the video
C: A copy indicates that the audio encoding is not changed

Add the audio track

Add external audio to the video, such as background music or voice-over

# If the video originally has sound, it will not be added successfully, so you need to remove the original audio in the video first
# Input has two files, ffmPEG will combine them into a single file
ffmpeg -i input.aac -i input.mp4 output.mp4
Copy the code

screenshots

Take snapshots of a 1-second video from the specified time

# From 00:00:05s, take a screenshot of the 1-second video
ffmpeg -y -i input.mp4 -ss 00:00:05 -t 00:00:01 output_%3d.jpg
Copy the code

If you just want to capture one frame

# Capture a frame from 00:00:10s
ffmpeg -ss 00:00:10 -i input -vframes 1 -q:v 2 output.jpg
Copy the code

Parameters that

Vframes 1 specifies that only one frame is captured
Q: V 2 indicates that the quality of the output picture is generally between 1 and 5

Split one MP4 file into multiple smaller MP4 files

You can specify a start time and duration, as well as an end time

ffmpeg -ss <start> -i <input> -t <duration> -c copy <output>
ffmpeg -ss <start> -i <input> -to <end> -c copy <output>
Copy the code

Example:

Split the video from 0S to 5s
ffmpeg -ss 00:00:00 -i 1.mp4 -c copy -t 5 aqgy-1.mp4
# Example: Split a 1 minute 30 seconds video into 4 segments
ffmpeg -ss 00:00:00 -i 1.mp4 -c copy -t 00:00:22 aqgy-1.mp4
ffmpeg -ss 00:00:22 -i 1.mp4 -c copy -t 00:00:22 aqgy-2.mp4
ffmpeg -ss 00:00:44 -i 1.mp4 -c copy -t 00:00:22 aqgy-3.mp4
ffmpeg -ss 00:01:06 -i 1.mp4 -c copy -t 00:00:22 aqgy-4.mp4
Copy the code

Principle of video player

Let’s talk about how the player plays the video, either locally or in the browser.

The general process is as follows:

The function of protocol resolution is to parse the data of the streaming media protocol into the corresponding encapsulated data of the standard format. Streaming media protocols include RTMP,HLS and HTTP. A video link HTTP, for example, http://111.229.14.189/file/1.mp4, after solution agreement, will get 1. Mp4 video encapsulation format

Unpacking: Unpacking is to separate the audio stream and video stream from the 1.mp4 file. 1. After unpacking the MP4 files, the video compressed data in H264 format and audio compressed data in AAC format can be obtained.

Audio and video decoding: Converts compressed audio and video data into raw data, which is a frame of image and YUM data.

The function of visual and audio synchronization is to synchronize the decoded video and audio data according to the parameter information obtained in the process of unsealing module, and send the video and audio data to the system’s graphics card and sound card for playback.

We in the browser to open the http://111.229.14.189/file/1.mp4, process is probably the those above.

Audio and video knowledge in the browser

Why aren’t some videos playing

The process of browser playing a video has just been mentioned above. In the audio and video decoding stage, due to the limited decoder supported by the browser, the browser cannot decode some encoded video stream, and the process is interrupted, so the video stream cannot be played.

Front-end development needs to remember, video coding for H264, audio coding for AAC MP4 files in all major browsers can play, because h264 encoding format although copyright, but can be used for free.

Audio and Video labels

<video controls poster="1.jpg" src="1.mp4" loop muted></video>
<audio controls src="1.mp3"></audio>
Copy the code

SRC specifies the resource address, poster specifies a cover image for the video, controls indicates that the browser should display the UI control (each browser has a different style)

Commonly used attributes

Here are the general attributes for video and audio

Common event

Video and Audio Common events

Commonly used method

Play () controls the start of the video
Pause () Controls the pause of a video

With the above properties, events, and methods, we can do a lot of things, such as customizing the player, using the player to preview videos locally, and so on.

Browser to achieve the basic idea of video player

The overall idea is as follows:

Listening to theloadedmetadataEvent, get video duration (duration), real width and height (videoWidth,videoHeight)
Call the Play/Pause methods of the video element when play/pause is clicked
Listen while playingtimeupdateGets the current Playing time (currentTime) and calculates the progress of the progress bar
Drag the progress bar to set the current video playing time, reverse step 3
Loading is displayed when the video is initially loaded. In other words, the loading property is set to true by default during the first rendering of the component, that is, the loading effect is displayed. When the video metadata is loaded, the loading is cancelled
Loading is displayed. The implementation of this feature is listeningwaitingEvent, cancel loading immediately, and listen tocanplayThe event

Please refer to my article for more details

Some recommended learning resources in the audio and video field

Domestic learning audio and video related development, but around a god is Lei Xiaohua, the big man has died, but left the article immortal.

Video and audio codec technology zero-based learning methods, from this point, encounter the jump to the link interested in the past, from the point and surface.

If there is a demand for live weBRTC, you can check this column to build a live audio and video system from zero. This is placed on my server, the bandwidth is low, many people will card, cautious access.

If you are xiaobai, you can see li Chao’s basic programming – audio and video xiaobai system introduction course 5, 7, 8, 9, 10 chapters, about some of the necessary theoretical knowledge.

If you are interested in ffMPEG, you can check out this tutorial in English, github likes 68000, github.com/leandromore…

After writing all night over the weekend, I still feel unsatisfied. On the one hand, there are still a lot of content left unwritten, such as content related to streaming media protocol, common audio and video coding stream analysis, front-end aspects of WeBRTC, Canvas video processing, and implementation of Canvas video special effects. On the other hand, because they finished writing some uneasy, worried about writing well, after all, they will be too little.

This article will continue to improve, so please leave it in the comments section.

Summary of audio and video knowledge in the browser V1.0 (work needs to deal with video must see!)

What is a video

The encoding process

Summarize and add some concepts

The resolution of the

Bit rate (bit rate)

What is audio

Audio coding

Encapsulation format

Operation audio and video essential tool -FFMPEG

Some of the concepts

Supported encapsulation formats

Supported encoding format

The encoder

Ffprobe Displays video information

Ffmpeg command line format

Common parameters

Common use

Viewing Video Information

Conversion encoding format

Convert container format

Adjust bit rate (bit rate)

Change resolution

Split video (remove audio from video)

The separation of audio

Add the audio track

screenshots

Split one MP4 file into multiple smaller MP4 files

Principle of video player

Audio and video knowledge in the browser

Why aren’t some videos playing

Audio and Video labels

Commonly used attributes

Common event

Commonly used method

Browser to achieve the basic idea of video player

Some recommended learning resources in the audio and video field

Related Posts

JS asynchronous programming: event queues and event loops

On-demand loading practices for React-Router4 (based on create-React-app and Bundle components)

SEO and ARIA for Web semantics