From white Wolf Stack: Check out the original

About audio and video, I believe everyone has seen movies (video), listen to music (audio), at least should know that MP4 is a video file, MP3 is an audio file.

What are the properties of an audio and video file? In the case of video, the ffmpeg-i command can be used to view information about media files.

» FFMPEG-I R1ORI.mp4 FFMPEG Version 4.1 Copyright (C) 2000-2018 The FFMPEG developers built with Apple LLVM Version 10.0.0 (clang - 1000.10.44.4) configuration: -- prefix = / usr/local/Cellar/ffmpeg / 4.1 - enable - Shared - enable - pthreads - enable - version3 - enable - hardcoded - tables - enable - avresample - cc = clang - host - cflags = '-i/Library/Java/JavaVirtualMachines jdk1.8.0 _251. JDK/Contents/Home/include - I/Library/Java/JavaVirtualMachines jdk1.8.0 _251. JDK/Contents/Home/include/Darwin '- host - ldflags = - enable - ffplay --enable-gpl --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-chromaprint --enable-frei0r --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libgme --enable-libgsm --enable-libmodplug --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-librsvg --enable-librtmp --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtesseract --enable-libtwolame --enable-libvidstab --enable-libwavpack --enable-libwebp --enable-libzmq --enable-opencl --enable-openssl --enable-videotoolbox - enable - libopenjpeg - disable - decoder = jpeg2000 - extra - cflags = upon - I/usr/local/Cellar/openjpeg 2.3.0 / include/openjpeg - 2.3 --enable-nonfree libavutil 56\.22.100/56 \.22.100 libavcodec 58\.35.100/58 \.35.100 libavFormat 58\.20.100/58 \ 20.100 libavDevice 58\.5.100/58 \.5.100 libavFilter 7\.40.101/7 \.40.101 libavresample 4\.0 \.0/4 \.0 \.0 Libswscale 5\.3.100/5 \.3.100 libswresample 3\.3.100/3 \.3.100 libpostProc 55\.3.100/55 \.3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'r1ori.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: Isomiso2avc1mp41 encoder: Lavf58.20.100 Duration: 00:00:58.53, Start: 0.000000, bitrate: 1870 KB /s Stream #0:0(und): Video: H264 (High) (AVC1/0x31637661), YUv420p, 544x960, 1732 KB /s, 29.83 FPS, 29.83 TBR, 11456 TBN, TBC (default) Metadata: Handler_name: VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default) Metadata: handler_name : SoundHandlerCopy the code

In addition to the meta information of the video, it also includes more of the configuration we compiled. You can hide this information by selecting the -hide_banner parameter. The full command is as follows

Mp4 -hide_banner Input #0, mov,mp4,m4a,3gp,3g2,mj2, FROM 'r1ori. Mp4 ': Metadata: major_brand: Isom Minor_version: 512 compatible_brands: isomiso2AVc1mp41 encoder: Lavf58.20.100 Duration: 00:00:58.53, start: 0.000000, bitrate: 187KB /s Stream #0:0(und): Video: H264 (High) (AVC1/0x31637661), YUv420p, 544x960, 1732 KB /s, 29.83 FPS, 29.83 TBR, 11456 TBN, TBC (default) Metadata: Handler_name: VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default) Metadata: handler_name : SoundHandler At least one output file must be specifiedCopy the code

Let’s focus on a couple of numbers

  1. Input #0, mov,mp4,m4a,3gp,3g2,mj2, from ‘r1ori.mp4’: # Input #0 represents the first file we Input with ffmpeg -I. The subscript starts at 0, which means we can Input multiple files. In fact, FFmPEG supports multiple output files
  2. Metadata indicates video meta information
  3. The Duration line includes the video playing time of 58.53 seconds, the start time of 0, and the bitrate of the entire file 1870kbit/s
  4. Stream #0:0(und): Video: H264, this line indicates that the first stream of the file is the video stream, the encoding format is H264 format (the package format is AVC1), the data of each frame is yuV420P, the resolution is 544×960, the bit rate of the video stream is 1732kbit/s, and the frame rate is 29.83 frames per second.
  5. Stream #0:1(und): Audio: Aac, this line indicates that the second stream of the file is an audio stream, encoded in ACC format (packaged in MP4A format), with a Profile of LC specification, a sampling rate of 44.1khz, a sound channel of stereo, and a bit rate of 129kbit/s

We’re starting to see some unfamiliar terms, so let’s go through them one by one.

The container

Like the video file above, the different data streams (video streams, audio streams, and sometimes subtitle streams, etc.) are encapsulated in a file called a container. As we are familiar with mp4, AVI, RMVB and so on are multimedia container format, in general, the multimedia file suffix is its container format.

We can think of a container as something like a bottle or a jar.

Encoding and Decoding (CODEC)

Encoding: Recording and storing video and audio in a format or specification called coDEC. Coding can be thought of as processing the contents of a container.

The common video encoding formats are H264, H265, etc., and the common audio encoding formats are MP3, AAC, etc.

Decoding: it is to decode the compressed video and audio data into uncompressed video and audio raw data. For example, if we want to echo a piece of audio, we need to decode and encode the audio file first.

Soft solution: that is, software decoding, through the software to enable the CPU to decode the video file operation.

Hardware solution: that is, hardware decoding. In order to reduce the pressure of CPU, GPU is used to process part of the video data originally processed by CPU.

Soft solutions need to process a lot of video information, so soft solutions are cpu-hungry, a single FFmpeg command can dry out the CPU.

In comparison, the efficiency of hard solution is very high, but the disadvantage of hard solution is also obvious, it can not be as soft solution, subtitles, picture quality and other processing effect is not very good. If I remember correctly, seven Niuyun platform (a relatively professional audio and video platform) does not support hard solution at present.

Ffmpeg is the most common open source library of soft decoding, it is actually through such as H264, H265, MPEG-4 codec algorithm for soft solution.

In today’s audio and video field, FFMPEG almost supports all audio and video codec, very powerful.

Transcoding: Transcoding is the conversion of video from one format to another. For example, convert a FLV file to an MP4 file.

ffmpeg -i input.flv output.mp4
Copy the code

Bit rate

Bit rate, also known as bit rate, indicates the number of bytes output by the encoder per second. The unit is Kbps, where B is bit. This is a unit of computer file size.

Such as

With the same compression algorithm (we’ll cover several different ones later), the higher the bit rate, the higher the quality of the video.

For compressed files, according to the above understanding, the rough calculation of bit rate = file size/duration.

For example, r1ori.mp4 is 13.7 megabits in size and 59 seconds in length, so its bit rate is approximately (13.7 x 1024 x 8) / 59 = 1900 KB /s

Formula: 1 MB = 8 MB = 1024 KB = 8192 KB

Because of the influence of some parameters, we can only get an approximate value of this bit rate.

Fixed and variable bit rates

In the early years, Constant Bitrate (CBR) was selected for audio encoding, followed by Variable Bitrate (VBR), which refers to the fixed Bitrate output by the encoder. In this way, it is difficult to balance “calm picture” and “violent picture”. Relatively speaking, variable bit rate can well control the encoder, in the details are more, when the picture is relatively violent, use more bits, for relatively calm picture, use lower bits. In this way, in the case of a certain output quality, VBR is more advantageous, we will also choose variable bit rate for storage.

Frame and frame rate

A frame is an image.

The frames per second, or FPS, is how many frames you can output per second, and you can also understand how many times a picture is output per second.

We must have deep experience when playing the game, when the game is stuck, the picture is between the frames of the beat, very not smooth.

The frame rate affects the flow of the image, and the higher the frame rate, the smoother the image will be.

Due to the phenomenon of visual persistence (that is, when the object is moving fast, the human eye can still retain 1/24 of its image after the image disappears), the minimum frame rate is 24 for general film and video, that is, 1/24 = 0.042 seconds for each frame exposure.

The resolution of the

We should be familiar with the resolution, such as a video website common Blu-ray 1080P, ultra HD 720P, HD 540P.

Resolution can be understood as the size of the video picture, namely the width and height of the video. 720P is 720 pixels high.

After understanding bit rate and frame rate, we found that it is not absolute to say that the higher the resolution is, the clearer the video will be. What is more important is how to balance the relationship among bit rate, frame rate and resolution.

In general, we’re more likely to accept smaller, higher-resolution videos for easy storage and fun to look at.

Lossy and lossless

First of all, what is audio and video raw data? Raw data refers to data collected through audio and video equipment without any processing. The raw data of audio is in PCM format and the raw data of video is in YUV format.

Lossy and lossless, that is, there is no loss, here is a view of multimedia data compression. Lossy compression is also called destructive compression, but not the kind of compression that cannot be decompressed after compression. For example, our common MP3, MP4 files are lossy compression.

Take audio coding as an example. The sound in audio comes from nature. We capture the sound through technical solutions and then store it according to certain algorithms.

At this stage, the sound we have stored cannot be fully restored to nature, and any audio encoding is lossy.

Some students may want to raise questions. I read that the original data of audio is not in PCM format.

In fact, PCM coding is only infinitely close to lossless, it can achieve the highest fidelity of the signal, therefore, PCM coding is agreed to lossless compression.

Good audio, I want to listen to the most authentic sound collected from nature, why compress it?

Raw data is too large to store easily

Even if it is stored, it is not easy to transfer and requires a lot of bandwidth

Now the video compression ratio is very high, such as the well-known 4K 8K, seems to meet the needs

Multiplexer and demultiplexer

In the case of containers, and this is containers, we often have two kinds of frequent operations.

Taking out the audio and video data in the container is called decapsulation and is accomplished by demuxer decapsulation (also called demultiplexer).

The packing of processed audio and video data into containers is called encapsulation, which is accomplished by muxer encapsulation (also called multiplexer).

We will continue to update the concepts related to audio and video under this article. If you feel that there is any concept that is difficult to understand, please leave a message to me, AND I will collect and supplement it.