How to generate video with FFMpeg

preface

FFMpeg is pronounced “FFMpeg”, “FF” stands for “Fast Forward”, and “Mpeg” stands for the Moving Picture Experts Group.

According to the official presentation, FFMpeg is a complete, cross-platform audio and video recording, conversion and streaming solution. Simply put, as long as it involves audio and video development, basic can’t get around this tool.

A quick start

FFMpeg quick start, suggest to see Ruan Yifeng teacher “FFMpeg video processing tutorial”, which tells the basic concepts of audio and video processing, such as FFMpeg support container, encoding format and encoder; I’ll also cover common uses of FFMpeg, such as viewing file information, converting encoding formats, and extracting audio.

2. Basic knowledge of audio and video

When I use FFMpeg myself, I found that if I want to use FFMpeg well, some basic knowledge of audio and video is still necessary, so here is a summary.

Now short video is so popular, I believe that we also often watch, and the composition of a video is not complicated, is a combination of image, audio, subtitles.

For images, there are two concepts that need to be distinguished: image format and color space. Image formats are how images are compressed, encoded and stored, such as JPEG and PNG. Color space is a mathematical description of color, which can be divided into different color models according to different presentation methods. The most commonly used color models are RGB(used in computer graphics), YUV(used in video system), and CMYK(used in color printing). (You’ll see YUV a lot later)

For audio, there are also two concepts that are important. One is the collected raw audio data (such as PCM), and the other is the compressed audio data, such as AAC, which will also be seen later.

For subtitles, there are three common formats, namely SRT, SSA and AAS.

SRT subtitle is a text format subtitle, which is the simplest subtitle, because it only consists of the time and subtitle content, such as the following:

# The first line is numbered, indicating the number of subtitle
The second line is the time range, to the millisecond
# The third word is the displayed text content0 00:00:00,000 --> 00:00:01,000 Suppose Zhang SAN carries $100,000 to invest 1 00:00:02,000 --> 00:00:03,000 after converting into RMB, the bank will have $100,000 more foreign exchangeCopy the code

Ssa subtitle is a more advanced subtitle file format than SRT subtitle, and the similar ASS subtitle is actually the PLUS version of SSA subtitle. The essence of ASS subtitle is SSA V4.00 +, which is built based on SSA 4.00+ coding. Here are the details of the ASS subtitles:

# This is the ASS caption from the SRT caption conversion above
# Script Info: Contains the header and general information of the Script
# V4+ Styles: contains all style definitions
# Events: contains all scripted Events, including captions, comments, images, etc[Script Info] ; ScriptType: V4.00 + PlayResX: 384 PlayResY: 288 ScaledBorderAndShadow: generated by FFmpeg/Lavc58.91.100 ScriptType: V4.00 + PlayResX: 384 PlayResY: 288 ScaledBorderAndShadow: yes [V4+ Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding Style: Default, Arial, 16, & Hffffff, & Hffffff, & H0, & H0, 0,0,0,0,100,100,0,0,1,1,0,2,10,10,10,0 [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 00 0, 0:00:00, 0:00:01onsaturday (UK time). 00, the Default, 0, 0, and assuming that zhang SAN carry to invest $100000 Dialogue: 0, 0:00:02. 00, 0:00:03. 00, the Default, 0, 0, and converted into RMB, the bank more than $100000 of foreign exchangeCopy the code

Three, a video construction

The reason I used FFMpeg was because I wanted to generate video from images and add audio and subtitles to make a finished video, so here are some of the mental steps I took to build it.

3.1 Project Structure

The audio and video generated by this practice will be uploaded to Github, which can be viewed here:

# Project structure$tree -l -l 1. ├─ add_audio# Add audio├ ─ ─ add_caption# Add subtitles└ ─ ─ img_to_video# Picture to video
Copy the code

3.2 Image generation video

To show it, I took a random picture from the Internet:

The command for converting pictures to videos is as follows:

$ ffmpeg -r 25 -i img001.jpg -vcodec libx264 -pix_fmt yuv420p one_img_to_video.mp4
...
[libx264 @ 0x7faf5b809200] i8c dc,h,v,p: 65% 19%  9%  7%
[libx264 @ 0x7faf5b809200] kb/s:8960.40
Copy the code

Here is a breakdown of each parameter:

-r: rate, used for settingVideo frame rate. Video frame rate is the number of frames displayed per second, commonly 30, 25, or 24FPS. This time set it to 25FPS, which means 25 images per second.
-i: input: indicates the input source file.
-vcodecVideo codec (h.264libx264.
-pix_fmtPixel formats,yuv420pIs one of the yuVs mentioned above.
one_img_to_video.mp4: Indicates the last output file name.

The generated video is very short (0 seconds), because the frame rate is set to 25, but only one image is entered. There are not enough images, so the generated video is very short.

There are two solutions: lower the frame rate (not recommended) and increase the number of images (recommended).

I started by lowering the frame rate to increase the time (my requirement is that the same image should be displayed for about 10 seconds), because 25FPS is 25 images per second, so if I set it to 0.1FPS, it is the same as 1 image in 10 seconds. The test is as follows:

$ffmPEG-r 0.1 -i img001.jpg -vcodec libx264-pix_fmt yuv420p one_IMg_to_video_SMALL_rate.mp4Copy the code

As you can see from the picture below, the duration of the MP4 was actually achieved, but it had some problems. Not only did the editing software not support it (for example, cutting images), but it was also very strange when adding audio and subtitles.

The second way is to increase the number of images, which I found after using clip, because it is the same as the principle of clip dragging pictures to increase the length of the video:

Adding images in batches can be done by writing any script, but the number of images needs to be calculated. For example, a 10-second video at 25FPS would require 10 x 25 = 250 images:

# This is used when entering multiple images
# %03d is 001, 002, 003... 100
$ cd img_to_video

$ ffmpeg -r 25 -i img/img%03d.jpg -vcodec libx264 -pix_fmt yuv420p multi_img_to_video.mp4
Copy the code

Some of you might wonder, why do I take the -pix_fmt yuv420p argument every time? This is also a pitfall, because some programs, such as the Mac’s QuickTime Player, cannot recognize generated MP4 files without this parameter.

The reason can be found in the official documentation, because the way we generate the video is actually a sequence of images (a series of images), corresponding to the encoding type image2, which is why some articles sometimes see them with more than the above command -f image2 parameter (it doesn’t matter whether it is added or not). Under this encoding, the default pix_fMT parameter is not YUv420P, but obtained from the first image, while JPG images are RGB, so the resulting video is unrecognizable.

3.2 Video Add Audio

The video generated in this way has no sound, so we need to add audio to it through FFMpeg.

Sometimes the audio format we get is not MP3, but WAV, in which case we can use the following command to convert:

$ffmpeg-i input.wav -vn -ar 44100 -ac 2 -b:a 192K output.mp3 -i: disable video to make sure no video is included -ar: Set audio sampling frequency. For output streams, it defaults to the frequency of the corresponding input stream. For input streams, this option only makes sense for the audio grab device and the raw demultiplexer, and is mapped to the appropriate demultiplexer option. - AC: Sets the number of audio channels. The 2 here is to make sure it is stereo (2 channels). For output streams, it defaults to the number of input audio channels. For input streams, this option only makes sense for the audio grab device and the raw demultiplexer, and is mapped to the appropriate demultiplexer option. -b:a: converts the audio bitrate to the exact 192kbit/ secondCopy the code

The above explanation refers to the term demultiplexing, but what is demultiplexing? When we open a multimedia file, the first step is demultiplexing, called Demux. Why is this step necessary and what exactly does this step do? We know that in a multimedia file, both audio and video are included, and audio and video are compressed separately, because the compression algorithm of audio and video is different, since the compression algorithm is different, so it must be different decoding, so the audio and video need to be decoded separately. Although audio and video are compressed separately, the compressed audio and video are bundled together for transmission. So the first step in decoding is to separate the audio and video streams that are tied together, known as ** demultiplexing. ** Simply put, the demultiplexing step is to separate the audio stream from the video stream for subsequent decoding.

After converting, you can add audio to the video, using the image video generated above (note that you can also add audio in WAV format, but I prefer mp3).

# Copy video
$ cp img_to_video/multi_img_to_video.mp4 add_audio/input.mp4

There are several ways to add audio:
# Method 1: Stream copy (not recommended)
# This method has no codec process, only demultiplexing, so the speed is very fast, the current test is not successful, it is not recommended
$ ffmpeg -i input.mp4 -i input.mp3 -codec copy audio_copy.mp4

# 2: Manually select a specific stream (not recommended)
$ ffmpeg -i input.mp4 -i input.mp3 -map 0:v -map 1:a -c copy audio_manually.mp4

# Method 3: Recode (test valid)
$ ffmpeg -i input.mp4 -i input.mp3 -c:a aac -c:v libx264 audio_recode.mp4

Sometimes the audio length is longer than the video length, for example, the audio length is 20s, the video length is 10s, using the command above will lengthen the video length to 20s
-Shortest parameter if audio length is to be the same as video length
$ ffmpeg -i input.mp4 -i input.mp3 -c:a aac -c:v libx264  -shortest audio_recode_short.mp4
Copy the code

3.3 Adding subtitles to Videos

After adding the audio, you can add the subtitles. For the subtitle conversion tool, you can write it by hand or use a ready-made one, such as this one:

TXT to SRT Converter

It is also very convenient to use, each line is a line of subtitles, and finally set the start time is ok (not necessarily match the actual reading) :

The command for adding SRT subtitles is as follows:

# Copy the previously generated video
$ cp add_audio/audio_recode.mp4 add_caption/input.mp4

# Add subtitles
$ ffmpeg -i input.mp4 -vf subtitles=input.srt video_with_srt.mp4

Too many packets buffered for output stream 0:1
# This exception is thrown because some video data has a problem, which causes the video to be processed too fast and the queue to overflow when the container is encapsulated
This can be resolved by increasing the size of the container wrapper queue, for example, setting the maximum wrapper queue size to 1024
$ ffmpeg -i input.mp4 -vf subtitles=input.srt -max_muxing_queue_size 1024 video_with_srt.mp4 
Copy the code

Sometimes we need to customize the style of the subtitles or the position of the subtitles. In this case, we can first convert SRT subtitles to ASS subtitles and then make adjustments. If you have FFMpeg installed, the conversion can be done with a single command. If not, you can use online tools such as subtitle sauce.

FFMpeg conversion command:

$ ffmpeg -i input.srt output.ass
Copy the code

Add ass subtitle command:


$ ffmpeg -i input.mp4 -vf "ass=output.ass" video_with_ass.mp4
Copy the code

The final effect is as follows:

Modify [V4+ Styles] if you want to control the type, size, and display position of a caption, etc.

The first line contains the field name and the second line contains the field value
# Fontname: font
# Fontsize: Fontsize
MarginL: distance from subtitles to the left, ranging from 0 to PlayResX
MarginR: distance from subtitles to the right, ranging from 0 to PlayResX
MarginV: subtitle height, ranging from 0 to PlayResY[V4+ Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding Style: Default, Arial, 16, & Hffffff, & Hffffff, & H0, & H0, 0,0,0,0,100,100,0,0,1,1,0,2,10,10,10,0Copy the code

Note: Refer to this article for descriptions of other parameters

Suppose I want to change the marquee size to 20 and move the marquee up, the corresponding change is as follows:

[V4+ Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding Style: Default, Arial, 20, & Hffffff, & Hffffff, & H0, & H0, 0,0,0,0,100,100,0,0,1,1,0,2,10,10,50,0Copy the code

Then add it again:

$ ffmpeg -i input.mp4 -vf "ass=new.ass" video_with_new_ass.mp4
Copy the code

The final effect is as follows:

Write in the last

Above is how to use FFMpeg to build a complete video of the whole process, I hope to help you!

Refer to the tutorial

TXT to SRT FFmpeg Formats Convert audio files to MP3 using FFmpeg FFmpeg: Too many packets buffered for output stream 0:1