Wasm + FFMPEG to achieve front-end video frame capture function

Is there a way to do audio and video on the front end? For example, if the user selects a video and then allows him to set any frame of the video as the cover, there is no need to upload the whole video to the back-end processing. After some groping by the author, the basic implementation of this function, a complete demo: FFMPEG WASM capture video frame function:

Support MP4 / MOV/MKV/AVI and other files. The basic idea is this:

Use a file input to let the user select a video file, and then read it as an ArrayBuffer and pass it to FFmPEg. wASM for processing. After processing, the output RGB data is drawn to canvas or converted to Base64 as the SRC attribute of the IMG tag to form an image. (Canvas can directly use video DOM as the drawImage object and get the video frame, but video can play in a few formats. This article focuses on the implementation of FFMPEG scheme, because FFMPEG can also do other things, this is just an example.)

Here’s a question: why use FFMPEG instead of JS? Because C library of multimedia processing are more mature, ffmpeg is one of them, or open source, and wasm it just can be converted into the format, used on web pages, multimedia processing related JS library is less, write a multi-channel solution reuse (demux) and the complexity of the decoding video predictably, JS directly codec will be more time-consuming. So use what’s already there.

Step 1 is compilation (skip to step 2 if you’re not interested in the compilation process)

1. Compile FFMPEG to WASM

At first I thought it would be very difficult, but it turns out it’s not that difficult, because one of the videoConverter. Js has been turned around. The key is to disable some useless features in Configure. Otherwise, syntax errors will be reported during compilation. This is a wASM transfer from EMSDK. The installation method of EMSDK is very clear in its installation tutorial, mainly using scripts to determine the system to download different compiled files. Emcc is the C compiler, emc++ is the C++ compiler, and emar is used to package different. O library files into a single. A file.

Download the source code from ffMPEG’s official website.

(1) the configure

Unzip to the directory and run the following command:

emconfigure ./configure --cc="emcc" --enable-cross-compile --target-os=none --arch=x86_32 --cpu=generic \
    --disable-ffplay --disable-ffprobe --disable-asm --disable-doc --disable-devices --disable-pthreads --disable-w32threads --disable-network \
    --disable-hwaccels --disable-parsers --disable-bsfs --disable-debug --disable-protocols --disable-indevs --disable-outdevs --enable-protocol=file
Copy the code

Configure generates the Makefile. The configure phase identifies the build environment and parameters, and then generates the build commands to put in the Makefile.

The main function of emconfigure is to specify compilers as EMCC, but this is not enough because ffMPEG has several submodules that do not specify all compilers as EMCC. Ffmpeg’s configure can specify a custom compiler with the –cc parameter. On the Mac, C compilers usually use /usr/bin/clang, which is specified as emcc.

Disable disables certain features that do not support WASM. For example, disable-ASM disables parts that use assembly code because the assembly syntax is incompatible with EMCC and will result in syntax errors when compiled. Disable-hwaccels disable hard decoding. Some graphics cards support direct decoding, which does not require application decoding (soft decoding). Hard decoding performance is significantly higher than soft decoding.

[swscaler @ 0x105c480] No accelerated colorspace conversion found from yuv420p to rgb24.

But does not affect the use.

(Configure generates a segment fault, but not a segment fault.)

After the configure command completes, the Makefile and associated configuration files are generated.

(2) the make

Make is the start of the compilation phase, execute the following command to compile:

emmake makeCopy the code

On a Mac, you’ll notice that an error will occur when you try to assemble a.A file from multiple.o files:

AR libavdevice/libavdevice.a

fatal error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar: fatal error in /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib

To solve this problem, change the packaging command from ar to emar, then remove a ranlib process, and modify the ffbuild/config.mak file:

Alter ar to emAR
- AR=ar
+ AR=emar

# remove ranlib
- RANLIB=ranlib
+ #RANLIB=ranlibCopy the code

And then you can make it again.

After compiling, a total ffmpeg file is generated in the ffmpeg directory. Libavcodec and other ffmpeg directories generate libavcodec.a and other files, which are the bitcode files we will use later. Bitcode is the intermediate code of the compiled program.

(strip -o ffmpeg ffmpeg_g will hang, but it doesn’t matter, strip will be cp ffmpeg_g ffmpeg)

2. Using ffmpeg

Ffmpeg consists mainly of several lib directories:

Libavcodec: provides codec functions
Libavformat: Multiplexing (DEMux) and multiplexing (MUX)
Libswscale: Image scaling and pixel format conversion

Take an MP4 file as an example. Mp4 is a container format. First, the libavFormat API is used to multiplex mp4 files to obtain information about the location of audio and video files in the file. Therefore, we need to use libavCodec to decode the image to get yuV format, and finally convert the image to RGB format with libswScale.

There are two ways to use FFMPEG. The first is to compile the FFMPEG file from the first step directly into WASM:

You need to copy a.bc suffix because EMCC distinguishes file formats based on the suffix
cp ffmpeg_g ffmpeg.bc
emcc ffmpeg.bc -o ffmpeg.htmlCopy the code

This generates ffmPEg. js and ffpmeg.wasm. Ffmpeg. js is used to load and compile WASM files and provides a global Module object to manipulate the FFMPEG API functionality in WASM. Once you have this, call the FFMPEG API from your Module in JS.

However, I feel that this way is more troublesome, JS data type and C data type are more different, in JS frequently call C API, need to let the data pass is more troublesome, because to achieve a interception function need to call a lot of FFMPEG API.

So I use the second way, first write C code, implement the function in C, and finally expose an interface to JS, so that JS and WASM only need to communicate through an interface API, and do not need to be called as frequently as the first way.

So the question becomes two steps:

The first step is to use C language to write an FFMPEG save video frame image function

The second step is to compile into WASM and JS for data interaction

The implementation of the first step mainly refers to an FFMPEG tutorial: ffMPEG Tutorial. Inside the code is ready-made directly copy over, there are some small problems is he used ffMPEG version is a little old, some API parameters need to be modified. Code has been uploaded to Github, visible: cfile/simple.c.

To use this method, described in the readme, compile it into an executable file simple with the following command:

gcc simple.c -lavutil -lavformat -lavcodec `pkg-config --libs --cflags libavutil` `pkg-config --libs --cflags libavformat` `pkg-config --libs --cflags libavcodec` `pkg-config --libs --cflags libswscale` -o simpleCopy the code

Then use the time to pass a video file location can be:

./simple mountain.mp4Copy the code

A PCM image is generated in the current directory.

Simple. C is the API that calls FFmPEG to automatically read files from the hard disk. It needs to be changed to read files from memory, that is, we read the buffer from memory and pass it to FFMPEG, then we can transfer data from JS buffer. simple-from-memory.c. Specific C code here will not be analyzed, is the tuning API, relatively simple, is to know how to use, FFMPEG web development documents are relatively few.

This completes the first step, and then the second step is to change the input of the data to get from JS and the output to return to JS.

3. Interaction between JS and WASM

The wASM version is implemented in web.c (and proccess.c is a separate function from simple. C). In web.c, there is a function exposed to JS calls called setFile.

EMSCRIPTEN_KEEPALIVE // This macro indicates that the function is to be exported
ImageData *setFile(uint8_t *buff, const int buffLength, int timestamp) {
    // process ...
    return result;
}Copy the code

Three arguments need to be passed:

Buff: Raw video data (passed in via JS’s ArrayBuffer)
BuffLength: Total size of the video buff in bytes
Timestamp: is the number of seconds of video frame that you want to capture

The ImageData structure returns an ImageData:

typedef struct {
    uint32_t width;
    uint32_t height;
    uint8_t *data;
} ImageData;Copy the code

There are three fields: width and height of the picture and RGB data.

After writing these C files, compile:

emcc web.c process.c .. /lib/libavformat.bc .. /lib/libavcodec.bc .. /lib/libswscale.bc .. /lib/libswresample.bc .. /lib/libavutil.bc \ -Os-s WASM=1 -o index.html -s EXTRA_EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' -s ALLOW_MEMORY_GROWTH=1 -s TOTAL_MEMORY=16777216
Copy the code

Use the files such as libavcode. BC generated in step 1. These files have dependency order and cannot be reversed. Here are some parameters to illustrate:

-o index. HTML: Exports the HMTL file. Both index.js and index.wasm are exported. The generated index. HTML file is useless.

-s EXTRA_EXPORTED_RUNTIME_METHODS='[” CCall “, “cwrap”] -s EXTRA_EXPORTED_RUNTIME_METHODS='[” CCall “, “cwrap”]

-s TOTAL_MEMORY=16777216 Indicates that the total wASM memory size is about 16MB. This is also the default value. This value must be a multiple of 64.

-s ALLOW_MEMORY_GROWTH=1 Automatically expand memory capacity when the total memory size exceeds.

HTML, add input[type=file] controls, and import the index.js generated above. This will load index.wasm and provide a global Module object to manipulate the WASM API, including the functions specified to export at compile time. The following code looks like this:


       
<html>
<head>
    <meta charset="utf-8">
    <title>Ffmpeg WASM capture video frame function</title>
</head>
<body>
<form>
    <p>Please select a video (local operation will not upload)</p>
    <input type="file" required name="file">
    <label>Time (seconds)</label><input type="number" step="1" value="0" required name="time">
    <input type="submit" value="Get image" style="font-size:16px;">
</form>
<! -- This canvas is used to draw exported images -->
<canvas width="600" height="400" id="canvas"></canvas>
<! -- the introduction of the index. Js - >
<script src="index.js"></script>
<script>
<script>
!function() {
   let setFile = null;
   // WASM has been downloaded and parsed
   Module.onRuntimeInitialized = function () {
        console.log('WASM initialized done! ');
        // The exported core handler function
        setFile = Module.cwrap('setFile'.'number'['number'.'number'.'number']); }; } ();</script>Copy the code

It needs to be downloaded and parsed by WASM before it can start, which provides an onRuntimeInitialized callback.

The first argument is the function name. The second argument is the return type. Since the return address is a pointer address, this is a 32-bit number, so use js number.

Then read the contents of the input file into a buffer:

let form = document.querySelector('form');
// Listen for the onchange event
form.file.onchange = function () {
    if(! setFile) {console.warn('WASM not loaded parsing completed, please wait ');
        return;
    }
    let fileReader = new FileReader();
    fileReader.onload = function () {
        // Get the raw binary data of the file ArrayBuffer
        // Add to Unit8Array of buffer
        let buffer = new Uint8Array(this.result);
        // ...
    };
    // Read the file
    fileReader.readAsArrayBuffer(form.file.files[0]);
};Copy the code

The read buffer is placed in a Uint8Array, which is an array in which each element is a Unit8-type unsigned 8-bit integer, that is, the numeric size of 0101 in one byte.

The next key question is: How do I pass this buffer to wASM’s setFile function? This requires an understanding of wASM’s memory heap model.

4. Memory heap model for WASM

Module.buffer and module. HEAP8 show the size of memory used by wASM.

This thing is the key to the interaction between JS and WASM data. In JS, put the data into the HEAP8 array, and then tell WASM where the pointer address of the data is and how much memory is occupied by the HEAP8 array. In turn, WASM wants to return data to JS which is also put into this HEA8, and then returns the pointer address and length.

But we can’t just specify a location, we need to use the API it provides for allocation and expansion. DynamicMalloc/module.dynamicmalloc/module.dynamicmalloc/module.dynamicmalloc/module.dynamicmalloc/module.dynamicmalloc

// Get the raw binary data of the file and store it in buffer
let buffer = new Uint8Array(this.result);
// Apply a specified amount of memory to the HEAP
// return the address of the starting pointer
let offset = Module._malloc(buffer.length);
// Populate the data
Module.HEAP8.set(buffer, offset); 
// Finally call WASM function
let ptr = setFile(offset, buffer.length, +form.time.value * 1000);Copy the code

Call malloc, pass the required memory size, and return the allocated memory offset, which is actually the index in the HEAP8 array, and then call the Uint8Array set method to fill in the data. We then pass the address of the offset pointer to setFile and tell it how much memory it has. This enables JS to transmit data to WASM.

SetFile returns a pointer address to a struct data structure:

typedef struct {
    uint32_t width;
    uint32_t height;
    uint8_t *data;
} ImageData;Copy the code

The first 4 bytes are used to indicate the width, followed by the next 4 bytes are used to indicate the height, followed by the pointer to the RGB data of the picture, the size of the pointer is also 4 bytes, this omitted the data length, because it can be obtained by width * height * 3.

So [PTR, PTR + 4) stores the width, [PTR + 4, PTR + 8) stores the length, [PTR + 8, PTR + 12) stores the pointer to the image data, as shown in the following code:

let ptr = setFile(offset, buffer.length, +form.time.value * 1000);
let width = Module.HEAPU32[ptr / 4]
    height = Module.HEAPU32[ptr / 4 + 1],
    imgBufferPtr = Module.HEAPU32[ptr / 4 + 2],
    imageBuffer = Module.HEAPU8.subarray(imgBufferPtr, 
                      imgBufferPtr + width * height * 3);Copy the code

HEAPU32 is similar to HEAP8, except that it reads each 32-bit number. Since we have 32-bit numbers, this is just fine. It is 4 bytes per unit, whereas PTR is 1 byte per unit, so PTR / 4 gets index. You don’t have to worry about not being divisible by 4 here, because it’s 64-bit aligned.

So we get the RGB data content of the image and draw it on canvas.

5. Canvas draws images

Use the Canvas ImageData class as shown in the following code:

function drawImage(width, height, buffer) {
    let imageData = ctx.createImageData(width, height);
    let k = 0;
    // Put the buffer memory into ImageData
    for (let i = 0; i < buffer.length; i++) {
        // Note that buffer data is RGB, while ImageData is RGBA
        if (i && i % 3= = =0) {
            imageData.data[k++] = 255;
        }
        imageData.data[k++] = buffer[i];
    }
    imageData.data[k] = 255;
    memCanvas.width = width;
    memCanvas.height = height;
    canvas.height = canvas.width * height / width;
    memContext.putImageData(imageData, 0.0.0.0, width, height);
    ctx.drawImage(memCanvas, 0.0, width, height, 0.0, canvas.width, canvas.height);
}
drawImage(width, height, imageBuffer);Copy the code

If you do not want to drawImage, the page will have one or two gigabytes of memory. If you do not want to drawImage, the page will have one or two gigabytes of memory. If you do not want to drawImage, the page will have no memory.

drawImage(width, height, imageBuffer);
// Free memory
Module._free(offset);
Module._free(ptr);
Module._free(imgBufferPtr);Copy the code

Code written in C also has to free up the memory requested during the process, otherwise the memory leak is quite serious. If free is correct, the malloc address is 16358200 each time. If free is not correct, the malloc address will be expanded again each time, and the increasing offset address will be returned.

But this thing still consumes a lot of memory overall.

6. Existing problems

After ffMPEG is initialized, the memory used for the web page goes up to 500MB. If a 300MB file is selected, the memory goes up to 1.3GB, because a 300MB memory is required for malloc to call setFile. Then, during the setFile execution of THE C code, a 300MB context variable will be malloc, because to process the MOV/M4V format, such a large context variable will be needed to obtain the MOOV information, which will add up to more than 1GB. In addition, WebAssembly Memory can only be grown and not shrink, which means that the expanded Memory is always there. For regular MP4 files, the context variable only needs 1MB, which keeps the memory under 1GB.

The second problem is that the generated WASM file is quite large. The original wASM file was 12.6MB, and after gzip it was 5MB, as shown below:

Because FFMPEG itself is quite large, if you can delve into the source code and slim it down by disabling or not including some useless features, or extracting only useful code, this can be a bit more difficult.

The third problem is the robustness of the code. In addition to trying to get memory down, you also need to consider some out-of-bounds memory access issues, because sometimes running throws the exception:

Uncaught RuntimeError: memory access out of bounds

Although there are some problems, it has been running at least, and may not have the value of deploying the production environment for the time being, which can be gradually optimized later.

In addition to the examples in this article, ffMPEG can also be used to implement other functions that allow web pages to handle multimedia directly. Basically, as long as FFMPEG can do, it can also run in web pages, and WASM performance is higher than direct JS running.

Wasm + FFMPEG to achieve front-end video frame capture function

1. Compile FFMPEG to WASM

(1) the configure

(2) the make

2. Using ffmpeg

3. Interaction between JS and WASM

4. Memory heap model for WASM

5. Canvas draws images

6. Existing problems

Related Posts

I met the MVC

Code Coverage Principles (PART 1)

LeetCode — Full Permutation (DFS)