Introduction: With the development of mobile Internet, the wave of video by 5G is coming. However, many users are discouraged by the complex functions and steep learning curve of traditional video editors in the process of video creation. To this end, Baidu hundred business RESEARCH and development team combined with the actual creation needs of users, developed a simple and easy to use online video editing and publishing tools – hundred online video editor. This paper will introduce the technical principle, architecture and evolution direction of this editor in detail, and reveal the technical cooperation and innovation mechanism inside Baidu from a corner.

preface

With the rapid development of mobile Internet, people are more and more accustomed to watching video content on mobile phones. As a content production platform of Mobile 100, it needs to provide easy-to-use video editing and publishing tools for authors. The online video editor comes into being in this demand. This article will take a close look at the technology used in the hundred video editor.

Noun explanation

BOS: Baidu Object Storage (BOS) Provides stable, secure, efficient, and scalable Storage services

VOD: Video on demand service, this article specifically refers to Baidu VideoWorks (the original VOD audio and video on demand service)

What are the functions of an online video editor?

1.1 Basic functions of the editor

We investigated local video editors and listed some of the features that video editors must implement:

Material source file management, loading and editing
Multi-track editor
Drag and drop operations (Add/remove footage, add/remove effects, quick clipping, switching tracks, etc.)
Audio and video track separation
Material effect (embossing, nostalgia, etc.), transition animation (fade in and out, spiral, etc.), material animation (single point zoom, simulated shaking, etc.)
Subtitle editing and embedding
Video preview
Multiple format render exports

1.2 Unique features of the online editor

An online video editor, also to achieve the above functions, but the specific implementation is different, for example:

Material management: to achieve material source file upload and delete
Video preview: A simple preview implemented by front-end JS
Export: The online video editor mainly serves for hundreds of publishers, so it does not export video files, but connects to the video publishing process

In addition, relying on Baidu and hundred number technology system, it can also realize additional functions such as audio to subtitles, subtitles to audio, hundred number graphic content to video.

How to implement an online video editor?

2.1 Selection of back-end technology

FFmpeg is the most commonly used video codec integration framework in the industry, not only powerful, but also high codec efficiency. Therefore, the back-end service uses FFmpeg as the video codec base.

2.2 introduce FFmpeg

FFmpeg is a free software, can run a variety of audio and video formats recording, conversion, streaming functions. Includes libavCodec, a decoder library for audio and video in multiple projects, and libavFormat, an audio and video format conversion library.

Figure 1 ffmpeg delta

2.2.1 FFmpeg features

Free software, open source code;
With many filters (plug-ins), can meet the current stage of all business needs;
Support third-party filters (plug-ins) to meet future business needs;
Support custom compilation, support dynamic compilation, reduce memory consumption as much as possible;
Supports remote files (such as HTTP and FTP) as input, reducing the local disk usage.
Support GPU codec, reduce CPU consumption, improve codec speed (this business did not use GPU cluster);
The syntax is simple and convenient for secondary encapsulation or assembly.

2.2.2 Command Line Usage

Figure 2 FFmpeg command line usage

Example 1: ffmpeg-i in. Wmv-vcodec libxvid out.mp4 ffmpeg -framerate 1 -t 1 -loop 1 -i "http://pic.rmb.bdstatic.com/2b18b480a1f2d15e3667e01c45dfc157.jpeg" -vcodec libx264 -pix\_fmt yuv420p -y test.mp4Copy the code

2.2.3 Basic rules of FFmpeg Filter

The avfilter in FFmpeg is usually translated as a Filter, and the Filter is used to Filter things out.

Any editing of decoded multimedia resources is called Filtering in the broad sense, and the components and plug-ins that make that editing happen are called filters.

For example, audio up and down/speed, video frame insertion/extraction, clipping/intercepting/merging/stacking, etc.

△ Figure 3 FFmpeg transcoding and Filter process

2.2.4 Basic filter and its schematic diagram

Basic filters are as simple as using -vf to add the desired filters between the input file (and options) and the output file (and options). Such as:

Scale (static)

ffmpeg -i video_1080p.mp4 -vf scale=w=640:h=360 video_360p.mp4
Copy the code

△ Figure 4 scale diagram

Zoompan (dynamic)

ffmpeg -framerate 1 -t 1 -loop 1-i "Http://pic.rmb.bdstatic.com/2b18b480a1f2d15e3667e01c45dfc157.jpeg" - vf zoompan = z = '" if (eq (on, 0), 1, the if (lt (zoom, 1.25), zoom + 0. 0005,1.25)) ': d = 16.06 * 25: x =' the if (lt (zoom, 1.25), and 0, (1) x) ': y =' the if (lt (zoom, 1.25), 0, + 1) (y) ': s =' 1024 x720 '" - y TMP. Mp4Copy the code

△ Figure 5 zoomPAN schematic diagram

Fuzzy boxblur

Ffmpeg-i tmp.mp4 -filter_complex "boxblur=luma_radius='min(h,w)/30':luma_power=2" -y boxblur. Mp4 Blur blurCopy the code

△ Figure 6. Boxblur diagram

Overlay overlay

ffmpeg -i tmp.mp4 -i watermark.png -filter_complex "[1:v]scale=-2:48[logo]; [0:v][LOGO]overlay=48:48" -y watermark. Mp4Copy the code

△ Figure 7 Overlay Diagram

2.2.5 FFmpeg pipe syntax

Rules:

Name the stream with [name]
Separate the filters with,
Used between streams; separated
The ith input life is [i-1]
The video and audio streams for the first input file are [0:v] and [0:a].
The last stream name can be omitted

For example:

\-filter\_complex " \[0:v\]split\[front\]\[back\]; // Copy and split into front and back streams \[back] // Background stream scale=1280:-2, // scale to output width 1280 boxblur=luma\_radius='min(h,w)/30':luma\_power=2, / / fuzzy crop = iw: 720 \ [background \]; // Cut to 1280:720 \[foreground]scale=-2:720\[foreground]; / / scaling to the output level of 720 \ [background \] \ [foreground \] overlay (W - W) / 2 = (H - H) / 2 / / overlay"Copy the code

△ FIG. 8 Diagram of pipeline filter flow

Actual effect:

△ Figure 9 Execution results of pipeline filter flow

2.3 Front-end technology selection

The React framework is used to realize the front-end interface, and the quick preview function is realized based on the HTML5 audio and video player of the browser. Adjustment parameters are transferred to the player through HTML tags to achieve simple negative, emrelief, black and white playback effects, and the transition effect is simulated by superimposing dynamic pictures on the video.

Due to the performance and complexity of front-end preview solutions, front-end quick preview can only show partial editing effects.

2.4 Functional boundary and interaction of front and rear ends

2.4.1 Functional Boundaries of the Front and Rear ends

Before specific functional development, the front and back end functional boundaries should be divided according to requirements and technical capabilities. For example:

Front-end interface implementation

User interaction with the video editor
Simple video preview (The preview effect and the final result may be different due to the difference between the backend technology stack and the resources used)
Convert the results of user operations in the edit interface into a timeline data structure
.

Back-end service implementation

The timeline translates to the FFmpeg command
Call the video publishing process after the video is generated
.

Both the front and back ends need to be implemented

Subtitles <==> audio
Material to upload
.

According to our functional requirements and the function division of the front and back ends, the user interface of hundred Online video editor can be roughly divided into three areas:

The functional area inside the yellow line
The multi-track editing area inside the green line
Quick preview area inside the red line

△ Figure 10 Interface partition of hundred Online video editor

2.4.2 Timeline data structure

In order to interact between the front and back ends, we need to define a data structure that is easy to load, modify, and store by the front-end multi-track editor, and easy to extract structured data by the back end.

We define a timeline data structure in which the tracks in the timeline correspond to the tracks in the multi-track editor:

{" timeline ": {" video_track" : [{/ / video orbit "start", 0.0, "the end" : / / start time, 1.5 / = start/end time + duration * speed "type" : Blank "height": 720,"width": 1280,"in_effect": "Fade_in", / / entry effect "out_effect" : "fade_out", "style" : / / exit effect "negative", / / effect: negative, fuzzy, embossing, black and white, and so on "duration" : 1.5, // Length "speed": 1, // Playback speed" animation": "zoompan", // Animation effects of video resources, such as camera shaking, pan zooming, etc. "sourceUrl": "Http:// *. Baidu.com/c20ad4d76fe97759aa27a0c99bff6710.mp4}]," "audio_track" : / / / audio track {" start ": 0.0, / / start time" end ": 1.5, // End time = start + duration * speed"type": "video", // slience"in_effect": "Fade_in ", // Entry effect" out_effect": "fade_out", // Exit effect" style": "jazz", // effects: jazz, rock, vocals, etc. Equalizer effect" duration": 1.5, / / the length "speed" : 1, / / play speed "sourceUrl" : "http:// *. Baidu.com/c20ad4d76fe97759aa27a0c99bff6710.mp3", "auto_subtitle" : True, / / voice subtitles}], "the subtitle" : [{/ / subtitle orbit "start", 0.0, "the end" : / / start time, 1.5 / = start/end time + duration * speed "type" : Slience "style": "Arial,23,yellow,white", // Effect: font, size, color, stroke color "duration": 1.5, / / the length "text" : "this is a subtitle," "pos_x" : 100, / / subtitle orientation "pos_y" : 200, / / subtitle orientation "TTS" : true, / /}] use subtitles synthesized speech, "watermark" : {"start": 0.0, // Start time "end": 1.5, // End time = start + duration * speed"style": Style_params: "0.8", "duration": 1.5, "sourceUrl": "Http:// *. Baidu.com/c20ad4d76fe97759aa27a0c99bff6710.png pos_x", "" : 100, / / map location" pos_y ": 200, / / map location" height ": 100, the "width" : / / map height 100, / / map width}}], "author_info" : {}, / / the author information "extra" : {},} / / other informationCopy the code

2.4.3 Asynchronous Invocation and Polling

When the user finishes the editing work, he/she needs to click the “Save” button to submit. At this time, the front end will encapsulate all resource elements in the multi-track editor into a timeline structure and pass it to the back-end service. After receiving the timeline structure, the back-end service interprets and calls FFmpeg for video codec.

This back-end phase is computationally intensive, typically taking 2-5 times the length of the timeline to complete the final video composition. Therefore, after clicking the “Save” button, the front end uses asynchronous call + periodic polling to check whether the back-end video synthesis is complete.

2.5 Back-end timeline translation process

As mentioned earlier, the back-end service translates the timeline passed from the front end into an FFmpeg command.

The main flow of this step is shown below:

△ Figure 11 FFmpeg command diagram of timeline translation

3. The concrete realization of hundred online video editor

3.1 Overall architecture of Hundred Video editor

△ Overall structure of Figure 12

3.2 User interface and service Interface

Video editors are currently available in two ways: graphical interfaces for end users and service interfaces for developers.

Among them, the graphical interface is integrated in the background of baibaihao content creation, which is now open to some authors of Baibaihao. The audio transcoding and video merging services provided through the interface have also been applied to the online services of Baibaihao.

3.3 Business layer: Timeline translation

In the business layer, in order to isolate internal and external network requests, a UI layer module is added to handle video editing requests from the graphical interface. Service module is the core module of the editor developed based on PHP. Its main function is to equalize the two types of requests: graphical interface and Service interface, translate the timeline data structure into FFmpeg commands that can be executed directly, and send them to the offline scheduling module for execution.

The Service module of the business layer mainly completes the following tasks during translation:

3.3.1 Image and video

The proportions and sizes of incoming video/images may be inconsistent with the final output, such as video taken in portrait on mobile phones, images downloaded from the Internet, etc. Before the industry for different proportions of the video, either leave black edge, or local cropping. With the rise of short videos on mobile phones, it is now popular to replace black edges with blurred and enlarged background images, as shown in Figure 13.
** Zoompan: ** For incoming static images, it is common to move the images so that they are not too rigid and have a better display.

3.3.2 Video connection and transition

**concat: ** Merges incoming images/video streams into a longer video track.
** Overlay: ** Add a layer of transition animation at the moment the video is attached to the video to avoid harsh direct screen transitions.

△ Figure 13 Overlay Adds cutscenes

3.3.3 audio

Multiple incoming video audio/dubbing /TTS read together into a long sound track.
Add BGM according to the user’s choice to make the video more atmospheric.
Handle fade-ins and fade-outs to avoid abrupt transitions.

3.3.4 subtitles

Add ass special effects subtitle header.
Generate ass subtitle files based on the text in the timeline.
Suppress ass subtitle files into the video stream.

△ Figure 14 special effects subtitle header

3.3.5 assembly

Combine all filter commands into a pipe filter stream to generate a filter stream script.
Upload the filter stream script and the generated ASS subtitles to BOS at the same time, so that subsequent FFmpeg commands can read and execute them directly.

3.3.6 other

You need to add blank video/audio of a specific length to the blank position to ensure that the timeline of the generated video is consistent with that of the video editor interface.
For longer text, you need to finely split it to ensure that each subtitle is synchronized with the TTS reading (this step is calculated at the UI level).

3.4 Internal Services

In the business layer, it involves user information, material information, voice synthesis and other queries and calls. These functions are provided by Baibaihao and Baidu internal services.

3.5 Offline Scheduling

Dispatch is a distributed task scheduling system, which is responsible for evenly executing FFmpeg requests in multiple instances (or containers), uploading generated resources to BOS/VOD, and calling back the Service layer module to return the execution results of task scheduling.

FFmpeg is a set of open source, perfect audio and video flow coding free software, responsible for the final implementation of FFmpeg command, the generation of audio and video files.

4. Offline scheduling framework: realize distributed FFmpeg scheduling

4.1 Dispatch Architecture Diagram

△ Figure 15 Dispatch architecture

4.2 Implementation principle of Dispatch

When the instance starts, the Redis Hash data structure registers itself, member= IP, value = current queue length: current status: updated timestamp;
After receiving a request from the Service layer module, if its current queue length is 0, the request is directly executed locally; otherwise, the request is forwarded to the normal instance with the shortest queue.
Before forwarding the request, obtain all Dispatch data from Redis, parse the IP address, queue length, status and update timestamp of all instances, and select the best instance to forward the request according to the rules.
When consuming the request in the queue, FFmpeg is called to get the input file, pipe filter stream script, ASS subtitle file from BOS, and then execute channel filter stream script, generate output file on local disk, and upload BOS/VOD;
Based on the request parameters, the Service layer module interface is called back to update the task status.

5. Text to video technology project: a technical attempt based on video editor back-end service

5.1 Edit videos by scene

In contrast to the video editor, the graphic to Video project’s user interface does away with the timeline and uses the concept of “Scene” instead. That is, a picture + a paragraph is a scene, and the video is connected by the scene.

△ Figure 16 Creating video by scene unit (design draft)

5.2 Transfer from the URL of the landing page to the video

Thanks to the simple concept of scenes, the landing page URL can be simply transformed into scenes, allowing graphic/gallery authors to start editing and creating video content with one click.

Figure 17 shows a flow chart of this authoring process.

△ Figure 17 FLOW of URL to video

Once the timeline is converted, the video editor’s interface can be invoked to generate and publish the video.

5.3 Converting pictures to Video Demo

At the end of this article, several videos generated during the technical verification of the image-to-video project are attached to show the actual effect. Please go to this link to watch: mp.weixin.qq.com/s/wHrQS9DXE…

6. Summary and outlook

6.1 Combination innovation, adapt to the trend

The online video editor technology of Hundred can be simply summarized as follows: the back-end uses PHP to translate the time axis format data generated by front-end JS into FFmpeg commands, and executes FFmpeg to produce final videos through Dispatch dispatching framework. From this point of view, there is no high technical threshold or complex and abstract logical model for this technology. Our technological innovation is mainly to combine existing technologies and form a new technical solution to adapt to the trend.

With the advent of the wave of video, not only ordinary users have a large demand for video content, but also creators are eager to see the convenience of video editing tools. Hundred has been standing on the perspective of creators, for creators to provide more excellent video editor. We hope that through our efforts, the creators in the wave of video can bring powerful OARS.

6.2 Technology sharing and win-win cooperation

In the process of the development of hundred online video editor technology, it attracted the attention of the media cloud team from Baidu ACG, and the two teams conducted in-depth technical exchanges on online video editor technology.

Subsequently, Baidu media Cloud developed an intelligent video fast programming service based on this technology. Thanks to the long-term technology accumulation of media cloud and in-depth mining of the underlying technology of video editing, intelligent video fast editing service uses intelligent sharding +GPU encoding and decoding technology, which improves the efficiency of video editing and synthesis several times. Meanwhile, it also provides more new features and functions, making online video editing technology more practical.

At present, Baijia is gradually transferring the video editor and the underlying service of general video editing capability to the intelligent video fast editing service of media cloud. As an exporter of online video editor technology, team Hundred has begun to enjoy the dividend brought by technical cooperation.

The original link: mp.weixin.qq.com/s/wHrQS9DXE…

———- END ———-

Baidu Architect

Baidu official technology public number online!

Technical dry goods, industry information, online salon, industry conference

Recruitment information · Internal push information · technical books · Baidu surrounding

Welcome students to pay attention to!

Technology evolution of hundred online video editor