Author’s brief introduction

Kong Weile, senior development engineer of Android platform of Qiniuyun Client team, focusing on audio and video, graphics and image fields. As an OpenGL expert, I have participated in the development of live streaming and Lianyu SDK, led the architecture design and implementation of short video SDK, and had rich experience in client architecture design and performance optimization.

History of short video

Figure 1 shows the history of short video and live broadcast. As is known to all, 2016 is the first year of live broadcast, during which many live broadcast platforms were born, such as Panda, Inke, Betyu, etc. In 2017, the popularity of short videos was no less than that of live streaming. It may be thought that short videos became popular in 2017, but in fact, as early as 2015, short video apps such as Kuaishou, Miaopai and Meipai were born. At that time, I happened to be engaged in the work related to short video App in YY. After I came to Qiniu, I successively participated in the development of SDK for live broadcast and Lianyumai in the client team. Later, I began to focus on short video SDK, and was committed to making the best and most useful short video SDK.

In 2016, the number of mobile short video users in China was 150 million, and it is expected to reach 240 million this year, with a growth rate of 58.2%. It can be seen that the popularity of short video has been increasing. In recent years, the production mode of short video has been constantly evolving, from UGC to PGC and then to the latest MCN (multi-channel Network), and the production capacity and quality of content have been greatly improved.

Figure 2 shows the comprehensive application of short video in various industries.


Difficulties in developing short video App

After introducing the history and development trend of short video, the following focuses on the preparatory knowledge and difficulties needed for the development of short video:

1. Inherent threshold of audio and video field deep understanding of audio and video encoding format H.264 and AAC coding details; How to adjust the two audio to the same parameters when mixing, what algorithm to use to mix, etc.

2, graphic images, OpenGL processing camera preview data, image processing, audio and video codec need to understand RGB and YUV color space data format, and the way of conversion between them and so on; Some of these operations can be done with more efficient OpenGL, such as beauty filters, layer blending, zoom in/out, rotation, image cropping, and so on.

Be familiar with the camera, microphone, codec, multimedia processing and other apis of your platform, otherwise some of them will consume a lot of time.

4, advanced functions Video editing features and advanced functions, such as beauty, filter, MV special effects, multiple shooting, text effects, etc., each of the advanced functions put forward high requirements for various aspects of technology.

5, system version, model and other compatibility problems this is a commonplace problem, whether iOS or Android, models and system versions are more and more, will inevitably bring compatibility problems. For example, there will be a small number of Android models coding video cannot play on iOS, such compatibility issues need to be addressed.

6. Performance and resource Occupation optimization The computing resources of mobile applications are strictly restricted by the corresponding system. When performing complex calculations such as audio and video collection, rendering, coding and so on, it is necessary to ensure that the application has enough resources to run smoothly, which requires developers to have rich tuning capabilities.

Solve the above difficulties is the primary thing, but the development time is developers must consider the question and develop a good short video App, from familiar with audio and video field, to solve the problem of system compatibility, and then to write complex business logic, and the corresponding UI interface requires three to six months of time, It’s very time consuming and energy consuming. At the beginning, our team also stepped a lot of holes in the development of short video SDK, and it took nearly a month to really stabilize. After settling, now we can complete the docking of short video SDK for an App in about a week.


Short video SDK architecture design

Next, it will introduce some of the main things our team did in the practice of short video SDK, among which the most important is the architecture design of short video SDK, including architecture design concept, architecture diagram, overall data process, module architecture design, etc.

1. Design concept of SDK architecture

Speaking of the design concept of SDK, naming norms must be mentioned. Just like qiniu’s enterprise concept of “simple and reliable”, our naming norms are unified, simple and refined. For example, we prefixed the external core class with PLShortVideo, as shown in Figure 3, respectively, naming modules of recording, editing and editing. Parameter configuration classes are named according to PLxxxSetting (Figure 4). The interface callback class is named PLxxxListener as the standard.

The second point we follow is a high modular, pluggable module of a concept; High modularity must ensure that each class and method “lives up to its name” and “does its job” in order to write clearer logic; High modularity can also promote high reuse and reduce duplicate code; Figure 5 shows the core transcoding class in SDK, because editing and editing need a process of decoding and recoding at the end of saving, here, the core transcoding class can reach a high reuse.

Figure 6 shows the package body division of short video SDK. From the table, we can clearly see the function division of each package body, and different functions are placed in different package bodies. We did not use ffMPEG soft solution and software software, but try to use Android and iOS system API for hard solution, which not only reduces the package size, but also much faster, although it will increase a lot of difficulty in the technical level, will step a lot of holes, but we still adhere to this solution. When introducing third-party libraries, we will also go through sufficient configuration and tailoring to strictly control the size of the package body, so that the total sum of the package body can be “small but fine” (1.5m). The last built-in filter module in the table, in which filter resources can be copied selectively, the SDK will automatically determine. Here are some ideas about module design.

The third point is to decouple from UI. As shown in Figure 7, from screenshots of different apps, it can be seen that each App has its own design. As a short video SDK, it is absolutely not allowed to restrict the user’s play in UI. Some short video SDKS on the market write UI dead as part of the SDK, which is very unfriendly to customers in the design of THE UI interface. We adopted another method, SDK and UI are decoupled, the customer’S UI is customizable, and there is only one place to accept view in the whole SDK:

PLShortVideoRecorder: prepare (GLSurfaceView preview,…).

Then there is the extensibility, we follow the concept of high extensibility, openness. In the recording and editing process, there will be data callback and support third-party library for beauty, filter, stickers, special effects and other functions.

Finally, the design of configurable parameters, in addition to the general parameters, such as camera resolution and frame rate, microphone sampling rate and other parameters can be configured, including beauty and other parameters can also be configured.

2. Short video SDK architecture

Figure 8 shows the architecture diagram of the Android short video SDK, which can be divided into four layers. The first layer is the application layer (sdK-based applications). The second layer is the external interface layer of SDK (prefixed with PLShortVideo). The third layer is the core layer, mainly some internal modules (including Java and Native two); The fourth layer is mainly the Android system layer.

Figure 9 shows the overall data flow chart; The input module supports two ways to collect data. One is to collect data through camera and microphone, and the collected data can be processed (beauty, face recognition, etc.); the other is to import and decode files. Editing module has a very rich function such as adding subtitles, MV effects, adding background music and so on; The coding module mainly supports H.264 software/hard coding and ACC software/hard coding; The encoded data is sent into MP4 packets and then into the output module, which can be stored locally or uploaded using HTTP. The following will focus on a few modules.

Figure 10 is a schematic diagram of the recording module. The key point of the recording module is the acquisition of frame data. In addition to the video frame can be obtained by camera, video frame can also be obtained by screen recording, and audio frame data is mainly obtained by microphone. The dashed part of the Filter module mainly implements the built-in beauty/Filter function. In addition, because of the texture and YUV data CallBack mechanism, it also supports the third-party library beauty, Filter, special effects and other functions. The processed data will go through OpenGL for pruning, scaling, rotation and other operations. Although these work can be done by CPU, it will be time-consuming, and GPU is a more sensible choice. Finally, after the texture is obtained, it will be divided into two paths, one for rendering and the other for coding and encapsulation. These two threads share the same texture, which greatly reduces the resource occupation and improves the working efficiency of SDK.

Figure 11 shows a schematic diagram of the editing module. First, you need to import a video file (shot using the short video SDK or imported from outside). After unpacking, you will get the corresponding frame data. Then, you will get the PCM and texture through the audio and video decoder respectively, and then send them to the editing engine. In this can be carried out a variety of processing (watermark, text effects, background music, multi-audio mixing, etc.) data after editing, and recorded will be divided into two ways, one way to play rendering, the other way will be transcoded save.

Figure 12 shows the realization idea of MV special effects. The data collected by the camera does not need to be decoded, while the frame data of the MV video file needs to be decoded before processing. The main function of the SurfaceTexture is to call back the decoded frame to notify you that you can update the texture in the OpenGL thread. This notification can be done simultaneously by multiple threads, so it is important to lock the frame callback to prevent out-of-sync MMV frames. After the update, the corresponding texture is obtained, which can be mixed to get the final MV effects.

Figure 13 shows the module diagram of the log system. The log system is mainly for the convenience of removing obstacles, locating and debugging problems quickly. We will output SDK version, device model, system version, and key configuration one by one, so that users can remove obstacles according to these information.


The pit of tread

Of course, the development process can’t be smooth, there are always a few holes to step in to make the overall SDK better. Here’s a list of some of the pits we’ve stepped in and how we went about finding them.

Some of the video clips appear splashy

After analyzing some sample videos provided by customers, we found that the High profile videos with two-way reference B frame were all in problem. As shown in Figure 14, B frame (3) was located in the middle, and P frames (2, 4) on the left and right sides were referenced in such order when displayed. But in the frame storage and video decoding, B frame (3) is behind the two P frames.

The PTS of the next frame must be greater than or equal to the PTS of the previous frame.

I +1 frame PTS >= I frame PTS

And MediaExtractor read the video frame is in accordance with the DTS order, that is, PTS will have back, so the problem is here, according to the ORDER of DTS to re-packet, will inevitably lead to the rear B frame is lost, which leads to the problem of flower screen. Currently MediaMuxer only supports b-frame encapsulation on Android 7.0+, so unless your APP is minimally compatible with 7.0, you are advised to use FFMpeg for encapsulation.

“Fake it until you make it” is my last word of encouragement. For our client team, we have made a lot of attempts to polish the SDK to the most perfect state, and we have gone through a lot of blood and tears before we finally achieved today’s results. We still have a lot to work on, and we will continue to make persistent efforts. Thank you!

-END-