AI in the field of video - bullets through people

Now, website B has become the largest video bullet screen website in China, and other video platforms, comics, reading and other content platforms have also added the function of bullet screen. Bullet screen has become an important means of content interaction, so it is very important to develop a set of bullet screen components with flexible access and rich gameplay.

The full text is 3979 words, and the expected reading time is 9 minutes.

Quote:

Now B station has become the largest video bullet screen website in China, and other video platforms, comics, reading and other content platforms have also added the function of bullet screen. Bullet screen has become an important means of content interaction.

The factory has many product lines of short video, long video, live broadcast and other media, and faces the same problem of bullet screen, such as large amount of bullet screen affects video experience, interaction (like), character bullet screen, color bullet screen and many other business requirements.

At present, the open source implementation of DanmakuFlame on the market is mainly based on DanmakuFlame. The code of DanmakuFlame library has been stopped maintenance, and the functions it supports are relatively simple, making access maintenance and expansion difficult. Based on the above reasons, it is very important to develop a set of bullet-screen components with flexible access and rich gameplay. BDDMBarrage is a barrage SDK developed to meet these requirements. The SDK supports customization of barrage styles, data source injection, barrage piercing and other functions.

This substance will tell barrage in solution, for the convenience of understanding, we will be starting from the macroscopic barrage of architecture, into the portrait separation technology, algorithm, server deployment, barrage mask, mask cache as well as problems encountered in development including the technology to solve the problem of the strategy, and the different aspects of performance optimization of end side, Finally, we will share some insights and plans for future technologies.

Take a look at the final implementation:

1. Barrage architecture diagram

Barrage data management module: its main function is to provide data for the barrage rendering module. First of all, it will find whether there is ammunition barrage data at this time point in the ammunition barrage cache module. If there is data, it will directly provide it to the ammunition barrage rendering module. If there is no data, the network request for barrage data will be triggered, and the barrage data will be provided to the barrage rendering module after it is obtained. Because bullet screen has a high requirement for timeliness, a prefetch strategy is designed in this module to ensure that when the rendering module of bullet screen obtains data, it can hit the cache module as much as possible and reduce the time delay caused by network request.
Bullet screen rendering module: the bullet screen time engine can conveniently and uniformly control the overall speed of the bullet screen, such as double speed, slow speed, pause, etc. Barrage scheduling module provides a set of rigorous track algorithm: reasonable speed can be set according to the length of barrage itself, and it can guarantee that any two barrage in the same orbit will not collide. This paper provides a customized solution for the style and interaction of the barrage. Access parties can design the UI of bullet screen arbitrarily according to their own APP form, and can also explore new ways to improve the interactive atmosphere of the whole APP. For example, Fanle APP designs the role bullet screen according to the story, operates the bullet screen, and can also add VIP bullet screen, etc.
Bullet screen penetrating module: this module uses the image segmentation technology of AI to generate a series of mask files, and then processes the mask to generate appropriate masks according to the video clipping method of the video player. Finally, the corresponding masks are rendered to the bullet screen View according to the timeline of the video. Specific schemes will be introduced later.

Two, bullets through people

After the bullet screen components were added into several product lines, we found that everyone set the number of bullet screen tracks to 3, for the specific reason of worrying that too many bullet screens would affect users’ video consumption experience. In this way, the experience of watching the video is guaranteed, but the atmosphere created by the bullet screen is greatly weakened. Therefore, the RD of our bullet screen component began to prepare a black technology — bullet screen piercing technology, hoping to solve the problem. Here’s a video to show you where we are so far.

Bullet barrage penetrating structure diagram:

The whole process is divided into algorithm side, server side and client side from bottom to top:

Firstly, the algorithm extracts video frames according to the frequency of 32 frames per second, and performs face recognition for each frame. With face tracking and smoothing processing, the face metadata of each frame is generated. Second, the server will be multiple frames of face metadata similarity filter weight, and then according to every 3 minutes a meta packet. In the client SDK side, the compression package of the corresponding time period on the service side will be pre-pulled according to the playing progress, and the corresponding frame will be played to make a mixed rendering of the bullet screen attempt and face metadata.

The following highlights what each module or submodule does

1. Algorithm

1) Video frame extraction module: the video stream is extracted at a frequency of 32 frames per second (configurable). The higher the frame extraction frequency is, the smoother the mask will be, and the picture quality of the mask will be more delicate. However, the later face recognition algorithm will also take more time, and the performance loss of the phone will also increase, such as memory power consumption.

2) Model training module: provide multiple images of characters in the play from multiple angles, train for the model training module, generate the corresponding face library, and then cooperate with the trained star library, these two libraries can greatly improve the accuracy of face detection;

3) Face detection: identify the face in each frame of the image, and give the face contour data;

4) Face similarity: in order to reduce the pressure of network data transmission, one frame or several frames will be discarded for the two frames with similarity greater than 95%.

2. The server

1) Video frame extraction metadata management: manage the frame data provided by the algorithm side, subcontract a large amount of video metadata in terms of video dimension and time period within video, establish mapping index, and provide the metadata group of video in a certain time period to THE SDK

2) merge: algorithm side spit out each frame of metadata, but the client is concerned about the change process of a face, the server will merge the metadata, to restructure the adult face group data;

3) Barrage service: provide basic barrage data

3. Client SDK

1) Rendering module: There are two schemes for rendering module:

▌ The first is to draw directly in Canvas mixed mode setXfermode. This mode will selectively overlay the two layers on the canvas, so that on the head part of the layer, we choose to draw only the mask layer, but not the barrage layer to achieve the mask effect.

Second, use OpenGL to output the corresponding drawing color in the Fragment Shader according to the incoming mask diagram. The initial scheme used was OpenGL drawing. Through reading the source code, I found that the two implementation schemes are consistent in the underlying implementation. Canvas is also a drawable API provided by Surface.

2) face data cache: cache the index table of the whole video, according to the index table to locate the specific mask package, according to the current playing progress in the mask package to take the corresponding mask;

3) Barrage basic control API and configuration API.

Service deployment

**1: environment: ** Environment dependencies: FFmpeg, Python2.7, OpenCV, numpy

Face detection service 2QPS
Portrait segmentation service 10qps

2: offline data storage structure

Directory for storing files in offline processing and file suffixes: Directory name: {vid}_{media_id} Generate folders based on video VID and media_id, including sub-files:

Frame (extract frame file.jpg)
Humanseg (Base64 image information after portrait segmentation. Json)
Contour_png (contour.png generated during image processing)
Contour_svg (save as SVG image.svg)
Zip (final package file.zip)
Mapping (index file.json)
Log (Script log)

3: Frame extraction script:

Extract frame using Baidu internal portrait script:

Four, SDK built-in face model encountered problems

Factory also tried to use the end of the built-in face model scheme. Encountered the following problems:

1. A large amount of frame data will be generated when the video is played every 16s. The model recognition speed has a bottleneck in performance, and there will be frame loss, resulting in not enough exquisite mask effect. Especially the head edge processing is more serious.

2. In the case of end-to-end identification, the CPU consumption of the phone increases, that is, the power consumption will increase, which may also affect the player’s lag rate and the whole memory pressure is also very high.

5. Thorny issues

1: Mask file is too large

If a 2-minute video pulls the needle and splits the image according to 32 frames per second, 3840 mask files will be obtained. Currently, the mask file obtained from the image segmentation operator is a binary graph (PNG format), about more than 100 K, so a 2-minute video generates a total of 375M mask files. According to such specifications to design, bullet mask files may be larger than the video itself, take up more bandwidth, so it is certainly not landing. In addition, due to the large file of bullet screen mask, it will take a long time to download, which is bound to cause that the video has been playing for a long time, but the bullet screen mask is still being downloaded, and the user experience will be very bad.

There are two main approaches to this problem: first, turn binary graphs into SVG files, because SVG files are pure XML and are very compressible. Just record the human contour of the binary graph in an SVG file (that is, a collection of points). There is also the flexibility to adjust the granularity of recording human contours, which further adjusts the size of SVG files. We ended up condensing a binary graph from over 100 K to a few hundred bytes. Second, mask file set using segmented compression storage, so that you can achieve the effect of side seeding change download. Also, rendering can be done after the first download, improving the user experience; In addition, where the video is played, the bullet mask file is downloaded to where, which also saves bandwidth.

Svg compressed package format:

▎ Zip file naming rules:

{vid}_{interval} _{index}.zip

Example: 4752528107223374247 _10_0. Zip

▎ SVG file naming rules:

{index}. SVG Example: 0000001.svg

▎** Index file structure: **index.json

2: The memory consumption of the mobile phone is too large

Different mobile phones have limitations on memory, especially some low-end phones, and video apps consume more memory, so the memory consumption of bullet screen SDK directly determines its availability.

Since each mask file is about 100 to 200kb, our mask produces 32 frames per minute, which is still large even after merging. Use this to calculate the memory usage for 1 minute:

Memory Total = 32 * 100kb = 3.125MB

On the ios side, it is ok. On the Android side, 3MB memory is occupied per minute, which will produce poor performance experience even if memory reclamation is carried out. Ordinary memory reclamation will consume performance, leading to obvious app lag.

Solution:

Fixed local memory is allocated according to the video duration, and the local memory is recycled, which reduces the frequent reclamation of memory and limits the endless use of memory.

Vi. Future prospects

Face data takes the cost of production process, a single service to run a script 5 minutes video, it will take about 2 hours, but the data is complex, not only can produce face binary map production, also can produce the body of the other data, such as people’s trajectory, the next step we are ready to put a human face data and the script for the human body data, Portrait movement track is used as the basic script. Based on these basic scripts, many innovative cases can be made, such as the interaction of human bullet screen, the gameplay of bullet screen following, and the face-changing of different characters’ heads in the video.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

AI in the field of video – bullets through people

1. Barrage architecture diagram

Two, bullets through people

Service deployment

Four, SDK built-in face model encountered problems

5. Thorny issues

Vi. Future prospects

AI in the field of video – bullets through people

1. Barrage architecture diagram

Two, bullets through people

Service deployment

Four, SDK built-in face model encountered problems

5. Thorny issues

Vi. Future prospects

Related Posts

Fundamentals of machine learning – Perceptrons

Curve fitting based on matlab GUI least square method

AI is not magic. It’s math, statistics, data and programming