Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article is published by Tencent Video cloud terminal team in cloud + community column

Chang Qing graduated and joined Tencent in 2008. She has been engaged in client development and has participated in PC QQ, mobile QQ, QQ Internet of Things and other product projects. Currently, she is in charge of the optimization and implementation of audio and video terminal solutions in Tencent Video Cloud team, helping customers with controllable r&d costs. To obtain the industry’s first-class audio and video solutions, our current product line includes: interactive live, on-demand, short video, real-time video call, image processing, AI, etc.

preface

Before AlphaGo became popular, Go was a pastime that few people played. You can count in your memory how many of your friends knew how to play it (we don’t count those who played two on a whim after AlphaGo became famous). By contrast, The popularity of Texas Hold ’em is so good that our team decided to abandon the traditional lottery at one of our annual meetings and rely instead on Texas Hold’ em to decide the prize list.

Why is the grand game of Go less popular than Texas Hold ’em? Are the rules too complicated?

The real reason is that the bar is too high. In Go, the difference between the winning rate and the level of the game is a crushing one. For two go players, if they are one level worse, the underdog has little chance of winning. This has turned go into a game played by a small group of highly intelligent players, who are just beginning to feel abusive. Texas Hold ’em is not, and whether you’re a pro or not, luck plays a big part, making it a game that everyone can play.

The same is true for video editing. Shortly after computers were able to handle multimedia work, a lot of video editing software appeared. For example, when I go home for the Spring Festival, I can see a copy of “Meeting Pictures from Entry to Mastery” on the shelf of my elders. This kind of professional software is used by older people who want to add special effects or make a clip to their photos and videos when they retire to do photography at home. I quickly lost interest after flipping through a few pages. There were too many procedural operations, which required me to spend a lot of time to study and practice, and I lacked motivation to learn this software.

Thus, the threshold of entry determines whether an activity or a product is generally accepted by the public or remains in a small professional circle.

Not so long ago, video editing was a niche activity, and even though many people were interested in creating their own videos, they fell asleep at the door of professional tools. Until the emergence of Kuaishou and Douyin, the threshold for entry was suddenly lowered. As long as you know how to use a mobile phone, you can quickly add some special effects to your videos and do some personalized editing, thus exploding the two phenomenal-level apps “North Kuaishou, South Douyin”.

But WHAT I want to tell you is that the transition from professional software in the form of audio and video to kuaishou Tiktok for the masses is not a simple adjustment at the level of interaction, but a complex story. Today, I’ll take you to look at the technical story behind it.

Principles of editing

The following diagram briefly shows the general process of video editing, which is similar in all video editing software:

The first detail to understand is that movies on mobile phones (such as MP4 or MOV) cannot be directly used for special effects editing because the data is compressed. It is very difficult to edit these encoded movies directly, and we can only do a simple clipping and stitching without decoding the movie files.

For an example in real life, let’s say you’ve just moved the new home, with your lover to ikea choose furniture, ikea’s service is very humanization, they will be at your door will give you a card and a pencil, so that you may want to buy furniture or a record number of accessories, you can go to the warehouse after one pickup, or go to complete door-to-door delivery services. This is all very convenient, until you find yourself over budget and decide to trade in some of the more expensive models for cheaper ones. But then there is the difficulty, because you can’t do that only by numbering. You have to go back to the display area and sit by the number of the product, and then you can know if there is a cheaper product to choose from. Otherwise all you can do is cross some models off the list.

The same is true of video editing, which limits what we can do if a computer cannot interpret the images and sounds contained in a document without decoding the encoded film. This explains the complexity of the figure above:

To do special effects editing for audio and video, the original film (such as mp4 or MOV) must first be split into separate video and audio streams. Take video stream as an example, we need to decode the video stream frame by frame and frame by frame, so that the decoded content is a picture that the computer can display. On the basis of this picture, the computer can understand the color value and brightness of each pixel. Further, the computer can add subtitles, do dynamic effects, or overlay widgets on the screen. The same is true for sound. The encoded AAC file is only suitable for transmission and storage. The computer also needs to decode it into PCM waveform file first to know the pitch of each point and then edit it.

Video effects and sound effects add after all, is not over, because users really want is to edit the video after the movie, rather than those of the picture and sound, so the decoded images after the superposition of special effects and processing, and once again to encode video encoder, then with audio stream, combined into a new video films.

Preview function

The above process may seem complicated, but it is only the last step in video editing, the “final generation” of the editor. Before this step, someone must use the editor to decide what type of effects to add at what time.

However, no one can know what special effects to put in the movie at the right moment, users need to improvise and play according to the specific content of the movie, so we need to add preview function as P0 feature.

As you can see above, a simple preview is not technically difficult in principle, just play back the visuals and sounds that have been treated with special effects. The real difficulty here is to optimize the performance of the screen effects, while doing a good synchronization of sound and painting proofreading.

Because the preview of a movie is different from the preview of a single picture, the preview process must be a smooth experience, otherwise the user will have an obvious sense of lag when editing the movie, resulting in the user’s bone stuck in the throat, and it is difficult to have a good product experience.

Performance optimization and audio and picture synchronization are also easy to solve, the real difficulty is the frame by frame preview, that is, the user wants to see the effect of the frame, can immediately do so, that is, the title of the so-called “heart move”.

Here we are faced with a technical difficulty that is difficult to overcome:

Frame by frame to preview

Despite the rapid progress of mobile phone chips in the past two years, the limited power consumption determines that the performance of PCS can still beat mobile phones. But have you noticed that even on a PC as good as this, you can suffer from “drag lag”, which is when you drag your progress with the mouse.

This is because behind the simple video stutter, there is a huge performance trap, namely the decoding rules of the decoder, which I explain in the following figure:

Let’s say you drag the mouse from screen 3 to screen 13. On the face of it, all the computer needs to do is read screen 13 and display it, which is not that difficult.

But in fact, when the encoder processes the 13th picture, in order to reduce the memory space of this picture, only the difference between the 13th picture and the 12th picture will be preserved. This means that in order to see picture 13, the computer needs to decode picture 12. So, in order to see picture 12, the computer needs to decode picture 11…

Finally, the computer needs to decode the picture 7, to break the cycle above, because the picture is special, 7 encoder in dealing with a mental picture, did not see other pictures, so the content of the picture in the reduction, do not need to rely on the content of the other picture (it was also because of this, This image has more space on the hard drive than any other).

Of course, what is said here is the simplest case. Due to the progress of technology, encoders commonly used now have adopted the advanced encoding mode of Main Profile and High Profile, and the reference relationship between pictures is more complex, not only referring to the previous picture, but also to the subsequent picture. The complexity and computation are further improved.

This explains why drag itself has an indeterminate wait time. Sometimes you may get lucky and get to the seventh screen in the image above, and as you wish, the screen will immediately be cut to the seventh. Or maybe, if you drag to screen 14, the computer is out of luck and has to decode all seven images before it can cut to the one you want.

Video preprocessing

To solve this problem, there are only two ways we can go:

One way is to wait for Moore’s Law to progress to the point where you can process a dozen frames in a snap, but Andy Beale’s law also tells you that any improvement Moore’s Law makes will be eaten up in 4K and 8K in a flash.

Another way is engineering thinking to solve the problem, we approach is: if it is to edit an already generated MP4 or MOV movie, we will first preprocess the movie, that is, before the movie is sent to the editor, we need to go through a round of processing in advance, the video processing is easier to edit.

Of course, if you think it’s not worth the effort just to improve the performance of a frame-by-frame preview, you’re overthinking yourself. Because our engineering team will take full advantage of this new time, many time-consuming but necessary things are completed in this phase, such as the preview toolbar in the figure below, which is achieved by splicing screenshots at regular intervals during the preprocessing process.

Production of special effects

We all know that the principle of animation itself is the rapid change from one picture to another, and the slight differences between pictures are accumulated to create the illusion of picture movement. The faster the picture changes, the less likely it is to be noticed, because the flow of the picture is better.

We said in the short video editor that the various effects added follow these two principles:

  • The effects themselves change frame by frame, with each one slightly different from the next.
  • The effect changes are superimposed on the original screen changes to ensure the flow of the effects is natural and not rigid.

Take the third video effect we provide in the Demo, which is implemented like this:

(1) Take the first image of the original picture, do 1.1 times of the image magnification, then intercept the central part, and then remove 50% of the transparency, so that you can get a new translucent picture, this new translucent picture is used to form the virtual shadow effect. It will be superimposed on the original image, so that we have the first image after processing (with ghost effect).

(2) According to this step, we do the same for the second image, but with a slight difference this time, instead of 1.1x magnification, we change the magnification factor to 1.2x this time, so that the blurred part formed by 50% transparency of the superimposed image will be enlarged by one circle compared with the last one. These enlarged circles, when added together, create the illusion of the picture radiating outward.

(3) For the third picture, the operation is similar. This time, we change the magnification factor to 1.3 times, so the imaginary shadow part expands a circle again…

(4) By doing this step by step, the processed images will form a new set of continuous animations that already contain the desired video effects.

Compared with other parts, the technical difficulty of special effects itself is relatively simple, especially the realization of video special effects.

The real difficulty is the flexibility of customization. Due to the price of abstraction, we have not yet opened up the ability to create video effects for developers to design by themselves. Instead, we can only think of a more direct way to create one by one. Really want to achieve customers at will custom, still need a period of effort.

More videos can be found here

Question and answer

How to build small program audio and video?

reading

Teach you to build your own “micro vision” in 1 day

Teach you from 0 to 1 build small program audio and video

Teach you how to quickly build a press conference live program

Machine learning in action! Quick introduction to online advertising business and CTR knowledge

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!