Nowadays, the pursuit of video frame rate is getting higher and higher, because high frame rate video is more smooth and smooth, which can greatly improve people’s viewing experience.

Existing cameras have also been able to shoot video at Frames Per Second (25 FPS), 60 FPS, 240 FPS or more.

At a time when most films have a frame rate of 24, Ang Lee’s Gemini Man has revolutionized film technology at 120 frame rates

However, the high frame rate camera equipment, the memory demand is very large, and the cost is high, is not popular. In order to get high frame rate video without professional equipment, video frame insertion technology came into being.

Super SloMo, the AI “imagination” method proposed by Nvidia, stands out among many video frame insertion technologies. Even a video with a frame rate of only 30 frames can be supplemented into 60, 240 or even higher frames by Super SloMo.

Advantages and disadvantages of traditional frame insertion methods

To better understand Super SloMo, let’s take a look at existing, more traditional video framing techniques.

The frame of sampling

Frame sampling is to use key frames as compensation frames, its essence is to lengthen the display time of each key frame, equivalent to no frame insertion. There is no visual improvement other than a higher frame rate for file properties and a larger file size for the same video quality.

Advantages: Frame sampling consumes less resources and is fast.

Cons: May make the video not look smooth.

Mixed frame

Blending, as the name suggests, is to increase the transparency of the key frames before and after, and then mix them into a new frame to fill the gap.

Advantages: Calculate the required time.

Disadvantages: not very effective. Because the original key frame is simply changed into translucent shape, the contour of moving object will produce obvious blurred scene when the two frames overlap, and the visual effect smoothness of the video will be slightly improved.

Motion compensation

Motion Estimation and Motion Compensation (MEMC) is a method that searches for moving blocks based on the difference between two frames in both horizontal and vertical directions, and then calculates the intermediate frames by analyzing the Motion trend of image blocks

MEMC is mainly used in TV, monitor and mobile terminal to improve video frame rate and give the audience a smoother look and feel.

Advantages: reduce motion jitter, weaken the picture trailing and virtual shadow, improve the picture definition.

Disadvantages: Moving object background is more complex, there will be object edge movement bug.

Optical flow method

Optical flow method is an important direction in computer vision research, which can automatically generate new empty frames by inferring the trajectory of pixel movement according to the upper and lower frames. It is similar to motion fuzzy calculation method.

Advantages: more smooth screen, weak sense of lag.

Disadvantages: large amount of calculation, time-consuming; Sensitive to light, in the case of large light changes, easy to appear the mistake of picture disorder.

Super SloMo: AI frame insertion method, can be called a classic in the industry

At CVPR 2018, Nvidia presented a paper ** Super SloMo: In High Quality Estimation of Multiple Intermediate Frames for Video Interpolation **, Super SloMo is proposed, which has attracted wide attention in the industry.

Super Slow Motion: High Quality Estimation for Multiple Intermediate Frame Interpolation in video

View papers:Please click here

Super SloMo is different from traditional methods. It uses deep neural network to complement frames. The basic idea is to train a large number of ordinary videos and slow-motion videos, and then make the neural network learn to reason and generate high-quality Super slow-motion videos according to normal videos.

Super SloMo method block diagram, including optical flow calculation module (left) and time-specific flow interpolation module (right)

The method proposed by Super SloMo team relies on two fully convolutional neural networks u-Net for its entire framework.

First, a U-net is used to calculate the bidirectional optical flow between adjacent input images. Then, these optical flows are linearly fitted at each time step to approximate the bidirectional optical flow of the middle frame.

To solve the problem of moving boundary artifacts, another U-net is used to improve the approximate optical flow and predict the flexible visibility mapping. Finally, the two input images are twisted and linearly fused to form intermediate frames.

In addition, the optical flow calculation network and interpolation network parameters of Super SloMo are independent of the specific time step of the interpolated frame (the time step is used as the input of the network). Therefore, it can insert frames at any time step between two frames in parallel, which breaks through the limitations of many single-frame interpolation methods.

Original SloMo video (top), Super slow motion video after Super SloMo frame filling (bottom)

The authors say that using their non-optimized PyTorch code, seven intermediate frames of 1280*720 resolution were generated on a single NVIDIA GTX 1080Ti and Tesla V100 Gpus in 0.97 and 0.79 seconds respectively.

To train the network, the authors collected multiple 240-frame videos from YouTube and hand-held cameras. A total of 1,100 video segments were collected, consisting of 300,000 independent video frames with a resolution of 1080×720. These videos range from indoor to outdoor, from still camera to motion camera, from daily activities to professional sports, covering a wide variety of scenes.

Then the model is validated on other data sets, and the results show that the performance of the proposed method is significantly improved compared with the existing methods on these data sets.

Check out the official demo video below to see more of the effects: Tutorial Portal

Follow the tutorial to achieve Super SloMo with one click

While the authors of the Nvidia paper have yet to release the data set and code, a GitHub user named Avinashpaliwal has opened source his own PyTorch implementation of Super SloMo, and the results are pretty much the same as described in the paper.

The details of the project are as follows:

Super SloMo Super slow motion camera fill

Run environment: PyTorch 0.4.1

Language version: Python 3.6

Training visualization: TensorboardX

Training data set: Adobe 240 FPS

Project address: Please click here

Since model training and testing is done on PyTorch 0.4.1 and CUDA 9.2, it is essential to have both installed, as well as an NVIDIA graphics card.

In addition, the model cannot be trained directly with video, so ffMPEG needs to be installed to extract frames from the video. After all these preparations are in place, download adobe 240FPS data sets for training.

However, you can also do a quiet, one-click Super SloMo without any preparation.

We found the corresponding tutorial in OpenBayes, the domestic machine learning computing force container service platform. From data sets to code to computing power, it is complete, even small white, can easily start.

Tutorial portal link: Click here

Tutorial Usage Guide

First sign up and log in to OpenBayes, under The Public Resources menu under “Public Tutorials”, and select this tutorial – “PyTorch Implementation of Super-Slomo Super Slow Motion Shots”.

The sample demo file in the tutorial is super-slomo.ipynb. Running this file will install the environment and show you the Super slow motion effect from the final frame filling.

You can also use your own video footage by changing the lightning-Dic-clip.mp4 file in the generated code below to your video file name.

The “scale” property is used to control the multiple speed of the generated video, for example: set to 4, that is, 4 times slow playback.

Generated code:

! python3 'Super-SloMo/eval.py' 'lightning-dick-clip.mp4' --checkpoint='/openbayes/input/input0/SuperSloMo.ckpt' --output='output-tmp.mp4' --scale=4 print('Done')Copy the code

Convert video format code:

! ffmpeg -i output-tmp.mp4 -vcodec libx264 -acodec aac output.mp4Copy the code

In this tutorial, a video on the Internet is used for Super SloMo frame insertion, and the following results are obtained:

Slow down at 4x speed to see every move of the martial arts

At present, the platform can also collect wool, weekly vGPU free use time, everyone can easily complete, quickly start to try it!

Get your hands dirty and make amazing super slow motion

References:

You can view the tutorial

Check the paper

Project home page

Other reference