The realization principle and difficulty of live broadcast special effects

This article is from the RTC developer community, welcome to visit the RTC developer community, exchange with more developers real-time audio and video technology, interactive live development experience, participate in more technical activities.

In this era where appearance level is justice, beauty and special effects have become standard features of many live broadcast platforms, no matter through Web or mobile terminals. What’s more, there are already attempts to integrate AR into products, adding more experiences that can attract users. But implementing any of these features in a live stream places further demands on the developer’s tech stack. No matter in the Web end based on WebRTC video call or online education products, or Android, iOS live broadcast. This paper will briefly sort out the principle of special effects and the difficulties that need to pay attention to.

The realization principle of live broadcast special effects

The specific process of live broadcast includes: collection, pre-processing, coding, transmission, decoding, post-processing and playback. Normally, after the camera has captured the video image, we will start the special effects processing, that is, during the pre-processing process.

The process of realizing live effects is as follows:

Video collection: There are three video collection sources: camera collection, screen recording, and streaming from video files. Images captured by camera are common in live broadcast. For Android, SurfaceTexture is used to process the image stream, adding effects and filters to the captured image. The SurfaceTexture is a texture that can be thought of as a View’s middleware. The Camera takes the video capture and hands it to the SurfaceTexture, which beautens it, then hands it to the SurfaceView and renders it.
Pre-processing: processing the collected images: for example, denoising algorithms such as mean blur, Gaussian blur and median filter are used to “skin” the original video; Or use GPUImage library to add filters; Or use ARCore, ARKit and other tools to add real-time AR effects to the video.
After the image is processed, it is encoded according to the appropriate bit rate and format.
Finally, push the stream to the CDN.

To achieve beauty effect, both webrTC-based mobile terminal and Web terminal can be achieved through GPUImage. If it is based on the combination of WebRTC, React Native and GPUImage, it is ok. However, the source code of React-Native WebRTC needs to be modified.

Difficulties in development

We can find many examples of special effects, filters and even AR effects in live broadcast, and we have shared them based on ARCore and ARKit. However, there are many difficulties for developers to be aware of.

First, lack of scalability and flexibility

If developed through WebRTC, WebRTC provides a renderer based on the GLSurfaceView View component. Compared to SurfaceView, it has no animation or morphing effects because GLSurfaceView is part of a window. So, if you want to draw somewhere else, or get the video data, it’s a little bit more difficult.

Second, the need to modify the source code

It is impossible to obtain camera data through WebRTC’s Native API. If you want to make a beautiful face, you need to make a lot of changes. For example, modifying the react-native WebRTC source code mentioned above is only part of the work. In addition, you may need to adjust the WebRTC source code and not use it immediately, which requires developers to be familiar with WebRTC.

Third, performance and power consumption

Performance and power consumption issues are obvious on The Android platform. Typically, when processing images, we can choose to input YUV data, let the CPU do the image processing, and then hand it over to the software/hardware encoder for encoding. However, this will result in a high CPU usage, resulting in increased power consumption and affected App response speed. Therefore, we need to use GPU to complete graphics processing as much as possible and make more use of hardware performance.

The same goes for coding. The advantage of software coding is high flexibility, but the disadvantage is high power consumption, affect the performance. Hardware coding is a better choice with higher speed and lower power consumption. The problem with it, however, is that the optimizations and parameters you can make depend on the hardware vendor’s open interfaces. There are also issues with the compatibility of hardware coding on some Android phones.

4. Hardware compatibility

WebRTC and other self-developed solutions also need to consider the compatibility of hardware. IOS devices are relatively simple, but Android devices have compatibility problems due to different chips and system versions.

Agora SDK version 2.1: Live effects are more flexible

In contrast to this self-development, the Soundnet Agora SDK will be open to capture and render, giving developers more flexibility in handling video data. As shown in the green section below, the processing permissions are open to developers, bringing greater flexibility and scalability.

Capture: Agora SDK supports custom video source types. You can easily build camera video sources, screen sharing video sources, or file video sources by using our helper classes.

Add special effects: The Agora SDK’s new interface takes advantage of the Android component Surface Texture directly and passes it to the GPU. Finally, the Agora SDK hardware encoder encodes the video. Maximize the hardware performance on the whole link without memory copy, which can not only achieve better performance and power consumption, avoid affecting the response speed of App, but also need not worry about hardware codec problems.

Renderer: The Agora SDK opens up a video Renderer interface that allows users to render to Android standard SurfaceView, TextureView, or custom View components based on their existing business.

Open up the differences brought by new features

Before the 2.1 update:

In versions prior to 2.1, developers shared the Texture ID with the EGL Context of the Texture and the ID of the Texture using pushExternalVideoSource.

After Upgrade 2.1:

With version 2.1 of the custom video source, custom renderer two new features, can be more flexible to achieve the desired effect. With these features, developers can either share the texture ID with the original texture id, or use system components such as SurfaceTexture or Surface to pass the texture. For example, the TextureSource class encapsulates the SurfaceTexture object that developers can use to create an EglSurface, which can then be drawn directly onto the EglSurface.

These two functions provide us with more open imagination in terms of video and image rendering, and more scenes can be realized in live broadcast, such as AR scenes that we realized by combining ARCore and ARKit, and games like Douyin dancing machine can also be put into live broadcast.

In Agora SDK 2.1, we added several new interfaces for custom video sources and custom renderers. Click here for more details and interface invocation methods.

The realization principle and difficulty of live broadcast special effects

The realization principle of live broadcast special effects

Difficulties in development

Agora SDK version 2.1: Live effects are more flexible

Open up the differences brought by new features

Related Posts

June 06, 2017 First time to climb data

Redis series — Geospatial: Do you have Lao Wang next door?

F5: How to provide users with powerful offensive and defensive means while constructing compliance system?