Where is video innovation at the edge? What will the next generation video cloud platform look like? This LiveVideoStackCon 2021 Beijing station we invited Huawei cloud media services RTC senior product expert Lu Zhihang teacher, from Serverless as the bearer platform, to open and professional media processing capacity as the core value direction, Discuss the next generation competitiveness of video cloud and edge cloud platform.

Article | Lu Zhihang

Finishing | LiveVideoStack

Let me introduce myself briefly. My name is Lu Zhihang, from Huawei Cloud Media Service Product Department. I joined Huawei after graduation and have been in charge of video-related businesses, from IPTV, to OTT business of operators, to CDN of public cloud and live broadcasting business. In the past two years, I have been engaged in RTC-related work, and now I am the product manager of RTC.

The reason why I want to talk with you about Serverless system is closely related to the development process of the whole team. The cloud video team has been established at the beginning of the establishment of Huawei Cloud, and all of them are audio and video experts for business development. After several years of accumulation, we gradually provide live broadcasting, on-demand, AR, VR, free perspective and RTC related services. With the expansion of business, the architecture is also evolving. The deployment position is from offline to online, including the complete cloud native now. At present, we are exploring and constructing Serverless, so we share some Serverless related content. The sharing will be carried out from four aspects. Firstly, I will talk about Serverless’s past life and see what direction Serverless will develop in. Secondly, the pits encountered in the process of building Serverless are summarized for the key elements of Serverless platform. Then the Serverless platform architecture and other key features are summarized. Finally, make an overall summary.

1. Where is Serverless heading?

First, take a look at where Serverless is headed.

There are more and more discussions about Serverless in the industry. What is the definition of Serverless? According to CNCF, there are three key words: function, pay on demand and elastic scaling.

Review the evolution of architecture in the software industry. In our early days, virtual machines were used to deploy businesses. Developers needed to deploy their own operating systems, such as centos and Ubunutu, runtime, such as JAVA’s JRE, and a set of middleware, such as Kafka, a messaging middleware. Finally, based on these layers of capabilities, they wrote business codes, built application software, and completed the final requirements. However, due to heterogeneous hardware, operating systems, and versions, it is difficult to expand, migrate, and operate applications. Therefore, Docker and K8S emerge at the right moment. The emergence of container technology has completely changed the development mode of developers, and people begin to find that microservice framework with container makes their deployment easier. Since then, computing resources have become ubiquitous, and applications can run wherever containers can run.

But programmers in the “lazy” thing is endless, so we are not satisfied with the Serverless technology, developers even container, runtime middleware also do not need to take care of, all the basic capabilities required by the application are unified by the bottom, developers only need to pay attention to their own application logic.

Once a developer is no longer concerned with the base resources, his approach to development does not need to take them into account. So Serverless is the new development paradigm. From a resource perspective, Serverless also unwittingly forms a new computing paradigm.

Serverless as a new development paradigm, software architecture must also make corresponding changes with the emergence of this new paradigm. Traditional monomer is a completely closed architecture, coupled with business modules, affecting the whole. Microservices architecture uses standard interfaces to decouple business modules and uses container technology to make development more efficient, but developers still need to pay attention to where containers are deployed and Devops are tightly linked. Serverless architecture further allows developers to return to the essence of business and avoid non-business related operation and maintenance. Applications are built in a functional way, which makes the granularity of application decoupling smaller.

We can not only focus on the business in the development process, but also on the content below the iceberg, from application deployment, computing affinity… Heterogeneous parallelism to elastic scaling, all become non-business related operations, all handled by the Serverless platform. Developers only need to care about business operations.

While the benefits of Serverless are obvious, the road to productization of Serverless is also a long and continuous accumulation process. Based on container technology, public cloud vendors create a large pool of computing resources behind a container, and then provide the container with automatic scaling and on-demand billing capabilities for developers to use. While this saves a lot of trouble, it’s not entirely satisfying. We expect it to make it easier to use existing software capabilities to quickly build applications and address needs. Therefore, public cloud vendors open the service capabilities of existing cloud services by providing functional interfaces, making cloud services become one Backend service. Developers only need to invoke the function interfaces provided by Backend in self-written Functions to meet business requirements, namely Functions that are services.

To sum up, among the existing mature products of public cloud vendors, the value of Serverless is also developing in the direction of application.

We noticed that Serverless is not only for public cloud scenarios, but also for edge oriented scenarios and terminal scenarios. This is not the deployment location of Serverless. It can be seen that Akamai has a good implementation in edge scenes. By opening the access logic and authentication logic of edge CDN, users can have better access. For example, on the terminal, Huawei Consumer Cloud exposes the business capabilities of Huawei mobile PHONE HMS Core on the server through Wisefunction. For example: At the beginning of the epidemic last year, the negative screen of Huawei mobile phone showed real-time epidemic situation in each province. We built it with Serverless technology, wrote javascript business logic and page layout in the function, and put it online in three days.

To sum up, Serverless has formed different product forms in cloud, side and end. These different product forms can make us have a basic judgment on the future development direction of Serverless.

Cloud-oriented, mainly focused on the function interface, event trigger to connect the various cloud services, for the construction of a complete application solution to create value.

Facing the edge and combining with the audio and video industry, we believe that the key is to open up the full coverage of media business capabilities focusing on cloud and edge integration, and to provide professional and customizable video cloud services.

End oriented, mainly focuses on the front and back end integration open framework to build the front-end service rapid delivery capability.

2. What kind of Serverless platform does video cloud need

After watching the evolution of Serverless platform, it is necessary to discuss what kind of Serverless platform is needed for audio and video services to better develop audio and video services.

To answer the question of what the audio and video business needs for a Serverless platform, we focus on the main core stages of video: production, processing and distribution.

If our video business is built on the Serverless platform, we want that platform to have sufficient computing resources and perform good management logic to facilitate the production and processing of video content.

At the same time, we hope that this platform can manage in a unified transmission network, effectively transmit RTC, live broadcast, vod and other services, and make dynamic adjustment according to our SLA of different video services.

Finally, we hope that this platform can be realized based on a global network, effectively shield the confusion caused by regional and operator differences for video developers, and manage enough edge nodes to make videos more appropriate for users. To meet these characteristics of the video cloud, it can better promote the development of video business.

In addition to external application resources, we also have corresponding demands for internal software resources, which mainly solve two problems: first, operation and maintenance cost; Second, waste of resources. In terms of o&M cost, it is hoped that the o&M of cluster, middleware and infrastructure can be managed by a unified platform on Serverless platform, and only application code o&M can be opened externally. Resource waste, because of the need to manage the cloud, side to side so vast amounts of resources needs corresponding bandwidth resources, according to different business, for example, day and night on video business peak peel and for computing resources in the region between the high traffic node and high computing resources to achieve the overall distribution, flexible business calls, which effectively reduce the TCO value of the machine room.

With different requirements, back to the audio and video business itself, is the most colorful and varied in all business, inundated with developers for a variety of demand, so I hope video atom ability especially have foreign service cloud service ability of atoms can be abstract, the function of the abstract out enough small atomic particles, It can provide developers with a quick arrangement and use method. For example, for the common RTC sideload live broadcast, that is, the scene where anchors connect MCPK, it is hoped that the flow processing triage logic of RTC, the RTMP sideload logic and the CDN play authentication logic can be abstracted into the corresponding atomic ability, so as to quickly and flexibly form the desired sideload live scene. Similarly, language, subtitles and minutes of meetings can be refined through atomic ability to form openness.

Therefore, we believe that the Serverless platform driving architecture of video cloud is more fully decoupled, enabling the development team to adapt to the required platform more quickly. Let’s take the full life cycle of a video, where we expect the environment to be atomically decoupled. For example, audio and video collection sources can have multiple perspectives, self-collection ability. The collected data can be pre-processed such as gesture detection, ROI detection, subtitle detection and background replacement. On the local side, audio and video data can be previewed and rendered locally. The processed data can be encoded by different encoding methods, such as H264, H265, VP8, etc. Then distribute and transmit data through different lines, such as PUBLIC Overlay RTN and private line, to the consumer end. The consumer side chooses appropriate decoding methods such as hard solution, soft solution, etc., and then the final post-processing and rendering. Post-processing, you can also take appropriate frames to increase fluency, blur to enhance the quality of the picture.

We expect every part of the video to build atomic capabilities on the Serverless platform. Then through the combination of atomic ability arrangement, to liberate productivity.

3. Huawei Cloud – Cloud native media network Serverless exploration and construction

This diagram shows the service architecture of Huawei Cloud Video based on Serverless platform.

The first is the lowest terminal, including RTC terminal, Live broadcast terminal, industry video VIS terminal.

Secondly, the platform layer is divided into two deployment modes: large cloud Region and edge node. Region and edge nodes have the same functions, including application gateways, sandboxes, containers, and scheduling control. Applications can deploy functions directly inside the sandbox or by deploying container images as functions. The functions of starting, stopping, preheating, expanding and shrinking are all managed by the scheduling control module in the node. Each edge node is also configured with an application gateway so that the node can provide the necessary API open capability externally. Data coordination, resource coordination, and function deployment between large cloud regions and edge nodes are centrally managed by distributed Serverless OS. Huawei named the kernel of this distributed Serverless OS as Yuanrong OS, which means commander in chief, and expects it to be able to completely control the entire cloud edge integrated media network. Distributed ServerlessOS provides users with functions, workflows, and unified IDE tools for writing functions, and orchestrating functions. The operation is not perceptive, and it contains global scheduling (centralized resource scheduling, node scaling management), distributed database, global access and other capabilities.

Finally, the media function layer. We deeply combine the core business components of cloud video, SUCH as SFU of RTC, LAS of live broadcasting, MPC of transcoding and MBS of message, with Yuanrong platform in the form of container deployment as a function. Provide functional interface capabilities on top of these business component capabilities. Through the function interface, build a function ecosystem, both self-built and third-party functions can run on our ServerlessOS platform. For example, various third-party beauty algorithms can be turned into functions and landed on our MPC.

We summarize this architecture with four key words: Edge Native; Serverless; No Ops; ApI Adatper.

One of the key features of this edge-side cloud Serverless platform is that the core cloud service sinks to the edge to realize the cloud-side business collaboration. It can meet the requirements of data synchronization, task coordination, management coordination and security coordination on the cloud side. For example, the algorithm trained by ModelArts on HUAWEI cloud ARTIFICIAL intelligence platform EI can be directly pushed down to the edge node for application. We can also transfer the transcoding activities of some cloud regions with low real-time requirements to edge nodes for transcoding. Through cloud-side collaboration, the edge is truly an extension of the public cloud to expand the service scope and resource utilization of the public cloud.

The second key feature is cloud – side mixed scheduling and joint elasticity of traffic and resources. In the architecture diagram, we mentioned the global scheduling capability of the Yuanrong kernel. This global scheduling is at the heart of our distributed ServerlessOS kernel. He predicts the future trend of media services by combining the traffic data of edge nodes with the characteristics of media streaming services. Feed back the data to yaoguang system, the resource scheduling platform, and do the prediction and allocation calculation of resource scheduling. Ultimately, the functions are deployed through Serverless’s base platform. Achieve precise coordination of resources and service SLA quality of edge nodes to reach a balance point.

The third key feature is integrated development experience and fast iterative video application. In fact, this feature is also the most intuitive feedback from Serverless to developers. Business functions written by developers in an IDE environment run directly to Region and edge. As long as functions are atomic, they can be run in Region or edge at will. At the same time, in order to enable the function to start at the millisecond level when it is running, and to better adapt to resource changes caused by sudden and steep service growth, Huawei Cloud Serverless has three optimization points: 1. Optimize the function startup stage through scheduling, code caching, and preheating to make the function start faster. 2. Provide the communication framework between functions to make the function call more flexible. 3, function expansion and contraction capacity reference function type, historical flow and other dynamic execution.

The fourth key feature is media business function level flexible arrangement. One last feature, it is our the whole business capabilities of cloud video open out according to the function: such as media stream function set is dedicated to do media stream transceiver, intelligence analysis function set special do EI related processing, video editing function set is dedicated to do video editing, real-time processing function set specifically for real-time streaming enhancement processing, etc. We through reasonable function interface design, so that each function can be directly arranged and used. Through the open choreography ability to complete the video business process, to achieve the purpose of quickly meet the needs of customized business.

Give a few practical examples. Scenario Example 1: Terminal authentication scenario. Based on this architecture, we have adjusted the access authentication function. In our original access authentication, engineers coded customer access authentication rules into the authentication server. Terminals can access media servers such as SFU or LAS only after the authentication passes. Through Serverless platform, we write these authentication rules into functions and deploy them to edge nodes. After terminal access through the application gateway of edge nodes, the corresponding authentication function will be triggered to be executed. Once the function feedback passes the authentication, access to the corresponding media server will be allowed.

Scenario Example 2: Real-time audio captioning. This case is jointly developed by the RTC team and Huawei cloud conference team. Because the real-time requirements of subtitles are high and not all customers need this function, the current requirements of subtitles are not standard, and the peaks and troughs of this service feature are particularly obvious. With the Serverless technology, resources can be adjusted quickly and flexibly. We separated the audio from SFU through the shunt function, sent the audio data to the decoding, and then sent it to ASR for voice and text analysis, and finally sent the text information to the end-to-end SDK through MBS. Thus, the delay of original audio stream is not increased, and the demand of subtitles is satisfied.

Scenario Example 3: Edge real-time rendering capability. Today’s phones come in a wide variety of sizes and capabilities. Various manufacturers have tried their best to adapt to various models. In order to meet the consistency of rendering and shield the differences of models, the idea of rendering beauty in the cloud came into being. We did a case of real-time RTC edge rendering on edge Serverless platform. After the corresponding function capability is deployed at the edge, the end-to-end signaling is used as the trigger condition, and the whole real-time rendering process is completed after the streaming from SFU to SDK through decoding, beauty and coding. Here we also rely on yuan Rong to do a lot of depth of the union. For example, yuanrong is required to open the GPU card capability of the host computer to our function to complete GPU pass-through. After pass-through, the GPU card can quickly switch between different containers, which is the biggest problem yuanrong has solved for us.

During the test, corresponding stopwatch calculation was also used to make corresponding calculation. Basically, the end-to-end delay was guaranteed to be within 150ms. In several groups of measurement data, the data were more obvious after the stop.

4. To summarize

The architecture of cloud video is constantly evolving with the evolution of cloud native technology, and it will be evolving now and in the future. We have a few ideas about the future of cloud video to share with you:

First, the value of cloud video: it should not only support the rapid innovation of video solution business, but also drive the large-scale collaboration and concurrency of surrounding services.

Ii. Cloud video products: Build a media network programmable platform, through which advanced live broadcasting, RTC business processing capabilities, real-time video AI capabilities, programmable content production and media processing pipeline capabilities are provided. Through these capabilities to drive business innovation, to achieve new value.

Iii. Core technologies of cloud video: edge Serverless, high-density OS, and global scheduling. Use core technologies to drive performance improvements and maximize business capabilities.

That’s all I have to share, thank you.


The lecturer to recruit

LiveVideoStackCon 2022 Shanghai station, is open to the public recruitment of lecturers, whether you are in the size of the company, title level, old bird or rookie, as long as your content is helpful to technical people, the other is secondary. You are welcome to submit your personal information and topic description via [email protected], and we will give feedback within 24 hours.