Brief introduction: 5G era massive view computing scene, Ali cloud edge computing nodes focus on the video cloud and processing direction, Ali cloud senior technical experts for you to interpret the technology and architecture capabilities behind massive view computing.

Author: Hu Fan

Data carrier, computing power distribution is fundamental changes

Video and picture have become the main carrier of data content and the main way of information transmission because of their strong information carrying capacity. The characteristics of large bandwidth, low delay and wide connection of 5G activate cloud video surveillance, cloud games, Internet of Things and other scene applications. The extension from consumer Internet to industrial Internet further promotes the outbreak of terminal applications and view data.

These terminals and data are characterized by dispersive, massive and relatively low value density. Taking cameras as an example, IHS research points out that at present, there are 1 billion surveillance cameras watching the world, that is, continuously generating video and picture data. The amount of data is ZB level, but most of the data is of low value. More importantly, we need to retain the fragments that generate the events of interest and their structured information. Such scenarios and requirements bring serious challenges and fundamental changes to the way we compute and store them.

The new edge-based distributed system architecture for data access, computing, and caching effectively addresses this problem by ensuring that traffic and computing converge locally, significantly reducing network transmission costs, improving computing efficiency, and meeting the scenario-based requirements of 5G’s low-latency processing.

Technical challenges of building business systems based on edges

The characteristics of massive, distributed and heterogeneous edge node resources will bring great challenges to the business, and corresponding processing should be carried out in the aspects of service adaptation, elasticity and high availability, which has a sense for the business system, or even damage if it is not handled properly.

To build a business system based on edges, technical challenges mainly come from the following aspects:

1. The nodes at the edges are scattered and multi-level, with many nodes but small volume, which requires complex management. During interactive access, specific locations are concerned and multiple entrances are included, such as the location of calculation and storage

2. The heterogeneity of resources leads to the need for the business to select resources, and the distribution of resource types on each node may also be unbalanced, such as different computing resources such as CPU, GPU, ARM array, etc.

3. The elasticity of a single node is weak, but the elasticity of the whole node is strong. Deployment location and business adaptation must be considered for scaling

4. Single node cutover and complex network environment between edge and edge and cloud and edge may lead to service jitter or even unavailability of single point, which requires the business system to take into account such issues as service drift. When the task is stateful, the situation will be more complicated.

How can we address these challenges and experience simplicity and consistency when using massive distributed nodes and central clouds? It is best to have only one interaction surface

View Computing – Location-free computing, caching, and connecting platforms

View computing solves this problem well by providing a location-sensitive computing, caching platform based on the extensive ENS infrastructure, as well as a platform to connect to the cloud on the view (terminal) in order to make the view data cloud better.

As shown in the figure above, in the infrastructure layer, resources are managed, virtualized, and sliced to form a unified pool of resources and provide security and isolation capabilities. View PaaS platform with unified network, computing, storage scheduling, shield the heterogeneous resources, as well as the physical location of resources, according to business characteristics, terminal location and resource state, edge resources and terminal matching and cooperative, low latency and high in security business response at the same time are available, and realize the business of computing, storage, and the connection position non-inductive;

For example, in cloud scenes on cameras such as security, education and training, transportation and logistics, device access and streaming media access and processing will comprehensively consider node states such as available computing power, network bandwidth and storage capacity, and select the nearest node that is most suitable, and the location of the node is closer to the content production end (camera). However, cloud games and other scenes require specific rendering computing resources (such as ARM board cards), and should be closer to the content consumption side (mobile phone side). When multi-player live watching is needed, it can be pushed to the CDN network for distribution and remote viewing.

View computing cloud-edge-end collaboration architecture

The core of the view computing platform is the collaborative architecture of the cloud edge:

1. The terminal device is responsible for the collection and aggregation of views and other data, as well as the decoding and display of views (namely the thin terminal). At the same time, it can carry out the input and control of instructions, or perform simple calculations according to the configuration and rules issued on the cloud.

2. View computing is based on the scattered edge nodes to build a low delay device access gateway, to achieve a variety of terminal cloud connection protocols (such as GB28181/RTMP, etc.), and at the same time can receive real-time streams and video and picture files fast upload. Computing processing and periodic storage are carried out in nodes or adjacent nodes. The calculation results (such as structured AI analysis data) and the data that need to be persisted for long-term storage are realized through the safe and accelerated channel between edges and clouds to quickly return to the cloud.

3. The view manages nodes and devices in the central cloud, as well as unified scheduling, Meta aggregation, etc. Terminal equipment in the cloud will be mapped to a virtual device, similar to the projection of the physical world (that is, the shadow device), equipment operation and management, configuration, and in the shadow, by signaling channel issued, execution and feedback quickly, power when a physical device or exception after offline, context can save, after launch in time synchronization.

What we can see through view computing is a cloud and an interaction surface, instead of a distributed cloud with N scattered entrances. The collaborative architecture at the cloud edge can find the best balance in terms of cost, delay and reliability. For example, in terms of cost, network bandwidth, computing and storage costs need to be comprehensively considered.

Location – insensitive multi-point collaborative computing

The View Calculation Service provides three location-sensitive computations for view data:

1. Basic calculation of view: including transcoding, recording, screenshots, etc. Through coding optimization, high compression ratio can be achieved, and storage space and transmission bandwidth can be saved by 20~40% with the same picture quality

2. View AI Computing: Relying on the accumulation of algorithms in computer vision by Damo Institute, view computing provides various scenario-based view structural analysis, target detection and tracking AI capabilities

3. Custom computing: self-help uploading and hosting operators, reduce algorithm access costs, and facilitate users and algorithm providers to integrate algorithms into view computing services. In addition to operators and parameters can be customized, computing patterns can also be customized according to their own business needs.

The biggest characteristic of these computations is “computing with the movement of the network” : the computation is carried out with the flow of data on the network, and the full amount of data is avoided to be transmitted back to the central cloud processing, so as to realize the sinking of computing power and the floating of terminal computation.

This network is AliCloud’s edge collaboration network, which realizes the integration of terminal-edge, edge-edge and edge-center collaboration, shielding complex network environment for upper applications, and injecting computability and caching capabilities while providing high-quality end-to-end access and data transmission capabilities.

In addition to common local computing scenes such as cloud on camera, scenes such as Internet live broadcast can also carry out edge transcoding and real-time AI analysis based on view computing to improve the overall user experience. For example, the live stream does not need to be transmitted back to the center and then distributed to the edge, instead, transcoding compression is carried out directly at the nearest node. For 80% of the cold stream (no one watching or very few people watching) can be directly convergent to the local, and for the hot transfer code after the nearest distribution, can also reduce the delay and lag, so that the client play more smoothly. In the whole process, the terminal only needs to access through the unified domain name, and the specific location of the calculation does not need to be perceived. The multi-point collaborative calculation with no sense of location can complete the data calculation just like using CDN acceleration.

Customizable scene calculation

In a large number of scenarios, you may already have your own operators or applications, or third-party algorithm vendors operators, View Computing provides an open and customizable framework for scene computing, you can host operators or applications on view computing, truly achieve to help users do their own calculations.

The whole computing platform is divided into three layers, from bottom to top, corresponding to computing environment, computing scheduling and computing services.

1. Computing environment, namely the production and control layer of computing resources, is responsible for the production of resources such as containers and VMs, file storage, release, installation, deployment and configuration of operating system software and operator applications, as well as log monitoring, etc. This layer also provides basic application isolation ability.

2. Computing scheduling to achieve the elastic expansion management of resources and multi-dimensional global load balancing. This layer makes global planning and coordination on the security isolation of low-level resources such as containers, so as to solve the problem of resource competition.

3. The computing services, hosting, assessment of the operator and the actual distribution calculation, image analysis was carried out on the computing tasks at the same time, the iterative precision and improve the consumption of computing resources assessment, such as live transcoding, in addition to the code format, resolution, frame rate, output parameters, such as the content of the input source will also to a certain extent affect the actual resource consumption, Each channel of transcoding fluctuates dynamically in terms of computing power consumption, which will lead to the accuracy of dispatching resource allocation. Therefore, dynamic analysis and calibration are required to achieve the consistency between dispatching allocation water level and actual resource water level.

The whole access process is very simple:

1. Upload and manage operators, configure calculation templates and parameters; The cloud evaluates compatibility and resource consumption.

2. Online application of computing power and other resources, such as the highest concurrent amount of different computing specifications, will be evaluated and confirmed by the cloud, and operators will be issued and deployed to each computing node.

3. For content access or user trigger, the cloud streams and calculates the data, and feeds the calculation results back to the users in real time.

Taking cloud games as an example, the streaming image that can load the game package and render the video stream is an operator or application. After the game manufacturer uploads the game package and configures the rendering specifications, the cloud will carry out corresponding adaptation, resource assessment and dynamic allocation.

Location-sensitive distributed storage

After complete the computing platform, data on cloud storage is we are going to solve the problem, due to the dispersion data source, and the value of different density and usage scenarios, such as live sports events such as the content of high value, need to record the playback persistent storage, and video surveillance camera scene flow relative value is low, only need to keep key events of video clips, Most data can only be cached for a few days or months.

How to solve the problems of access latency, availability, and cost of decentralized and tiered storage of data?

View computing OSS storage based on edge distributed file caching and central persistence provides a location-free distributed storage solution. Data sources around the world can access edge nodes through the nearest access of view computing. The cache location refers to the location of data access and computation to ensure overall affinity. Periodic data is cached to the edge, and long-stored high-value data and structured analysis data are returned to central storage.

At the same time, view computing solves the problems of large file uploading delay, slow speed and easy interruption across regions and operators through edge acceleration network, and realizes the transfer and acceleration back to the cloud.

Users still see and use a cloud, regardless of the specific storage location.

Distributed Cache Platform

Location-insensitive storage access is realized by distributed cache platform, which provides nearby access, large-capacity, cost-effective periodic file cache. In the cache cycle, high availability of services and high reliability of data are realized through multi-point cooperative storage scheduling and multi-node and multi-copy redundancy. 
 View Computing provides flexible cache access and scheduling policies (national, regional, carrier, custom node scope). At the same time, it is compatible with the use mode of central OSS (SDK/API). After downloading the OSS SDK, you only need to change the endpoint to access the domain name, and then you can switch to the distributed cache with almost zero development cost. Compared with the previous one, the concept of Region is removed, and the unified centralized domain name access and management mode is directly adopted. It is true that there is only one cloud and only one cloud.

How do you achieve this kind of positional insensitivity? The key point: 


1. Physical files are cached in edge nodes, and control data and file meta-information are collected in the center for centralized management and retrieval.

2. File writing and reading adopt 302 scheduling mode, write unified domain name, after storage scheduling, jump to the real physical location for reading and writing.

3. Real-time monitoring of node state and capacity, unwriteable at a single point, automatic migration to other nodes, non-inductive service drift and switch, quick replication and synchronization after single point recovery.

4. Provide multi-node and multi-copy redundant storage to realize the rapid transfer of traffic when a single point is not available, and load balancing can also be carried out when the traffic volume is large.

Views connect the platform to full-cycle PaaS services

To help view data get better in the cloud, view computing provides a platform to connect to the cloud on the terminal and a PaaS service covering the full life cycle of the view, including collection, computing, and content consumption. The ability to connect mainly lies in:

1. Access and control of devices

2. Access and management of view content

3. View processing and view storage are based on the previously introduced location non-inductive computing platform and cache platform respectively, which provide basic capabilities and complex processing capabilities for view transcoding, AI analysis, encryption and streaming rendering. View storage provides view access and retrieval capabilities, as well as lifecycle cleanup policies, and back to cloud storage and archiving policies.

Safe and easy to use view (terminal) One-click cloud Currently, there are three mainstream view cloud solutions on the terminal:

1. GB/T-28181, the national standard in the field of security, has problems such as complex access, low security and lack of functions, such as signalling plaintext transmission, and basically no authentication for data flow, which can only be simply identified based on SSRC, and cannot effectively avoid collision or forgery; There are adaption and transition problems in 2011 and 2016 versions of the national standard, and the overall cloud experience is poor.

2. Since ONVIF was proposed in 2008, it has been supported by a large number of equipment manufacturers worldwide, but its multicast discovery mechanism cannot be realized in the public cloud, and the cloud is not friendly. At the same time, its interaction is based on HTTP standard, and the SOAP protocol format is adopted to define signaling content, with large communication delay.

3. A large number of equipment manufacturers have introduced private protocols and standard upper cloud with various types, which are closed and black-box, and the upper cloud access cannot be reused, resulting in more repetitive construction.

View Computing’s one-click cloud solution provides open, easy to use, safe and flexible terminal one-click cloud capability. The main features are as follows:

Compatible with national standard /ONVIF, etc., adapt to all kinds of terminals, and solve the complex and security problems of national standard access, as well as the public cloud access problems of ONVIF.

The device access gateway built based on the wide coverage of the edge nodes of Aliyun can ensure the nearby access, reuse the CDN low delay transmission and acceleration network, as well as the characteristics of multi-protocol access, to ensure low delay device communication, signaling control and data flow access.

The core signaling channel enables transparent two-way communication, and vendors and developers can customize control signaling. 
 Ali cloud open device cloud protocol ODCAP

The core of the one-click Cloud solution of the view computing connection platform is built on the Cloud Protocol on Open devices of ODCAP (Open Device Cloud Access Protocol). We will fully Open the content of the Protocol to support the independent Access of diversified devices of any vendor.

The cloud subject on the terminal interconnects with each other through the network, and ODCAP protocol supports a variety of network interconnection structures:

1. The device is in the internal network, and can access the public network through the firewall NAT, and can also be transferred through the device gateway; 2. Direct connection of devices in public network environment, such as those with 4G/5G network capacity; 3.ODCAP also supports cascading mode, in which sub-devices can connect to the upper level devices through other protocols. Direct connected devices shield different access of sub-devices of the lower level, and access the cloud platform uniformly with ODCAP protocol.

ODCAP protocol supports a variety of types of devices, to achieve a variety of terminal cloud. Different devices have different functions. For unified description, we define the device by device model, including four levels:

Resources, all kinds of data generated by the device, such as real-time video streams, video picture files, structured data analyzed by the terminal AI, configuration information events of the device, event services triggered by the device, and functional services provided by the device

In the future, Ali Cloud View Computing will share more latest product capabilities, solutions and technical practices in the “Ali Cloud Edge Plus” public account. Welcome to discuss with us.

This article is the original content of Aliyun, shall not be reproduced without permission.