Abstract:

Recently, the League of Legends S8 global finals came to an end, Chinese team IG zero seal FNC won the championship. The match has attracted a great deal of attention from Chinese netizens, and also brought technical challenges to live streaming platforms. Huya live broadcast platform combined with ali Cloud edge node technology solution ensured the low delay, stable and smooth real-time interaction of 70 million online users on the day of the final, providing smooth experience for the audience.

So what are the features and technical challenges of livestreaming large events like S8? Why does business sink to the edge? How to choose between self-built edge nodes and cooperating with cloud service vendors? How does edge Node service (ENS) provide technical support? What are the targeted optimization programs provided? This article will tell you all about it.

Why sink your business to the edge?

Interactive live broadcast scenes usually include core business modules such as audio and video push and stream, transcoding, distribution and playback, as well as interactive business logic such as danscreen and gift appreciation. In terms of experience, there are requirements on clarity, playback fluency and playback delay. Based on the analysis of the above characteristics and combined with the experience needs of live broadcasting, the technical challenges faced by live broadcasting are mainly in the following aspects:

1. Performance guarantee when instantaneous flow increases

The live streaming and watching users of the live event are concentrated, and the time of the event is planned, so there will be a sudden load at the start time, which will put great pressure on the system with high concurrency. At the same time, the instantaneous hot issues in the athletic contest, there will be more rapid flow of business and pressure changes, such as a game in the case of an accident suddenly the tie is broken, will inspire the audience to send a large number of barrage, as with all audience room at the same time to send data, data volume has an explosive growth, instantaneous load cans be imagined.

To deal with instantaneous flow growth and high concurrency, there are usually fundamental and effective solutions. One is to reserve enough resource water level to ensure that the system processing capacity meets the demand when the flow peak, which requires the system to have good resource flexibility. Second, through the flow scheduling system, the load is distributed to different resources for parallel processing, and the single point of pressure is relieved.

2, second on/low delay/smoothness and other core experience guarantee

The core experience index of live broadcast directly determines the user’s playing and watching experience, and the most common index is playing delay and playing fluency. There are many ways to improve the experience index, such as scheduling and network link quality optimization, protocol optimization, P2P transmission technology, player optimization, etc.

Security experience index key, lies in content delivery speed and network transmission quality level, such as distribution of live link is dependent on the CDN system’s ability to ensure that network speed and quality of distribution, form of user access to the nearest effect at the same time, reduce the network delay, reduce across a network or a complex network link to access the instability. In fact, in addition to the distribution of live stream, more business modules can be placed on the edge to complete live scene, such as transcoding after stream pushing and distribution of bullet screen, etc., so as to make full use of the advantages of edge network and edge computing to achieve better experience.

3. Service stability

There is no doubt about the importance of service stability, especially in the live broadcast of large-scale events, once the stability of the problem, the impact will be very big. The stability of the service depends on the architecture and solution design. High availability design is carried out for system risk points to ensure that the single point of failure does not affect the whole link. The core module must have failover capability or degradation plan. Secondly, the stability should focus on monitoring, operation and maintenance, to ensure that faults are found in time, and have efficient ability to locate and rectify problems.

In fact, after facing all the above business challenges, more and more live broadcasting platforms have been sinking their business to the edge, making full use of the advantages of edge network and computing. First, it can reduce network delay and provide users with better viewing experience. Second, it relieves the pressure of central resources and single point resources, solves the instantaneous pressure, and ensures the smooth passing of business flood peak.

So what problems will we face if we build our own nodes?

As illustrated above, there are many pain points and challenges associated with building your own infrastructure:

1. Heavy assets and high cost

First of all, self-built infrastructure means that we have to do everything from business procurement, server procurement and other supply chain management to node construction, which will lead to excessive asset investment and high cost.

2. Poor elasticity

Secondly, when dealing with some unexpected business demands, the delivery cycle of new nodes is long and their flexibility is poor, and many resources will be idle after the temporary business peak.

3. Operation and maintenance is difficult

In addition, self-built infrastructure challenges operations, first of all need to edge node from the whole process of construction, delivered to the operation management, and secondly to manage physical server in edge node level, operating system level, and the operational software application layer surface, to have a set of tools to help the problems of remote view log and screen location problem, These put forward high requirements for the automation and white screen of operation and maintenance.

4. Safety and reliability test

The last challenge is security and reliability. First of all, the reliability of the edge DC infrastructure environment depends on the services of third-party operators, so it is necessary to deal with a variety of complex situations, including considering countermeasures for the cutover of operators’ networks. In view of all kinds of possible hardware and software failures in edge nodes, it is necessary to have timely detection and scheduling capabilities. These requirements will be directly reflected in the design and development of business architecture, and the challenges and costs are very large. Additional security should focus on such aspects as network traffic security, host security safety, development cost is very high, every level security schemes such as DDoS protection, when attacks on IP in a certain edge node occurs, may be the whole node network problems are unavailable, and want to achieve the desired protective effect, It may be necessary to deploy a software plus hardware system solution on each edge node.

The edge Node Service (ENS) launched by Ali Cloud is aimed at the aforementioned target scenarios to deal with the pain points and challenges encountered by customers in self-built edge facilities. ENS further expands the public cloud boundary of Aliyun to the edge, fully supports customers’ complex business architecture requirements of “center + edge” together with public cloud, and truly sinks the infrastructure capacity of cloud to the side of users. At present, this service has been launched on the official website, and has been well applied and verified in the S8 event escort on huya live platform.

Support of edge Node Service (ENS) for live broadcast business scenarios

Let’s return to the live broadcast business itself, which is a business scenario highly dependent on content distribution capability. The CDN system can support the nearby streaming pushed by anchors as well as the distribution and access of live streaming, ensuring low delay of the whole process and greatly reducing the cost of bandwidth distribution. ENS is based on the existing CDN nodes to form richer computing, storage, network, security and other capabilities, which can easily support the customer’s own CDN system and the live broadcast service module running in edge DC.

1. Better support the elastic demand for resources

As mentioned above, the most effective way to deal with sudden traffic and instantaneous traffic growth is to reserve enough resources, which is actually the demand of resource elasticity. At present, the business of “center-terminal” architecture can easily acquire this capability by using aliyun’s elastic computing service. Due to the distribution of live broadcast service based on a large number of CDN nodes, a large part of its resource elasticity demand is at the edge, which is actually a “center-edge-terminal” architecture.

ENS is designed for such architecture and scene requirements. Relying on THE complete coverage of ENS in various regions and operators across the country, as well as the rich computing and bandwidth resource capabilities of nodes, IT can support the resource elasticity demand of live broadcast of large-scale events or events. ENS Provides the application and image delivery capabilities. Resources can be created in about 1 minute, greatly improving the efficiency of resource expansion and capacity reduction.

2. Complete and open scene service capability

ENS encapsulates the complex infrastructure and network environment at the bottom of edge nodes and provides customers with standard computing, storage, network, and security capabilities. Customers do not need to care about the differences of the underlying facilities and environment, nor do they need to care about the underlying operation and maintenance issues.

In terms of these edge instances and computing power resources, any module suitable for placing on the edge in the live broadcast business can be sunk by specifying resource specifications and bandwidth consumption. ENS provides a variety of storage solutions and DDoS protection capabilities, which can fully support the basic capability requirements of live streaming and other scenarios.

3. Reliable and continuous service capability

ENS system is designed and developed based on the Apsara Edge architecture of Ali Cloud Flying 2.0, inheriting the technical precipitation of Ali Cloud flying system for many years, and combining with the global leading automatic operation and maintenance system of Ali Cloud, forming a reliable continuous service capability. ENS instances and computing power resources ensure high availability through the underlying automatic migration capability, and have complete coverage monitoring capability in node network to detect network jitter in real time. ENS supports customer-level resource isolation and avoids resource contention, which ensures the stable operation of the live edge business module.

4. Convenient and efficient operation and maintenance support

ENS has a complete and easy-to-use Web management console and OpenAPI, which supports remote online management of edge instances, real-time visual monitoring of various operating indicators such as CPU/ storage IO/ network traffic, and visual statistical analysis of data, greatly improving the ability and efficiency of monitoring operation and maintenance.

5, significantly reduce the cost of central bandwidth

ENS can save resource construction and development costs at the initial stage of services, and operation and maintenance and management costs at the later stage, as well as bandwidth costs from the center to the edge. According to statistics, the center bandwidth costs can be saved by 30% on average.

ENS optimization of business scenarios for live events

Ali Cloud ENS team has also made a lot of technical optimization for the business scene of regular activities/events/e-sports live broadcast. At the same time, under the demand big sporting events, the team will start the corresponding risk assessment and escort planning, at the beginning of the tournament began to closely track live platform business index, performance index and stability performance, on duty will escort in every game, timely discover and solve the problems in live.

1. Optimization of packet loss problem of bullet screen service in edge nodes

Barrage in live platform is a very common mode of interaction, from a technical level generally belong to the architecture of the multiplayer online chat rooms, each online user speech broadcast to all the other users online, so in some important match point, frequent sudden high instantaneous flow, and other regular business flow models are different. Sampling bandwidth curves of different time granularity are shown in the figure below.

Can see the sample particle size slightly larger bandwidth fluctuation more smooth, simply don’t see any problems, water level node bandwidth is sufficient, but the second level of instantaneous bandwidth are very volatile, and the peak bandwidth will hit is very high, very high because of the sudden instantaneous bandwidth, can play the full server nic, the superposition of multiple servers in single node formation pressure to exchange instant processing, Packet loss occurs.

ENS uses network adapter traffic shaping to avoid instantaneous data broadcast. After traffic shaping, the verification effect on the warp line is good, which can effectively avoid packet loss.

2. Optimized packet loss problems of some network transmission service instances

The effect of live broadcast is dependent on network transmission. Some network transmission service instances will find packet loss problem, and ENS will also carry out some optimization for this problem. Check whether the switch and instances are abnormal, whether the CPU load is high, and whether the packet loss severity is closely related to the bandwidth. Then, the multi-core load of the instance CPU is further analyzed to locate the source of the problem.

For example, when some core loads are high and the network adapter’s processing capability is insufficient, the RPS feature can be enabled for instances to ensure the normal delay and packet loss rate of instances. ENS instances can still ensure stable performance and service under 5 times of normal business pressure.

3. Customize monitoring

In view of the business characteristics of live broadcasting and the core concerns during large-scale activities such as events/events/e-sports, the Edge computing team of Ali Cloud conducted in-depth analysis based on past experience and communication with customers, and conducted visual monitoring of the data in this business scenario.

ENS supports customized development and monitoring and alarm capabilities that are closer to users’ business needs. In terms of network jitter and real-time service capabilities of nodes and instances, it ensures that problems are found and notified to customers in the first time, forms linkage with customers’ systems, and responds to and solves problems in the shortest time. In terms of escort for major events, an automatic shift and duty system will be developed according to the competition time to form a special green channel for response and processing to ensure the stability of services.

About ENS multi-storage solution and security capability

In addition to live scenes, ENS has a number of key capabilities for supporting other edge scenes.

1. Multiple storage solutions support diversified scenarios

Different service scenarios have different requirements for instance block storage, such as storage capacity, I/O performance, and storage reliability. In a stateful scenario, data may not be lost or recoverable. In a CDN scenario, massive storage space and high I/O performance are required. These requirements are difficult to support all kinds of requirements to the extreme with a unified set of storage solutions. ENS has designed and developed a variety of storage solutions such as cloud disk and local site, which can support various storage requirements. According to the online feedback, I/O performance, I/O throughput, storage capacity, storage high availability and other aspects have formed stable service capabilities.

2. Security capability of edge nodes

ENS provides network traffic security and supports DDoS detection and cleaning. When an attack against an IP address occurs on an edge node, the edge node can detect the attack in real time at the second level and automatically clean the traffic. In addition, the edge node can provide stable and continuous services with high availability and control the range of risks combined with the IP black hole capability. Customers can also use the cloud Shield and other security products of Aliyun to achieve better overall security protection effect.

In addition to live broadcasting, how do other businesses apply ENS?

Edge nodes can support two types of business scenarios: full network coverage and localized service.

First, in the scenario of wide coverage of the whole network, online businesses of the Internet industry are mainly used. There are generally no regional restrictions on the target range of services, and edge nodes should also achieve wide coverage of the whole network. For example, CDN business itself is a typical edge computing scenario, which can seamlessly run directly to ENS, thus bringing significant benefits and improvements to the overall cost, operation and maintenance capabilities and other aspects. Interactive live broadcasting and real-time audio and video communication are the demands of low delay and bandwidth cost saving. The purpose of detecting and monitoring service requirements is to match the real service and network environment at the edge closest to the user to detect the correctness of some service logic, service stability, and performance indicators of core services. In game acceleration and SD-WAN, the essence of such scenes is to make edge soft gateway or soft routing on edge nodes. Through the optimization of network protocols and network links, the goals of acceleration and security are accomplished. The role of edge nodes is similar to that of real-time communication.

The second category of typical scenarios is localization, focusing on ultra-low time extension and high bandwidth business scenarios within 10 kilometers, requiring a delay of less than 1ms to meet business requirements. Such scenes tend to be more traditional industries or offline businesses, with regional characteristics. For example, the cloud of video surveillance in the scene of city brain, the cloud of the application of automatic identification and sales of video AI and monitoring in some stores in the new retail scene, and the cloud of some IT facilities in local industries are gradually forming mature cases of Ali Cloud.

At present, ENS service has been officially launched on aliyun official website, supporting direct online application for opening, console creation of computing resources, business deployment and effect testing. In addition, you can directly use the console to manage, view consumption, and dynamically monitor services. Product consultation can be added to the nail group: 21740823

This month, ENS also launched a special discount of 30% for the whole year. Please click the link to learn more about the activity and purchase

As the largest cloud computing service provider in China, Ali Cloud has accumulated a whole set of practical experience and solutions of escort for live broadcast of large-scale events. It has more than 1500 CDN nodes worldwide and a large number of edge nodes connecting the last 10 kilometers of users. Combined with real-time media processing, intelligent audit, cloud guidance, cloud editing, security protection and other solutions of video cloud, it comprehensively improves the capacity and efficiency of live broadcasting platform. Nowadays, the deep integration of super-large-scale media processing platform with CDN and ENS can realize the coordination with rich computing capacity, low latency and low cost, so that video analysis and calculation can be directly completed on edge nodes, creating more value for customers.