The author king | | Ao source Serverless public number

takeaway

USENIX Annual Technical Conference (ATC) is the top Conference in the field of computer systems, and has been included in the List of Class A international conferences recommended by China Computer Association (CCF). A total of 341 papers were submitted and 64 papers were accepted, with an acceptance rate of 18.8%.

Ali Cloud Serverless team was the first to propose the decentralized fast image distribution technology in the FaaS scenario, and the paper created by the team was accepted by USENIX ATC ’21. The following is the core content of the paper, focusing on shortening the end-to-end delay of the function cold start of Ali Cloud function computing product Custom Container Runtime.

USENIX ATC will be held from 7.14 to 7.16 in the online and paper information, see: www.usenix.org/conference/…

Abstract

Serverless Computing (FaaS) is a new cloud Computing paradigm. It allows customers to focus only on their own code business logic, and the virtualization, resource management, and elastic scaling of the underlying system are all maintained by cloud system service providers. Serverless Computing supports container ecology, unlocking a variety of business scenarios. However, due to complex container mirroring, large volume, high dynamic and unpredictable WORKLOAD of FaaS and other characteristics, many industry-leading products and technologies cannot be well applied to THE FaaS platform. So efficient container distribution technology is a challenge on the FaaS platform.

In this paper, we design and propose FaaSNet. FaaSNet is a lightweight system middleware with high scalability. It uses the image acceleration format for container distribution. The target scenario is the large-scale container image start (function cold start) under burst of traffic in FaaS. The core component of FaaSNet includes Function Tree (FT), which is a decentralized and self-balanced binary Tree topology. All nodes in the Tree topology are equivalent.

We integrate FaaSNet into Function computing products. Experimental results show that FaaSNet can provide 13.4 times faster container startup speed for FC than Function Compute (FC) under high concurrency requests. In addition, for the unstable time of end-to-end delay caused by burst request volume, FaaSNet can restore the end-to-end delay to the normal level with 75.2% less time than FC.

This paper

1. Background and challenges

FC in September 2020 to support custom container mirror (developer.aliyun.com/article/772… AWS Lambda announced Lambda Container Image support in December of that year, showing that FaaS is embracing the larger trend of container ecology. And function calculation in computing launched in February 2021, function image acceleration (developer.aliyun.com/article/781… The FaaS application scenario allows users to seamlessly migrate their container business logic to the functional computing platform, and can make the GB level image start in seconds.

When a large number of functions are cold started due to large-scale requests in the function computing background, even with the support of image acceleration, the bandwidth of Container Registry will be greatly pressured. Multiple machines will simultaneously pull image data from the same Container Registry. This causes a bandwidth bottleneck or traffic limiting for container mirroring services, making it take longer to pull and download image data (even in the image acceleration format). A more direct approach can improve the bandwidth capacity of the function to calculate the background Registry, but this approach does not solve the underlying problem and incurs additional system overhead.

1) Workload analysis

We first analyzed the online data of FC’s two major regions (Beijing and Shanghai) :

  • Figure (a) analyzes the pull image delay of THE FC system in the function cold start. It can be seen that in Beijing and Shanghai, the pull image delay of ~80% and ~90% is greater than 10 seconds respectively.
  • Figure (b) shows the proportion of pull image in the whole cold start. It can also be found that 80% of functions in Beijing and 90% of functions in Shanghai occupy more than 60% of the delay in the whole cold start.

Workload analysis shows that most of the cold start time of functions is spent on the acquisition of container image data, so optimizing this part of the delay can greatly improve the cold start performance of functions.

According to the historical record of the online operations, a big user representatives in the instantaneous concurrent pull 4000 function mirror, the mirror before the size of the decompression is 1.8 GB, after decompression for 3-4 GB size, arrive at the request of the large flow began to pull the container moment, received a container service flow control alarm, caused the part request delay is extended, In severe cases, you will receive a container startup failure message. These are the kinds of problem scenarios that we need to solve.

2) State-of-the art comparison

There are several related technologies available in academia and industry to speed up the distribution of images, such as:

DADI alibaba: www.usenix.org/conference/…

Dragonfly: github.com/dragonfly/d…

And Uber’s open source Kraken: github.com/uber/kraken…

  • DADI

DADI provides a very efficient image acceleration format that can be read on demand (FaaSNet also makes use of the container acceleration format). In terms of image distribution technology, DADI adopts a tree topology structure to carry out networking among nodes based on the granularity of image layer. Each layer corresponds to a tree topology, and each VM exists in multiple logical trees. P2P distribution of DADI relies on several root nodes with large performance specifications (CPU and bandwidth) to play the role of data return source and the role of the manager of the peer in the maintenance topology. The tree structure of DADI is static because container Provisioning typically doesn’t last long, so by default the root node of DADI dissolves the topology logic after 20 minutes and does not maintain it forever.

  • dragonfly

Dragonfly is also a P2P image and file distribution network, which consists of Supernode (Master node) and DFGET (Peer node). Like DADI, Dragonfly relies on several large supernodes to support the entire cluster, Dragonfly also manages and maintains a fully linked topology through the central Supernode node (multiple DFGET nodes contribute different pieces of the same file respectively to achieve the purpose of point-to-point transmission to the target node). Supernode performance will be a potential bottleneck of the entire cluster throughput performance.

  • Kraken

Kraken’s Origin and Tracker nodes manage the entire network as central nodes, and agents exist on each peer node. Kraken’s Traker node only manages peer connections in an organizational cluster. Kraken allows peer nodes to communicate data transfers themselves. However, Kraken is also a container image distribution network with layer as unit, and the networking logic will also become a relatively complex fully connected mode.

Through an explanation of the three industry-leading technologies, we can see several similarities:

  • Firstly, the image layer is used as the distribution unit of the three nodes, so the networking logic is too fine-grained, which may lead to multiple active data connections on each peer node.

  • Second, all depend on the management of the network logic of the central node and peer nodes within a cluster to coordinate, DADI and dragonflies central node will also be responsible for the data back to the source, the design requirements in production use, the need to deploy a number of large size machine to bear very high traffic, also need to be tuned to achieve the desired performance index.

With these preconditions in mind, let’s look at the design of an FC ECS architecture, where each machine has 2 CPU cores, 4GB of memory, and 1Gbps of Intranet bandwidth, and the life cycle of these machines is unreliable and can be reclaimed at any time.

This leads to three serious problems:

  • Insufficient Intranet bandwidth leads to bandwidth crowding in full connections, which deteriorates data transmission performance. The fully connected topology does not achieve function-aware, and it is easy to cause system security problems under FC. Because each machine executing function logic is not trusted by FC system components, there will be security risks that tenant A intercepts tenant B’s data.

  • The CPU and bandwidth specifications are limited. Due to the pay-per-use billing nature of the function, the machine life cycle in our cluster is unreliable and we cannot manage the entire cluster with several machines in the machine pool as central nodes. The system overhead of this part of the machine will become a major burden, and the reliability can not be guaranteed, the machine will lead to failure; What FC needs is to inherit the pay-as-you-go feature and provide instantaneous networking technology.

  • Multifunction problems. The above three do not have a function-awareness mechanism. For example, in DADI P2P, too many mirrors on a single node may become hot spots, resulting in performance degradation. The more serious problem is that multi-function pull is inherently unpredictable. When multi-function pull reaches full bandwidth, services downloaded from the remote end at the same time will also be affected, such as code packages and third-party dependent downloads, resulting in availability problems of the whole system.

With these issues in mind, we’ll explain the FaaSNet design in detail in the next section.

2. Design scheme – FaaSNet

According to the above three mature P2P schemes in the industry, function-level awareness is not achieved, and most of the topology logic in the cluster is fully connected network mode, and certain requirements on the performance of the machine are put forward. These pre-settings are not suitable for the system implementation of FC ECS. Therefore, we propose Function Tree (FT), a function-level logical Tree topology of function-aware.

1) FaaSNet architecture

The gray part in the figure is the part of our FaaSNet system transformation, and the other white modules continue the existing SYSTEM architecture of FC. It is worth noting that all Function trees of FaaSNet are managed on FC scheduler; On each VM, THERE is A VM Agent to cooperate with Scheduler to carry out gRPC communication and accept upstream and downstream messages. In addition, VM Agent is also responsible for obtaining and distributing image data upstream and downstream.

2) Decentralized function/mirror level self-balancing tree topology

In order to solve the above three problems, we first upgraded the topology structure to the function/mirror level, which can effectively reduce the number of network connections on each VM. In addition, we designed a tree-like topology based on AVL Tree. Next, we elaborate on our Function Tree design.

Function Tree

  • Decentralized self-balanced binary tree topology

The design of FT is inspired by AVL Tree algorithm. In FT, there is no concept of node weight at present, and all nodes are equivalent (including root node). When any node is added or deleted in the tree, the whole tree maintains a perfect balanced structure. The absolute value of the height difference between the left and right subtrees of any node is not more than 1. When a node to join or delete, FT will own adjusting the shape of tree structure (left/right) so as to achieve balance, dextral example is shown in the diagram below, node 6 will be recycled, its recovery led to node 1 as the parent node subtree is highly unbalanced, need for right handed operation has reached the equilibrium state and State2 represent rotation after the final state, Node 2 becomes the new root node. Note: All nodes represent ECS machines in FC.

In FT, all nodes are equivalent and their main responsibilities include: 1. Pull data from upstream nodes; 2. Distribute data to two downstream child nodes. Note that in FT, we do not specify the root node. The only difference between the root node and other nodes is that it is the source upstream and the root node is not responsible for any metadata management. The next section describes how we manage metadata.

  • Overlap of multiple FT on multiple peer nodes

A peer node is bound to have different functions under the same user, so a peer node is bound to be located in multiple FT. As shown in the figure above, there are three FT in the example belonging to FUNc 0-2. However, because FT is managed independently of each other, it can help each node find the correct upper node even if there are overlapped transmissions.

In addition, we will limit the maximum number of functions a machine can hold to achieve the function-awareness feature, which further solves the problem of multi-function pull-down data uncontrollable.

Discuss the correctness of the design

  • By integrating on FC, we can see that since all nodes in FT are equivalent, we do not need to depend on any central node;
  • The manager of topology logic does not exist in the cluster, but is maintained by the FC system component (Scheduler), which sends the memory state to each peer node with the operation request of creating a container through gRPC.
  • FT perfectly ADAPTS to the high dynamic nature of FaaS workload, and it will automatically update its form when nodes of any size join or leave the cluster.
  • The number of network connections on each peer node can be greatly reduced by using the coarse-grained function and binary tree data structure to realize FT.
  • Function isolation networking can naturally realize function-aware to improve system security and stability.

3. Performance evaluation

In the experiment, we selected the image of DAS application scenario of Aliyun database with Python as the base image. Before decompression, the size of the container image was 700MB+ and it had 29 layers. We select the stress test part for interpretation. Please refer to the original paper for all test results. In the test system, we compared Alibaba DADI, Dragonfly technology and Uber’s open source Kraken framework.

1) Stress test

The delay recorded in the pressure section is the average user-perceived end-to-end cold start delay. Firstly, we can see that the image acceleration function can significantly improve end-to-end delay compared with traditional FC. However, with the increase of concurrency, more machines simultaneously pull data from the central Container Registry, resulting in the competition of network bandwidth and the increase of end-to-end delay (orange and purple bar). But in FaaSNet, because of our decentralized design, no matter how much concurrency pressure there is on the source site, only one root node will pull data from the source site and distribute it down, so there is very high system scalability and the average latency does not increase as the concurrency pressure increases.

At the end of the pressure measurement part, we explored how the performance of functions with different images (multiple functions) would be brought to the same VM. Here, we compared FC (DADI+P2P) and FaaSNet that enabled the image acceleration function and assembled DADI P2P.

The vertical axis in the figure above shows the level of end-to-end delay after standardization. As the number of functions with different mirrors increases, DADI P2P has more layers and smaller specifications of each ECS in FC, resulting in excessive bandwidth pressure on each VM, resulting in performance degradation. The end-to-end delay has been stretched to more than 200%. However, because FaaSNet establishes connections at the mirror level, the number of connections is much lower than the Layer tree of DADI P2P, so it can still maintain good performance.

conclusion

High scalability and fast image distribution speed enable FaaS service providers to better unlock custom container image scenarios. FaaSNet uses lightweight, decentralized, self-balancing Function Tree to avoid performance bottlenecks brought by central nodes, introduces no additional systematic overhead and completely reuses existing FC system components and architectures. FaaSNet can realize real-time networking based on workload dynamics to achieve function-awareness without workload analysis and pre-processing.

The target scenarios of FaaSNet are not only limited to FaaS, but also in many cloud native scenarios, such as Kubernetes and Alibaba SAE can play their best in dealing with sudden traffic, so as to solve the pain point that too much cold start affects user experience and fundamentally solve the problem of slow cold start of containers.

FaaSNet is the first domestic cloud vendor to publish a paper on accelerating container startup technology to deal with sudden traffic in the Serverless scenario in the international top conference. We hope this work will provide new opportunities for container-based FaaS platforms to fully open the door to embracing container ecology and unlock more application scenarios such as machine learning and big data analysis tasks.