Summary: This article introduces the use of functions to calculate the next generation of IaaS base Dragon bare metal and safety containers, further reducing the absolute delay and can significantly reduce the frequency of cold start.

Author: Repair trace

background

Function computing in August 2020 provides an innovative way to deploy functions in container images. AWS Lambda re-invented in December 2020, and other FaaS providers in China also announced major features for FaaS support containers in June 2021. Cold starts have always been a pain point for FaaS, and the introduction of container images dozens of times larger than the code compression package has become the biggest concern for developers.

Functional computing decided during the design phase to support container mirroring that it wanted developers to experience mirroring in the same way as using code packs (second elasticity capability), with ease of use and the ultimate flexibility of FaaS itself, free of user friction and trade-offs. The ideal user experience is one in which the function calls barely feel the extra cost of latency associated with remote mirror data transfer.

There are two ways to optimize the mirror accelerated cold start: reduce the absolute delay and reduce the probability of cold start. Since container mirroring went live we have reduced absolute latency in stages through image acceleration. On this basis, this paper introduces the calculation of the next generation Of IaaS base Shenlong bare metal and safe container by means of function, which further reduces the absolute delay and can greatly reduce the cold start frequency.

Optimize the process

(Take a mirror image as an example)

First generation architecture: ECS fictional machine

Phase 1 (March 2021) : Load on demand to reduce data transfer

In the past, the problem was that all internal data of the mirror was pulled before starting the mirror. As a result, useless image data was downloaded completely and occupied too much preparation time. Therefore, our initial optimization direction is to ignore the useless mirror data as much as possible and achieve on-demand loading. Therefore, we use the image acceleration technology to omit the time of pulling useless data, and realize the function to calculate the related technical details of the custom image cold start from minute level to second level.

Phase 2 (June 2021) : The container instance startup I/O track is recorded, and the mirror data is prefetched in advance during subsequent instance startup

We found that function instances had a highly consistent I/O data access pattern during container startup and initialization phases. According to the characteristics of FaaS platform to schedule resources based on the application running mode, we recorded the desensitization data of I/O track when the function instance was started for the first time. When the subsequent instance was started, the track data was used as a prompt to prefetch mirror data to the local in advance, further reducing the cold start delay.

Both of these accelerated optimizations significantly reduce the absolute latency of cold start, but since traditional ECS VMS are recycled after being idle for a while, cold start is retriggered when a new machine is restarted. Therefore, how to reduce the frequency of cold start has become one of the key problems to be solved in the next stage.

Next generation architecture: Flexible bare metal server (Dragon)+microVM

When designing the next generation architecture, we not only consider the problem of cold startup frequency, but also pay attention to the impact of cache on startup delay. Therefore, we innovatively invent Serverless Caching to build a data-driven, intelligent and efficient Caching system based on the characteristics of different storage services to achieve software and hardware co-optimization and further improve the Custom Container experience. Function calculation background Shenlong change time is much larger than the idle recovery time of ECS VM, for the user side, hot start frequency is greatly increased, after cold start, cache will continue to be retained on Shenlong machine, cache hit rate can reach more than 90%.

Compared to ECS virtual machines, the Dragon bare Metal plus micro virtual machine architecture brings more room for optimization of image acceleration:

  • Reduce back bandwidth pressure and reduce duplicate data storage. Read magnification for the mirror repository and write magnification for disk storage space is reduced by at least two orders of magnitude compared to the ECS VM when thousands of instances are started simultaneously.
  • Virtual machine-level security isolation allows functional computing components to safely form an availability level cache network at speeds even faster than cloud disks.

The function calculates the Custom Container to log in to Shenlong, but also improves resource utilization and reduces costs. This is a win-win situation for users and server maintenance.

The Serverless Caching architecture provides more optimization potential without increasing resource usage costs.

(L1~L4 are different levels of cache, the distance and delay from small to large)

Horizontal contrast

So far, we have optimized mirror acceleration to a high level. We selected 4 typical images from the public use cases of function calculation and applied them to several large cloud vendors at home and abroad (named by vendor A and vendor B) for horizontal comparison. The above images were called every 3 hours and repeated several times, and the following results were obtained:

1. AI online inference – cat and dog recognition

The image contains an image recognition application based on TensorFlow deep learning framework. Ali Cloud function calculation and vendor A can run normally, but vendor A’s performance is poor. Vendor B cannot run properly. In the figure below, the delay data of Ali Cloud function calculation and vendor A include the end-to-end delay of mirror pull, container start and inference operation, while vendor B’s data is only the delay of mirror pull, both of which are the slowest. FC is relatively stable, and it can be seen that function calculation has a greater advantage in CPU consumption type such as AI reasoning.

Based on hot startup of cloud disk (gray), compare the extra cost of each vendor (color)

Python Flask Web Service 2

This image is a common network service that uses Python internally with the Flask service framework. The purpose of this image is to test the ability of different cloud products to perform efficient on-demand loading. Both FC and vendor A have fluctuations, but the fluctuation of the latter is the most obvious.

Based on hot startup of cloud disk (gray), compare the extra cost of each vendor (color)

3. Python machine learning operations

The Python operating environment is also in the image, and it can be seen that each vendor still maintains their own features. Vendor B has downloaded the full number of requests, while vendor A has optimized but unstable requests.

Based on hot startup of cloud disk (gray), compare the extra cost of each vendor (color)

4. Cypress Headless Chrome

This image contains A headless browser test flow that vendor A cannot run due to programming model limitations and runtime environment incompatibility. However, vendor B is too slow to complete application initialization in 71.1 seconds. It is not hard to see that function calculation still does well in the mirror of heavy I/O.

The cloud disk hot start is used as the baseline (gray), and the extra overhead of each vendor is compared (color). The green area is the end-to-end time better than the baseline

Recommend best Practices

Support for container technology is an essential feature of FaaS. Containers increase portability and delivery agility, while cloud services reduce operational and idle costs and provide resilient scaling capabilities. The combination of custom image and function calculation directly solves the problem of migrating large volume business logic for cloud vendors.

FaaS running the container needs to eliminate as much overhead as possible to make the user experience similar to the local running scenario. Stable and fast operation is also the standard of excellent FaaS. FC provides image loading optimization and greatly reduces the frequency of cold startup to guarantee stable and fast operation. Moreover, the portability of the application needs to be smooth, not limiting the development mode but also trying to reduce the user threshold. Function Computing Custom images support standard HTTP services, free configuration of available ports, and can be read and written. Multiple tool chains and diversified deployment solutions are provided. There is no mandatory waiting time for image preparation, HTTP triggering does not depend on other cloud services, and a series of quality solutions such as customized domain names are supported.

Function calculation custom mirror is suitable for but not limited to artificial intelligence reasoning, big data analysis, game settlement, online course education, audio and video processing, etc. It is recommended to use ACR EE, the enterprise version of Ali Cloud container image service, which has the image acceleration function, eliminating the need to manually enable the accelerated pull and prepare the accelerated image when using ACR images.

AI/ML online reasoning

Reasoning computing relies on large-volume low-level training framework and a large amount of data processing. Ordinary AI frameworks such as Tensorflow can easily reach GB level of mirroring, which has high CPU requirements, and it is even more challenging to meet the expansion and contraction capacity. Function computing custom image can well solve such requirements. Users only need to directly use the image of the underlying training framework and package it with the data processing logic into a new image, which can easily save the migration cost brought by changing the operating environment, and at the same time meet the fast training results brought by elastic expansion and shrinkage. Song preference reasoning, image AI recognition analysis and so on can be seamlessly linked with function calculation to achieve flexibility to meet a large number of dynamic online reasoning requests.

Lightweight and flexible ETL

Services rely on data, and data processing often consumes a lot of resources to meet fast and efficient data change requests. The custom mirror can meet the same security isolation in data processing as other function computing runtime, and at the same time retain the convenient ability of users to freely package the business logic of data processing into a mirror. It provides smooth migration while meeting the very low extra delay of mirror startup, and meets the users’ requirements for safe, efficient and elastic data processing in application scenarios such as database governance and Internet of Things.

Battle Settlement

All kinds of game will usually set the daily tasks within a short time to gather a lot of players at the same time need to combat settlement of a kind of data processing, in order not to let players lose patience, combat data validation usually needs to be done in a short span of a few seconds, and the data of a single player clearing unit time does not deteriorate with increase in the number of players. The business logic for this kind of data processing is often complex and highly repetitive, and packaging the player data processing logic into a custom mirror of the function calculation can flexibly meet a large number of similar player settlement requests in a short period of time.

The future planning

The optimization function is designed to compute custom images so that users do not feel the extra delay caused by the container image transfer, giving cloud native developers the best experience. Optimization will not stop, and our ultimate goal is to almost eliminate the overhead of container image pull and massive capacity expansion when the mirror warehouse becomes a bottleneck, rapid scaling. Further improve Serverless Caching while the Custom Container feature will help Web applications on Kubernetes in the future, Job class workloads run seamlessly in function computation. Kubernetes handles resident, steady traffic workloads, and Serverless services share volatile computing that will increasingly become cloud native best practice.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.