Introduction: SAE is the best choice for Serverless in traditional applications with its core competitiveness in resiliency, scenario richness, and stability through continuous optimization of elastic components and application lifecycle to achieve second-level resiliency.

Author: Jing Xiao

The advent of the Serverless era

Serverless, as the name implies, is a “Serverless” architecture, because it shields the complexity of server operation and maintenance, so that developers can devote more energy to business logic design and implementation. In the Serverless architecture, developers only need to focus on the development of upper-layer application logic, while complex operations such as resource application, environment construction, load balancing, expansion and scaling are maintained by the platform. In the cloud Native Architecture whitepaper, the features of Serverless are summarized as follows:

  • Fully managed computing services, customers only need to write code to build applications, without paying attention to the homogenized and heavy burden of server-based infrastructure development, operation and maintenance, security, high availability and other work;
  • Generality, able to support all important types of applications on the cloud;
  • Automatic elastic scaling, so that users do not need to advance capacity planning for resource use;
  • Billing by volume, so that enterprises can effectively reduce the cost of use, no need to pay for idle resources.

Looking back on the whole development history of Serverless, we can see that people’s attention to Serverless has exploded from the first proposal of Serverless concept in 2012 to the launch of AWS Lambda cloud product. The expectation and imagination of Serverless gradually detonates the whole industry, but the promotion and production of Serverless is not optimistic. There is a Gap between Serverless concept and practical production process, which challenges people’s inherent use experience and habits.

Ali Cloud firmly believes that Serverless will be the definite development direction after cloud native, and has launched FC,SAE and other cloud products to cover different fields and different types of application loads to use Serverless technology, and constantly promote the popularization and development of the whole Serverless concept.

In terms of the current market pattern of Serverless, Ali Cloud has achieved the Serverless product capacity of the first in China, leading the world. In the magic quadrant of Forrester evaluation last year, it can be clearly seen that Ali Cloud has matched AWS in the Serverless field. At the same time, Ali cloud Serverless user proportion is the first in China, in 2020 China cloud native user survey report of the whole Ali cloud Serverless user proportion has reached 66%, and in the Serverless technology adoption survey shows that, More and more developers and enterprise users are using Serverless technology in core businesses or will use it in core businesses.

Serverless Elastic exploration

Elastic capacity as one of the core competence of cloud, issues of concern is the contradictions between the load and capacity planning and the actual cluster through comparing two pictures you can see, if adopt the way of advance planning for resource arrangement, because of the resources will be prepared for quantity and resource demand does not match the result in resource waste or shortage of resources, Leading to cost too much overhead or even business is damaged, and we expect extreme flexibility ability, is to prepare the resources and the actual demand of resources, almost matching application so that the overall resource utilization is higher, the cost also along with the business and the corresponding increase or decrease, increase or decrease at the same time will not because of capacity problems affected application availability, this is the value of elasticity.

The implementation of elasticity is divided into scalability and fault tolerance. Scalability means that the underlying resources can adapt to the changes of indicators to a certain extent, while fault tolerance ensures that the application or instance in the service is in a healthy state through elastic self-healing. The value benefits of these capabilities lie in improving app availability while reducing costs. On the one hand, the resource usage is in line with the actual application consumption, and on the other hand, the peak application availability is improved, thus flexibly adapting to the constant development and change of the market.

The following is a description and analysis of three common elastic expansion modes.

For example, Alibaba Cloud ESS can configure alarm rules for cloud monitoring to trigger corresponding ECS increase or decrease operations. Meanwhile, it supports dynamic increase or decrease of Slb back-end servers and Rds whitelist to ensure availability. Realizes the elastic self-healing capability through the health check function. ESS defined the concept of scale group, that is, the basic unit of the elastic expansion, as collections of ECS instances of the same application scenarios and associated Slb, Rds, support for multiple scaling rules at the same time, such as simple rules, rules of progress, target tracking rules, prediction rules, such as the use of the user process to create scalable group and telescopic configuration, create scaling rules, Monitor Displays the elastic execution.

Kubernetes elastic scaling, here mainly focuses on the horizontal elastic HPA, its representative product is K8s and its corresponding managed cloud products, such as Ali Cloud container service, As an infrastructure and Platform for application operation and maintenance, K8s provides built-in capabilities mainly around container-level management and orchestration, while elastic capabilities focus on the dynamic horizontal scaling of the underlying Pod. K8s HPA polling Pod monitoring data and comparing it with the target expected value, using real-time algorithm to generate the expected number of copies, and then increasing or decreasing the Workload number of copies. In actual use, users need to create and configure corresponding indicator sources, elastic rules and corresponding Workload. You can view the execution of elasticity through events.

Finally, I will introduce the application of elastic scaling, which is mainly used in Internet companies, such as Ali ASI capacity platform. The capacity platform provides capacity prediction and capacity change decision services, guides the underlying capacity change components such as AHPA/VPA to achieve elastic capacity scaling, and modifies the capacity image based on the elastic results. Elastic expansion capability is realized mainly by image-driven + index driven, and elastic expansion risk is reduced by advance expansion + real-time correction. The whole elastic scaling will use ODPS and machine learning ability to process data such as instance monitoring and generate application portraits, such as benchmark portraits, elastic portraits, and large promotion portraits, etc., and use capacity platform to complete portrait injection, change control and fault fuse and other operations. Users use the flow to generate capacity portraits based on historical data or experience for application access, monitor indicator correction portraits in real time, and monitor and view elastic execution.

From the comparison, it can be seen that the elastic expansion function modes of all products are basically the same in abstract terms, which are composed of trigger sources, elastic decision-making and trigger actions. Trigger sources generally rely on external monitoring systems to collect and process node indicators and application indicators. Elastic decision-making is generally based on periodic polling and algorithm decision-making. Some are based on historical data analysis and prediction and user-defined timing policies, while the trigger action is to scale the instance horizontally and provide change records and external notification. On this basis, each product will be competitive in scene richness, efficiency and stability, and the transparency of elastic system will be improved through observable ability, which is convenient for troubleshooting and guiding elastic optimization, and meanwhile, user experience and stickiness will be improved.

The elastic scaling model of various products also has certain differences. For IaaS, as an old elastic scaling capability, it has long precipitation time, powerful and rich functions, and the capabilities of cloud vendors tend to be homogenized. Elastic efficiency is limited compared to containers, and they are strongly bound to their underlying Iaas resources. As an open source product, Kubernetes continuously optimizes iterative resilience and best practices through community forces, which is more in line with the demands of most development operations personnel. The elastic behavior and apis are highly abstract, but not extensible enough to support custom requirements. The elastic scaling of application portrait has internal characteristics of the group. It is designed according to the application status and elastic demands of the group, and focuses more on the pain points of resource pool budget cost optimization, capacity reduction risk, complexity and so on. It is not easy to copy the extension, especially for small and medium external customers.

From the final state goal, we can see the difference between public cloud and Internet enterprises:

  • Internet companies are often due to its significant flow characteristics on the internal application launch applications rely on much, slow, and to the overall water resource pool capacity, inventory and financial management from the online mixing department has many demands, organizational and therefore more portraits early elastic expansion in capacity is given priority to, the capacity of the data were calculated based on the Metrics as a real-time correction, The goal is to map capacity accurately enough to achieve the desired resource utilization.
  • Public cloud vendors serve external customers, provide more general and universal capabilities, and meet the differentiated needs of different users through extensibility. Especially in the Serverless scenario, the ability of applications to cope with unexpected traffic is emphasized. The goal is to achieve near-on-demand use of application resources and availability of services throughout the process through indicator monitoring and extreme flexibility without capacity planning.

Serverless elastic landing

As the best practice of cloud computing, the direction of cloud native development and the future evolution trend, Serverless’s core value lies in fast delivery, intelligent flexibility and lower cost.

SAE is an application-oriented Serverless PaaS platform that supports mainstream development frameworks such as Spring Cloud and Dubbo. Users can directly deploy applications to SAE without code modification, and use on demand and charge by volume. It can give full play to the advantages of Serverless to save the cost of idle resources for customers. At the same time, it adopts the way of full hosting and free operation and maintenance in the experience. Users only need to focus on the development of core business, while the application life cycle management, micro-service management, logging, monitoring and other functions are completed by SAE.

The competitiveness of elasticity mainly lies in the competitiveness of scene richness, efficiency and stability. First, let’s talk about the optimization of SAE on elastic efficiency.

Through data statistics and visual analysis of the whole life cycle of SAE application, including scheduling, init Container creation, user image pulling, user container creation, user container startup & application, the proportion of their time consumption is simplified in the schematic diagram. We can see that the entire application life cycle time is concentrated in scheduling, pulling user images, and cold startup of the application. In the scheduling phase, it takes a long time because SAE performs the operation of opening up user VPC. This step is strongly coupled to scheduling and takes a long time. In addition, creating a long tail times out and creating a long tail fails to retry, which takes a long time.

The question is whether the scheduling speed can be optimized? Can we skip the scheduling phase? In the case of pulling user images, the duration of pulling images and decompressing images is included, especially in the case of large-capacity image deployment. The idea of optimization lies in whether the pull image can be optimized using cache, and whether the decompression image can be optimized. For cold start applications, SAE has a large number of single and microservice JAVA applications. JAVA applications tend to have many startup dependencies, slow loading and configuration, and long initialization process, resulting in a cold start speed of up to minutes. The direction of optimization lies in whether cold start process can be avoided and the user as far as possible without feeling, application without modification.

First of all, SAE adopted the in-place upgrade capability. SAE initially used the original Deployment rolling upgrade strategy of K8s for the release process, which would first create a new version of Pod and then destroy the old version of Pod for the upgrade. The so-called in-place upgrade, That is, only one or more container versions of a Pod are updated without affecting the entire Pod object and the upgrade of other containers. The K8s Patch capability enables in-place Container upgrade, and the K8s readinessGates capability enables traffic loss during the upgrade.

The most important of these is the avoidance of rescheduling and the rebuilding of Sidecar containers (ARMS,SLS,AHAS), which can change the deployment time from the entire Pod life cycle to just pulling and creating business containers. At the same time, due to no scheduling, New images can be cached on Node in advance to improve elastic efficiency. SAE adopted Cloneset from Alibaba’s Open source Openkruise project as the new application load, and improved overall elastic efficiency by 42% with its in-place upgrade capability.

At the same time, SAE adopts the image preheating capability, which includes two forms of preheating: pre-scheduling preheating, SAE will carry out full node cache for the common base image to avoid its frequent pulling from the remote end. At the same time for partial support scheduling of preheating of scene, with the help of Cloneset situ upgrade ability, in the process of upgrading can sense to the instance of node distribution, could be the first to deploy new versions mirror at the same time, for the back lot instances of the node to the pull of the mirror, and realize the scheduling and pull the user mirror in parallel. SAE achieved a 30% increase in elastic efficiency through this technology.

The optimization point mentioned above is to pull the image. For decompressed images, traditional container operations need to download the full image data and then unpack it. However, container startup may only use part of the image content, resulting in a long time for container startup. SAE uses the image acceleration technology to automatically transform the original standard image format into an accelerated image that supports random reading. This enables image data to be downloaded and decompressed online, greatly improving application distribution efficiency. In addition, THE P2P distribution capability provided by Acree can effectively reduce image distribution time.

SAE, in collaboration with Dragonwell 11, provides an enhanced AppCDS startup acceleration strategy, which stands for Application Class Data Sharing. Using this technology, you can obtain the Classlist when an application is started and Dump the shared class files. When an application is started again, the shared file can be used to start the application, effectively reducing the cold startup time. In an SAE deployment scenario, a cache file is generated on a shared NAS after the application is started. The cache file can be used to start the application during the next release. Overall cold start efficiency increased by 45%.

In addition to optimizing efficiency throughout the application life cycle, SAE also optimized the elastic scaling process, which consists of obtaining elastic metrics, making metric decisions, and performing elastic scaling operations. For the elastic index acquisition, the basic monitoring index data has reached the second level acquisition, and for the seven-layer application monitoring index, SAE is planning to adopt a transparent traffic interception scheme to ensure the real-time acquisition of the index. In the elastic decision phase, the elastic component enables concurrent processing of multiple queues to Reconcile, and monitors queue accumulation and delay in real time.

SAE elastic scaling includes powerful indicator matrix, rich policy configuration, perfect notification and alarm mechanism, and comprehensive observation capability, supporting multiple data sources: Native MetricsServer, MetricsAdapter, Prometheus, cloud products SLS,CMS,SLB and external gateway routing, etc., supporting various indicator types: CPU, MEM, QPS, RT, TCP connection count, inbound and outbound bytes, disk usage, Java thread count, GC count and custom metrics. After capturing and preprocessing indicators, customized elastic strategies can be configured to adapt to specific application scenarios: fast expansion and fast contraction, fast expansion and slow contraction, only expansion but not contraction, only contraction but not expansion, DRYRUN, adaptive expansion and contraction, etc.

At the same time, more refined elastic parameter configuration can be carried out, such as instance upper and lower limits, indicator interval, step size ratio range, cooling and warm-up time, indicator acquisition cycle and aggregation logic, CORN expression, and event-driven capability will be supported later. After the elastic trigger, the corresponding capacity expansion and shrinkage operation will be carried out, and the flow is guaranteed to be lossless by cutting the flow, and the users can be informed in real time with the help of perfect notification and alarm capabilities (nail, Webhook, phone, email, SMS). Elastic scaling provides a full range of observability, clear display of elastic decision time and decision context, and traceability of instance status and instance SLA monitoring.

SAE resilience is also competitive in terms of scenario richness. Here are four scenarios SAE currently supports:

  • Timing elasticity: in the case of known application traffic load cycle, application instance number can be in accordance with the time, week, date cycle regularity of scalability, such as from 8 am to 8 PM time keep 10 instances for dealing with traffic during the day, and in the rest of the time due to the low flow is maintained in two instances even zero. This mode is applicable to the scenarios where the resource usage is periodic and applicable to the securities, medical, government, and education industries.
  • Indicator elasticity: You can configure the expected monitoring indicator rules. Indicators applied during SAE meetings are stable within the configured indicator rules, and the default mode is fast expansion and slow contraction to ensure stability. For example, set the target CPU indicator to 60%, QPS to 1000, and number of instances to 2-50. This application scenario applies to burst traffic and typical periodic traffic, and is mainly used in the Internet, gaming, and social networking platforms.
  • Mixed elasticity: The combination of timing elasticity and indicator elasticity enables you to configure indicator rules based on different time, week, and date, thus flexibly responding to complex scenarios. For example, if the target CPU indicator is set to 60% from 8 am to 8 PM, the number of instances ranges from 10 to 50, and in other times, the number of instances ranges from 2 to 5. This applies to scenarios with periodic resource usage, burst traffic, and typical periodic traffic, and is mostly used in the Internet, education, and catering industries.
  • Adaptive flexibility: the SAE in view of the flow excursion the scene is optimized, with the aid of flow in the window, calculate the index whether traffic surges problems on at this time, and will calculate according to the intensity in the surge of traffic capacity will increase when the instance is part of redundancy, and under the surge model, are not allowed to shrink.

Stability is an important part of the SAE resilience capability building process, ensuring that customer applications scale up and down as expected and are available throughout the process is a key concern. SAE elastic expansion follows the principle of fast expansion and slow contraction as a whole, ensures the stability of execution through multi-level smooth anti-shake, and expands in advance with the help of adaptive ability in the case of index surge. SAE currently supports four levels of elastic smooth configuration for stability:

  • Level 1 smoothing: Configures the indicator acquisition period, time window for single indicator acquisition, indicator calculation cluster, and logic
  • Two-level smoothing: configure the tolerance of index values and interval elasticity
  • Three-level smoothing: Configures the expansion step, percentage, upper and lower limits per unit time
  • Four-level smoothing: configure the expansion and contraction cooling window and instance preheating time

Serverless Elasticity best practices

SAE elastic expansion can effectively solve the automatic capacity expansion when the instantaneous flow peak arrives and automatic capacity reduction when the peak ends. High reliability, no o&M, and low cost ensure smooth application running. You are advised to follow the following best practices for elastic configuration.

  • Configure health check and life cycle management

You are advised to configure application health check to ensure the overall availability of applications during elastic expansion and to ensure that your applications receive traffic only when they are started, running, and ready to receive traffic. You are also advised to configure Prestop for life cycle management to ensure that applications are gracefully offline during capacity reduction.

  • Exponential retry mechanism is used

To avoid service invocation exceptions caused by delayed elasticity, application startup, or graceful online and offline, you are advised to use the exponential retry mechanism to invoke services.

  • Application startup speed optimization

To improve elastic efficiency, you are advised to optimize the creation speed of applications in the following aspects:

  • Software package optimization: Optimizes the application startup time to reduce the application startup duration caused by external dependencies such as class loading and caching
  • Image optimization: Simplify the image size to reduce the image pull time when creating instances. Dive, an open source tool, can analyze the image layer information and simplify changes accordingly
  • Java Application startup Optimization: SAE in conjunction with Dragonwell 11 provides Accelerated application startup for Java 11 users

  • Configure elastic scaling indicators

Flexible scaling indicator configuration. SAE supports basic monitoring and multiple application monitoring indicators. You can flexibly choose based on the current application attributes (CPU sensitive, memory sensitive, and I/O sensitive).

You can view historical indicator data (such as the past 6h, 12h, 1-day, 7-day peak value, P99 and P95 values) of basic monitoring and application monitoring and estimate target indicator values. You can use pressure measurement tools such as PTS to know the number of concurrent requests that the application can handle and the amount of CPU and memory required. And the application response mode under high load state to evaluate the application capacity peak size.

The target value of an indicator needs to weigh availability and cost to select a strategy

  • The value of the availability optimization policy configuration index is 40%
  • The availability cost balancing policy configuration index value is 50%
  • The cost optimization strategy configuration index value is 70%

In elastic configuration, the upstream and downstream dependencies, middleware, and DB should be arranged, and elastic rules or traffic limiting and degradation measures should be configured to ensure the availability of all links during capacity expansion.

After elastic rules are configured, you can continuously monitor and adjust the elastic rules to make the capacity closer to the actual application load.

  • Memory Specifications configuration

For memory metrics, consider that some application types use dynamic memory management to allocate memory (such as Java JVM memory management, Glibc Malloc, and Free operations), and the application does not release idle memory to the operating system in a timely manner. The physical memory consumed by instances does not decrease in time. Adding new instances does not reduce the average memory consumption, and therefore the capacity reduction cannot be triggered. Therefore, you are not advised to use memory indicators for this type of application.

  • Java application runtime optimization: Release physical memory to enhance memory metrics and business relevance

ElasticHeap is enabled by adding JVM parameters to the Dragonwell runtime environment to enable dynamic elastic scaling of Java heap memory, saving physical memory footprint that Java processes actually use.

  • Minimum number of instances

You are advised to configure the minimum number of instances for elastic scaling to be greater than or equal to 2, and configure the VSwitch for multiple availability zones (AZS) to prevent instances from being expelled or applications from stopping working due to exceptions of underlying nodes and availability zones.

  • Maximum number of instances

When configuring the maximum number of elastic scaling instances, ensure that the AVAILABLE IP addresses are sufficient to prevent instances from being added. You can view the available IP address of the current application in the console VSwitch. If the available IP address is less, replace or add a VSwitch.

  • Maximum elasticity

You can view the applications that have elastic scaling enabled on the application summary page. In addition, you can find the applications that have the maximum number of instances and reevaluate whether the maximum elastic scaling configuration is appropriate. If the expected maximum number of instances exceeds the product limit (currently limited to 50 instances per application, the upper limit can be raised by working order feedback)

  • Available area rebalancing

After elastic scaling is triggered, available zones may be unevenly allocated. You can view the available zones of instances in the instance list. If the available zones are unevenly allocated, you can restart applications to balance the available zones.

  • Automatically restore elastic configurations

When performing a change order, such as application deployment, SAE stops the elastic scaling configuration of the current application to avoid conflict between the two operations. If you want to restore the elastic configuration after completing the change order, you can select Automatic system recovery during deployment.

  • Elastic history

SAE Elastic implementation behavior Currently allows events to view expansion times, expansion actions, real-time, historical decision records, and decision context visualizations to measure the effectiveness of elastic scaling strategies and make adjustments as necessary.

  • Elastic event notification

Combined with multiple notification channels such as Dingding, Webhook, SMS and telephone, it is convenient to timely understand the elastic trigger situation.

Finally, I will share a case of an online education customer who adopted SAE’s flexible scalability feature. During the COVID-19 pandemic, the business traffic of an online education customer soared 7-8 times, putting hardware costs and business stability at great risk. If the traditional ECS architecture is adopted at this time, the customer needs to upgrade the infrastructure architecture in a very short time, which is a great challenge to the cost and effort of the user. However, if SAE is adopted, users can enjoy the technical dividend brought by Serverless at the cost of 0 transformation. Combined with SAE’s multi-scenario elastic strategy configuration, elastic self-adaptation and real-time observability, the service SLA of users in peak periods is guaranteed, and the hardware cost can be saved up to 35% through the extreme elastic efficiency.

To sum up, the development direction of elasticity, especially in the Serverless scenario, emphasizes the ability to deal with sudden traffic. Its goal is to achieve nearly on-demand use of application resources and service availability throughout the process through indicator monitoring and extreme flexibility without capacity planning. SAE achieves second-level elasticity through continuous optimization of elastic components and application lifecycle, and has core competitiveness in elastic capability, scenario richness, and stability. SAE is the best choice for Serverless in traditional application 0 transformation.

The original link to this article is the original content of Ali Cloud, shall not be reproduced without permission.