AHPA: Open the door to Kubernetes elastic forecasting

Authors: Yuan Yi, Zi Bai

takeaway

In the era of cloud native containers, users need to face different business scenarios: periodic services, Serverless on demand, etc. In the use of automatic elasticity, you will find problems of one kind or another, of which the most important concerns are elastic lag, cold start problems. The original team of Alibaba Cloud and the intelligent timing team of Alibaba Dharma Institute cooperate to develop AHPA elastic forecasting product. The main starting point of this product is to make “timing planning” based on the detected cycle, and achieve the purpose of advance expansion through planning. Under the condition of ensuring business stability, you can truly realize on-demand use.

background

There are two main reasons for users’ increasing expectations on cloud resilience. One is the rise of the concept of cloud native, which is changing from the VM era to the container era. The second is the rise of new business models, which are based on the cloud at the beginning of their design and naturally have the appeal for flexibility.

With cloud, users no longer need to build their own infrastructure from physical machines and computer rooms. Cloud provides users with very flexible infrastructure. The biggest advantage of cloud is that it can provide users with flexible resource supply. Especially in the era of cloud native, users have increasingly strong demands for flexibility. In the ERA of VM, the strength of elastic demand was at the level of manual operation minutes, while in the era of container, it has reached the level of second. Users’ expectations and requirements for cloud are also changing in different business scenarios:

Cyclical business scenarios: New businesses such as live streaming, online education, and games have a significant cyclical nature in common, which makes customers think about resilient business architectures. Coupled with the concept of cloud native it is natural to think of a batch of on-demand services up, use up the release.
Arrival of Serverless: The core concept of Serverless is on-demand, automatic flexibility. Users do not need capacity planning. But when you actually start to use Serverless, there are problems of one kind or another, with elastic lag and cold start issues being the biggest concerns. This is unacceptable for response delay sensitive services.

So facing the above scenario, the current Kubernetes existing elastic solution can be solved?

Problems faced by traditional resilience schemes

There are three ways to manage the number of application instances in Kubernetes: fixed number of instances, HPA and CronHPA. The biggest problem with fixed number of instances is that it causes obvious waste of resources in the trough of business. In order to solve the problem of resource waste, HPA is established. However, the elastic triggering of HPA lags behind, which leads to the lag of resource supply. The failure of timely supply of resources may lead to the decline of business stability. CronHPA can scale periodically, which seems to solve the problem of elastic lag. However, how fine is the timing granularity and do you need to manually adjust the timing elasticity policy frequently when service volumes change? If you do this, it can be very onerous and error-prone.

AHPA elasticity forecast

The elastic prediction of Advanced Horizontal Pod Autoscaler (AHPA) is based on the detected period to achieve the purpose of capacity expansion in advance. However, since it is planning, there will be omissions, so you need to have a real-time adjustment of the number of planning instances. Therefore, this program has two elastic strategies: active prediction and passive prediction. Active prediction identifies the cycle length based on RobustPeriod algorithm [1] and then uses RobustSTL algorithm [2] to raise the cyclical trend and proactively predict the number of application instances in the next cycle. Passive prediction based on the application of real-time data to set the number of instances can be a good response to burst traffic. In addition, AHPA also adds a bottom-of-the-pocket protection policy where users can set upper and lower bounds on the number of instances. The number of effective instances in AHPA algorithm is the maximum among active prediction, passive prediction and bottom-of-the-pocket strategy.

architecture

Flexibility is first carried out under the condition of stable business. The core purpose of flexibility is not only to help users save costs, but also to enhance the overall stability of business, free operation and maintenance ability and build core competitiveness. Basic principles of AHPA architecture design:

Stability: Elastic expansion under the condition of ensuring the stability of user services
O&m free: does not impose additional O&M burdens on users, including: does not add new Controller and Autoscaler configurations on the user side. The configuration semantics are clearer than HPA
Serverless: Application-oriented and application-oriented design. Users do not need to care about the configuration of the number of instances.

The structure is as follows:

Rich data indicators: CPU, Memory, QPS, RT, and external indicators are supported
Stability guarantee: The elastic logic of AHPA is based on the strategy of active warm-up and passive cushion, combined with degradation protection, to ensure resource stability.
- Active forecasting: Forecasting trends over time based on history, suitable for cyclical applications.
- Passive prediction: Real-time prediction. Prepare resources in real time by passive prediction for burst traffic scenarios.
- Degraded protection: Supports the configuration of multiple instances of the maximum and minimum time range.
Various scaling modes: AHPA supports scaling modes including Knative, HPA and Deployment:
- Knative: Provides flexible cold startup based on the number of concurrent requests, QPS, and RT in Serverless application scenarios
- HPA: Simplifies the configuration of HPA elastic policies, lowers the threshold for users to use elasticity, and solves the cold start problem
- Deployment: Use Deployment directly for automatic scaling

To adapt to the scene

AHPA adaptation scenarios include:

There are obvious periodic scenes. Such as live broadcast, online education, game service scene, etc
Fixed number of instances + elastic bottom. For example, dealing with unexpected traffic in normal services
You are advised to configure the number of instances

Prediction effect

When AHPA Elasticity is enabled, we provide a visual page for viewing AHPA effects. Here is an example of a prediction based on CPU metrics (compared to using HPA) :

Description:

Predict CPU Oberserver: The blue color indicates the actual CPU usage of the HPA, and the green color indicates the predicted CPU usage. The green curve is larger than the blue curve, indicating that the predicted capacity is sufficient.

Predict POD Oberserver: Blue indicates the actual number of EXPANDED PODS using HPA, and green indicates the predicted number of expanded PODS. The green curve is smaller than blue, indicating that the predicted number of POD is lower.
Periodicity: Based on the historical data of 7 days, the prediction algorithm detects that the application has periodicity.

Conclusion: The prediction results show that the elastic prediction trend accords with the expectation.

Invite a test

Click here to view the details of the AHPA Elastic Prediction product documentation of Ali Cloud Container Service. AHPA has started the user invitation test. Interested users are welcome to click “Submit work order” in the document to apply for whitelist. We look forward to your trial and feedback.

reference

[1] Qingsong Wen, Kai He, Liang Sun, Yingying Zhang, Min Ke, And Huan Xu. RobustPeriod Robust Time-Frequency Mining for Multiple Periodicity Detection, in Proc. of 2021 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD 2021), Xi’an, China, Jun. 2021.

[2] Qingsong Wen, Jingkun Gao, Xiaomin Song, Liang Sun, Huan Xu, Shenghuo Zhu. A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series, in Proc. of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, pp. 5409-5416, Honolulu, Hawaii, Jan. 2019.

[3] Qingsong Wen, Zhe Zhang, Yan Li and Liang Sun. Fast RobustSTL Efficient and Robust Seasonal-Trend Decomposition for Time Series with Complex Patterns, in Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2020), San Diego, CA, Aug. 2020.