Recently, the RobustScaler: qos-Aware Autoscaling for Complex Workloads, a paper co-authored by Ali Cloud Container Service team and Data decision team of Damo Institute, was accepted by ICDE 2022, the international top conference on Data Management and Database. ICDE, SIGMOD and VLDB are the three top international academic conferences in the field of database, and have been selected into the List of Class A international conferences recommended by China Computer Society (CCF).

Alibaba Cloud Container Service ACK manages a large number of Kubernetes clusters, has accumulated rich experience in cluster management, cluster operation and maintenance, and constructed the intelligent operation and maintenance platform CIS(Container Intelligence Service), aiming to solve the operation and maintenance problems through intelligent means. The data decision-making team of Dharma Institute has been deeply engaged in the direction of time series analysis/prediction/anomaly monitoring /AIOps for many years. Dozens of articles have been published in KDD, SIGMOD, ICDE, AAAI, and many Chinese and American patents. Winner of the 2022 ICASSP AIOps Challenge and other international awards.

Nowadays, enterprise business flow tends to show obvious peaks and troughs. If the number of fixed instances is used, there will be a great waste of resources. Configuring elastic scaling for applications is an effective way to improve resource utilization.

Existing elastic scaling policies in Kubernetes, such as HPA and CronHPA, all have the problem of elastic trigger lag, which leads to the deterioration of service quality of applications. How to expand and shrink the application capacity in advance based on the historical data of the application and on the premise of ensuring the application quality of service?

To solve this problem, we propose an intelligent elastic framework RobustScaler based on non-homogeneous Cypress Pine process (NHPP) and stochastic constraint optimization. In addition, a special alternate direction multiplier method (ADMM) is developed to train the NHPP model effectively, and it is proved that the active strategy based on optimization can guarantee the quality of service of the application. A number of experiments have shown RobustScaler to be superior to common automatic scaling strategies in a variety of real-world scenarios, and to perform well in complex and cyclical applications.

RobustScaler algorithm has been applied in AHPA component of intelligent operation and maintenance platform CIS. Intelligent operation and maintenance platform CIS is composed of four modules of anomaly discovery, anomaly location, anomaly repair and anomaly prediction, including periodic inspection, network diagnosis, runtime diagnosis, CVE vulnerability repair, application configuration optimization and many other functions. AHPA is one of the core components of CIS. The component architecture is shown in the figure below. The AHPA elastic strategy can be divided into active prediction and passive prediction. Proactive forecasting identifies cyclical trends from historical data and proactively predicts the number of application instances in the next cycle; Passive prediction based on the application of real-time data to set the number of instances can be a good response to burst traffic. In addition, AHPA also adds a bottom-of-the-pocket protection policy where users can set upper and lower bounds on the number of instances. The number of effective instances in AHPA algorithm is the maximum among active prediction, passive prediction and bottom-of-the-pocket strategy.

AHPA components are in public beta, click apply whitelist [1], welcome to try out and give your valuable comments.

Click here to view the details of the AHPA Elastic Prediction product documentation of Ali Cloud Container Service. AHPA has started the user invitation test. Interested users are welcome to click “Submit work order” in the document to apply for whitelist. We look forward to your trial and feedback.

A link to the

[1] Application whitelist help.aliyun.com/document_de… Release the latest information of cloud native technology, collect the most complete content of cloud native technology, hold cloud native activities and live broadcast regularly, and release ali products and user best practices. Explore the cloud native technology with you and share the cloud native content you need.

Pay attention to [Alibaba Cloud native] public account, get more cloud native real-time information!