Co-author | Yan Xun, ali cloud EDAS team senior engineer Andy Shi, ali cloud developer advocate Tom Kerkhove, Codit container business and director of the Azure architects, KEDA maintainer, ambassador CNCF source | alibaba cloud native public number

When you are scaling Kubernetes, you will think of some areas, but if you are new to Kubernetes, you may find it a little overwhelming.

In this blog post, we will briefly explain the areas to consider, how KEDA makes it easy to automate application scaling, and why Alibaba Cloud Enterprise Distributed Application Service (EDAS) is fully standardized on KEDA.

Telescopic Kubernetes

When managing Kubernetes clusters and applications, you need to carefully monitor things like:

  • Cluster capacity – Do we have enough available resources to run our workload?
  • Application workload – Does the application have enough resources available? Does it keep up with the work to be done? (Like queue depth)

To automate, you usually set up alerts to get notified, or even use automatic scaling. Kubernetes is a great platform to help you implement this out-of-the-box functionality.

Clusters can be easily scaled by using the Cluster Autoscaler component, which monitors the Cluster to find pods that cannot be scheduled due to resource shortages and begin adding/removing nodes accordingly.

Because the Cluster Autoscaler only starts when pods are overscheduled, you may have a period of time in between when your workload is not up and running.

Virtual Kubelet (a CNCF sandbox project) is a great help, allowing you to add a “Virtual node” to a Kubernetes cluster on which pods can be scheduled.

By doing so, platform vendors (such as Alibaba, Azure, HashiCorp, and others) allow you to spill suspended pods out of the cluster until it provides the required cluster capacity to alleviate the problem.

In addition to scaling clusters, Kubernetes also allows you to scale applications easily:

  • The Horizontal Pod Autoscaler (HPA) allows you to add/remove more pods to your workload toscale in/out (add or remove copies).
  • Vertical Pod Autoscaler (VPA) allows you to add/remove resources to your Pod toscale up/ Down (add or remove CPU or memory).

All of this gives you a good starting point for scaling your application.

Limitations of HPA

While HPA is a good starting point, it focuses on the pod’s own metrics, allowing you to scale it based on CPU and memory. That said, you can fully configure how it should automatically scale, which makes it powerful.

While this is ideal for some workloads, you usually want to scale based on metrics elsewhere such as Prometheus, Kafka, cloud providers, or other events.

Thanks to external metrics support, users can install metrics adapters that provide various metrics from external services and scale them automatically by using metrics servers.

However, it is important to note that you can only run one metric server in a cluster, which means you have to choose the source of the custom metric.

You can use Prometheus and tools like Promitor to take your metrics from other providers and scale them as a single source of truth, but that takes a lot of plumbing and work to scale.

There must be an easier way… Yes, using Kubernetes Event-driven Autoscaling (KEDA)!

What is KEDA?

Kubernetes Event-driven Autoscaling (KEDA) is a single-purpose event-driven Autoscaling device for Kubernetes that can be easily added to a Kubernetes cluster to scale applications.

Its goal is to make automatic application scaling very simple and to optimize costs by supporting scale-to-zero.

KEDA removes all of the scaling infrastructure and manages everything for you, allowing you to scale on over 30 systems or with your own extenders.

The user just needs to create ScaledObject or ScaledJob to define the object you want to scale and the trigger you want to use; KEDA will take care of the rest!

You can scale anything; Even if it’s a CRD of another tool you’re using, as long as it implements the /scale subresource.

So, has KEDA reinvented the wheel? No! Instead, it extends Kubernetes by using HPA underneath, which uses our external metrics provided by our own metrics adapter, which replaces all other adapters.

Last year, KEDA joined CNCF as a CNCF sandbox project, with plans to upgrade the proposal to the incubation phase later this year.

Alibaba is based on the practices of OAM/KubeVela and KEDA

As the main enterprise PaaS product on Aliyun, Enterprise Distributed Application Service (EDAS) has served countless developers on public cloud on a huge scale for many years. From an architectural perspective, EDAS was built in conjunction with the KubeVela project. Its overall architecture is shown in the figure below.

In terms of production, EDAS integrates ARMS monitoring service on Ali Cloud to provide fine-grained indicators of monitoring and application. The EDAS team added an ARMS Scaler to the KEDA project to perform automatic scaling. They also added some features and fixed some bugs in the KEDA V1 version. Include:

  • When there are multiple triggers, these values are summed up instead of being left as individual values.
  • When creating a KEDA HPA, the name length is limited to 63 characters to avoid triggering DNS complaints.
  • Triggers cannot be disabled, which could cause problems in production.

The EDAS team is actively sending these fixes to the upstream KEDA, although some of them have already been added to the V2 release.

Why has Aliyun standardized KEDA as an automatic scaler for its applications

EDAS initially used the CPU and memory of upstream Kubernetes HPA as two metrics when it came to the automatic scaling feature. However, as the user base grew and needs diversified, the EDAS team quickly discovered the limitations of upstream HPA:

  1. There is limited support for custom metrics, especially for application-level fine-grained metrics. Upstream HPA focuses on container-level metrics, such as CPU and memory, which are too crude for applications. Metrics that reflect application load, such as RT and QPS, are not readily supported. Yes, the HPA can be extended. However, this ability is limited when it comes to application-level metrics. The EDAS team was often forced to fork code when trying to introduce fine-grained application-level metrics.
  2. Scaling to zero is not supported. Many users have the need to scale to zero when their microservices are not being used. This requirement is not limited to FaaS/ serverless workloads. It saves costs and resources for all users. Currently, upstream Hpas do not support this feature.
  3. Scheduled scaling is not supported. Another strong demand from EDAS users is scheduled scaling. Again, upstream HPAS did not provide this capability, and the EDAS team needed to find a non-vendor lock-in alternative.

Based on these requirements, the EDAS team began planning a new version of the EDAS automatic scaling feature. Meanwhile, EDAS introduced OAM in early 2020, overhauling its underlying core components. OAM provides EDAS with standardized, pluggable application definitions to replace its internal Kubernetes application CRD. The extensibility of the model enables EDAS to easily integrate with any new features of the Kubernetes community. In this case, the EDAS team attempted to combine the need for EDAS’s new auto-scaling feature with a standard implementation of OAM’s auto-scaling feature.

Based on the use cases, the EDAS team concluded three criteria:

  1. The auto-scaling feature should present itself as a simple atomic feature without the need to attach any complex solutions.
  2. Metrics should be pluggable, so the EDAS team can customize them and build on them to support various requirements.
  3. It requires out-of-the-box land support to scale to zero.

After a detailed evaluation, the EDAS team chose the KEDA project, which was open-source by Microsoft and Red Hat and has been donated to CNCF. KEDA provides several useful scalers by default and supports scaling to zero out of the box. It provides fine-grained automatic scaling for applications. It has the concepts of Scalar and Metric adapters, supports a powerful plug-in architecture, and provides a unified API layer. Most importantly, KEDA is designed to focus only on automatic scaling so that it can be easily integrated into OAM features. Overall, KEDA is a great fit for EDAS.

Looking to the future

Next, Alibaba is actively promoting aiOPs-driven KEDA features with the goal of bringing intelligent decisions to its automated scaling behavior. This will essentially realize automatic scaling decisions based on expert system and historical data analysis, and utilize the application QoS trigger and database measurement trigger newly implemented by Alibaba’s KEDA component. Therefore, we expect a more powerful, smarter, and more stable keDA-based auto scaling feature to be released in KEDA soon.