Technically, containerized applications can help organizations be more cost competitive, but Kubernetes is full of cost traps that can put you over budget. Fortunately, there are strategies for controlling cloud costs, and automatic scaling is one of them. Kubernetes comes with three built-in auto-scaling mechanisms to help you do this. The better they work together, the lower the cost of running the application.

1.Pod Horizon PodAutoScaler can scale ReplicationController, Deployment, ReplicaSet and ReplicationController automatically based on CPU utilization Number of PODS in StatefulSet. In addition to CPU utilization, automatic scaling can also be performed based on custom metrics provided by other applications.

In production environments, the usage of many applications fluctuates, which means that adding or removing POD duplicates in real time is more cost-effective. This is where the HorizontalpodAutoscaler (HPA) helps by automating this action.

When to use HPA? It is ideal for extending stateless applications, and certainly for stateful applications. HPA can be used in conjunction with Cluster Autoscaler(CA) to achieve the greatest cost savings for frequently changing workloads. This reduces the number of active nodes when the number of pods is reduced.

How does the HPA work? The HPA monitors the POD to see if the number of POD copies needs to change. To determine this, it takes the average of each POD indicator value and checks whether removing or adding copies brings the value closer to the target.

For example, if you deploy with a target CPU utilization of 50%, and now you have five pods running there, the average CPU utilization is 75%. To bring the POD average closer to your goal, the HPA controller will add three copies.

HPA best practices provide each POD resource metric for HPA: Metrics-Server needs to be installed in the Kubernetes cluster.

Configure values for each container: THE HPA makes extended decisions based on the OBSERVED VALUE of THE CPU utilization of the POD (the percentage of resource requests coming from a single POD). If you don’t include some container values, the calculation will be inaccurate and can lead to poor extension decisions.

Use custom metrics: Another source of HPA extension decisions is custom metrics. HPA supports two types of customized indicators: POD indicator and object indicator. Make sure you use the correct target type. You can also use external metrics from third-party monitoring systems.

Vertical Pod Autoscaler(VPA) automatically sets CPU and memory requests based on container resource utilization, allowing for appropriate scheduling on nodes. To provide the appropriate resources for each Pod. It can either shrink containers that overrequest resources, or increase the capacity of under-resourced containers at any time based on their usage.

This automatic scaling mechanism increases and decreases the CPU and memory resource requests of the POD container to align allocated cluster resources with actual usage. The VPA also needs access to the Kubernetes metric server because it only replaces the POD managed by the Replication Controller.

Tip:

“If your HPA configuration does not use CPU or memory to set its expansion target, use both VPA and HPA. When to use VPA? A workload may hit high utilization at some point, but increasing its request limits is not a good idea. You may waste CPU or memory resources and limit the nodes that run them. Distributing workloads across multiple application instances can also be tricky, which is where Vertical Pod Autoscaler comes in handy.

How does VPA work? A VPA deployment consists of three components:

  • Recommender: Monitors resource utilization and calculates the target value. That is, checks the historical resource utilization and the current pattern, and recommends an ideal resource request value
  • Updater: Checks whether the Pods resource limits need to be updated
  • Admission Controller: Overrides its resource request when creating a POD

Since Kubernetes does not allow changing the resource limits of a running POD, VPA first terminates the old pod and then injects the updated values into the new POD specification.

VPA best practices are avoided prior to version 1.11 of Kubernetes.

Run VPA with updateMode:Off to see the resource usage of the pod you want to automatically scale. This will give you recommended CPU and memory requests and is an important basis for future adjustments.

If the workload has frequent spikes of high and low utilization, the VPA may be overly aggressive because it may be constantly replacing pods over and over again. In this case, HPA works better.

3. Cluster Autoscaler(CA) Cluster Autoscaler You can add or delete nodes in a K8S Cluster to expand the Cluster capacity. Because the Cluster Autoscaler controller works at the infrastructure level, it needs to have permission to add and remove infrastructure, and you should manage this authorization information securely (for example, following the minimum permission rule).

When to use automatic cluster expander? This automatic scaling mechanism works well if you want to optimize costs by dynamically scaling the number of nodes to maximize the current cluster utilization. It is an excellent tool for workloads designed to scale and meet dynamic requirements.

How does the Cluster Autoscaler work? It checks for non-dispatchable pods and then calculates whether it is possible to consolidate all currently deployed pods to run them on a smaller number of nodes. If the Cluster Autoscaler identifies nodes that have PODS that can be rescheduled to other nodes in the Cluster, it evicts them and removes standby nodes.

Cluster auto expander best practices

  • When deploying the Cluster Autoscaler, use it with the matching Kubernetes version. (Compatibility list).
  • Check that the cluster nodes have the same CPU and memory capacity: otherwise the cluster autozoomer will not work because it assumes that every node in the cluster has the same capacity.
  • Ensure that automatically scaled pods all have the specified resource request.

To sum up, automatic scaling mechanisms are very valuable for controlling cloud costs, but they require a lot of manual configuration:

  • Prevent HPA and VPA conflicts: You need to check whether your HPA and VPA policies end up in conflict. Keep an eye on costs to prevent them from spiraling out of control.
  • Balance the three mechanisms: You need to balance a combination of the three mechanisms to ensure that workloads support peak loads and keep costs to a minimum when loads are low.