How does Kubernetes Autoscaling work?

How does Kubernetes Autoscaling work? That’s a question we’ve been asked a lot lately.

So this article will explain how Kubernetes Autoscaling works and the advantages it can provide when scaling clusters.

What is the Autoscaling

Imagine filling two buckets with water from a tap. We want to make sure that when the first bucket is 80% full, we start filling the second bucket. The solution is as simple as installing a pipe connection between the two buckets in the right place. And when we want to increase the volume of water, we just need to increase the bucket in this way.

The same goes for our applications or services, where the elastic scaling of cloud computing frees us from having to manually tune physical servers/virtual machines. So compare “buckets of water” to “applications consume computing resources” —

Bucket – Scale unit – explain what we scale
80% markup – Scale measures and triggers – explain when we scale the problem
Pipes – Operations to scale – explain how we scale

What do we scale?

In the Kubernetes cluster environment, as users we typically scale two things:

Pods – For an application, suppose we run X replicas. When requests exceed the capacity of X Pods, we need to extend the application. For this process to work seamlessly, our Nodes should have enough resources available to successfully schedule and execute these additional Pads;

Nodes – The total capacity of all Nodes represents our cluster capacity. If workload demands exceed this capacity, we need to add nodes to the cluster to ensure efficient scheduling and execution of the workload. If Pods continue to grow, then we may run out of available resources on the nodes and have to add more nodes to increase the total resources available at the cluster level;

When do you scale?

Typically, we measure a measure continuously, and when the measure exceeds a threshold, we operate on a resource by scaling it. For example, we might need to measure the average CPU consumption of a Pod and then trigger a scaling operation when the CPU consumption exceeds 80%.

However, one metric is not suitable for all use cases, and may vary for different types of applications — for message queues, the number of messages in the waiting state may be used as a metric; For memory-intensive applications, memory consumption may be a more appropriate metric. If we have a business application that can process about 1000 transactions per second for a given capacity pane, we might take this metric and scale it when Pods reach 850 +.

We’ve only considered extensions above, but when workload utilization drops, there should be a way to scale back modestly without interrupting existing requests that are being processed.

How do I scale?

For Pods, simply change the number of copies in Replication; For Nodes, we need a way to call the CLOUD computing service provider’s API and create a new instance as part of the cluster.

Kubernetes Autoscaling

Based on this understanding, let’s look at the implementation and technique of Kubernetes Autoscaling

Cluster Autoscaler

Cluster Autoscaler is used to dynamically scale clusters (Nodes). It continuously monitors Pods and expands PodConditoin if Pods cannot be scheduled. This is much more efficient than looking at the percentage of cpus in the cluster. Since Nodes can take a minute or more to create (depending on factors such as the cloud computing provider), Pods can take some time to Schedule.

Within a cluster, we might have multiple Nodes pools, such as a Nodes Pool for billing applications and another Nodes Pool for machine learning workloads. The Cluster Autoscaler provides various tags and methods to adjust Nodes scaling behavior, See https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md for more details.

For scale-down, the Cluster Autoscaler looks at the average utilization of Nodes and looks at other relevant factors, such as if Pods are running on Nodes that cannot be rescheduled, The Node cannot be removed from the cluster. Custer Autoscaler provides a normal way to terminate Nodes, which typically relocates Pods in 10 minutes.

Horizontal Pod Autoscaler (HPA)

The HPA is a control loop that monitors and scales Pods in a deployment. This can be done by creating an HPA object that references the deployment/Reolication Controller. We can define scaling thresholds and thresholds for deployment. The earliest version of THE HPA, GA (Autoscaling/V1), only supported CPU as a monitorable metric. The current version of HPA, which is in beta (Autoscaling/V2Beta1), supports memory and other custom metrics. Once the HPA Object is created and it can query the metrics for that pane, you can see that it reports the details:

$ kubectl get hpa
NAME               REFERENCE                     TARGETS  MINPODS MAXPODS REPLICAS AGE
helloetst-ownay28d Deployment/helloetst-ownay28d 8% / 60% 1       4       1        23h
Copy the code

We can make some adjustments to the horizontal Pod Autoscaler by adding Flags for the Controller Manager:

Using the Flags-horizontal-pod-autoscaler-sync-periodDetermine how often hPa monitors Pods group metrics. The default interval is 30 seconds.
The default interval between two extended operations is 3 minutes, which can be controlled by Flags-horizontal-pod-autoscaler-upscale-delay
The default interval between two zoom operations is 5 minutes, also controlled by Flags-horizontal-pod-autoscaler-downscale-delay

Metrics and cloud providers

In order to measure, the server should be enabled Kubernetes custom indicators (https://github.com/kubernetes/metrics) at the same time, enable Heapster or enable API aggregation. API Metrics Server is the preferred method for Kubernets version 1.9 and above. For configuring Nodes, we should enable and configure the appropriate Cloud provider in the cluster, See https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/ for more details.

Some plugins

There are also some nice plug-ins, such as —

Vertical pod autoscaler https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
addon-resizer https://github.com/kubernetes/autoscaler/tree/master/addon-resizer

To sum up, the next time someone asks “How does Kubernetes Autoscaling work”? I hope this article can be helpful to you.

It’s time for commercials again

Kubernetes proposed a series of conceptual abstractions, very consistent with the ideal distributed scheduling system. However, a large number of difficult technical concepts also create a steep learning curve, which directly raises the bar for using Kubernetes.

The open source PaaS Rainbond package these technical concepts into a production-ready application that can be used as a Kubernetes panel that requires no special learning. Including elastic scaling in this article, Rainbond supports both horizontal and vertical scaling

In addition, Kubernetes itself is a container orchestration tool and does not provide management processes, whereas Rainbond provides off-the-shelf management processes, including DevOps, automated operations, microservices architecture, and application marketplaces, right out of the box.

To learn more about: https://www.goodrain.com/scene/k8s-docker

Rainbond Github：https://github.com/goodrain/rainbond