Horizontal Pod Autoscaler (HPA)

Elastic scale the number of Pods in a replication controller, deployment, replica set, or stateful set based on monitoring CPU memory utilization (or custom custom or installed application combination). Note that HPA does not apply to objects that cannot be scaled, such as DaemonSet.

In K8S, there are the following elastic scaling modes:

  • Pod Horizontal Scaling (HPA)
  • Pod Vertical Scaling (VPA)
  • Cluster Node Elastic Scaling (CA)

Different types of elastic stretching use different field shadows


If cluster resources are insufficient, the CA automatically configures new computing resources and adds them to the cluster


Generally, the configuration operation is carried out with the help of services provided by public cloud, such as Ali Cloud, which provides elastic scaling services. Using elastic scaling + scheduled tasks in peak traffic can avoid service failure or excessively high server cost

K8s official website cloud vendor solution: github.com/kubernetes/…

Of course, bloggers feel that using CA for node resiliencies is not as good as using cloud vendors’ resiliencies


Sometimes you can’t expand by increasing the number of PODS, such as a database. In this case, you can use VPA to increase the Pod size, such as adjusting the Pod CPU and memory


This is done in conjunction with monitoring, which usually requires the deployment of metrics-server, although it is possible to use Prometheus + scripting

Liverpoolfc.tv: github.com/kubernetes/…

Actual deployment can refer to the bosses of blog: www.jianshu.com/p/94ea8bee4…


The HPA is a common elastic scaling mode. It is good at expanding or reducing the capacity of stateless services

Principle of HAP

The HPA is implemented as a control loop whose period is controlled by the — horizontal-pod-Autoscaler-sync-period flag of the controller manager (default is 15 seconds).

During each period, the controller manager queries the resource utilization based on the metrics specified in each HPA definition.

The controller manager gets metrics from either the resource metrics API (for each Pod resource metric) or the custom metrics API (for all other metrics).

  • For each Pod’s resource metric (such as CPU), the controller gets the metric from the resource metric API for each Pod targeted by the HPA. Then, if the target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource requests on the containers in each Pod. If the target raw value is set, the raw measure is used directly.

    The controller then takes the average of the utilization or raw value (depending on the specified target type) of all the target PODS and generates the ratio used to extend the number of copies required.

    Note that if some containers of Pod do not set the relevant resource request, the Pod’s CPU utilization is not defined, and the auto scaler does not take any action on this metric.

  • For custom metrics per Pod, the controller functions similar to resource metrics per Pod, except that it uses raw values instead of utilization values.

  • For object metrics and external metrics, a metric describing the related objects is obtained.

    Compare this indicator with the target value to produce the above ratio. In the Autoscaling /v2beta2API version, you can optionally divide this value by the number of Pods before making comparisons

The resource acquisition

  • Kubelet captures Pod CPU and memory usage every 10 seconds

  • Horizontal Pod Autoscaler checks the Pod index every 15 seconds

  • Every minute, the Metrics Server aggregates these Metrics and exposes them to the K8S API

HPA deployment in actual combat

Install the metrics

Liverpoolfc.tv: github.com/kubernetes-…

Install with one command (if your K8S version is greater than 1.19+)

Kubectl apply - f such as https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlCopy the code

If the version is earlier than 1.19, use the following command (this is the compatible version, supporting 1.8 to 1.21)

Kubectl apply -f at https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yamlCopy the code

Viewing the Deployment Status

kubectl get po -n kube-system | grep metrics
Copy the code

Configure the HPA

Here is an example of springboot-Hive deployed by Deployment in the previous section. The portal is juejin.cn/post/697645…

You can run only one command to configure the HPA. For convenience, configure the HPA on the Rancher interface

If you use commands

kubectl autoscale deployment springboot-hive --cpu-percent=10 --min=1 --max=10
Copy the code

For the convenience of pressure measurement, we set the average CPU utilization to 10

HAP interface

Pressure test

After the HPA configuration, because we are in the development environment, there is no traffic, so the CPU can not be up, here to pressure the POD, pressure the tool used WebBench

WebBench website: github.com/EZLippi/Web…

Pressure measurement command:

T is how long benchmark lasts

C is the number of requests in time

Start 100 clients simultaneously requesting the interface for 60 seconds

webbench -t 60 -c 100 http://k8s-node1:30112/
Copy the code

Soon a bunch of machines were running

Finally, it will not be expanded after starting 10, because we defined 10. After stopping the pressure test, it will need to wait for 5 minutes, and the machine will be slowly recycled and directly become the defined 1


Learnk8s. IO/kubernetes -…

Kubernetes. IO/docs/tasks /…

Kubernetes. IO/docs/tasks /…