An overview of the

Kubernetes Pod Horizontal Pod Autoscaler (HPA) can automatically scale the number of Pod copies based on CPU utilization, memory utilization, and other custom metrics. To match the overall measurement level of workload services to the target values set by the user. This article will introduce and use the HPA function of Tencent cloud container service TKE to realize Pod automatic horizontal scaling.

Usage scenarios

HPA automatic telescopic features make container services have very flexible adaptive ability, can rapid expansion in the user to set multiple Pod counterpart to manage business load has soared, can also in the case of smaller business load according to the actual situation appropriate shrinkage capacity to save computational resources to other services, the entire process automation without manual intervention, It is suitable for business scenarios with large service fluctuations, large number of services, and frequent expansion and shrinkage, such as e-commerce services, online education, and financial services.

The principle of overview

The Pod horizontal auto-scaling feature is implemented by Kubernetes API resources and controllers. Resource utilization metrics determine the behavior of the controller, and the controller periodically adjusts the number of copies of the service Pod based on Pod resource utilization to match the measurement level of the workload with the target value set by the user. The expansion process and description are as follows:

Note: This feature is currently in beta, and Pod automatic horizontal scaling does not work with objects that cannot be scaled, such as DaemonSet resources.

HPA Controller: controls the HPA expansion logic.

Metrics Aggregator: Metrics Aggregator Typically, the controller will get metrics from a set of aggregation apis (metric.k8s.io, custom.metric.k8s.io, and external.metric.k8s.io). The metrics. K8s. IO API is typically provided by the Metrics Server. The Community Edition provides basic CPU and memory metrics types. Compared to the Community edition, TKE uses custom Metrics Server collection to support a wider range of HPA metrics trigger types. The indicators include CPU, memory, hard disk, network, and GPU metrics. For details, see TKE Automatic Scaling Indicator Description.

Tip: The controller can also get metrics directly from Heapster. However, as of Kubernetes 1.11, the way to get metrics from Heapster has been deprecated.

HPA algorithm to calculate the number of target copies: TKE HPA expansion and shrinkage algorithm please refer to its working principle. For more detailed algorithm, please refer to the algorithm details.

The premise condition

  • You have registered your Tencent cloud account.
  • You have logged in to the Tencent Cloud Container Service Console.
  • A TKE cluster has been created. For details about creating a cluster, see Creating a Cluster.

steps

Step 1: Deploy the test workload

For the example of a Deployment resource type workload, create a single copy of the “hPA-test” workload of the service type WEB service. For the method of creating a Deployment type workload on the TKE console, see Deployment Management. The creation result of this example is as shown in the figure below:

Step 2: Configure the HPA

On the TKE console, bind an HPA configuration for the test workload. For details about how to bind and configure the HPA, see the HPA procedure. This example configures a policy that triggers expansion when the network outbound bandwidth reaches 0.15Mbps (150Kbps).

Step 3: Function verification

Start a temporary Pod in the cluster to test the configured HPA functionality (mock clients) :

kubectl run -it --image alpine hpa-test --restart=Never --rm /bin/sh
Copy the code

Run the following command in a temporary Pod to simulate a large number of requests to the “hpa-test” service for a short period of time to increase the egress traffic bandwidth:

# hpa - test. Default. SVC. Cluster. The local for the service in the cluster domain name, when you need to press Ctrl + C when you stop scripts while true; do wget -q -O - hpa-test.default.svc.cluster.local; doneCopy the code

After executing the simulated request command in the test Pod, by looking at the Pod count monitoring of the workload in the following figure, you can see that the workload expands the number of copies to two at 16:21, which can be inferred that the HPA expansion event has been triggered.

It can be seen from the network egress bandwidth monitoring of workload in the following figure that the network egress bandwidth increases to about 199 Kbps at 16:21 from the right to the left, which has exceeded the target value of network egress bandwidth set by HPA. This further proves that at this time, the HPA expansion algorithm is triggered to expand a copy number to meet the set target value. So the number of copies of the workload becomes two.

Note: THE HPA expansion and reduction algorithm not only controls the expansion and reduction logic by calculating the dimension of the formula, but also measures whether the expansion or reduction is needed in multiple dimensions. For details, please refer to the details of the algorithm, so it may be slightly different from the expected in actual situation.

The following figure shows that the bandwidth at the network egress drops to the position before capacity expansion. According to the HPA logic, the conditions for workload reduction are met at this time.

However, as you can see from the Pod count monitoring of the workload below, the workload only triggers the HPA scaling at 16:30. This is because the HPA scaling algorithm that triggers the HPA scaling has a default tolerance time of 5 minutes to prevent frequent scaling due to short fluctuations in the metric. See Cooling/Delay support for details. It can be seen from the figure below that the number of workload copies was reduced to the originally set number of copies according to the HPA scaling algorithm 5 minutes after the stop command.

When an HPA expansion event occurs on the TKE, the event list of the corresponding HPA instance is displayed, as shown in the following figure. It should be noted that the time in the event notification list is divided into “first occurrence time” and “Last occurrence time”. “First occurrence time” indicates the first occurrence time of the same event, and “Last occurrence Time” indicates the latest occurrence time of the same event. In the event list in the following figure, you can see that the capacity expansion event is at 16:21:03 and the capacity reduction event is at 16.29:42. The time points are consistent with the time points seen by the workload monitoring.

In addition, the workload event list also records the number of copies added and deleted by the workload when HPA occurs. As can be seen from the following figure, the time point of workload expansion and reduction is also consistent with the time point of HPA event list. The time point of increasing the number of copies is 16:21:03, and the time point of reducing the number of copies is 16:29:42.

conclusion

In this example, the HPA function of TKE is demonstrated. The network egress bandwidth metric type defined by TKE is used as the scaling capacity metric of the WORKLOAD HPA. When the actual measurement value of the workload exceeds the measurement target value configured by the HPA, the HPA calculates the appropriate number of copies based on the expansion algorithm to expand the capacity of the workload. Ensure that the workload metrics meet expectations and ensure the healthy and stable operation of the workload; When the actual measurement value is much lower than the measurement target value configured on the HPA, the HPA calculates an appropriate number of copies after the tolerance period to reduce the capacity and release idle resources to improve resource utilization. In addition, the HPA and the workload event list will record corresponding events during the whole process. Make the entire workload scale and shrink traceable.