Talk about K8S surveillance via HPA/VPA

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

preface

For a Kubernetes cluster, elastic scaling in general should include the following:

Cluster – Autoscale (CA)
Vertical Pod Autoscaler (VPA)
Horizontal – Pod – Autoscaler (HPA)

Elastic scaling depends on cluster monitoring data, such as CPU and memory, etc. This article will introduce its data link and implementation principle, and at the same time explain the monitoring system in K8S, and finally answer some common questions:

What is VPA? What is HPA?
What are the differences between metrics-server, CAdvisor, and kube-state-metrics?
Kubectl top error:
Kubectl top POD and exec see different top after entering POD.

What is VPA? HPA?

The Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) solutions have been developed to address the huge fluctuation in business service load and the gap between actual and estimated resource usage.

In order to make full use of existing cluster resources and optimize the cost, when a service that occupies a large number of resources needs to be expanded, you can first optimize the configuration of the service load’s own resource requirements (the difference between request and actual demand). Only when the resource pool of the cluster cannot meet the actual resource requirements of the load, can the service load’s own resource requirements be optimized. Adjust the total amount of resources in the resource pool to ensure resource availability.

So in general, elastic expansion should include:

Cluster-autoscale: automatic scaling of Cluster capacity (number of nodes). It is related to automatic deployment and depends on elastic scaling of IaaS. It is mainly used for VM container clusters
Vertical Pod Autoscaler: Workload pods automatically scale vertically (resource configuration), such as automatically calculating or adjusting the Pod template limit/request in deployment, depending on the business historical load indicator
Horizontal-pod-autoscaler: Workload pods automatically scale horizontally, such as Automatic Scale Deployment’s Replicas, depending on business real-time load metrics

VPA and HPA are optimized from the point of view of business load

VPA solves the problem of incorrect resource quota (Pod CPU, memory limit/request) evaluation.

HPA is to solve the problem that the business load pressure fluctuates greatly and the number of copies needs to be constantly adjusted manually according to the monitoring alarm.

With HPA, the subsequent changes in the deployment of HPA will not be managed manually, and the HPA Controller will automatically help you adjust dynamically according to the busy situation of the service. Of course, there is a special HPA with a fixed policy: cronHPA, which directly sets the expansion and reduction time and corresponding number of copies according to the FORMAT of CRON. It can be simply understood as timed scaling. This type of HPA does not need to dynamically identify service busyness, and belongs to static HPA, which is applicable to the situation where service traffic changes in a fixed period.

How is HPA implemented?

Since the number of copies of the workload is automatically adjusted according to the workload, the implementation idea of HPA is easy to think of: By monitoring the busy situation, we need to expand the number of copies of the workload when the business is busy; When the business comes down, nature should shrink the number of copies again. Therefore, the key to achieve horizontal expansion and shrinkage is:

How to identify the degree of busyness
What replica is used to adjust the policy

Kubernetes provides a standard metrics interface, through which HPA controller can query the busy indicator metrics data of deployment business associated with any HPA object. The busy metrics of different services can be customized. You only need to define the metrics corresponding to the deployment association in the corresponding HPA.

With the standard metrics query interface, we also need to implement the metrics API server and provide a variety of metrics data. As we know that all core components of K8S communicate with each other through Apiserver, so as an extension of K8S API, Metrics APIserver naturally selected based on the API Aggregation layer, The metrics query request of the HPA Controller is then automatically forwarded through apiserver’s aggregation layer to the real metrics API server on the back end, corresponding to the Promesheus Adapter and Metrics Server shown in the figure below.

Can be simply regarded as the following data link:

You can configure the APIService resource object to use the API aggregation mechanism. Here is the metrics API configuration file:

[root@centos ~]$ kubectl get APIservice v1beta1.metrics.k8s.io
NAME                     SERVICE                       AVAILABLE   AGE
v1beta1.metrics.k8s.io   kube-system/metrics-service   True        32d

[root@centos ~]$ kubectl get APIservice v1beta1.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
Copy the code

As mentioned above, APIService provides an API named v1beta1.metrics.k8s. IO and binds to a Service resource object named metrics-server.

Therefore, access to metrics-server is as follows:

/apis/metrics.k8s.io/v1beta1 ---> metrics-server.kube-system.svc ---> x.x.x.x +---------+ +-----------+ + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | the initiating + -- -- -- -- - > + API Server + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > + Service: The metrics - server + -- -- -- -- -- -- -- -- > + Pod: metrics-server-xxx-xxx | +---------+ +-----------+ +------------------------+ +-----------------------------+Copy the code

With access to metrics-server, HPA, Kubectl top, and so on work just fine.

How does the HPA Controller use metrics data for scaling control, that is, what replica adjustment mechanism does it use?

The HPA Controller is configured periodically (horizontal-pod-Autoscaler-sync-period), as shown on the right side of the figure, to set the metrics types and expected target metrics values in the HPA. In Reconcile each HPA pair, the metrics API in Reconcile retrieves the most recent value of metrics for the HPA in real time (in the case of the current copy service) and compares it with the target expected value.

First of all, according to the size results of the comparison, the direction to expand and shrink is determined: If there is no need to adjust the expansion or shrinkage, the current number of copies should be returned directly; otherwise, the algorithm corresponding to the HPA Metrics target type will be used to calculate the number of target copies in Deployment. Finally, the scale interface of Deployment will be called to adjust the current number of copies. Finally, the final metrics (average) of each POD in the deployment will be basically maintained to the level expected by the user as far as possible.

Note that HPA target metrics is a definite value, not a range.

Monitoring system

metrics-server

When the metric API concept was proposed, a new monitoring system was proposed on the official page, where monitoring resources were divided into two categories:

Core Metrics: Metrics collected from Kubelet, cAdvisor, etc. are provided by metrics-Server for use by dashboards, HPA controllers, etc.
Custom Metrics: Prometheus Adapter provides API custom.metrics. K8s. IO to support any Metrics collected by Prometheus.

The core indicator contains only the CPU and memory of node and pod. Generally speaking, the core indicator is sufficient for HPA. However, if you want to implement HPA based on a custom indicator, such as requesting QPS /5xx errors, you need to use a custom indicator.

At present, the customized indicators in Kubernetes are generally provided by Prometheus, and then aggregated into Apiserver by K8S-Prometheus-Adpater to achieve the same effect as the core indicators (metry-Server).

kubelet

As mentioned above, both Heapster and Metry-Server are only data transfer and aggregation. Both are data obtained by calling KUbelet’S API interface, while the cAdvisor module actually collects indicators in Kubelet code. You can access port 10255 on node (read-only-port) to obtain monitoring data:

Kubelet Summary metrics: 127.0.0.1:10255/metrics
Cadvisor metrics: 127.0.0.1:10255 / metrics/Cadvisor, exposed the container dimension data

cadvisor

Cadvisor not only collects information about all running containers on a machine, including CPU usage, memory usage, network throughput, and file system usage, but also provides a basic query interface and HTTP interface to facilitate data fetching by other components. Integrated in Kubelet as default boot option in K8S.

When cAdvisor gets metrics, it is really just a forwarder, and its data comes from cgroup files.

cgroup

Values in the cgroup file are the ultimate source of monitoring data, such as

The value of mem Usage comes from

/sys/fs/cgroup/memory/docker/[containerId]/memory.usage_in_bytes
If no Limit memory, Limit = machine_mem, or from the/sys/fs/cgroup/memory/docker / [id] / memory. Limit_in_bytes
Memory usage = memory.usage_in_bytes/memory.limit_in_bytes

Answer the questions

1 What is VPA? HPA?

See the beginning of the article

2 What is the difference between kube-server and kube-state-metrics?

Metric-server collects CPU and memory usage metrics from apI-Server and sends them to the storage backend. The current core function is to provide decision metrics for components such as HPA.
Kube-state-metrics focuses on obtaining the latest state of various K8S resources, such as deployment or Daemonset. The reason why kube-state-metrics is not included in the metric-server capability is that kube-state-metrics is not included in the metric-server capability. Because their focus is fundamentally different. Metric-server is essentially a monitoring system that simply captures, formats, and writes existing data to specific storage. Kube-state-metrics takes a snapshot of K8S health in memory and takes new metrics, but he has no ability to export them
To put it another way, kube-state-metrics itself is a source of data for Metric-Server, although it is not currently used.
In addition, monitoring systems such as Prometheus do not use metric-Server data; they collect and integrate metrics themselves (Prometheus includes metric-Server capabilities), Prometheus, however, monitors the monitoring status of metric-Server components, such as the running status of the Metric-Server Pod, and reports reports when appropriate, using Kube-state-metrics.

3 why kubectl top error?

Kubectl top pod -v=10 kubectl top pod -v=10 kubectl top pod -v=10

If the metric-Server is not deployed or the POD is running abnormally, check the POD logs
Error “Not found”, default 1 minute
Check whether kubelet’s 10255 port is open. By default, kubelet uses this read-only port to obtain metrics. You can also add certificates to the metric server configuration and change it to 10250 authentication port

[root@centos ~]$kubectl top node -v=10 I0925 10:28:36.123146 6251 loader.go:375] Config loaded from file / data/home/mervinwang /. Kube/config I0925 10:28:36. 124198 6251 round_trippers. Go: 423] the curl - k - v - XGET - H "Accept: Application /json, */*" -h "user-agent: kubectl-1.16/v1.16.3 (Linux/AMD64) kubernetes/ 9E16566" -H "Authorization: Bearer" 'https://x.x.x.x:19170/api? Timeout = 32 s' I0925 10:28:36. 154052 6251 round_trippers. Go: 443] GET https://x.x.x.x:19170/api? Timeout =32s 200 OK in 29 milliseconds I0925 10:28:36.15497 round_trippers. Go: 497] Response Headers: I0925 10:28:36.154119 6251 round_trippers. Go :452] Content-length: 158 I0925 10:28:36.154181 round_trippers. Go :452] Date: Sat, 25 Sep 2021 02:28:36 GMT I0925 10:28:36.154194 6251 round_trippers. Go :452] AUDit-id: 75ac3ab4-f3F8-4b8f-9abc-701256a805d3 I0925 10:28:36.154252 6251 round_trippers. Go :452] content-type: Application /json I0925 10:28:36.154321 6251 Request. Go :968] Response Body: {" kind ":" APIVersions ", "versions:" [] "v1", "serverAddressByClientCIDRs" : [{" clientCIDR ":" 0.0.0.0/0 ", "serverAddress" : "CLS - o5e B179u.ccs.tencent-cloud.com :60002"}]} I0925 10:28:36.154683 6251 round_trippers. Go :423] curl -k-v-xget-h" Kubectl-1.16 / V1.16.3 (Linux/AMD64) Kubernetes / 9E16566 "-H "Authorization: Bearer" -H "Accept: application/json, * / * "' https://x.x.x.x:19170/apis?timeout=32s' I0925 10:28:36. 233840 6251 round_trippers. Go: 443] the GET https://x.x.x.x:19170/apis?timeout=32s 200 OK in 79 milliseconds I0925 10:28:36. 233883 6251 round_trippers. Go: 449] Response Headers: I0925 10:28:36.233895 6251 round_trippers. Go :452] audit-id: D7b4c46c-3dff-42a8-a64c-8b2423372710 I0925 10:28:36.233912 6251 round_trippers. Go :452] content-type: Application /json I0925 10:28:36.233927 6251 round_trippers. Go :452] Date: Sat, 25 Sep 2021 02:28:36 GMT......Copy the code

Kubectl top pod = kubectl top POD

The top command is the same as the top command, so it can not be directly compared with the top command. In addition, even if you set the limit on pod, the top command can still see the total memory and CPU of the machine, not pod can be allocated

The RSS of a process is all the physical memory used by the process (file_rss+ anon_RSS). Anonymous pages+Mapped apges
Cgroup RSS (Anonymous and Swap cache Memory) does not contain shared memory. Neither contains a file cache.