1 Problems to be solved

If a cluster is allocated to multiple users, you need to use a quota to limit users’ resource usage, such as the number of CPU cores, memory size, and number of GPU cards. Otherwise, resources may be used up by certain users, resulting in unfair resource allocation.

In most cases, the cluster’s native ResourceQuota mechanism works just fine. However, with the expansion of cluster size and the increase of task types, our quota management rules need to be adjusted:

  • ResourceQuotaDesigned for a single cluster, but in practice, often used in development/productionMultiple clustersThe environment.
  • Cluster most tasks through for exampledeployment,mpijobAdvanced Resource ObjectMake the submission we want in the advanced resource objectThe commit phaseYou can judge the quota. butResourceQuotaWhen computing resource requestspodIs granular and therefore cannot meet this requirement.

Based on the above issues, we need to manage quotas ourselves. Kubernetes provides a dynamic access mechanism that allows us to write custom plug-ins to achieve requested access. Our quota management plan starts with this.

2 Principle of Dynamic Cluster Access

The request to enter the K8s cluster, after being received by the API Server, passes through the following sequential execution stages:

  1. Authentication/authentication
  2. Access control (change)
  3. Format validation
  4. Access control (validation)
  5. persistence

Requests are processed in each of the first four stages and are judged to be allowed in turn. After each stage has passed, it can be persisted, that is, stored in the ETCD database, and thus become a successful request. In the change control phase, the Mutating Admission Webhook is called to modify the content in the request. In the validation phase, Valadmission Webhook is called to verify that the request meets certain requirements, and to determine whether the request is allowed or rejected. These Webhooks support extensions that can be independently developed and deployed to clusters.

Although Webhooks can also check and reject requests during the access control (change) phase, the order in which they are invoked is not guaranteed to prevent other Webhooks from making changes to the requested resources. Therefore, we can implement resource quota management by deploying Valadmission Webhook for quota verification and configuring it to be called in the access control (validation) phase to request resource checks.

Three schemes

3.1 How can I Deploy the Verification Service in a Cluster

Using custom Valadmission Webhooks in a K8s cluster requires deployment of:

  1. ValidatingWebhookConfigurationConfiguration (need to enable ValidatingAdmissionWebhook cluster), is used to define what resources to object (pod.deployment.mpijobAnd provide the service callback address for the actual validation. You are advised to configure it in a clusterServiceTo provide the address of the validation service.
  2. The actual processing of the validating service through theValidatingWebhookConfigurationThe configured address is accessible.

In a single-cluster environment, the validation service is deployed in a cluster as a Deployment. In a multi-cluster environment, the following options are available:

  1. Virtual Kubelet, Cluster Federation and other solutions are used to merge multiple clusters into a single cluster, which deforms to a single cluster solution deployment.
  2. Will validate the service todeloymentIs deployed in one or more clusters, but ensure that the network connectivity between the service and each cluster.

It is important to note that in both single-cluster and multi-cluster environments, services that process validation require resource monitoring, which is typically implemented by a single point. So you have to choose.

3.2 How to Verify the Service

3.2.1 Verification service architecture design

3.2.1.1 Basic Components

  • API server: cluster request entry, callvalidating admission webhookTo validate the request
  • API: Access service interface that uses the cluster convention AdmissionReview data structure for requests and returns
  • Quota Usage Service: interface for requesting resource usage
  • Admissions: Access service realization, includingdeploymentmpijobAccess to different resource types
  • Resource Validator: verifies Resource requests with quotas
  • Quota Adapter: Connects to external Quota services for the Validator to query
  • Resource Usage Manager: Maintains Resource usage and determines quotas
  • Informers: Monitors cluster resources using the Watch mechanism provided by the K8sdeploymentmpijobEtc to maintain current resource usage
  • Store: stores resource usage data, which can be connected to service local memory or Redis service
3.2.1.2 Basic Process for determining resource quotas

Take the example of a user creating a Deployment resource:

  1. The user to createdeploymentThe definition of a resource must contain information about an application groupannotation, such asti.cloud.tencent.com/group-id: 1Is used to apply for an application group1If the resource does not contain application group information, reject the resource or submit the resource to the default application group, for example, application group0, etc.).
  2. Request by theAPI serverCollect, because it is configured correctly in the clusterValidatingWebhookConfigurationTherefore, during the validation phase of access control, requests are made for cluster deploymentvalidating admission webhookAPI, using the structure specified by K8sAdmissionReviewRequestAs a request, expectAdmissionReviewResponseStructure as return.
  3. After receiving the request, the quota verification service takes charge of processing itdeploymentresourcesadmissionTo calculate the resources that need to be renewed or released by the request based on whether the action of the request is CREATE or UPDATE.
  4. fromdeploymentspec.template.spec.containers[*].resources.requestsField to extract the resource to request, such ascpu: 2memory: 1GiIn order toapplySaid.
  5. Resource validatorTo find thequota adapterObtaining an Application Group1Quota information, such ascpu: 10memory: 20GiIn order toquotaSaid. In conjunction with the foregoing acquisitionapplytoresource usage managerApply for resources.
  6. Resource usage managerAll the way throughinformerMonitor accessdeploymentResource usage and maintenance instoreIn the.StoreLocal memory can be used with no external dependencies. Or useRedisAs a storage medium, convenient service level expansion.
  7. Resource usage managerreceivedresource validatorThe request can be passedstoreThe application group is found1Resources currently occupied, such ascpu: 8memory: 16GiIn order tousageSaid. Examination revealedapply + usage< =quotaIf the quota is not exceeded, the request is approved and finally returned toAPI server.

The above is the basic process for implementing resource quota check. A few details are worth adding:

  • The API of the verification service must use HTTPS to expose the service.
  • For resource types that are not used, for exampledeployment,mpijobEtc., all need to achieve the correspondingadmissionAs well asinformer
  • There may be different versions of each resource type, for exampledeploymentapps/v1,apps/v1beta1Etc. The processing needs to be compatible with the actual situation of the cluster.
  • UPDATE requests are received based on the resource typepodTo determine whether the existing one needs to be rebuiltpodInstance to correctly calculate the number of resource requests.
  • In addition to K8s native resource types, such ascpuIf customized quota control of resource types, such as GPU types, is required, you need to specify corresponding quotas in the resource requestannotations, such asti.cloud.tencent.com/gpu-type: V100
  • When the Resource Usage Manager determines the usage, application, and quota, problems may occur, such as resource competition, quota verification, but actual resource creation failure. We will explain these two questions in the following paragraphs.

3.2.2 Resource Application Competition

Due to concurrent resource requests:

  1. usageNeed to be able to be updated immediately after a resource request
  2. usageConcurrency control is required for updates to

In step 7, when the Resource Usage Manager verifies the quota, the Resource usage manager queries the Resource usage of the application group. This usage value is up to Date and maintained by Informers, but there is a time lag between when the resource request is validated by Valadmission Webhook and when Informer can observe it. During this process, there may still be resource requests, and the usage value is inaccurate. Therefore, usage needs to be able to be updated immediately after resource requests.

In addition, the update of usage requires concurrent control, for example:

  1. Application of group2quotacpu: 10.usagecpu: 8
  2. Enter two requestsdeployment1deployment2Applying for an application group2, theirapplyA fellowcpu: 2
  3. Need to judge firstdeployment1To calculate theapply + usage = cpu: 10, not more thanquotaValue, sodeployment1The request is allowed through.
  4. usageBe updated tocpu: 10
  5. To determinedeployment2Because ofusageBe updated tocpu: 10, is calculatedapply + usage = cpu: 12, more than thequotaIs not allowed to pass the request.

In the above process, it is easy to find that usage is the key shared variable, which needs to be queried and updated sequentially. Uncontrolled use of both Deployment1 and Deployment2 with Usage as CPU: 8 causes both deployment1 and Deployment2 requests to be passed, effectively exceeding quota limits. As a result, users may occupy more resources than specified in the quota.

Possible solutions:

  • Resource requests are queued and consumed and processed by a single point of service.
  • Variables that will be sharedusageThe critical region of the lock, query and update within the lockusageThe value of the.

3.2.3 Failed to Create resources

Due to resource contention issues, we required the Usage to be able to be updated immediately after resource requests, but this also introduced new problems. After the 4. Access control (validation) phase, the requested resource object enters 5. During the persistence phase, exceptions may occur (for example, other Webhooks reject the request again, or the cluster power failure, ETCD failure, etc.) so that the task is not actually submitted to the cluster database successfully. In this case, we have increased the usage value during the validation phase, so that tasks that do not actually occupy the quota are counted as occupying the quota. In this way, users may not occupy the resources specified in the quota.

To solve this problem, the background service periodically updates the usage value of each application group globally. Thus, if the usage value is increased during the validation phase but the task fails to actually commit to the database, the usage value will eventually be updated to the exact value of the application group’s resource usage within the cluster at that time during the global update.

However, in rare cases, global updates occur at the moment when a resource object creation request that will eventually be successfully persisted to ETCD has been authenticated by WebHook but has not yet been persisted. The existence of such moments still leads to the problem of users exceeding their quotas for global updates. For example, in the previous example, deployment1 updated its Usage value just after a global update happened. At this point, the information for deployment1 is not yet in ETCD, so a global update will re-update usage to the old value, which will cause Dployment2 to pass as well, exceeding the quota limit. But typically, the time from validation to persistence is short. This almost never happens with low frequency global updates. Later, if further requirements arise, more complex solutions can be adopted to circumvent this problem.

3.2.3 nativeResourceQuotaThe way in which

The native quota management ResourceQuota in the K8s cluster adopts a similar solution to the preceding problem of resource application competition and resource creation failure:

Real-time updates to resolve application competition issues

After checking quotas, update resource quotas immediately. K8s system has optimistic lock to ensure concurrent resource control (see THE implementation of checkQuotas in K8s source code) to solve resource competition problems.

CheckQuotas most relevant source code

// now go through and try to issue updates. Things get a little weird here:
// 1. check to see if the quota changed. If not, skip.
// 2. if the quota changed and the update passes, be happy
// 3. if the quota changed and the update fails, add the original to a retry list
var updatedFailedQuotas []corev1.ResourceQuota
var lastErr error
for i := range quotas {
    newQuota := quotas[i]
    // if this quota didn't have its status changed, skip it
    if quota.Equals(originalQuotas[i].Status.Used, newQuota.Status.Used) {
        continue
    }
    iferr := e.quotaAccessor.UpdateQuotaStatus(&newQuota); err ! =nil {
        updatedFailedQuotas = append(updatedFailedQuotas, newQuota)
        lastErr = err
    }
}
Copy the code

The newquot.status. Used field records the resource usage of the quotas. If the resource request for the quota is approved, the Used field will have been added to the amount of new resources requested when the code is run. Then, Equals is called, meaning that if the Used field has not changed, no new resource is requested. . Otherwise, it will run to e.q uotaAccessor UpdateQuotaStatus, immediately to the quota of etcd information. According to the newQuota Status. 2 to update.

Periodic global update resolves the creation failure

Periodic global update of resource usage (see the implementation of Run in K8s source code), solve the problem of possible resource creation failure.

The most relevant source code for Run:

// the timer for how often we do a full recalculation across all quotas
go wait.Until(func(a) { rq.enqueueAll() }, rq.resyncPeriod(), stopCh)
Copy the code

Here, RQ is the self-reference of controller corresponding to ResourceQuota object. This Controller runs a Run loop that continuously controls all ResourceQuota objects. In this loop, enqueueAll is called periodically, that is, all ResourceQuota is queued, and its Used value is changed for global update.


4 reference

  • Controlling Access to the Kubernetes API
  • Dynamic Admission Control
  • A Guide to Kubernetes Admission Controllers
  • In-depth understanding of Kubernetes Admission Webhook
  • Github.com/kubernetes/…
  • Admission Webhooks: Configuration and Debugging Best Practices – Haowei Cai, Google