Contents for this tutorial series (published) :

K8S (01) : Deploying the latest version of K8S cluster based on Ubuntu

Figure K8S (02) : Recognize resource objects in K8S

Figure K8S (03) : Understanding the nature of a Pod from a Pause container

Figure K8S (04) : Understand the third type of container in Pod — init container

K8S (05) : Label and Selector for Scheduling Sharps (Packet Scheduling)

Figure K8S (06) : The Stain and Tolerance of The Dispatch Weapon (Pressure Expulsion)

K8S (07) : Scheduling Sharps affinity and Disaffinity (Service Disaster Recovery)

Assigning a Pod to a node that can satisfy a Pod resource request is called scheduling.

Ideally, there are enough resources in your cluster to allow you to create the pods you want, so you have a reason not to care how much resources your nodes have left, and a reason not to care about the details of K8S scheduling pods.

In fact, your cluster resources are limited, so you need to plan your nodes in order to allocate and utilize them properly.

For example, which machines are high performance machines, which are ordinary machines, and which are dedicated machines, and try to avoid ordinary applications running on high performance machines.

In addition, some applications require multiple copies of the application to be deployed in different domains for high availability purposes.

These can be divided into three parts:

  • Labels and selectors
  • Stains and tolerance
  • Affinities and antiaffinities

The previous article covered tags and selectors. This article will cover stains and tolerance.

1. Understand stains in a popular way

For example, you go to the hospital to see a doctor. After the doctor’s diagnosis and understanding of your condition, he is going to prescribe some medicine for you.

But prescribing also pay attention to the right medicine, the same is a fever, to adults and children to eat the medicine is not the same.

Therefore, in order to prevent abuse, pharmacies will set the applicable scope for different antipyretic drugs (the effect is equivalent to the stain in K8S) :

  • Aspirin: Suitable for adults (as a fever reducer, not for children)
  • Ibuprofen: Suitable for children

Based on the patient’s symptoms, the doctor diagnosed fever and searched the system for fever-reducing drugs (the equivalent of K8S’s scheduling process). He found two fever-reducing drugs, aspirin and ibuprofen.

Given that the patient is a child, aspirin is therefore ruled out in favor of ibuprofen.

That’s where the stain comes in, in this case:

  • Aspirin and ibuprofen are nodes in K8S
  • The scope of application on drugs is the stain on drugs (K8S in Node)
  • And the need to find medicine is Pod in K8S

As you can see, spoilers are from Node’s point of view, disallowing pods that do not tolerate these spoilers.

2. The difference between stains and labels

What’s the difference between a stain and a label? This is the first thing we have to figure out when we study stains.

Because labels and smudges work differently, they are applicable to different scenarios.

The tag is usually used to assign pods to groups, specifying that pods can only be scheduled to nodes in those groups, which is mandatory.

Spoilers are usually used to set nodes as dedicated nodes. Regular pods cannot be dispatched by default, only if you specify a tolerance level for them.

3. Tolerance and stigma

The stain is also hit on Node, which can be understood as the public disclosure of their “shortcomings” (not real shortcomings). If you want to dispatch to my side, please express that you can tolerate my shortcomings, otherwise it will not be able to dispatch.

Use the following command to stain Worker01, and worker02 does not have any stains

kubectl taint nodes worker02 gpu=true:NoSchedule
Copy the code

Normally, the Pod you create cannot be scheduled to Worker02 without special configuration, only to worker01, even if you use nodeSelector

It is possible to create worker01 only if you add the following tolerance (under.spec) to the Pod

tolerations:
- key: "gpu"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"
Copy the code

It’s important to note that this says “maybe,” not “definitely.”

If the standard scheduling is to be performed on the GPU machine, it also needs to comply with the previous 2. How to Label? [^ 1]

  • Tags: The need to implement precision scheduling
  • Stain: Avoid unnecessary waste

The machine has a GPU on it, and the Pod that does not specify the tolerance required for the GPU cannot be dispatched.

If a Pod is created on a GPU node without using the GPU, it is a waste of resources, which is not understood or allowed.

The tolerance of the latter, which means I can schedule to a machine with a GPU, and if I don’t specify this configuration, I can’t schedule to a machine with a GPU.

If you want to remove the original stain, you can add a minus sign at the end of the command above the stain

kubectl taint nodes worker02 gpu=true:NoSchedule-
Copy the code

4. Tolerance configuration

Tolerance, which consists of several key fields:

  • Key: (Mandatory)
  • Value: Specifies the value. If the operator is Equal, value is mandatory. If the operator is Exists, value is omitted
  • Operator: the operator operation, which can be either Exists or Equal (value must be Equal to a match).
  • Effect: indicates the effect. The options are NoSchedule, PreferNoSchedule, and NoExecute

Among them, “Effect” is difficult to understand. To understand “effect”, we need to understand the principle of tolerance and the filtering of stains.

Simply put, a Node can have multiple stains, and a Pod can have multiple tolerance levels.

Kubernetes handles multiple stains and tolerance like a filter: It starts with all the stains on a node and filters out those that have a matching tolerance in the Pod. The effect value of the remaining unfiltered stains determines whether a Pod will be assigned to the node, especially if:

  • If there is at least one effect value in the unfiltered stainNoSchedule, Kubernetes will not assign pods to the node.
  • If there is no effect value in the unfiltered stainNoSchedule, but the effect value isPreferNoScheduleKubernetes will try not to assign a Pod to that node.
  • If there is at least one effect value in the unfiltered stainNoExecute, Kubernetes will not assign Pod to the node (if Pod is not already running on the node), or expel Pod from the node (if Pod is already running on the node).

5. Original use of stains

How are smudges used in native Kubernetes?

On each cluster node of Kubernetes, there is a Kubelet service that monitors resources such as CPU, memory, disk space, and inode of the file system of the cluster node.

When one or more of these resources reach a certain consumption level, Kubelet will actively mark the nodes with one or more smutters with the effect of NoExecute

Such as memory is nervous, he would play on the node. Kubernetes. IO/memory – pressure

For example, node.kubernetes. IO /disk-pressure will be typed if the disk is too tight

For example, node.kubernetes. IO /pid-pressure will be typed if PID is nervous

If there are already some PODS running on the node, and these PODS are not configured with the tolerance of the above three types, Kubelet will start the expulsion process, one by one, until there is no more resource pressure on the node, it will remove the stain and end the expulsion.

And tolerationSeconds are often included, which means how long the Pod can still function after the stain has appeared, or how long the expulsion can be delayed.

In addition to the above stains, there are other common ones

  • node.kubernetes.io/not-ready: The node is not ready. This corresponds to the node stateReadyA value of”False
  • node.kubernetes.io/unreachable: Node The controller cannot access the node. This is equivalent to the value of the node state Ready being “Unknown”.
  • node.kubernetes.io/network-unavailable: The node network is unavailable.
  • node.kubernetes.io/unschedulable: The node cannot be scheduled.

These spotty effects are usually NoSchedule, in case a new Pod is scheduled that doesn’t work.

6. Advanced development of stains

The principle of smudge has been analyzed almost above, in practice, it is widely used to realize the exclusive use of nodes.

However, in order to realize the exclusive use of nodes, it is necessary to have the cooperation of labels and nodeSelector.

So, if you want to schedule pods to dedicated nodes, you add tolerance configuration, and you add nodeSelector configuration.

Is there a way to simplify these two steps into one step?

K8S has the concept of an access controller, which can be understood as a plug-in defined in an API-server component that intercepts API requests and does something when you operate on objects.

Depending on the operation, this type of access plug-in can be divided into two categories:

  • MutatingAdmissionWebhook: You can change the configuration of an object
  • ValidatingAdmissionWebhook: can verify object

The access control process is divided into two stages. In the first phase, the change access controller is run. In the second stage, the verification access controller is run, and some controllers are both change access controllers and verification access controllers.

If any controller at any stage rejects the request, the entire request is immediately rejected and an error is returned to the end user.

And MutatingAdmissionWebhook can be variable object configuration, isn’t that what we need?

We can customize a MutatingAdmissionWebhook when checking that the Pod has the following tolerance

tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"
Copy the code

Automatically add the following selector configuration to Pod, of course, if the original Pod already has this configuration, you can directly overwrite or skip.

nodeSelect:
  gpu: true
Copy the code

Custom access controller, actually not difficult also, Kubernetes itself comes with very much actually access controller, can imitate the, don’t trouble to write, specific code in the SRC/k8s. IO/Kubernetes/plugin/PKG/admission

Note that some access controller, namely is MutatingAdmissionWebhook ValidatingAdmissionWebhook.

Below I pick a Kubernetes own access controller, take you know about the MutatingAdmissionWebhook and how ValidatingAdmissionWebhook work.

7. PodNodeSelector

Create a new namespace named iswbm

kubectl create namespace iswbm
Copy the code

Then use the Kubectl edit command to add the annotation on the Namespacce (or you can specify the corresponding configuration file on apiserver)

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/node-selector: env=test
  name: iswbm
Copy the code

With this annotation, any Pod created under this namespace can only be created on nodes with the env=test tag — that’s part of the MutatingAdmissionWebhook

If Pod nodeSelector and PodNodeSelector after intersection, none of the node to satisfy conditions, will be rejected – this is part of the ValidatingAdmissionWebhook directly

This is done by automatically adding nodeSelector to pods in the namespace via the PodNodeSelector access controller.

Reference Documents:

  1. Stain and tolerance
  2. Access controller