Moment For Technology

Kubernetes Notes (17) - Scheduling policies

Posted on June 23, 2022, 4:58 p.m. by Tracy McGill
Category: The back-end Tag: The back-end kubernetes

This is the 20th day of my participation in the Genwen Challenge

17 Scheduling Policies

The Master mainly runs the control plane components of the cluster, such as Apiserver, Scheduler, and ControlerManager. The Master also relies on storage nodes such as ETCD.

The kubeadm deployed cluster will run the Master's control components as static pods. Essentially, these components are processes running on the Master node that serve the cluster. The Master is not responsible for running the workload.

The node node is responsible for running the workload POD. The user only needs to submit the running task to the Master. The Master does not care which node the workload POD is running on.

17.1 POD Creation Process

The final node on which a user created task should run is determined by the scheduler on the Master node, which allows the user to define its work features. By default, we do not define it. The default scheduler is used.

When we use Kubectl Describe Pods MyApp to view POD information, there is a Events field with information about scheduling results.

Scheduler will select the node that meets the OPERATION requirements of POD from a large number of nodes, and then record the information of the selected node in ETCD. Kubelet always waits for the change of information about this node on apiserver. Kubelet will go to Apiserver to get the configuration list of the change information and create the POD based on the definition of the configuration list.

17.2 Service Creation Process

When a user creates a service, the request is submitted to Apiserver, which writes the manifest file to etCD, The Kube-Proxy on each node then waitchapiserver for changes in service resources. When changes occur, the Kube-Proxy on each node creates iptables/ IPVS rules for the service.

In terms of communication, Kubectl, Kubelet, and Kube-Proxy are all clients of Apiserver. When these components interact with Apiserver, the data format is JSON, and the internal data serialization mode is Protocolbuff.

17.3 Dimensions of Resource Restrictions

  1. Resource requirements: The minimum resource requirements required to run a POD
  2. Resource Limit: Maximum resource limit that a POD can occupy

17.4 Scheduler Scheduling Process

  1. Pre-selection phase: Exclude all nodes that do not meet the requirements for running this POD, such as minimum resource requirements, maximum resource quotas, and whether ports are occupied
  2. Optimization stage: The priority of each node is calculated based on a series of algorithm functions, and the node with the highest score is sorted according to the priority
  3. Select phase: If the preferred phase produces more than one result, a node is randomly selected
  • The field in POD that affects scheduling in kubectl explain Pods.spec
nodeName          # Directly specify the POD running node
nodeSelector      # Select a node according to the tag on the node
Copy the code
  • Other factors that affect scheduling

Node affinity scheduling: displays the nodeSelector field

Affinity between PODS: PODS tend to run with certain pods, for example, in the same equipment room or machine

Anti-affinity between PODS: A POD and a POD tend not to run together. This is called anti-affinity. For example, a POD listens on the same nodeport and has confidential data

Taints: Taints some nodes

Tolerations: A POD tolerates stains on node, and if new stains appear as it runs

Evict POD: Node gives the POD a limited time to leave the node.

17.4 Pre-selection factors

The following preselection conditions must meet all the preselection conditions to pass the preselection

  1. CheckNodeConditionPred
Check whether the node is normalCopy the code
  1. GeneralPredicates
Child policy role
HostName Check whether the hostname is the NodeName specified by pod.spec.hostname
PodFitsHostPorts Check Pod within each container pods. Spec. Containers. Ports. HostPort listing is already occupied by other containers, if it needs hostPort doesn't meet the demand, the Pod can't dispatch on this host
MatchNodeSelector Check that pods.spec.nodeSelector is defined on the POD container to see if node tags match
PodFitsResources Check that Node has enough resources to run the basic requirements of this POD
  1. NoDiskConflict (not enabled by default)
Check whether the storage defined by pod is used on the Node node.Copy the code
  1. PodToleratesNodeTaints
Check whether the node's stain nodes.spec.taints is a subset of pod.spec. tolerations from the POD stain tolerance listCopy the code
  1. PodToleratesNodeNoExecuteTaints
Check whether the POD tolerates NoExecute stains on nodes. What does "NoExecute" mean? NoExecute: If a pod runs on an untainted node and the node is tainted, NoExecute means that the newly tainted node will expel the pod that is running on it. No NoExecute will not expel pods running on the node, indicating acceptance of the fait accompli, which is the default policy.Copy the code
  1. CheckNodeLabelPresence (not enabled by default)
Check the presence of the specified label on the node. If the node has the label specified by pod, the node is selected.Copy the code
  1. CheckServiceAffinity (not enabled by default)
A Service can have multiple pods. For example, if the pods are all running on machines 1, 2, and 3, but not on machines 4, 5, and 6, then CheckServiceAffinity indicates that the newly added pods are all running on machines 1, 2, and 3. The benefit of this centralization is that internal communication between pods within a Service becomes more efficient.Copy the code
  1. MaxEBSVolumeCount
Ensure that the mounted Amazon EBS storage volume does not exceed the maximum value set. The default value is 39Copy the code
  1. MaxGCEPDVolumeCount
Ensure that the number of mounted GCE storage volumes does not exceed the maximum value. The default value is 16Copy the code

10 MaxAzureDiskVolumeCount

Ensure that the number of attached Azure storage volumes does not exceed the maximum value. The default value is 16Copy the code
  1. CheckVolumeBinding
Check whether the PVC on the node is bound to another PODCopy the code
  1. NoVolumeZoneConflict
Check whether volume conflicts exist if POD is deployed on the host based on the given zone (room) limitCopy the code
  1. CheckNodeMemoryPressure
Check whether the memory on the node is under pressureCopy the code
  1. CheckNodeDiskPressure
Check whether the disk I/O pressure is too highCopy the code
  1. CheckNodePIDPressure
Check whether the PID resources on the node are under pressureCopy the code
  1. MatchInterPodAffinity
Check whether the Pod meets affinity or anti-affinity requirementsCopy the code

17.5 Preferred Function

The optimization function is executed at each node, and the result of each optimization function is added, and the one with the highest score wins.

  1. least_requested.go
Select the node with the least consumption (CPU evaluated based on idle ratio (total capacity -sum(used) x 10/ total capacity))Copy the code
  1. balanced_resource_allocation.go
The balanced resource usage means that the CPU and memory usage are similar. The closer the CPU and memory usage are, the higher the score will be. The one with the highest score wins.Copy the code
  1. node_prefer_avoid_pods.go
Whether the node is annotated information "scheduler. Alpha. Kubernetes. IO/preferAvoidPods". The absence of this annotation information indicates that this node is suitable for running the POD.Copy the code
  1. taint_toleration.go
Tolerations and nodes.spec.taints were matched, and the more matched items, the lower the scores.Copy the code
  1. selector_spreading.go
Search for service, StatefulSet, ReplicatSet, etc., which correspond to the current POD object. The fewer pods running on the node with such tags, the higher the score. This means we need to spread pods running on the same tag selector across multiple nodes.Copy the code
  1. interpod_affinity_test.go
Iterate through the POD object affinity items, and add those that can match to the weight of the node, the larger the value of the higher the score, the higher the score wins.Copy the code
  1. most_requested.go
Least_requested: if possible, use up resources on a node first. This is the opposite of least_requested.Copy the code
  1. node_label.go
The score is evaluated based on whether the node has a label, regardless of what the label is.Copy the code
  1. image_locality.go
Represents the selection of nodes based on the sum of the sizes of existing mirrors that meet the requirements of the current POD object.Copy the code
  1. node_affinity.go
According to nodeselector in POD object, the matching degree of nodes is checked. The more nodes can be successfully matched, the higher the score will be.Copy the code

17.6 Selecting functions

When there are more than one preferred node, one of them is randomly selected


Send your notes to: Welcome one button three links

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.