Introduction to the

Scheduler is a Scheduler for Kubernetes that assigns defined PODS to nodes in the cluster. Sounds pretty simple, but there’s a lot to consider:

  • Fairness: How to ensure that each node can be allocated resources
  • Efficient resource utilization: All cluster resources are used to the maximum extent
  • Efficiency: the scheduling performance is good, can complete the scheduling work for a large number of POD as soon as possible
  • Flexibility: Allows users to control scheduling logic based on their own needs

Sheduler is run as a separate program. Once started, it holds up the API Server, gets podspec.nodename for empty pods, and creates a binding for each pod indicating which node it should be placed on

Scheduling process

Scheduling is made up of several parts. First, nodes that do not meet conditions are filtered, called predicate. Then the nodes that pass are sorted by priority, this is priority; Finally, the node with the highest priority is selected. If there is an error in any of the intermediate steps, return an error directly

Predicate has a series of algorithms that you can use:

  • PodFitsResources: Whether the remaining resources on the node are greater than the resources requested by pod
  • PodFitsHost: if pod specifies NodeName, check whether the NodeName matches NodeName
  • PodFitsHostPorts: Whether the port already used on the node conflicts with the port applied for by the POD
  • PodSelectorMatches: Filters out nodes that do not match the label specified by pod
  • NoDiskConflict: Mounted volumes and volumes specified by pod do not conflict unless they are read-only

If there are no suitable nodes in the predicate process, the POD will remain in the pending state, retrying the schedule until a node meets the condition. After this step, if more than one node meets the criteria, the priorities process continues: the nodes are sorted by priority size

A priority consists of a series of key-value pairs, where the key is the name of the priority item and the value is its weight (the importance of the item). These priority options include:

  • LeastRequestedPriority: The weight is determined by calculating the CPU and Memory usage. The lower the CPU usage, the higher the weight. In other words, this priority metric favors nodes with a lower percentage of resource usage
  • BalancedResourceAllocation: node on the closer the CPU and Memory utilization rate, the higher the weight. This should be used with the one above, not alone
  • ImageLocalityPriority: Tends to favor nodes that already have images to use. The larger the total image size, the higher the weight

All priority items and weights are calculated through the algorithm to get the final result

Custom scheduler

In addition to the kubernetes built-in scheduler, you can also write your own scheduler. A scheduler can be selected for pod scheduling by specifying the name of the scheduler with the spec: SchedulerName parameter. For example, the following pod selects my-scheduler for scheduling instead of the default default-scheduler:

apiVersion: v1
kind: Pod
metadata:
  name: annotation-second-scheduler
  labels: 
    name: multischeduler-example
spec:
  schedulername: my-scheduler
  containers:
  - name: pod-with-second-annotation-container
    image: GCR. IO/google_containers/pause: 2.0
Copy the code

The Node affinity

pod.spec.nodeAfinity

  • PreferredDuringSchedulingIgnoredDuringExecution: soft strategy
  • RequiredDuringSchedulingIgnoredDuringExecution: hard strategy

requiredDuringSchedulingIgnoredDuringExecution

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels: 
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - k8s-node02
Copy the code

preferredDuringSchedulingIgnoredDuringExecution

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: source
            operator: In
            values: 
            - qikqiak
Copy the code

A mixture of

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - k8s-node02
      preferredDuringSchedulingIgnoredDuringExecution:
      -  weight: 1
         preference:
           matchExpressions:
           - key: source
             operator: In
             values:
             - qikqiak
Copy the code

Key value operation relation

  • In: The value of label is In a list
  • NotIn: the value of label is NotIn a list
  • Gt: The value of label is greater than a certain value
  • Lt: The value of label is less than a certain value
  • Exists: A label Exists
  • DoesNotExist: a label DoesNotExist

Pod affinity

pod.spec.afinity.podAfinity/podAntiAfinity

  • PreferredDuringSchedulingIgnoredDuringExecution: soft strategy
  • RequiredDuringSchedulingIgnoredDuringExecution: hard strategy
apiVersion: v1
kind: Pod
metadata:
  name: pod-3
  labels:
    app: pod-3
spec:
  containers:
  - name: pod-3
    image: myapp:v1
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values: 
          - pod-1
        topologyKey: kubernetes.io/hostname
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values: 
              - pod-2
          topologyKey: kubernetes.io/hostname
Copy the code

The affinity and anti-affinity scheduling policies are compared as follows:

Scheduling policy Match the label The operator Topology Domain Support Scheduling goal
nodeAfinity The host In, NotIn, Exists,DoesNotExist, Gt, Lt no Specify the host
podAfinity POD In, NotIn, Exists,DoesNotExist is POD is in the same topology domain as the specified POD
podAnitAfinity POD In, NotIn, Exists,DoesNotExist is The POD and the specified POD are not in the same topology domain

Taint and Toleration

Node affinity is a property (preference or hard requirement) of PODS that makes them attracted to a particular class of nodes. Taint, by contrast, enables nodes to reject a particular class of pods

Taint and Toleration work together to prevent pods from being assigned to inappropriate nodes. One or more TAINTs can be applied to each node, which means that pods that do not tolerate these taints will not be accepted by the node. Toleration, if applied to pods, means that these pods can, but are not required to, be scheduled to nodes with a matching TAINT

Stain (Taint)

The composition of Taint

Using the kubectltaint command, you can place a stain on a Node. Once the stain is placed on a Node, it has a mutually exclusive relationship with Pod, allowing Node to reject Pod scheduling, or even eject existing pods from Node

Each stain consists of the following:

key=value:effect
Copy the code

Each stain has a key and value as the label for the stain, where value can be empty, and EFect describes what the stain does. Taintefect currently supports the following three options:

  • NoSchedule: indicates that K8S will not schedule pods to nodes with this stain
  • PreferNoSchedule: indicates that K8S will try to avoid scheduling a Pod to a Node with the PreferNoSchedule
  • NoExecute: indicates that K8S will not dispatch pods to the Node with the stain and expel existing pods from the Node

ⅱ. Setting, viewing and removing stains

#Set the stain
kubectl taint nodes node1 key1=value1:NoSchedule
#In the node specification, look for the Taints field
kubectl describe pod pod-name
#Remove the stain
kubectl taint nodes node1 key1:NoSchedule-
Copy the code

Tolerance (Tolerations)

The Node with the taint set will have a mutually exclusive relationship between taint efect: NoSchedule, PreferNoSchedule, NoExecute, and Pod, and Pod will not be scheduled to Node to some extent. But Toleration can be set up on pods, which means that pods with Toleration will tolerate stains and can be assigned to nodes with stains

pod.spec.tolerations

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"
  tolerationSeconds: 3600
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
- key: "key2"
  operator: "Exists"
  effect: "NoSchedule"
Copy the code
  • Key, vaule, and efect must be consistent with taint Settings on Node
  • The value of operator Exists will be ignored
  • TolerationSeconds describes how long a Pod can remain running on a Pod when it needs to be expelled

ⅰ. If no key value is specified, all tainted keys are tolerated:

tolerations:
- operator: "Exists"
Copy the code

ⅱ. If no EFECT value is specified, all stains are tolerated

tolerations:
- key: "key"
  operator: "Exists"
Copy the code

ⅲ. If multiple masters exist, you can set the following parameters to prevent resource waste

kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule
Copy the code

Specifying a Scheduling Node

ⅰ. Pod.spec.nodeName dispatches pods directly to the specified Node, skipping Scheduler’s scheduling policy and forcing matching

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 7
  template:
    metadata: 
      labels:
        app: myweb
    spec:
      nodeName: k8s-node01
      containers:
      - name: myweb
        image: myapp:v1
        ports:
        - containerPort: 80
Copy the code

ⅱ. Pod.spec.nodeSelector: Nodes are selected by the label-selector mechanism of Kubernetes. The scheduler matches the label policy, and then dispatches the Pod to the target node

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeSelector:
        type: backEndNode1
      containers:
      - name: myweb
        image: 8.5 jre8 harbor/tomcat:
        ports:
        - containerPort: 80
Copy the code