Kubernetes cluster scheduling

Introduction to the

Scheduler is a Scheduler for Kubernetes that assigns defined PODS to nodes in the cluster. Sounds pretty simple, but there’s a lot to consider:

Fairness: How to ensure that each node can be allocated resources
Efficient resource utilization: All cluster resources are used to the maximum extent
Efficiency: the scheduling performance is good, can complete the scheduling work for a large number of POD as soon as possible
Flexibility: Allows users to control scheduling logic based on their own needs

Sheduler is run as a separate program. Once started, it holds up the API Server, gets podspec.nodename for empty pods, and creates a binding for each pod indicating which node it should be placed on

Scheduling process

Scheduling is made up of several parts. First, nodes that do not meet conditions are filtered, called predicate. Then the nodes that pass are sorted by priority, this is priority; Finally, the node with the highest priority is selected. If there is an error in any of the intermediate steps, return an error directly

Predicate has a series of algorithms that you can use:

PodFitsResources: Whether the remaining resources on the node are greater than the resources requested by pod
PodFitsHost: if pod specifies NodeName, check whether the NodeName matches NodeName
PodFitsHostPorts: Whether the port already used on the node conflicts with the port applied for by the POD
PodSelectorMatches: Filters out nodes that do not match the label specified by pod
NoDiskConflict: Mounted volumes and volumes specified by pod do not conflict unless they are read-only

If there are no suitable nodes in the predicate process, the POD will remain in the pending state, retrying the schedule until a node meets the condition. After this step, if more than one node meets the criteria, the priorities process continues: the nodes are sorted by priority size

A priority consists of a series of key-value pairs, where the key is the name of the priority item and the value is its weight (the importance of the item). These priority options include:

LeastRequestedPriority: The weight is determined by calculating the CPU and Memory usage. The lower the CPU usage, the higher the weight. In other words, this priority metric favors nodes with a lower percentage of resource usage
BalancedResourceAllocation: node on the closer the CPU and Memory utilization rate, the higher the weight. This should be used with the one above, not alone
ImageLocalityPriority: Tends to favor nodes that already have images to use. The larger the total image size, the higher the weight

All priority items and weights are calculated through the algorithm to get the final result

Custom scheduler

In addition to the kubernetes built-in scheduler, you can also write your own scheduler. A scheduler can be selected for pod scheduling by specifying the name of the scheduler with the spec: SchedulerName parameter. For example, the following pod selects my-scheduler for scheduling instead of the default default-scheduler:

apiVersion: v1
kind: Pod
metadata:
  name: annotation-second-scheduler
  labels: 
    name: multischeduler-example
spec:
  schedulername: my-scheduler
  containers:
  - name: pod-with-second-annotation-container
    image: GCR. IO/google_containers/pause: 2.0
Copy the code

The Node affinity

pod.spec.nodeAfinity

PreferredDuringSchedulingIgnoredDuringExecution: soft strategy
RequiredDuringSchedulingIgnoredDuringExecution: hard strategy

requiredDuringSchedulingIgnoredDuringExecution

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels: 
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - k8s-node02
Copy the code

preferredDuringSchedulingIgnoredDuringExecution

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: source
            operator: In
            values: 
            - qikqiak
Copy the code

A mixture of

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - k8s-node02
      preferredDuringSchedulingIgnoredDuringExecution:
      -  weight: 1
         preference:
           matchExpressions:
           - key: source
             operator: In
             values:
             - qikqiak
Copy the code

Key value operation relation

In: The value of label is In a list
NotIn: the value of label is NotIn a list
Gt: The value of label is greater than a certain value
Lt: The value of label is less than a certain value
Exists: A label Exists
DoesNotExist: a label DoesNotExist

Pod affinity

pod.spec.afinity.podAfinity/podAntiAfinity

PreferredDuringSchedulingIgnoredDuringExecution: soft strategy
RequiredDuringSchedulingIgnoredDuringExecution: hard strategy

apiVersion: v1
kind: Pod
metadata:
  name: pod-3
  labels:
    app: pod-3
spec:
  containers:
  - name: pod-3
    image: myapp:v1
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values: 
          - pod-1
        topologyKey: kubernetes.io/hostname
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values: 
              - pod-2
          topologyKey: kubernetes.io/hostname
Copy the code

The affinity and anti-affinity scheduling policies are compared as follows:

Scheduling policy	Match the label	The operator	Topology Domain Support	Scheduling goal
nodeAfinity	The host	In, NotIn, Exists,DoesNotExist, Gt, Lt	no	Specify the host
podAfinity	POD	In, NotIn, Exists,DoesNotExist	is	POD is in the same topology domain as the specified POD
podAnitAfinity	POD	In, NotIn, Exists,DoesNotExist	is	The POD and the specified POD are not in the same topology domain

Taint and Toleration

Node affinity is a property (preference or hard requirement) of PODS that makes them attracted to a particular class of nodes. Taint, by contrast, enables nodes to reject a particular class of pods

Taint and Toleration work together to prevent pods from being assigned to inappropriate nodes. One or more TAINTs can be applied to each node, which means that pods that do not tolerate these taints will not be accepted by the node. Toleration, if applied to pods, means that these pods can, but are not required to, be scheduled to nodes with a matching TAINT

Stain (Taint)

The composition of Taint

Using the kubectltaint command, you can place a stain on a Node. Once the stain is placed on a Node, it has a mutually exclusive relationship with Pod, allowing Node to reject Pod scheduling, or even eject existing pods from Node

Each stain consists of the following:

key=value:effect
Copy the code

Each stain has a key and value as the label for the stain, where value can be empty, and EFect describes what the stain does. Taintefect currently supports the following three options:

NoSchedule: indicates that K8S will not schedule pods to nodes with this stain
PreferNoSchedule: indicates that K8S will try to avoid scheduling a Pod to a Node with the PreferNoSchedule
NoExecute: indicates that K8S will not dispatch pods to the Node with the stain and expel existing pods from the Node

ⅱ. Setting, viewing and removing stains

#Set the stain
kubectl taint nodes node1 key1=value1:NoSchedule
#In the node specification, look for the Taints field
kubectl describe pod pod-name
#Remove the stain
kubectl taint nodes node1 key1:NoSchedule-
Copy the code

Tolerance (Tolerations)

The Node with the taint set will have a mutually exclusive relationship between taint efect: NoSchedule, PreferNoSchedule, NoExecute, and Pod, and Pod will not be scheduled to Node to some extent. But Toleration can be set up on pods, which means that pods with Toleration will tolerate stains and can be assigned to nodes with stains

pod.spec.tolerations

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"
  tolerationSeconds: 3600
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
- key: "key2"
  operator: "Exists"
  effect: "NoSchedule"
Copy the code

Key, vaule, and efect must be consistent with taint Settings on Node
The value of operator Exists will be ignored
TolerationSeconds describes how long a Pod can remain running on a Pod when it needs to be expelled

ⅰ. If no key value is specified, all tainted keys are tolerated:

tolerations:
- operator: "Exists"
Copy the code

ⅱ. If no EFECT value is specified, all stains are tolerated

tolerations:
- key: "key"
  operator: "Exists"
Copy the code

ⅲ. If multiple masters exist, you can set the following parameters to prevent resource waste

kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule
Copy the code

Specifying a Scheduling Node

ⅰ. Pod.spec.nodeName dispatches pods directly to the specified Node, skipping Scheduler’s scheduling policy and forcing matching

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 7
  template:
    metadata: 
      labels:
        app: myweb
    spec:
      nodeName: k8s-node01
      containers:
      - name: myweb
        image: myapp:v1
        ports:
        - containerPort: 80
Copy the code

ⅱ. Pod.spec.nodeSelector: Nodes are selected by the label-selector mechanism of Kubernetes. The scheduler matches the label policy, and then dispatches the Pod to the target node

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeSelector:
        type: backEndNode1
      containers:
      - name: myweb
        image: 8.5 jre8 harbor/tomcat:
        ports:
        - containerPort: 80
Copy the code

Introduction to the

Scheduling process

Custom scheduler

The Node affinity

Pod affinity

Taint and Toleration

Stain (Taint)

Tolerance (Tolerations)

Specifying a Scheduling Node

Related Posts

Netty Channel source code analysis

Use Python to add a Santa hat to your avatar

4 Ribbon Load Balancing