In order to facilitate everyone to learn Kubernetes system, I organized a Kubernetes learning series of articles, covering the basic knowledge of Kubernetes, installation steps and the related content of the whole Kubernetes system, I believe we read this series, To have a deeper understanding of Kubernetes.

Typically, the scheduler does not have to worry about which nodes a Pod is assigned to. Sometimes, however, we need to specify scheduling restrictions, such as some applications should run on nodes with SSD storage, some applications should run on the same node, and so on.

As of Kubernetes 1.11, the node affinity feature is still in Beta.

nodeSelector

First we plan the tags for nodes, and then when creating the deployment we specify which nodes the Pod will run on by using the nodeSelector tag.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: docker.io/nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd
Copy the code

NodeSelector will be abolished in future versions, and affinity policies are still recommended.

Affinity and anti-affinity

NodeSelector has a relatively simple scheduling mode. Through affinity and anti-affinity configurations, nodeSelector can provide more flexible scheduling policies, which are mainly enhanced in the following aspects:

  • More expression support than just ADD and exact matching
  • You can set soft/preference scheduling policies instead of rigid requirements
  • Scheduling constraints can be implemented using Pod tags, not just Node tags

The affinity feature can be configured in two ways

Node affinity Node affinity

Node Affinity is a new feature introduced after Kubernetes 1.2. Similar to nodeSelector, it allows us to specify constraints on Pod scheduling between nodes. Two modes are supported: RequiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution, can think of the former is must meet, if don’t meet the schedule, The latter type is inclined to meet the requirements. If the requirements are not met, the Node that does not meet the requirements will be scheduled. IgnoreDuringExecution: If the Node label changes during the Pod run, the current Pod is continued if the affinity policy cannot be met.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1               // The value ranges from 1 to 100
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: nginx
    image: docker.io/nginx
Copy the code

In addition to In, the operator of label judgment can also use NotIn, Exists, DoesNotExist, Gt, Lt. If multiple nodeSelectorTerms are specified, it will be scheduled to the corresponding node as long as one of the conditions is satisfied. If multiple matchExpressions are specified, all conditions must be met before the corresponding node is scheduled.

Affinity and anti-affinity between PODS Inter-pod affinity/anti-affinity

This feature, added after Kubernetes 1.4, allows the user to determine the scheduling policy based on the label on the Pod that is already running. The text description is “If one or more pods that meet Y conditions are running on Node X, then this Pod should be running on Node X”. Because Node does not have namespaces, Pod has namespaces. This allows the administrator to specify which namespace the affinity policy applies to during configuration. This can be specified using the topologyKey. Topology refers to a range. It can be a Node, a cabinet, an equipment room, or a region (for example, North America or Asia). The corresponding topology is still the label on the Node.

There are two types

  • RequiredDuringSchedulingIgnoredDuringExecution, rigid requirements, must be an exact match
  • PreferredDuringSchedulingIgnoredDuringExecution, soft requirement
apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: kubernetes.io/hostname
  containers:
  - name: with-pod-affinity
    image: K8s. GCR. IO/pause: 2.0
Copy the code

The judgment operation of tag supports In, NotIn, Exists, DoesNotExist. In principle, a topologyKey can be a valid label for a node, but with some constraints:

  • For affinity and RequiredDuringScheduling anti-affinity, the topologyKey must be specified
  • For the anti-compatibility of RequiredDuringScheduling, LimitPodHardAntiAffinityTopology access control limit topologyKey kubernetes. IO/hostname, you can change or disable remove the constraints
  • For PreferredDuringScheduling of affinity, The empty topologyKey represents kubernetes.io/hostname, Failure – domain. Beta. Kubernetes. IO/zone and failure – domain. Beta. Kubernetes. IO/region combination.
  • The OlogyKey can be set to other keys based on other constraints.

This paper from the servant to CSDN blog, full address, please click: blog.csdn.net/jettery/art…

The affinity policy between pods requires considerable computing resources, which may significantly reduce cluster performance. Therefore, it is not recommended to use the policy in a cluster with more than 100 nodes. The anti-affinity policy between pods requires that all nodes have consistent labels. For example, all nodes in the cluster should have labels matching the topologyKey. If some nodes lack these labels, abnormal behavior may occur.

Common scenario

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis-server
        image: Redis: 3.2 - alpine
Copy the code

In the example above, a deployment with three instances is created, and the anti-affinity policy between pods is adopted to restrict the creation of instances. If there are already instances with the same label on the node, the deployment will not be carried out, avoiding the deployment of multiple instances with the same label on one node.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  selector:
    matchLabels:
      app: web-store
  replicas: 3
  template:
    metadata:
      labels:
        app: web-store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-store
            topologyKey: "kubernetes.io/hostname"
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-app
        image: Nginx: 1.12 - alpine
Copy the code

Create three instances of the Web service, same as the Redis configuration above, first ensure that the two Web will not be deployed to the same node, then apply the Pod affinity policy, preferentially deploy the Web on the node with the Redis service.

The resources

  1. Affinity in Kubernetes
  2. Assigning Pods to Nodes