Kubernetes is an open source container orchestration engine for automated deployment, scaling, and management of containerized applications. However, not all projects need microsertization, and not all projects need Kubernetes. For example, there is no need for container deployment of management background, scheduled task service, non-distributed database, etc. Kubernetes is more suitable for deployment of distributed microservice applications.

These two days I finished reading Kubernetes source code analysis of the book, this is the author of some key points in the book collation.

(Self-drawn mind map of learning route)

Kubernetes architecture

(Photo credit:KubernetesSource code analysis.KubernetesArchitecture diagram)

Kubernetes system adopts C/S architecture design. The system architecture is divided into two parts: Master and Node. Master is the Server (Master Node), and Node is the Client (working Node).

As the brain of a cluster, the Master Master Node manages all working nodes, schedules the working nodes on which PODS run, and controls all status of the cluster. The Node indicates the cloud virtual server.

Node is responsible for managing containers, monitoring and reporting the running status of all pods running on the Node.

The components running on the Master Master node include KuBE-Apiserver, KuBE-Controller-Manager, and KuBE-Scheduler.

Kube-apiserver is responsible for exposing and serving Kubernetes “resource groups/resource versions/resources” in a RESTful style. All components in the cluster operate on resource objects through the Kube-Apiserver component. The Kube-Apiserver component is also the only core component in the cluster that interacts with the Etcd cluster.

Kube-controller-manager Manages nodes, Pod replicas, services, endpoints, namespaces, and ServiceAccounts in the Kubernetes cluster. Responsible for ensuring that the actual state of the Kubernetes system converges to the desired state, it provides some controllers by default, Examples include DeploymentControllers, StatefulSet controllers, Namespace controllers, and PersistentVolume controllers, Each controller monitors the current state of each resource object in the whole cluster in real time through the interface provided by kube-Apiserver component, and tries to repair the system state to the desired state when the system state changes due to faults.

The Kube-Scheduler component is responsible for finding and running on the appropriate node for a Pod resource object in the Kubernetes cluster. The scheduler schedules only one Pod resource object at a time, and the process of finding a suitable node for each Pod resource object is a scheduling cycle. The scheduler component monitors Pod and Node resource objects throughout the cluster, and when a new Pod resource object is monitored, the scheduling algorithm selects the optimal Node for it.

Components running on the working Node include Kubelet, Kube-Proxy, and Container.

Kubelet receives, processes, and reports tasks delivered by KuBE-Apiserver. The kubelet process registers the Node information with kube-Apiserver when it starts. It is mainly responsible for the creation, modification, monitoring, deletion, expulsion and Pod life cycle management of the Pod resource object on the Node. The Kubelet component implements three open interfaces, namely CRI(Container runtime Interface), CNI(container network interface) and CSI(container storage interface).

As the network proxy on the node, Kube-proxy runs on each Kubernetes node. It monitors kube-Apiserver’s service and endpoint resource changes and configures load balancers such as iptables/ IPVS to provide unified TCP/UDP traffic forwarding and load balancing functions for a group of PODS, but only makes requests to Kubernetes service and its back-end PODS.

Resources concept

In Kubernetes, resources are the core concept, and the entire ecosystem operates around resources. Kubernetes is essentially a resource control system that registers, manages, schedules, and maintains the state of resources.

Kubernetes groups and versifies resources:

  • Group: resource group
  • Version: Resource version
  • ResourceResources:
  • Kind: Resource type (by category)

Resource objects and resource manipulation methods:

  • Resource object (Resource Object) : A resource object contains fields such as resource group, resource version, and resource type.
  • Resource operation method (Verbs) : Each resource has a resource operation method, to achieve theEtcdtheCURDOperation,kubernetesTo support the8The resource operation method iscreate,delete,deletecollection,get,list,patch,update,watch.

Kubernetes supports two types of resource groups: resource groups with and without group names:

  • Resource groups with group names: represented by<group>/<version>/<resource>, e.g.apps/v1/deployments;
  • Resource group without group name: Core resource group, represented by<version>/<resource>, e.g./v1/pods.

The Restful apis provided by Kubernetes use GVR(resource group/resource version/resource) to generate path, as shown in the following table:

PATH resources Resource operation method
/api/v1/configmaps ConfigMap create,delete,deletecollection,get,list,patch,update,watch
/api/v1/pods Pod create,delete,deletecollection,get,list,patch,update,watch
/api/v1/services Service create,delete,deletecollection,get,list,patch,update,watch
.

The path of a resource group with a group name is prefixed with /apis. The path of a resource group without a group name is prefixed with/API. For example, / API /v1/configmaps, v1 indicates the resource version number and configmaps indicates the resource name.

Resources can also have child resources, such as PODS have logs child resources. Kubectl logs [pod] = / API /v1/ PODS /logs

Kubernetes supports eight resource manipulation methods, but not every resource needs to support eight resource manipulation methods. Child resources such as Pods/Logs have only get operations because logs only need to be viewed.

The Kubernetes system supports namespaces. Each Namespace acts as a “virtual cluster”. Different namespaces can be isolated from each other. Namespaces are used to divide different environments, such as the production environment, test environment, and development environment. Namespaces can also be used to divide unrelated projects, such as project A and project B.

Resource object description file definition

Kubernetes resources can be divided into built-in resources and custom resources, which are defined through the resource object description file. A resource object is described by five fields: Group/Version, Kind, MetaData, Spec, and Status.

Using the Service resource description file as an example, the configuration is as follows:

apiVersion: v1
kind: Service
metadata:
  name: test-service
  namespace: default
spec:
  .
Copy the code
  • apiVersion: that is,Group/Version.ServiceIn the core resource group, so there is no resource group name,v1Is the resource version;
  • Kind: Resource type;
  • MetaData: Defines metadata information, such as resource names and namespaces;
  • SpecDescription:ServiceExpected state of;
  • Status: Describes the actual status of a resource object. Hidden and does not need to be configuredKubernetesSystem provision and update.

Pod scheduling

Pod resource objects support priority and preemption mechanisms. When kube-scheduler runs, the scheduler will schedule according to the priority of Pod resource objects. High-priority Pod resource objects are placed at the front of the scheduling queue, and appropriate nodes are first obtained, and then suitable nodes are selected for low-priority Pod resource objects.

When the Pod resource object of higher priority does not find a suitable node, the scheduler will try to preempt the node of the Pod resource object of lower priority. The preemption process is to expel the Pod resource object of lower priority from the node where it is located, so that the Pod resource object of higher priority runs on this node. Low-priority Pod resource objects that have been expelled are re-queued and wait for the appropriate node to be selected again.

By default, if priority is not enabled, all existing Pod resource objects have priority 0. The steps for configuring a priority for Pod resources are as follows:

  • 1And throughPriorityClassResource object description file createdPriorityClassResource object with the following configuration file:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
 name: MainResourceHighPriority
value: 10000
globalDefault: false
description: "highest priority"
Copy the code
  • value: indicates the priority. A higher value indicates a higher priority.
  • globalDefault: Indicates whether the default value is globalPodThis priority is used by default when no priority is specified.
  • 2, modify,PodResource object description file, isPodAssign priority

When configuring a Pod resource through Deployment, you only need to add a configuration called priorityClassName under Spec in the Deployment description file as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-server
  namespace: default
spec:
  replicas: 1
    # configuration pod
    spec:
      containers:
        - name: test-server-pod
          image: test-server:latest
          imagePullPolicy: IfNotPresent
          ports:
            - name: http-port
              containerPort: 8080
          envFrom:
            - configMapRef:
                name: common-config
      serviceAccountName: admin-sa
      priorityClassName: MainResourceHighPriority
Copy the code

Affinity scheduling

Affinity scheduling is also related to scheduling. The Kube-Scheduler automatically selects global or local optimal nodes (i.e., nodes with sufficient hardware resources, low load, etc.) for Pod resource objects. In the production environment, it is generally expected to intervene more in the scheduling of Pod resource objects. For example, assign Pod resource objects that do not depend on GPU hardware resources to nodes that do not have GPU hardware resources, and assign Pod resource objects that depend on GPU hardware resources to nodes that have GPU hardware resources. Developers only need to label these nodes, and then the scheduler can use the labels to schedule Pod resource objects. This scheduling policy is called affinity and anti-affinity scheduling.

  • Affinity (Affinity) : Used to deploy multiple services in the nearest location. For example, two services (such asAD click servicewithIP Query Service)PodResource objects are scheduled to the same node as much as possible to reduce network overhead.
  • Anticompatibility (Anti-Affinity) : allows the transfer of a businessPodMultiple replica instances of resource objects are scheduled to different nodes for high availability, such as for order servicesPODThree replicas are expected and deployed on different nodes.

Pod resource objects currently support two affinity and one anti-affinity:

  • NodeAffinity: Indicates the affinity of a nodePodResource objects are scheduled to specific nodes as neededGPUthePODScheduling to haveGPUOn the node of;
  • PodAffinity:PodResource object affinity, which identifies a resource objectPodResource object scheduling to anotherPodFor example, resource objects are scheduled to the same host, hardware cluster, or equipment room to shorten the network transmission delay.
  • PodAntiAffinity:PodResource object anti-affinity, will be aPodMultiple replica instances of resource objects are scheduled to different nodes, to different hardware clusters, etc., which can reduce risks and improve performancePodAvailability of resource objects.

Built-in scheduling algorithm

By default, kuBE-Scheduler provides two types of scheduling algorithms, namely pre-selected scheduling algorithm and preferred scheduling algorithm.

  • Pre-selected scheduling algorithm: Checks whether nodes are running “to be scheduled”PodIf the condition is met, add it to the list of available nodes;
  • Optimal scheduling algorithm: calculates a final score for each available node,kube-schedulerThe scheduler will schedule the node with the highest score as the best runPodResource object node.

reference

[1] Zheng Dongxu.Kubernetes Source Code Analysis [M]. Publishing House of Electronics Industry: Beijing,2020 [2]Kubernetes official documentation. Kubernetes.io