System environment

  • Docker version: 19.3.8
  • Kubeadm version: 1.16.15
  • Kubernetes version: 1.16.15
  • Kubernetes Master Unit: 3

Operation process

  • Query the master node information
[root @ k8s - portal - master1 olami] # kubectl get nodes | grep master k8s - portal - master1 Ready master 367 d v1.16.15 K8s-portal-master2 Ready Master 367d v1.16.15 K8S-portal-Master3 Ready Master 367d v1.16.15Copy the code

The K8S-portal-master1 node is abnormal. After the cluster is reset, the k8S-portal-Master1 node wants to join the cluster again as the master node

  • Failed to join the cluster after the reset. Procedure
[root@k8s-portal-master1 ~]# kubeadm join 10.3.175.168:6443 --token b8MDec.mh10MojlFL4zdqrd --discovery-token-ca-cert-hash sha256:23921aa3bd9d8acd048633613a9174c4d52caf404739a67b71bd55075a52mq56 --control-plane [preflight] Running pre-flight checks [WARNING SystemVerification]: This Docker version is not on the list of validated versions: 20.10.0. Latest validated version: 18.09 [WARNING Hostname]: Hostname "K8S-portal-master1" could not be reached [WARNING Hostname]: Hostname "k8S-portal-master1 ": lookup K8S-portal-master1 on 119.29.29.29.29:53: no such host [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "front-proxy-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [k8S-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [K8S-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [k8s-portal-master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.3.175.165 10.3.175.168 10.3.175.165 10.3.175.166 10.3.175.167 10.3.175.168] [certs] Valid certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf" [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy error execution phase check-etcd: etcd cluster is not healthy: Failed to dial the endpoint https://10.3.175.165:2379 with maintenance client: context deadline exceeded To see the stack trace of this error execute with --v=5 or higherCopy the code

The kubernetes cluster failed to perform an ETCD health check. The kubernetes cluster failed to perform an ETCD health check. The kubernetes cluster failed to perform an ETCD health check

  • Run the kubectl command to view cluster information
[root @ k8s - portal - master2 ~] # kubectl get nodes | grep master k8s - portal - master2 Ready master 367 d v1.16.15 K8s-portal-master3 Ready Master 367D V1.16.15Copy the code

If the K8S-portal-Master1 node is not in the node list, the node fails to be added to the cluster

  • View kubeadm-config information about the cluster
[root@k8s-portal-master2 ~]# kubectl describe configmaps kubeadm-config -n kube-system Name: kubeadm-config Namespace: kube-system Labels: <none> Annotations: <none> Data ==== ClusterConfiguration: ---- apiServer: certSANs: - 10.3.175.165-10.3.175.166-10.3.175.167-10.3.175.168 extraArgs: Authorization-mode: Node,RBAC timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: 10.3.175.168:6443 controllerManager: {} DNS: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: gcr.azk8s.cn/google_containers kind: ClusterConfiguration kubernetesVersion: v1.16.15 Networking: dnsDomain: cluster.local podSubnet: Scheduler: {} ClusterStatus: ---- apiEndpoints: k8S-portal-Master1: AdvertiseAddress: 10.3.175.165 bindPort: 6443 K8S-portal-master2: advertiseAddress: 10.3.175.166 bindPort: 6443 K8S-portal-master3: advertiseAddress: 10.3.175.167 bindPort: 6443 apiVersion: kubeadm.k8s. IO /v1beta2 kind: ClusterStatus Events: <none>Copy the code

The k8S-portal-master1 node information is also stored in kubeadm-config, which indicates that the k8S-portal-master1 node information is stored in etcd. The cluster is set up by kubeadm, and etcd and master nodes are together. So there is an instance of the ETCD container on each Master node. When a master node is removed from the ETCD cluster, the etCD member information of the deleted node is still stored in the ETCD cluster list

  • Obtain etCD cluster node information
[root@k8s-portal-master2 ~]# kubectl get pods -n kube-system | grep etcd
etcd-k8s-portal-master2                      1/1     Running   0          88m
etcd-k8s-portal-master3                      1/1     Running   0          124m
Copy the code
  • Enter etCD and delete etCD member information
[root@k8s-portal-master2 ~]# kubectl exec -it etcd-k8s-portal-master2 sh -n kube-system
# export ETCDCTL_API=3
# alias etcdctl='etcdctl - endpoints = https://127.0.0.1:2379 -- cacert = / etc/kubernetes/pki/etcd/ca. CRT --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
# etcdctl member list81823 df8357bcc71, started, k8s - portal - master3, https://10.3.175.167:2380, https://10.3.175.167:2379, 9 d7d493298ff2c5f, Started, k8s - portal - master1, https://10.3.175.165:2380, https://10.3.175.165:2379 fac8c4b57ce3b0af, started, K8s - portal - master2, https://10.3.175.166:2380, https://10.3.175.166:2379# etcdctl member remove 9d7d493298ff2c5f
Member 9d7d493298ff2c5f removed from cluster bd092b6d7796dffd
# etcdctl member list81823 df8357bcc71, started, k8s - portal - master3, https://10.3.175.167:2380, https://10.3.175.167:2379 fac8c4b57ce3b0af, Started, k8s - portal - master2, https://10.3.175.166:2380, https://10.3.175.166:2379#
# exit
Copy the code
  • Reset the K8S-portal-master1 environment
[root@k8s-portal-master1 ~]# kubeadm reset [reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config-oyaml 'W1215 12:49:00.069254 4316 reset.go:96] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get node name from kubelet config: open /etc/kubernetes/kubelet.conf: no such file or directory [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: Y [preflight] Running pre-flight checks W1215 12:49:02.730590 4316 removeetcdmember. Go :79] [reset] No kubeadm config, using etcd pod spec to get data directory [reset] No etcd config found. Assuming external etcd [reset] Please, manually reset etcd to prevent further issues [reset] Stopping the kubelet service [reset] Unmounting mounted Directories in "/var/lib/kubelet" W1215 12:49:02.733156 4316 CleanupNode. go:99] [reset] Failed to evaluate the value "/var/lib/kubelet" directory. Skipping its unmount and cleanup: lstat /var/lib/kubelet: no such file or directory [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] [reset] Deleting contents of stateful directories: [/etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/cni] The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file.Copy the code
  • Copy related certificates and join the cluster again
[root@k8s-portal-master1 ~]# kubeadm join 10.3.175.168:6443 --token b8MDec.mh10MojlFL4zdqrd --discovery-token-ca-cert-hash sha256:23921aa3bd9d8acd048633613a9174c4d52caf404739a67b71bd55075a52mq56 --control-plane [preflight] Running pre-flight checks [WARNING SystemVerification]: This Docker version is not on the list of validated versions: 20.10.0. Latest validated version: 18.09 [WARNING Hostname]: Hostname "K8S-portal-master1" could not be reached [WARNING Hostname]: Hostname "k8S-portal-master1 ": lookup K8S-portal-master1 on 119.29.29.29.29:53: no such host [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [k8s-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving Cert is signed For DNS names [k8S-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [k8s-portal-master1 Kubernetes kubernetes. Default kubernetes. Default. SVC kubernetes. Default. SVC. Cluster. The local] and IPs [10.96.0.1 10.3.175.165 10.3.175.168 10.3.175.165 10.3.175.166 10.3.175.167 10.3.175.168] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Valid  certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf" [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that The ETCD cluster is healthy [Kubelet-start] Downloading Configuration for the kubelet from the "Kubelet-config-1.16" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Activating the kubelet service [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [etcd] Announced new etcd member joining to the existing etcd cluster [etcd] Creating static Pod manifest for "etcd" [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s {" level ":" warn ", "ts" : "the 2020-12-15 T12:52:08. 222 + 0800", the "caller" : "clientv3 / retry_interceptor. Go: 61", "MSG" : "retrying of unary Invoker failed ", "target" : "passthrough: / / https://10.3.175.165:2379 /", "attempt" : 0, "error" : "the RPC error: code = DeadlineExceeded desc = context deadline exceeded"} [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [mark-control-plane] Marking the node k8s-portal-master1 as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node k8s-portal-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] This node has joined the cluster and a new control plane instance was created: * Certificate signing request was sent to apiserver and approval was received. * The Kubelet was informed of the new secure connection details. * Control plane (master) label and taint were applied to the new node. * The Kubernetes control plane instances scaled up. * A new etcd member was added to the local/stacked etcd cluster. To start administering your cluster from this node, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Run 'kubectl get nodes' to see this node join the cluster.Copy the code
  • Query the master node information
[root @ k8s - portal - master1 olami] # kubectl get nodes | grep master k8s - portal - master1 Ready 2 m v1.16.15 master K8s-portal-master2 Ready Master 367d v1.16.15 K8S-portal-Master3 Ready Master 367d v1.16.15Copy the code

After being added to the cluster, the cluster recovers