Kubeadm etCD check error occurred when adding node

This is the fifth day of my participation in Gwen Challenge

Error keyword

Perform kubeadm join… when

[check-etcd] Checking that the etcd cluster is healthy

error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://10.8.18.105:2379 with maintenance client: context deadline exceeded To see the stack trace of this error execute with –v=5 or higher

I. Problem description

Kubernetes = Kubernetes;

k8s-master01
k8s-master02
k8s-master03

K8s-master02 Master node server kernel and software upgrade operation, so that the first temporarily removed from the cluster, and then upgrade, after completion ready to re-join the Kubernetes cluster, through Kubeadm execution, enter the following command:

kubeadm join k8s-lb:16443 --token j7p6dr.zx0rh80lqn8unpty     --discovery-token-ca-cert-hash sha256:41a1353a03c99f46868294c28f9948bbc2cca957d98eb010435a493112ec7caa     --control-plane --certificate-key 5990f26f91d034a464692c13b31160d6d20df54fd8e3988d560e315c6ddb61aa
Copy the code

During the execution, the following log is displayed indicating that the ETCD monitoring check failed:

. [Control-plane] Creating static Pod manifest for "kube-controlling-manager" MANIFESTS. Go: 150]  the default kube-apiserver authorization-mode is "Node,RBAC"; Using "Node,RBAC" [Control-plane] Creating static Pod manifest for "kube- Scheduler "W0329 00:01:51.373807 19209 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [check-etcd] Checking that the etcd cluster is healthy error execution phase check-etcd: Etcd Cluster is not healthy: Failed to dial endpoint https://10.8.18.105:2379 with maintenance client: context deadline exceeded To see the stack trace of this error execute with --v=5 or higherCopy the code

[error Execution phase check-etcd] [error Execution phase check-etcd] [error execution phase check-etcd] [error execution phase check-etcd] [error execution phase check-etcd] [error execution phase check-etcd] [error execution phase check-etcd]

Second, analyze the problem

1. View the cluster node list

[root@k8s-master01 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION K8S-master01 Ready master 4D20h v1.18.2 K8s-master03 Ready master 4d20h v1.18.2k8S-node01 Ready worker 4d18h v1.18.2Copy the code

You can see that k8S-Master02 is indeed not in the node list

2. Check the Kubeadm configuration

ClusterStatus:
----
apiEndpoints:
  k8s-master01:
    advertiseAddress: 172.20.5.11
    bindPort: 6443
  k8s-master02:
    advertiseAddress: 172.20.5.12
    bindPort: 6443
  k8s-master03:
    advertiseAddress: 172.20.5.13
    bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus
Copy the code

K8s-master02 is stored in the etCD. Kubeadm is configured with k8S-Master02.

3. Analyze problems and solutions

Because the cluster is built using the Kubeadm tool and is mirrored with the master node using etCD, there is an etCD container instance on each master node. When a master node is removed from the ETCD cluster, the etCD member information of the deleted node is still stored in the ETCD cluster list.

Therefore, we need to enter the ETCD to manually delete the ETCD member information.

Third, solve the problem

1. Obtain the Etcd image list

[root@k8s-master01 ~]# kubectl get pods -n kube-system | grep etcd
etcd-k8s-master01                         1/1     Running            4          4d20h
etcd-k8s-master03                         1/1     Running            1          4d20h
Copy the code

2. Enter the Etcd container and delete the node information

Select either of the two ETcds above and enter inside the POD using kubectl tool

[root@k8s-master01 ~]# kubectl exec -it -n kube-system etcd-k8s-master01 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
Copy the code

Once in the container, follow these steps

Export ETCDCTL_API=3 # alias etcdctl='etcdctl -- endPoints =https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt - key = / etc/kubernetes/pki/etcd/server. The key '# # # to check the etcd cluster member list etcdctl a9b6a1341829d62a member list are started, K8s - master03, https://172.20.5.13:2380, https://172.20.5.13:2379, false d1c737a26ea4dd70, started, k8s - master01, https://172.20.5.11:2380, https://172.20.5.11:2379, false fe2d4a2a33304913, started, k8s - master02, https://172.20.5.12:2380, https://172.20.5.12:2379, Member remove fe2d4a2a33304913 member fe2d4a2a33304913 removed from cluster ## exit container # exitCopy the code

3. Try to join the cluster again

Use kubeadm to try to add k8S-master02 to the cluster again. Before executing kubeadm, first enter the k8S-master02 server and run kubeadm clear command:

$ kubeadm reset
Copy the code

Then try joining the Kubernetes cluster:

[check-etcd] Checking that the etcd cluster is healthy[kubelet-start] Downloading configuration for the kubelet from the "Kubelet-config-1.18" ConfigMap in the kube-system namespace[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Starting the kubelet[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [etcd] Announced new etcd member joining to the existing etcd cluster[etcd] Creating static Pod manifest for "etcd"[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40 s {" level ":" warn ", "ts" : "the 2020-12-22 T11: he. 560 + 0800", the "caller" : "clientv3 / retry_interceptor. Go: 61", "MSG" : "retrying of Unary invoker failed ", "target" : "passthrough: / / https://172.20.5.12:2379 /", "attempt" : 0, "error" : "the RPC error: code = DeadlineExceeded desc = context deadline exceeded"}[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace[mark-control-plane] Marking the node k8s-master02 as control-plane by adding the label "node-role.kubernetes.io/master=''"[mark-control-plane] Marking the node k8s-master02 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]This node has joined the cluster and a new control plane  instance was created:* Certificate signing request was sent to apiserver and approval was received.* The Kubelet was informed of the new secure connection details.* Control plane (master) label and taint were applied to the new node.* The Kubernetes control plane instances scaled up.* A new etcd member was added to the local/stacked etcd cluster.To start administering your cluster from this node, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/configRun 'kubectl get nodes' to see this node join the cluster.Copy the code

[root@k8s-master01 ~]# kubectl get nodesNAME           STATUS   ROLES    AGE     VERSIONk8s-master01   Ready    master   4d20h   v1.18.2k8s-master02   Ready    master   7m38s   v1.18.2k8s-master03   Ready    master   4d20h   v1.18.2k8s-node01     Ready    worker   4d18h   v1.18.2
Copy the code