Kubernetes FAQ summary

How do I remove rc, Deployment, and Service in inconsistent states

In some cases, it is common to find that the Kubectl process is suspended, and then half of the kubectl process is deleted when it gets, while the other process cannot be deleted

[root@k8s-master ~]# kubectl get -f fluentd-elasticsearch/ NAME DESIRED CURRENT READY AGE rc/elasticsearch-logging-v1 0 2 2 15h NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deploy/kibana-logging 0 1 1 1 15h Error from server (NotFound): services "elasticsearch-logging" not found Error from server (NotFound): Daemonsets. Extensions "Fluentd-es-v1.22" not found Error from server (NotFound): Services "Kibana-logging" not foundCopy the code

Remove these deployment,service or rc commands as follows:

kubectl delete deployment kibana-logging -n kube-system --cascade=false kubectl delete deployment kibana-logging -n kube-system --ignore-not-found delete rc elasticsearch-logging-v1 -n kube-system --force now --grace-period=0 How to reset after 1 | 2 won't delete etcdCopy the code

How to reset etCD if I cannot delete it

rm -rf /var/lib/etcd/*
Copy the code

Then reboot the master node.

After reset etcd, you need to reset the network

Etcdctl mk/atomic. IO/network/config '{" network ":" 192.168.0.0/16 "}'Copy the code

Description Failed to start apiserver

Each startup reported the following problems:

start request repeated too quickly for kube-apiserver.service
Copy the code

/var/log/messages. In my case, it was because CA.crt and other files could not be found after ServiceAccount was enabled, which caused startup errors

May 21 07:56:41 k8s-master kube-apiserver: Flag --port has been deprecated, see --insecure-port instead. May 21 07:56:41 k8s-master kube-apiserver: F0521 07:56:41.692480 universal_validation.go:104] Validate server run options failed: unable to load client CA file: open /var/run/kubernetes/ca.crt: no such file or directory May 21 07:56:41 k8s-master systemd: kube-apiserver.service: main process exited, code=exited, status=255/n/a May 21 07:56:41 k8s-master systemd: Failed to start Kubernetes API Server. May 21 07:56:41 k8s-master systemd: Unit kube-apiserver.service entered failed state. May 21 07:56:41 k8s-master systemd: kube-apiserver.service failed. May 21 07:56:41 k8s-master systemd: kube-apiserver.service holdoff time over, scheduling restart. May 21 07:56:41 k8s-master systemd: start request repeated too quickly for kube-apiserver.service May 21 07:56:41 k8s-master systemd: Failed to start Kubernetes API Server.Copy the code

When deploying log components such as FluentD, many problems are caused by enabling the ServiceAccount option and configuring security. Therefore, you need to configure the ServiceAccount after all.

A Permission denied condition occurred. Procedure

Error cannot create /var/log/fluentd.log: Permission denied during fluentd configuration. This is because SElinux security is not enabled. You can set selinux =enforcing to disabled in /etc/selinux/config and then reboot

Serviceaccount-based configuration

First, generate all necessary keys, k8S-master need to be replaced with master host name.

openssl genrsa -out ca.key 2048 openssl req -x509 -new -nodes -key ca.key -subj "/CN=k8s-master" -days 10000 -out ca.crt Openssl genrsa -out server.key 2048 echo subjectAltName=IP:10.254.0.1 > extfile. CNF --all-namespaces |grep 'default'|grep 'kubernetes'|grep '443'|awk '{print $3}' openssl req -new -key server.key -subj "/CN=k8s-master" -out server.csr openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extfile extfile.cnf -out server.crt -days 10000Copy the code

If modify the configuration files of the/etc/kubernetes/apiserver parameters, through the systemctl start kube – apiserver startup fails, an error message is:

Validate server run options failed: unable to load client CA file: open /root/keys/ca.crt: permission denied
Copy the code

However, API Server can be started from the command line

/usr/bin/kube-apiserver --logtostderr=true --v=0 -- etcd-Servers =http://k8s-master:2379 --address=0.0.0.0 --port=8080 - kubelet - port = 10250 - allow - ring = true - service - cluster - IP - range = 10.254.0.0/16 - admission, cache-control = ServiceAccount - insecure - bind - address = 0.0.0.0 - the client - ca - file = / root/keys/ca. The CRT - TLS - cert - file = / root/keys/server. The CRT --tls-private-key-file=/root/keys/server.key --basic-auth-file=/root/keys/basic_auth.csv --secure-port=443 &>> /var/log/kubernetes/kube-apiserver.log &Copy the code

Command line to start controller-manager

/usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://k8s-master:8080 --root-ca-file=/root/keys/ca.crt --service-account-private-key-file=/root/keys/server.key & >>/var/log/kubernetes/kube-controller-manage.log
Copy the code

ETCD does not start – Problem <1>

Etcd is kubernetes cluster zookeeper process, almost all of the service depends on the etcd started, such as flanneld apiserver docker…

The etCD error log is as follows:

May 24 13:39:09 k8s-master systemd: Stopped Flanneld overlay address etcd agent. May 24 13:39:28 k8s-master systemd: Starting Etcd Server... May 24 13:39:28 k8s-master etcd: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://etcd:2379,http://etcd:4001 May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: Shadowed by corresponding Flag May 24 13:39:28 K8S-master ETcd: etcd Version: 3.1.3 May 24 13:39:28 K8S-master etcd: Git SHA: 21fdcc6 May 24 13:39:28 k8S-master etcd: Go Version: go1.7.4 May 24 13:39:28 k8S-master etcd: Go OS/Arch: linux/amd64 May 24 13:39:28 k8s-master etcd: setting maximum number of CPUs to 1, total number of available CPUs is 1 May 24 13:39:28 k8s-master etcd: the server is already initialized as member before, starting as etcd member... May 24 13:39:28 k8s-master etcd: listening for peers on http://localhost:2380 May 24 13:39:28 k8s-master etcd: Listening for client requests on 0.0.0.0:2379 May 24 13:39:28 K8S-master etCD: Listening for client requests on 0.0.0.0:4001 May 24 13:39:28 K8S-master etCD: recovered store from snapshot at index 140014 May 24 13:39:28 k8s-master etcd: name = master May 24 13:39:28 k8s-master etcd: data dir = /var/lib/etcd/default.etcd May 24 13:39:28 k8s-master etcd: member dir = /var/lib/etcd/default.etcd/member May 24 13:39:28 k8s-master etcd: heartbeat = 100ms May 24 13:39:28 k8s-master etcd: election = 1000ms May 24 13:39:28 k8s-master etcd: snapshot count = 10000 May 24 13:39:28 k8s-master etcd: advertise client URLs = http://etcd:2379,http://etcd:4001 May 24 13:39:28 k8s-master etcd: ignored file 0000000000000001-0000000000012700.wal.broken in wal May 24 13:39:29 k8s-master etcd: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 148905 May 24 13:39:29 k8s-master etcd: 8e9e05c52164694d became follower at term 12 May 24 13:39:29 k8s-master etcd: newRaft 8e9e05c52164694d [peers: [8e9e05c52164694d], term: 12, commit: 148905, applied: 140014, lastindex: 148905, lastterm: May 24 13:39:29 K8S-master etCD: Enabled for version 3.1 May 24 13:39:29 K8S-Master etCD: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32 from store May 24 13:39:29 k8s-master Etcd: Set the cluster version to 3.1 from Store May 24 13:39:29 K8S-master ETcd: Starting server... [Version: 3.1.3, Cluster version: 3.1] May 24 13:39:29 K8S-Master ETCD: Raft Save State and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory May 24 13:39:29 k8s-master systemd: etcd.service: main process exited, code=exited, status=1/FAILURE May 24 13:39:29 k8s-master systemd: Failed to start Etcd Server. May 24 13:39:29 k8s-master systemd: Unit etcd.service entered failed state. May 24 13:39:29 k8s-master systemd: etcd.service failed. May 24 13:39:29 k8s-master systemd: etcd.service holdoff time over, scheduling restart.Copy the code

Core statement:

raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory
Copy the code

Go to the relevant directory, delete 0.tmp, and then boot!

ETCD does not start – timeout problem <2>

Background: Three ETCD nodes are deployed. One day, all the three clusters fail. After the restart, the K8S cluster was found to be working normally, but after checking the components, it was found that the ETCD of one node could not be started.

After a probe, it is found that the time is not accurate. Use the following command to adjust the time correctly and restart etcd, but it is still not up, the error is as follows:

Mar 05 14:27:15k8s-node2 etcd[3248]: etcd Version: 3.3.13 Mar 05 14:27:15k8s-node2 etcd[3248]: Git SHA: 98D3084 Mar 05 14:27:15k8S-node2 etCD [3248]: go1.10.8 Mar 05 14:27:15k8S-node2 etCD [3248]: go1.10.8 Mar 05 14:27:15k8S-node2 etCD [3248]: go1.10.8 Mar 05 14:27:15k8S-node2 etCD [3248]: goos /Arch: linux/amd64 Mar 05 14:27:15 k8s-node2 etcd[3248]: setting maximum number of CPUs to 4, total number of available CPUs is 4 Mar 05 14:27:15 k8s-node2 etcd[3248]: the server is already initialized as member before, starting as etcd member ... Mar 05 14:27:15 k8s-node2 etcd[3248]: peerTLS: cert = /opt/etcd/ssl/server.pem, key = /opt/etcd/ssl/server-key.pe m, ca = , trusted-ca = /opt/etcd/ssl/ca.pem, client-cert-auth = false, crl-file = Mar 05 14:27:15 k8s-node2 etcd[3248]: Listening for peers on https://192.168.25.226:2380 Mar 05 14:27:15 k8s - 2 etcd [3248] : The scheme of client URL http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored Key /cert Files. Mar 05 14:27:15 k8s-node2 etcd[3248]: Listening for Client requests on 127.0.0.1:2379 Mar 05 14:27:15k8S-node2 ETcd [3248]: Listening for Client requests on 192.168.25.218:2379 Mar 05 14:27:15k8S-node2 ETcd [3248]: member 9c166b8b7cb6ecb8 has already been bootstrapped Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILURE Mar 05 14:27:15 k8s-node2 systemd[1]: Failed to start Etcd Server. Mar 05 14:27:15 k8s-node2 systemd[1]: Unit etcd.service entered failed state. Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service failed. Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service failed. Mar 05 14:27:15 k8s-node2 systemd[1]: etcd.service holdoff time over, scheduling restart. Mar 05 14:27:15 k8s-node2 systemd[1]: Starting Etcd Server... Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_NAME, but unused: shadowed by correspo nding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corr esponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_LISTEN_PEER_URLS, but unused: shadowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadow ed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS, but unuse d: shadowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_ADVERTISE_CLIENT_URLS, but unused: sha dowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER, but unused: shadowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER_TOKEN, but unused: sha dowed by corresponding flag Mar 05 14:27:15 k8s-node2 etcd[3258]: recognized environment variable ETCD_INITIAL_CLUSTER_STATE, but unused: sha dowed by corresponding flagCopy the code

Solutions:

According to experience, the failure of one ETCD node does not have a big impact on the cluster. At this time, the cluster can be used normally, but the failed ETCD node is not started. The solution is as follows:

Go to the etCD data storage directory for backup

Back up original data:

cd /var/lib/etcd/default.etcd/member/

cp * /data/bak/
Delete all data files in this directory

rm -rf /var/lib/etcd/default.etcd/member/*
Stop the other two ETCD nodes, because the ETCD nodes need to be started together and can be used after they are started successfully.

Master node systemctl stop etcd systemctl restart etcd node1 node systemctl stop etcd systemctl restart etcd node2 node systemctl stop etcd systemctl restart etcd

Configure host trust in CentOS

For each server, run the following command to generate a public key or key for the username for establishing host trust

ssh-keygen -t rsa
Copy the code

You can see the file that generated the public key

Public key transfer, the first need to enter the password, then OK

Ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected] (-p 2222)Copy the code

-p Port -p is not added to the default port. If the port has been changed, -p is added. SSH/authorized_keys file, which records the public keys of other servers that can access this server

Test to see if you can log in

SSH 192.168.199.132 (2222) - pCopy the code

Modify the CentOS host name

hostnamectl set-hostname k8s-master1
Copy the code

Virtualbox provides the copy and paste functions for CentOS

If not installed or output, you can change the update to install and run it again

yum install update
yum update kernel
yum update kernel-devel
yum install kernel-headers
yum install gcc
yum install gcc make
Copy the code

Run the sh vboxLinuxaddition.run command

Delete Pod is always in Terminating state

You can run the following command to force the deletion

kubectl delete pod NAME --grace-period=0 --force
Copy the code

Deleting a namespace is in the Terminating state

You can force the deletion by using the following script

[root@k8s-master1 k8s]# cat delete-ns.sh #! /bin/bash set -e useage(){ echo "useage:" echo " delns.sh NAMESPACE" } if [ $# -lt 1 ]; then useage exit fi NAMESPACE=$1 JSONFILE=${NAMESPACE}.json kubectl get ns "${NAMESPACE}" -o json > "${JSONFILE}" vi "${JSONFILE}" curl -k -H "Content-Type: Application/json "- PUT X - data - binary @" ${JSONFLE} "http://127.0.0.1:8001/api/v1/namespaces/" ${NAMESPACE} "/ finalizeCopy the code

What might happen if a container contains valid CPU/ memory requests and does not specify limits?

Let’s create a corresponding container that only has requests but no limits.

- name: busybox-cnt02
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo hello from cnt02; sleep 10;done"]
    resources:
      requests:
        memory: "100Mi"
        cpu: "100m"
Copy the code

What’s the problem with creating this container?

In fact, there is no problem in the normal environment, but for resource-based PODS, if some containers do not set limits, resources will be seized by other pods, which may cause container application failure. You can use the LimitRange policy to set the POD automatically. The premise is that limitRange rules are configured in advance.

Related Posts

Design pattern synchronize+volatile Double checking mechanism lazy singleton

After a few days of sofa-tracer masturbation, IT dawned on me!

Hadoop YARN Architecture details