Background:

For the initial 1.15 cluster built by Kuberadm, see: 2020-07-22- Tencent Cloud – SLB – Kubeadm High Availability Cluster Built. The 2019-09-23 – k8s – 1.15.3 – update1.16.0, version 1.16 finally sustained minor version upgrade to 1.16.15 (small version upgrades only write process). The last update was 1.17.17: Kubernetes 1.16.15 was upgraded to 1.17.17. It will continue to upgrade to the latest 1.21 in the future. It’s just that there’s a project being tested online recently. The upgrade part is suspended, and we are going to expand the cluster soon. It’s not too bad that Containerd was used in the 1.20.5 cluster tests. You’ll want to add a work node for containerd. There will be time to gradually replace the modules within the environment. Of course, the main reason for node replacement is that the early Work nodes all adopted Tencent Cloud CVM with 8 cores and 16G memory. At the beginning, the resources could be satisfied. Now, POD resources have gradually increased the request and limit of resources after pressure testing and various tests. Corresponding, resource scheduling optimization node is some oversold OOM problem, ready to add the next 16 core 32G memory CVM node! Of course, the Docker Runtime node of the master node and other Work nodes have not been replaced yet!

Work node basic information:

system ip The kernel
centos8.2 10.0.4.48 4.18

1. Work node initialization:

Centos8 + Kubeadm1.20.5 + Cilium + Hubble environment to complete the system initialization.

1. Change hostname:

hostnamectl set-hostname sh02-node-01

Let me first talk about my cluster and name: all kinds of environment located in Tencent cloud Shanghai district. The online Kubernetes environment is located in Zone 3 of Shanghai under the private network, and the naming rule is k8s-node-0x. The 10.0.4.48 is located in Shanghai District 2. Distinguish the area name…. Let’s just name it sh02-node-0x. In the future, different regions will be directly SH0X to distinguish it. It is necessary to distinguish the next area (too concentrated in the past with Shanghai 3 areas, now it is necessary to disrupt the next area, increase the possibility of some disaster tolerance…. But Tencent cloud network seems to have no use, before the problem is basically a problem… In the future, if the business volume is up, it is still a lot of geographical or cloudy environment.

2. Close swap

swapoff -a
sed -i 's/.*swap.*/#&/' /etc/fstab

3. Close the selinux

setenforce  0 
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/sysconfig/selinux 
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config 
sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/sysconfig/selinux 
sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/selinux/config

4. Turn off the firewall

systemctl disable --now firewalld
chkconfig firewalld off

5. Adjust file open number and other configuration

cat> /etc/security/limits.conf <<EOF
* soft nproc 1000000
* hard nproc 1000000
* soft nofile 1000000
* hard nofile 1000000
* soft  memlock  unlimited
* hard memlock  unlimited
EOF

6. yum update

yum update yum -y install gcc bc gcc-c++ ncurses ncurses-devel cmake elfutils-libelf-devel openssl-devel flex* bison* autoconf automake zlib* fiex* libxml* ncurses-devel libmcrypt* libtool-ltdl-devel* make cmake pcre pcre-devel openssl openssl-devel jemalloc-devel tlc libtool vim unzip wget lrzsz bash-comp* ipvsadm ipset jq sysstat conntrack libseccomp conntrack-tools socat curl wget git conntrack-tools psmisc nfs-utils tree bash-completion conntrack libseccomp net-tools  crontabs sysstat iftop nload strace bind-utils tcpdump htop telnet lsof

7. IPVS added (Centos8 kernel default 4.18. Kernels not including 4.19 use this)

:> /etc/modules-load.d/ipvs.conf module=( ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh br_netfilter ) for kernel_module in ${module[@]}; do /sbin/modinfo -F filename $kernel_module |& grep -qv ERROR && echo $kernel_module >> /etc/modules-load.d/ipvs.conf ||  : done

Kernel 4.19 or greater

:> /etc/modules-load.d/ipvs.conf module=( ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack br_netfilter ) for kernel_module in ${module[@]}; do /sbin/modinfo -F filename $kernel_module |& grep -qv ERROR && echo $kernel_module >> /etc/modules-load.d/ipvs.conf ||  : done

Load the IPVS module

systemctl daemon-reload
systemctl enable --now systemd-modules-load.service

Query if IPVS is loaded

#  lsmod | grep ip_vs
ip_vs_sh               16384  0
ip_vs_wrr              16384  0
ip_vs_rr               16384  0
ip_vs                 172032  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack          172032  6 xt_conntrack,nf_nat,xt_state,ipt_MASQUERADE,xt_CT,ip_vs
nf_defrag_ipv6         20480  4 nf_conntrack,xt_socket,xt_TPROXY,ip_vs
libcrc32c              16384  3 nf_conntrack,nf_nat,ip_vs

8. Optimize system parameters (not necessarily optimal, take what you need)

Note: Well, it’s especially important to turn off IPv6… Anyway, I’m going to get the short end of it.

cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_announce = 2 net.ipv4.ip_forward = 1 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_synack_retries = 2 # requires iptables not to process the bridge's data net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-arptables = 1 net.netfilter.nf_conntrack_max = 2310720 fs.inotify.max_user_watches=89100 fs.may_detach_mounts = 1 fs.file-max = 52706963 fs.nr_open = 52706963 vm.overcommit_memory=1 vm.panic_on_oom=0 vm.swappiness = 0 EOF sysctl --system

9. Containerd installation

dnf install dnf-utils device-mapper-persistent-data lvm2 yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo sudo yum update -y && sudo yum install -y containerd.io Containerd config default > / etc/containerd/config toml # replace containerd default sand_box image, and will be SystemdCgroup set to true. Edit/etc/containerd/config. Toml sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.2" [plugins. "IO containerd. GRPC. V1. Cri". Containerd. The runtimes. Runc. Options] SystemdCgroup = true # $systemctl restart containerd daemon-reload $ systemctl restart containerd

10. Configure CRI client crictl

cat <<EOF > /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

Install Kubeadm(centos8 does not use yum source for centos7)

cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF # to delete the old version, # yum list -- showDuplicates # yum list -- showDuplicates # yum list -- showDuplicates # yum list -- showDuplicates # yum list -- showDuplicates Kubeadm --disableexcludes=kubernetes # Install kubeadm-1.17.17 kubectl-1.17.17 kubelet-1.17.17 # from lower class or lower class Systemctl enable kubelet. Service

12. Modify the Kubelet configuration

vi /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS= --cgroup-driver=systemd --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock

13. Journal log related to avoid repeated log collection, waste of system resources (set according to personal needs)

sed -ri 's/^\$ModLoad imjournal/#&/' /etc/rsyslog.conf
sed -ri 's/^\$IMJournalStateFile/#&/' /etc/rsyslog.conf
sed -ri 's/^#(DefaultLimitCORE)=/\1=100000/' /etc/systemd/system.conf
sed -ri 's/^#(DefaultLimitNOFILE)=/\1=100000/' /etc/systemd/system.conf
sed -ri 's/^#(UseDNS )yes/\1no/' /etc/ssh/sshd_config
journalctl --vacuum-size=200M

2. The master node generates Token and Token-Ca-cert-hash (any control plane node)

[root@k8s-master-01 ~]# kubeadm token create
W0629 13:59:57.505803   16857 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0629 13:59:57.505843   16857 validation.go:28] Cannot validate kubelet config - no validator is available
8nyjtd.xeza5fz4yitj62sx
[root@k8s-master-01 ~]# kubeadm token list
TOKEN                     TTL         EXPIRES                     USAGES                   DESCRIPTION                                                EXTRA GROUPS
8nyjtd.xeza5fz4yitj62sx   23h         2021-06-30T13:59:57+08:00   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token
[root@k8s-master-01 ~]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
56ccafb865957c0692f5737cd8778553910c1049ef238a7781b7a39f5fd3a99a

3. Add the Work node to the cluster

Kubeadm join 10.0.0.36:6443 --token 8nyjtd.xeza5fz4yitj62sx --discovery-token-ca-cert-hash sha256:56ccafb865957c0692f5737cd8778553910c1049ef238a7781b7a39f5fd3a99a



IPv4 forwarding turned on! And, of course, blocking IPv6 (I didn’t change the hostname and optimize the system parameters first, of course). This is not a problem after the previous steps!

The final WORK node is as follows:

4. Verify that sh02-node-01 node has been added to master node

kubectl get nodes -o wide
kubectl describe nodes sh02-node-01



5. Next:

1. Kick the TM-node-002 node out of the cluster

My TM-Node-002 node is a temporary addition of 4 cores with 8GB of memory. Well, first make it non-schedulable and then kick it out of the cluster

[root@k8s-master-01 ~]# kubectl cordon tm-node-002 
node/tm-node-002 cordoned



Test-Ubuntu-01 was omitted only to allow development to connect directly to the Kubernetes cluster network

Then look at the POD distribution of the TM-node-002 node:

kubectl describe node sh02-node-01



2. Reschedule a POD

1. Rescheduling a pod(NaCOS-1 pod)

Nacos POD will be killed and resScheduled (other nodes are allocated more resources, how about the scheduling policy will also be allocated to my newly added sh02-node-01 node?)

[root@k8s-master-01 ~]# kubectl delete pods nacos-1 -n qa pod "nacos-1" deleted [root@k8s-master-01 ~]# kubectl get pods  -n qa -o wide

2. Forgetting of NFS-Client

See that NaCOS-1 is scheduled to the node SH02-node-01. But it didn’t run at first. What’s going on? My StorageClass uses NFS. Sh02-node-01 could not install NFS client, so it could not schedule mount PVC:

[root@sh02-node-01 ~] yum install nfs-*
[root@sh02-node-01 ~] systemctl restart kubelet

Note: I rebooted Kubelet anyway. Since installing the nfs-client plug-in didn’t work at first, restarting kubelet was fine.

3. The iptables problem

kubectl logs -f nacos-01 -n qa

But a second look at the NACOS-1 log still showed an error. A closer look at the iptables problem…. Well, sh02-node-1 turns on iptables

systemctl stop iptables
chkconfig iptables off

Postscript:

1. Verify that Docker and containerd are used together

2. Familiar with using CTR command

3. Keep upgrading to 1.21

4. StorageClass to see the CBS that has time to integrate into Tencent Cloud (verified in other environments)

5. How can I migrate elasticsearch storage quickly? Do I still use the cosine backup? Don’t want to good

6. Of course, eventually containerd will replace docker