Data disaster recovery (Dr) is one of the most important tasks in cluster maintenance, to avoid catastrophic problems such as error deletion, machine breakdown, virus invasion, service unavailability caused by overwhelming factors, or data loss
In order for the service to run reliably and stably, data backup work needs to be designed together at the beginning of the cluster setup and implemented after the cluster is running
Data to be backed up
- All data of the K8S cluster is stored in etCD, so backing up the K8S cluster is to back up etCD data
- But some of the configuration is on the server, such as the Kubelet configuration file, the CA component certificate of the cluster
- In addition, there are stateful services such as mysql, Mongo and other data
- More than two backup copies
- Storage location Backup of different machines, cross-region or cross-province storage
- Scripts + open source tools
- Scheduled tasks run according to rules
- Recovering a Faulty Cluster
- Repair incomplete data
- Ensure reliable and stable operation of services
Perform regular recovery drills based on data importance or data volume
Periodically check whether backup data is deleted or locked
Periodically check for data size changes
- Use the snapshot function of the ETCD to back up data
- Backup the data using the etcd-backup tool
- Backup the k8s cluster-based etcd using the etcd-backup-restore tool
K8s cluster tool
- Back up or migrate or replicate the K8S cluster using the Velero tool
- Back up the resource list with kube-dump
- Backing up stateful applications using the Stash tool (untested)
Etcd comes with commands
Download an ETCD package and place it on a K8S node, specifying the private key and certificate file
etcdctl --endpoints=https://ip:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \ --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \ snapshot save /backup/etcd-snapshot.db Copy the code
etcdctl snapshot restore /backup/etcd-snapshot.db; mv /default.etcd/member/ /var/lib/etcd/ Copy the code
Etcd /member/. In this case, run the mv command to move it to the mount directory /var/lib/etcd/
Etcd - backup tool
The tool is written in GO language, and the code is clear and concise. Because THE author has been using go writing tool, the source code can be modified if necessary to facilitate the development of more functions, such as backup and upload to the specified server or OSS
Direct source package: github.com/gravitation...
Here we need to compile ourselves, and there is no instruction on how to package and compile, so here we use go Model, okay
go mod init etcd-backup go build Copy the code
etcd-backup backup --etcd-cafile /etc/kubernetes/pki/etcd/ca.crt \ --etcd-certfile /etc/kubernetes/pki/etcd/healthcheck-client.crt \ --etcd-keyfile /etc/kubernetes/pki/etcd/healthcheck-client.key \ --etcd-servers https://ip:2379 Copy the code
etcd-backup restore --etcd-servers https://ip:2379 Copy the code
This tool is divided into community version enterprise version, the function is more limited, because some reasons have not been tested, for interested students can try
Kube - dump tool
The tool is to back up the cluster as a YAML manifest
Install dependencies and tools
# kubectl curl -sLo ~ /.local/bin/kubectl \ https://storage.googleapis.com/kubernetes-release/release/v1.20.2/bin/linux/amd64/kubectl chmod + x ~ / local/bin/kubectl # jq curl - sLo ~ / local/bin/chmod + x ~ jq \ https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64 / local/bin/jq # yq curl - sLo ~ /. Local/bin/yq \ https://github.com/mikefarah/yq/releases/download/v4.5.0/yq_linux_amd64 Chmod +x ~/.local/bin/ yq # download kube-dump curl -lso ~/.local/bin/kube-dump \ https://raw.githubusercontent.com/WoozyMasta/kube-dump/v1.0.6/kube-dump \ chmod + x ~ / local/bin/kube - dumpCopy the code
./kube-dump all -d /bakcupdir/ Copy the code
Fully open source a backup, migration, replication tool, the next focus on actual combat
We operate the portal step by step according to the latest version of the documentation: velero. IO /docs/v1.6/
An overview of the
Velero (formerly Known as Heptio Ark) gives you tools for backing up and restoring Kubernetes cluster resources and persistent volumes.
You can run Velero through a cloud provider or locally.
Back up your cluster and recover if lost.
Migrate cluster resources to other clusters.
Copy your production cluster to the development and test cluster.
The server running on the cluster
The command line client running locally
The working principle of
Each Velero operation -- backup on demand, scheduled backup, recovery -- is a custom resource defined using the Kubernetes Custom Resource Definition (CRD) and stored in the ETCD.
Velero also includes controllers that handle custom resources to perform backups, restores, and all related operations.
You can back up or restore all objects in the cluster, or you can filter objects by type, namespace, and/or label.
Velero is ideal for disaster recovery use cases and for taking snapshots of application state before performing system operations, such as upgrades, on the cluster.
Backup on Demand:
- Upload the tarball of the copied Kubernetes object to the cloud object store or call the API to create a snapshot of the persistent volume
- Data backed up at intervals is executed according to CRONT
- Restore all objects and persistent volumes from previously created backups, or only partially
Set specified TTL (time to live) to delete expired backups
When you run velero backup create test-backup:
The Velero client calls the Kubernetes API server to create a Backup object.
The BackupController notices the new Backup object and validates it.
Start the backup process on BackupController. It collects data to be backed up by querying resources from the API server.
Make the BackupController a call to the object storage service - for example, AWS S3 - to upload backup files.
By default, Velero Backup Create creates a disk snapshot for any persistent volume. You can adjust the snapshot by specifying additional flags. Run velero backup create --help to see the available flags. Snapshots can be disabled with the option --snapshot-volumes=false.
Download the zip package and unzip it to /usr/local/bin
Cp wget https://github.com/vmware-tanzu/velero/releases/download/v1.6.1/velero-v1.6.1-linux-amd64.tar.gz Velero - v1.6.1 - Linux - amd64 velero/usr/local/bin/caseCopy the code
For backup data storage, Velero supports a variety of third-party storage, including object storage from Tencent Cloud and Aliyun, as it allows communities to build their own plug-ins
Minio is used as an example
The Velero package provides a depoyment file for starting minio. The default account password for minio is minio/minio123
Kubectl create -f velero - v1.6.1 - Linux - amd64 / examples/minio / 00 - minio - deployment. YamlCopy the code
Sed -i "/type: /s#ClusterIP#NodePort#" velero-v1.6.1-linux-amd64/examples/minio/00-minio-deployment.yamlCopy the code
The above command execution does not take effect because the Console port is already dynamic
The port is (check the log to determine), because the latest minio mirror made the Console port dynamic, the API port is still 9000
This dynamic port is different every time, you need to check the Minio Pod log to make sure that this is open in order to see the page, the following install must be 9000 port. Because the API is a 9000 port
kubectl expose deployment minio -n velero --type=NodePort --name=minio-nodeport --target-port=40610 Copy the code
However, the UI interface is subject to the port forwarded by 40610
If you feel the command line is not convenient, you can use the Rancher interface to change, find the Minio service to find, editing can be
Create account ciphertext
Vim velero - v1.6.1 - Linux - amd64 / examples/minio/credentials - velero
[default] aws_access_key_id = minio aws_secret_access_key = minio123 Copy the code
Configuration source github.com/vmware-tanz...
velero install \ --provider aws \ --bucket velero \ --secret-file / root/velero - v1.6.1 - Linux - amd64 / examples/minio/credentials - velero \ - use - volume - snapshots = false \ plugins Velero/velero - plugin - for - aws: v1.2.0 \ - backup location -- config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000Copy the code
The installation process will take a while. See figure
View all velero services
kubectl get all -n velero Copy the code
Example of Backup and Restoration
Using springboot-Hive as an example, the portal is juejin.cn/post/697645...
There is only one Deploy and one service in the DevOPS namespace
If you find these examples troublesome, you can use the official examples directory: velero-v1.6.1-linux-amd64/examples/nginx-app/
velero backup create backup-devops-ns --include-namespaces devops Copy the code
Check the schedule
velero backup describe backup-devops-ns Copy the code
View backup data in minio
Delete the DevOps namespace
Kubectl delete ns/devops kubectl get all-n devops #Copy the code
Executing the Restore Command
velero restore create --from-backup backup-devops-ns Copy the code
Check whether resources are restored
kubectl get all -n devops Copy the code
Velero Resource viewing command
Velero get Backup # Velero get schedule # Velero get Restore # Velero get plugins # Uninstall # includes the previously created minioCopy the code
Velero create schedule schedule NAME --schedule="0 1 * * *" Backup is reserved for 72 hours velero create schedule schedule NAME --schedule="0 1 * * *" -- TTL 72h # Backup every 5 hours velero create schedule SCHEDULE NAME -- SCHEDULE ="@every 5h" # Back up the namespace once a day (for example, panshi-qtc-dev) velero create SCHEDULE SCHEDULE NAME --schedule="@every 24h" --include-namespaces panshi-qtc-devCopy the code
Restore to another namespace
velero restore create backup-devops-ns --from-backup backup-devops-ns-20210708174958 --namespace-mappings devops:devops-1 Copy the code
velero restoreRecovery will not overwrite
Existing resources, restore only the current cluster
Non-existent resource. The existing resources cannot be rolled back to the previous version. If rollback is required, delete the existing resources before restoration.
Backup with PV
For kubernetes1.7.2 and above, there is already support for PV backup, from the official 2 examples are exactly the same as you can see
However, you need to specify the parameter when installing Velero: --use-restic
For more details, see velero. IO /docs/v1.6/c...
Cloud vendor object storage
Velero supports a variety of backend storage, including Alibaba Cloud and Tencent Cloud. If you want to back up to the public cloud, please refer to the third-party plug-in support mentioned on the official website: Velero. IO /docs/v1.6/s...