Moment For Technology

August k8s series 15: cluster backup | more challenges

Posted on Dec. 3, 2022, 9:54 a.m. by 熊雅涵
Category: The back-end Tag: The back-end kubernetes

Data disaster recovery (Dr) is one of the most important tasks in cluster maintenance, to avoid catastrophic problems such as error deletion, machine breakdown, virus invasion, service unavailability caused by overwhelming factors, or data loss

In order for the service to run reliably and stably, data backup work needs to be designed together at the beginning of the cluster setup and implemented after the cluster is running


Data to be backed up

  • All data of the K8S cluster is stored in etCD, so backing up the K8S cluster is to back up etCD data
  • But some of the configuration is on the server, such as the Kubelet configuration file, the CA component certificate of the cluster
  • In addition, there are stateful services such as mysql, Mongo and other data


  • More than two backup copies
  • Storage location Backup of different machines, cross-region or cross-province storage
  • Scripts + open source tools
  • Scheduled tasks run according to rules


  • Recovering a Faulty Cluster
  • Repair incomplete data
  • Ensure reliable and stable operation of services

Disaster events

  • Perform regular recovery drills based on data importance or data volume

  • Periodically check whether backup data is deleted or locked

  • Periodically check for data size changes


Etcd tools

  • Use the snapshot function of the ETCD to back up data
  • Backup the data using the etcd-backup tool


  • Backup the k8s cluster-based etcd using the etcd-backup-restore tool


K8s cluster tool

  • Back up or migrate or replicate the K8S cluster using the Velero tool


  • Back up the resource list with kube-dump


  • Backing up stateful applications using the Stash tool (untested)



Etcd comes with commands

Download an ETCD package and place it on a K8S node, specifying the private key and certificate file

The backup

etcdctl --endpoints=https://ip:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt                  \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key     \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt    \
snapshot save /backup/etcd-snapshot.db
Copy the code


etcdctl snapshot restore /backup/etcd-snapshot.db; mv /default.etcd/member/ /var/lib/etcd/
Copy the code

Etcd /member/. In this case, run the mv command to move it to the mount directory /var/lib/etcd/

Etcd - backup tool

The tool is written in GO language, and the code is clear and concise. Because THE author has been using go writing tool, the source code can be modified if necessary to facilitate the development of more functions, such as backup and upload to the specified server or OSS

Direct source package:

Here we need to compile ourselves, and there is no instruction on how to package and compile, so here we use go Model, okay

go mod init etcd-backup
go build 
Copy the code

The backup

etcd-backup backup --etcd-cafile  /etc/kubernetes/pki/etcd/ca.crt \
--etcd-certfile /etc/kubernetes/pki/etcd/healthcheck-client.crt \
--etcd-keyfile /etc/kubernetes/pki/etcd/healthcheck-client.key \
--etcd-servers https://ip:2379
Copy the code


etcd-backup restore  --etcd-servers https://ip:2379
Copy the code

Stash tools

This tool is divided into community version enterprise version, the function is more limited, because some reasons have not been tested, for interested students can try

Kube - dump tool

The tool is to back up the cluster as a YAML manifest


Install dependencies and tools

# kubectl curl -sLo ~ /.local/bin/kubectl \ chmod + x ~ / local/bin/kubectl # jq curl - sLo ~ / local/bin/chmod + x ~ jq \ / local/bin/jq # yq curl - sLo ~ /. Local/bin/yq \ Chmod +x ~/.local/bin/ yq # download kube-dump curl -lso ~/.local/bin/kube-dump \  \ chmod + x ~ / local/bin/kube - dumpCopy the code

The backup

./kube-dump all -d /bakcupdir/
Copy the code

Velero tools

Fully open source a backup, migration, replication tool, the next focus on actual combat

We operate the portal step by step according to the latest version of the documentation: velero. IO /docs/v1.6/

An overview of the

Velero (formerly Known as Heptio Ark) gives you tools for backing up and restoring Kubernetes cluster resources and persistent volumes.

You can run Velero through a cloud provider or locally.

Velero role:

  • Back up your cluster and recover if lost.

  • Migrate cluster resources to other clusters.

  • Copy your production cluster to the development and test cluster.

Velero components:

  • The server running on the cluster

  • The command line client running locally

The working principle of

Each Velero operation -- backup on demand, scheduled backup, recovery -- is a custom resource defined using the Kubernetes Custom Resource Definition (CRD) and stored in the ETCD.

Velero also includes controllers that handle custom resources to perform backups, restores, and all related operations.

You can back up or restore all objects in the cluster, or you can filter objects by type, namespace, and/or label.

Velero is ideal for disaster recovery use cases and for taking snapshots of application state before performing system operations, such as upgrades, on the cluster.

Backup on Demand:

  • Upload the tarball of the copied Kubernetes object to the cloud object store or call the API to create a snapshot of the persistent volume

Scheduled backup:

  • Data backed up at intervals is executed according to CRONT


  • Restore all objects and persistent volumes from previously created backups, or only partially


  • Set specified TTL (time to live) to delete expired backups

Backup workflow

When you run velero backup create test-backup:

  1. The Velero client calls the Kubernetes API server to create a Backup object.

  2. The BackupController notices the new Backup object and validates it.

  3. Start the backup process on BackupController. It collects data to be backed up by querying resources from the API server.

  4. Make the BackupController a call to the object storage service - for example, AWS S3 - to upload backup files.

By default, Velero Backup Create creates a disk snapshot for any persistent volume. You can adjust the snapshot by specifying additional flags. Run velero backup create --help to see the available flags. Snapshots can be disabled with the option --snapshot-volumes=false.

The installation

Download the zip package and unzip it to /usr/local/bin

Cp wget Velero - v1.6.1 - Linux - amd64 velero/usr/local/bin/caseCopy the code

For backup data storage, Velero supports a variety of third-party storage, including object storage from Tencent Cloud and Aliyun, as it allows communities to build their own plug-ins

Minio is used as an example

The Velero package provides a depoyment file for starting minio. The default account password for minio is minio/minio123

Kubectl create -f velero - v1.6.1 - Linux - amd64 / examples/minio / 00 - minio - deployment. YamlCopy the code

Open ports

Sed -i "/type: /s#ClusterIP#NodePort#" velero-v1.6.1-linux-amd64/examples/minio/00-minio-deployment.yamlCopy the code

The above command execution does not take effect because the Console port is already dynamic

The port is (check the log to determine), because the latest minio mirror made the Console port dynamic, the API port is still 9000

This dynamic port is different every time, you need to check the Minio Pod log to make sure that this is open in order to see the page, the following install must be 9000 port. Because the API is a 9000 port

kubectl expose deployment minio -n velero --type=NodePort --name=minio-nodeport  --target-port=40610
Copy the code

However, the UI interface is subject to the port forwarded by 40610

If you feel the command line is not convenient, you can use the Rancher interface to change, find the Minio service to find, editing can be

Create account ciphertext

Vim velero - v1.6.1 - Linux - amd64 / examples/minio/credentials - velero

aws_access_key_id = minio
aws_secret_access_key = minio123
Copy the code

Deploy veleor

Configuration source

velero install \ --provider aws \ --bucket velero \ --secret-file / root/velero - v1.6.1 - Linux - amd64 / examples/minio/credentials - velero \ - use - volume - snapshots = false \ plugins Velero/velero - plugin - for - aws: v1.2.0 \ - backup location -- config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000Copy the code

The installation process will take a while. See figure

View all velero services

kubectl get all -n velero
Copy the code

Example of Backup and Restoration

The backup

Using springboot-Hive as an example, the portal is

There is only one Deploy and one service in the DevOPS namespace

If you find these examples troublesome, you can use the official examples directory: velero-v1.6.1-linux-amd64/examples/nginx-app/

Create backup

velero backup create backup-devops-ns --include-namespaces devops
Copy the code

Check the schedule

velero backup describe backup-devops-ns
Copy the code

View backup data in minio


Delete the DevOps namespace

Kubectl delete ns/devops kubectl get all-n devops #Copy the code

Executing the Restore Command

velero restore create --from-backup backup-devops-ns
Copy the code

Check whether resources are restored

kubectl get all -n devops
Copy the code

Velero Resource viewing command

Velero get Backup # Velero get schedule # Velero get Restore # Velero get plugins # Uninstall # includes the previously created minioCopy the code

Regular backup

Velero create schedule  schedule NAME --schedule="0 1 * * *" Backup is reserved for 72 hours velero create schedule  schedule NAME --schedule="0 1 * * *" -- TTL 72h # Backup every 5 hours velero create schedule SCHEDULE NAME -- SCHEDULE ="@every 5h" # Back up the namespace once a day (for example, panshi-qtc-dev) velero create SCHEDULE SCHEDULE NAME --schedule="@every 24h" --include-namespaces panshi-qtc-devCopy the code

Restore to another namespace

velero restore create backup-devops-ns --from-backup backup-devops-ns-20210708174958 --namespace-mappings devops:devops-1
Copy the code

Return items

  • velero restoreRecovery will not overwriteExisting resources, restore only the current clusterNon-existent resource. The existing resources cannot be rolled back to the previous version. If rollback is required, delete the existing resources before restoration.

Backup with PV

For kubernetes1.7.2 and above, there is already support for PV backup, from the official 2 examples are exactly the same as you can see

Velero. IO/docs/v1.6 / e...

However, you need to specify the parameter when installing Velero: --use-restic

For more details, see velero. IO /docs/v1.6/c...

Cloud vendor object storage

Velero supports a variety of backend storage, including Alibaba Cloud and Tencent Cloud. If you want to back up to the public cloud, please refer to the third-party plug-in support mentioned on the official website: Velero. IO /docs/v1.6/s...


Velero. IO/docs/v1.6 / b...

Juejin. Cn/post / 686591...

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.