In today’s article, we will talk about Data Safety and security. Safety and security are two different concepts, although they are often translated as “safety” in Chinese. The former means that I can retrieve the lost data through data backup (such as snapshot) and redundant data storage (such as multiple replicas) in case of hardware or artificial data errors. Security focuses more on whether data can be accessed illegally. In today’s presentation, we will focus on the Snapshot functionality provided by Elastic and how to automatically manage snapshots using the Snapshot lifecycle management provided by Elastic. This feature is available in version 7.6.

Its basic steps are as follows:

  • Register a REPO (for cloud storage, a plugin is required)
  • To create the snapshot
  • Restore a Snapshot (same cluster or different cluster)
  • Use cycle management automation

Snapshot is a backup taken from a running Elasticsearch cluster. You can take a snapshot of the entire cluster, including all its data flows and indexes. You can also take snapshots of only specific data flows or indexes in a cluster.

Snapshot Repository must be registered before a Snapshot can be created.

Snapshots can be stored in local or remote repositories. Remote repositories can reside on Amazon S3, HDFS, Microsoft Azure, Google Cloud Storage, and other platforms supported by the repository plug-in.

Snapshot is an incremental Snapshot: Each Snapshot stores only data that does not belong to earlier snapshots. This allows you to take frequent snapshots with minimal overhead.

You can restore the snapshot to a running cluster, which by default includes all the data flows and indexes in the snapshot. However, you can choose to restore only the cluster state or specific data flows or indexes from the snapshot.

You can use Snapshot Lifecycle Management to automatically create and manage snapshots.

For information on how to create a Snapshot and how to restore a Snapshot using the Restore API, see my previous article “Elasticsearch: Cluster Backup Snapshot and Restore API”. In that tutorial, we need to manually create snapshot and restore each time.

 

To prepare data

In today’s tutorial, we’ll use the index that comes with Kibana. Open the Kibana interface:

Click on the Add data:

Now our sample data is imported into Elasticsearch. In Elasticsearch we will generate an index called KiBANA_sample_datA_logs. In the following sections, we will back up the data for this index.

 

Create a local folder to store the snapshot file

We create the following folder on our computer:

mkdir my_repo
Copy the code
$ cd my_repo/
liuxg:my_repo liuxg$ pwd
/Users/liuxg/my_repo
Copy the code

It shows above that my snapshot will be saved in the /Users/liuxg/my_repo path. This directory should be different for your situation. We want to document this path. This path will be used in the following sections.

 

Create snapshots and restore snapshots using Snapshot cycle management

First, let’s open the Kibana interface:

To create a Snapshot, we must first register a Repository. Click the Register a Repository button above:

Above, we provide a unique repsitory name. We choose Shared file System. We can call it “source-only snapshots” and save 50% or more space. There is a problem, however, when restoring, you need to recreate the reverse index. Click the Next button above:

Above, we filled the location with a path address /Users/liuxg/my_repo that we created earlier.

As mentioned above, we need to set path.repo in Elasticsearch config/ elasticSearch.yml:

After the above changes, we need to restart Elasticsearch. Click the Register button above:

Click on Verify respository above:

It says the connection was successful. Click the Close link.

Let’s go ahead and click on Policies. Let’s create a Policy:

Above that, we fill in the index kibanA_SAMple_datA_logs that we want to back up. Of course we can also back up all data streams and indexes.

Above we want to delete the outdated snapshots after 2 hours and keep a minimum of 10 snapshots and a maximum of 20 snapshots.

Click Create Policy above, so we produce a policy for cycle management:

An overview of the current policy is displayed on the right above. Let’s click the Close link.

On the top right, we can see a Play button. When we click on it, it immediately generates a snapshot. In general, we don’t need to do this because we have a cycle management mechanism in Policy. Snapshots are generated every minute. After a minute, call it “Snapshots” and press “Snapshots”.

Above, we can see that it automatically generates a snapshot every minute.

Delete kibanA_samPLE_datA_logs from kibanA_sample_data_logs;

DELETE kibana_sample_data_logs
Copy the code

So we don’t have this index in our Elasticsearch. Next, we restore it with the restore function. We select one of the snapshots to operate on:

Call the Snapshots and press the recovery button:

For our operation, we can select the different options above. We are not going to Restore global State, although we already clicked on that selection to save global State when we created the Snapshot earlier. Click the Next button:

Click the Next button:

Click Restore the snapshot:

It shows that the index has been successfully restored. Let’s re-examine kibanA_SAMPLE_datA_logs:

GET _cat/indices
Copy the code
Green open. apm-custom-link jh4aRSmHRqGCMbWPy2oOyw 1 000 208b 208b green open. kibana-event-log-7.9.1-000001 IqgkQDbRRmm73sGv1fvBbQ 1 0 1 0 5.5 KB 5.5 KB Green open. kibana_task_manager_1 LiCob6msRuWfHht5cDYpBw 1 0 6 83 72.6 KB 72.6 KB Green open. apm-agent-configuration WS2paOjhQgOVr__USM1ChA 1 0 0 0 208b 208b Green Open KiBANA_SAMPLE_datA_logs MAsGd4g3SMyqLZ2sFDyhSQ 1 0 14074 0 11.2 MB 11.2 MB Green Open. kibanA_1 nJ3OTQThTo2Jg7LZbWGYjg 1 0 115 2 11.1 MB 11.1 MBCopy the code

The index kibanA_SAMPLE_datA_logs already exists.

We get Snapshots again:

We will notice that no new snapshot was created for a period of time between 4:06 and 4:12 because the index kibanA_SAMple_datA_logs was deleted. At the next 4:12, the automatic backup starts again, every minute.

We can check the files in the path we used to register repository:

$ pwd
/Users/liuxg/my_repo
liuxg:my_repo liuxg$ ls
index-68                        meta-yplAJp48QeyEERYgltmIqA.dat
index.latest                    snap--qXeq_vvQAedC1PHHFgMSw.dat
indices                         snap-13q3h1cxQVCOWJXGIjdavg.dat
meta--qXeq_vvQAedC1PHHFgMSw.dat snap-1So-vOVNQVavUAGddTUxfA.dat
meta-13q3h1cxQVCOWJXGIjdavg.dat snap-3NTTWbxCQYa13XsEQTUkaw.dat
meta-1So-vOVNQVavUAGddTUxfA.dat snap-5Bt4J72mRzC4_eeK0MXM_A.dat
meta-3NTTWbxCQYa13XsEQTUkaw.dat snap-6Q2Su9GXQrSjL-31fZfwKw.dat
meta-5Bt4J72mRzC4_eeK0MXM_A.dat snap-7DE5EMHOSQ-kU1ejXaUu8Q.dat
meta-6Q2Su9GXQrSjL-31fZfwKw.dat snap-8PqwnWtWQumqBNWl8BmefQ.dat
meta-7DE5EMHOSQ-kU1ejXaUu8Q.dat snap-9PNAe_zwSWOd8VtqO_1UHQ.dat
meta-8PqwnWtWQumqBNWl8BmefQ.dat snap-9x6iCjTjSgKQzCu7eVQdzQ.dat
meta-9PNAe_zwSWOd8VtqO_1UHQ.dat snap-AlyhL4pbT_ytL5RkqERvNA.dat
meta-9x6iCjTjSgKQzCu7eVQdzQ.dat snap-D1mlS3cXTEGP4uvCdj40Fg.dat
meta-AlyhL4pbT_ytL5RkqERvNA.dat snap-FknMPZnRQzW7uHqiYkCdpg.dat
meta-D1mlS3cXTEGP4uvCdj40Fg.dat snap-FsVxPFsaRJSIaT-d1_rDNQ.dat
meta-FknMPZnRQzW7uHqiYkCdpg.dat snap-GGCMBl2XQRmyUmY_kWrfDQ.dat
meta-FsVxPFsaRJSIaT-d1_rDNQ.dat snap-GMMiP7WuSYSrB1IqoCNAfA.dat
meta-GGCMBl2XQRmyUmY_kWrfDQ.dat snap-JGdGHsVETtyA5effratsCg.dat
meta-GMMiP7WuSYSrB1IqoCNAfA.dat snap-L3wBfEZGR86qlJ6ACf2SLw.dat
meta-JGdGHsVETtyA5effratsCg.dat snap-McgDErqfR3OQ-ELyX1S-xw.dat
meta-L3wBfEZGR86qlJ6ACf2SLw.dat snap-OE1WcJ-JTW2pQDvFKLCyLg.dat
meta-McgDErqfR3OQ-ELyX1S-xw.dat snap-P2pk-rwERtqg4mItguACDQ.dat
meta-OE1WcJ-JTW2pQDvFKLCyLg.dat snap-P3wMjD3GSeGNnyrL9Ju9GA.dat
meta-P2pk-rwERtqg4mItguACDQ.dat snap-PAaMRknpR9SAEktnOLx4Vg.dat
Copy the code

We can see that there are a lot of SNAP files.