When we actually use Elasticsearch, over time we will find it necessary to expand. This may be due to a lack of understanding when we first create the project. We need more primary shards to speed up ingest. So what can we do to turn a large index into more small indexes? The answer is the Split Index API. Its basic usage is as follows:

POST /my-index-000001/_split/split-my-index-000001
{
  "settings": {
    "index.number_of_shards": 2
  }
}
Copy the code

It has two forms of interface:

POST /<index>/_split/<target-index>

PUT /<index>/_split/<target-index>
Copy the code

Before we operate, we have the following prerequisites:

  • The index must be read-only
  • Cluster Health must be green

We can make an index read-only by:

PUT /my_source_index/_settings
{
  "settings": {
    "index.blocks.write": true 
  }
}
Copy the code

The split index API allows you to split an existing index into a new index, where each original primary shard is split into two or more primary shards in the new index.

The number of times an index can be split (and the number of shards each raw shard can be split into) is determined by the index. number_of_ROUTing_shards setting. The number of routing shards specifies the hash space that is internally used to distribute documents with a consistent hash between shards. For example, five SHard indexes with number_of_ROUTing_SHards set to 30 (5 x 2 x 3) can be split into 2 or 3. In other words, it can be split as follows:

  • 5→10→30
  • 5→15→30
  • 5→30

 

Hands-on practice

To prepare data

In today’s tutorial, we’ll use the index that comes with Kibana. Open the Kibana interface:

Click on the Add data:

Now our sample data is imported into Elasticsearch. In Elasticsearch we will generate an index called KiBANA_SAMple_datA_FLIGHTS.

 

Use the Split Index API to split the index

As required, we first make the index read only:

PUT kibana_sample_data_logs/_settings
{
  "blocks.write": true
}
Copy the code

Run the above instructions in Kibana. We can use the following command to query:

GET kibana_sample_data_logs/_settings
Copy the code

The command above says:

{ "kibana_sample_data_logs" : { "settings" : { "index" : { "number_of_shards" : "1", "auto_expand_replicas" : "0-1", "blocks" : { "write" : "true" }, "provided_name" : "kibana_sample_data_logs", "creation_date" : "1602127165075", "number_of_replicas" : "0", "uuid" : "oyJwGhQATvCil2rWAC6nqg", "version" : { "created" : "7080099"}}}}}Copy the code

The above shows that blocks. Write is true, meaning that we cannot write any more data to the index. We can also see from the above that our current index has only one primary shard. You can see this from number_of_shards being 1.

We typed the following directive in Kibana:

POST kibana_sample_data_logs/_split/kibana_sample_data_logs_split
{
  "settings": {
    "index.number_of_shards": 2
  }
}
Copy the code

Above, we split kibanA_sample_data_logs into 2 ㊗️ shards, and our target index name is kibana_sample_data_logs_split. The return value of the above command is:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "kibana_sample_data_logs_split"
}
Copy the code

To view the process, run the following command:

GET _cat/recovery/kibana_sample_data_logs_split
Copy the code

The results shown above are:

index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered Translog_ops_percent kibanA_sample_data_logs_split 0 484ms local_shards done N /a n/a 127.0.0.1 node1 N /a N /a 00 100.0% 15 00 100.0% 11573815 00 100.0% kibanA_sample_data_logs_split 1 503ms local_shards done N /a n/a 127.0.0.1 node1 N /a N /a 00 100.0% 15 0 100.0% 11573815 00 100.0%Copy the code

It shows 100% complete.

We can compare the number of indexes by the following method:

GET kibana_sample_data_logs/_count
Copy the code

It shows:

{
  "count" : 14074,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}
Copy the code

We look at the kibana_sample_data_logs_split setting as follows:

GET kibana_sample_data_logs_split/_settings
Copy the code
{
  "kibana_sample_data_logs_split" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "initial_recovery" : {
              "_id" : null
            }
          }
        },
        "number_of_shards" : "2",
        "routing_partition_size" : "1",
        "auto_expand_replicas" : "0-1",
        "blocks" : {
          "write" : "true"
        },
        "provided_name" : "kibana_sample_data_logs_split",
        "resize" : {
          "source" : {
            "name" : "kibana_sample_data_logs",
            "uuid" : "oyJwGhQATvCil2rWAC6nqg"
          }
        },
        "creation_date" : "1602127847864",
        "number_of_replicas" : "0",
        "uuid" : "rhs6k0P6QNudSVO1MauQZA",
        "version" : {
          "created" : "7080099",
          "upgraded" : "7080099"
        }
      }
    }
  }
}
Copy the code

Number_of_shards is 2, indicating that there are two primary shards. We can use the following command to check the number of documents:

GET kibana_sample_data_logs_split/_count
Copy the code

The command above shows:

{
  "count" : 14074,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  }
}
Copy the code
GET _cat/shards/kibana_sample_data_logs_split? vCopy the code

The command above shows:

Index shard prirep State Docs Store IP node Kibana_sample_data_logs_split 1 P STARTED 7076 4.8 MB 127.0.0.1 node1 Kibana_sample_data_logs_split 0 P STARTED 6998 4.7 MB 127.0.0.1 node1Copy the code

We can see that there are two primary shards.

If you want to convert the above split index back into one large index, please read my other article “Shrink Elasticsearch Index by Reducing the number of Shards via the Shrink API.”

reference

【 1 】 github.com/dadoonet/de…