Adjust the size of the Elasticsearch index by using the Shrink API to use fewer primary fragments. In Elasticsearch, each index contains multiple shards, and each shard in the Elasticsearch cluster helps with CPU, memory, file descriptors, etc. This definitely contributes to parallel processing performance. In the case of time series data, you would do a lot of reading and writing to an index with the current date.

If the index drops requests and only reads from the index from time to time, then we don’t need as many shards, and if we have multiple indexes, they can build up and take up a lot of computing power.

In cases where you want to reduce the index size, you can use the Shrink API to reduce the number of master shards.

This process is the opposite of my previous article “Split Index API – Splitting a large index into more shards.” In that article, we split one index into multiple indexes.

The Shrink API is introduced

The Shrink Index API allows you to Shrink an existing index into a new index with fewer primary shards. The number of primary shards requested in the target index must be a fraction of the number of shards in the source index. For example, an index with 8 primary shards can be reduced to 4, 2, or 1 primary shard, or an index with 15 primary shards can be reduced to 5, 3, or 1. If the number of shards in the index is prime, it can be reduced to only one primary shard. Before we can shrink, we must meet the following conditions:

  • Each shard’s (primary copy or duplicate copy) in the index must exist on the same node.
  • The index must be read-only
  • The health status of the index must be green. See Health Status

In a nutshell, we can do this with the following command:

PUT /my_source_index/_settings { "settings": { "index.number_of_replicas": 0, (1) "index. Routing. Allocation. The require. _name" : "shrink_node_name", (2) "index. Blocks. Write" : true}} (3)Copy the code

In the above:

  1. Delete all backups of the index
  2. Reassign index shards to a node called shrink_node_name. See “Using Shard Filtering to Control Index Allocation to which node” for details on how to allocate master and sub-shards to the same node, or link to index-level Shard allocation filtering.
  3. Do not write to this index. Metadata changes are still allowed, such as dropping indexes.

The Shrink step

Create the target index with the same definition as the source index, but with a smaller number of primary shards. It then hardlinks the segment from the source index to the target index. Finally, it restores the target index as if it were a closed index that was just reopened.

 

Hands-on practice

We first Split an index into two indexes following the previous article “Split Index API – Splitting a large index into more shards.”

GET _cat/shards/kibana_sample_data_logs_split? vCopy the code

The command above shows:

Index shard prirep State Docs Store IP node Kibana_sample_data_logs_split 1 P STARTED 7076 4.8 MB 127.0.0.1 node1 Kibana_sample_data_logs_split 0 P STARTED 6998 4.7 MB 127.0.0.1 node1Copy the code

The kibanA_sample_data_logs_split index has two primary shards. How do we change the split index back to have only one primary shard? Of course, the specific quantity can be defined according to your own requirements, as long as it meets the requirements of the above quantity description.

According to the above requirements, we need to centralize the index on a node. To do this, we can obtain the name of the node where index kibana_sample_data_logs_split is located as follows:

GET _cat/shards/kibana_sample_data_logs_split? vCopy the code

The command above shows:

Index shard prirep State Docs Store IP node Kibana_sample_data_logs_split 1 P STARTED 7076 4.8 MB 127.0.0.1 node1 Kibana_sample_data_logs_split 0 P STARTED 6998 4.7 MB 127.0.0.1 node1Copy the code

Kibana_sample_data_logs_split is shown to be on node1. Of course, in our actual use, this node will be different, and at the same time, there will be different copies displayed on different nodes, such as a structure like the following:

Index shard prirep state Docs Store IP node my-index-2019.01.10 2p STARTED 193 101MB X.X.X. xxlq9p7ep My-index-2019.01.10 2 r STARTED 193 101MB x.x.x.xxf5edowk my-index-2019.01.10 4 p STARTED 197 101MB x.x.x.xxlq9p7ep My-index-2019.01.10 4 r STARTED 197 101MB x.x.x.xxf5edowk my-index-2019.01.10 3 r STARTED 184 101MB x.x.x.xxlq9p7ep My-index-2019.01.10 3 p STARTED 184 101MB x.x.x.xxf5edowk my-index-2019.01.10 1 r STARTED 180 101MB x.x.x.xxlq9p7ep My-index-2019.01.10 1p STARTED 180 101MB x.x.x.xxf5edowk my-index-2019.01.10 0 p STARTED 187 101MB x.x.x.xxlq9p7ep My-index-2019.01.10 0 r STARTED 187 101MB x.x.x.xxf5edowkCopy the code

As per the above requirements, we execute the following command:

PUT kibana_sample_data_logs_split/_settings
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.routing.allocation.require._name": "node1",
    "index.blocks.write": true
  }
}
Copy the code

The above command will delete the copy of the index, assign all primary shards to the node on Node1, and prohibit writes. We installed the following method to query the status of all shards:

GET _cat/shards/kibana_sample_data_logs_split? vCopy the code

The results shown above are:

Index shard prirep State Docs Store IP node Kibana_sample_data_logs_split 1 P STARTED 7076 4.8 MB 127.0.0.1 node1 Kibana_sample_data_logs_split 0 P STARTED 6998 4.7 MB 127.0.0.1 node1Copy the code

Obviously all shards are already on Node1.

Next, we use the _shrink Index API to shrink the index:

POST kibana_sample_data_logs_split/_shrink/kibana_sample_data_logs_shrink
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "index.codec": "best_compression",
    "index.routing.allocation.require._name": null, 
    "index.blocks.write": null 
  },
  "aliases": {
    "my_search_indices": {}
  }
}
Copy the code

Above, we cleared the index assignment requirement and allowed writes at the same time. After running the above command, we can conduct the query process in the following way:

GET _cat/recovery/kibana_sample_data_logs_shrink? human&detailed=trueCopy the code

We can view the following information:

Kibana_sample_data_logs_shrink 0 345ms Local_shards done N /a N /a 127.0.0.1 node1 N /a N /a 00 100.0% 30 00 100.0% 10093464 0 0 100.0%Copy the code

We can see that the completion is 100%.

To query the number of kibana_sample_data_logs_shrink primary shards, run the following command:

GET _cat/shards/kibana_sample_data_logs_shrink? vCopy the code

Command from above:

Index shard Prirep State Docs Store IP node Kibana_sample_data_logs_shrink 0 P STARTED 14074 9.6 MB 127.0.0.1 node1Copy the code

Let’s take a look at the number of documents:

GET kibana_sample_data_logs_shrink/_count
Copy the code

The results above show:

{
  "count" : 14074,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}
Copy the code

This is obviously the same number as our original document.

In many cases, we can even reduce the number of segments in this index to 1. We can do this by:

POST kibana_sample_data_logs_shrink/_forcemerge? max_num_segments=1Copy the code

The command above will return:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}
Copy the code

That way we can speed up the search.