I am participating in the Mid-Autumn Festival Creative Submission contest, please see: Mid-Autumn Festival Creative Submission Contest for details.

Our analysis process is based on the ES cluster built in the previous chapter.

A, shard

What is sharding?

The existence of sharding is to solve the storage problems of a large number of documents in a single index and slow search response. An index is divided into multiple pieces, each of which is called a shard. Each shard is also a fully functional “index” that can be placed on any node in the cluster.

There are two important reasons for the existence of sharding: 1) horizontal sharding is allowed to expand capacity. 2) Allow distributed, parallel operations on shards to improve their throughput.

Sharding is a Lucene index. A sharding is a Lucene index. An ElasticSearch index is a collection of Lucene indexes. When a query is performed, the query request is sent to each shard belonging to the current ElasticSearch index, and the results from each shard are merged back.

Testing:

First of all, let’s look at what sharding is and what the changes are by creating an index. Let’s first create an index named test and let it have a shard. Let’s look at the result and execute the following command in Kibana:

PUT /test/
{
  "settings":{
    "index":{
      "number_of_shards" : "1",
      "number_of_replicas" : "0"
    }
  }
}
Copy the code

Check the result in head and see that test is only available on node-2:

Let’s create two shards of test1 again:

PUT /test1/
{
  "settings":{
    "index":{
      "number_of_shards" : "2",
      "number_of_replicas" : "0"
    }
  }
}
Copy the code

As you can see in the figure above, the shards are distributed on two nodes. What if I had four? One of the nodes will be allocated 2 shards, as shown below:

Second, the copy

What is a copy?

In a network/cloud environment where failure can happen at any time and where a shard/node somehow goes offline or disappears for whatever reason, having a failover mechanism is very useful and highly recommended. For this purpose, Elasticsearch allows you to create one or more copies of a shard. These copies are called replicas.

Replicas exist for two important reasons: 1) improved availability: Note that replicas cannot be on the same node as the master/original shard. 2) Improved throughput: Search operations can be run in parallel on all replicas.

Testing:

Create test3 with one shard and three copies:

PUT /test3/
{
  "settings":{
    "index":{
        "number_of_shards" : "1",
      "number_of_replicas" : "2"
    }
  }
}
Copy the code

As shown in the figure above, we see three green zeros. The border of Node-0 is bold, which represents shard, and the border of the zeros of the other two nodes is thin, which are copies of shards.

Usually, we can create two copies of three nodes, and the three copies of data are evenly distributed among three nodes. What about creating three copies?

As shown above, we see an extra copy of Unassigned, which is actually redundant, because each node already contains the fragment itself and its copy, more than this is meaningless!

A combination of shards and replicas

First let’s look at what an index looks like without specifying shards and replicas.

PUT /test6/
{
  "settings":{
    "index":{
    }
  }
}
Copy the code

As shown above, there is one shard and one copy by default.

Let’s create two shards and two copies:

PUT /test7/
{
  "settings":{
    "index":{
        "number_of_shards" : "2",
      "number_of_replicas" : "2"
    }
  }
}
Copy the code

Let’s have three shards and two copies:

PUT /test8/
{
  "settings":{
    "index":{
        "number_of_shards" : "3",
      "number_of_replicas" : "2"
    }
  }
}
Copy the code

As shown above, shards and replicas are evenly distributed on each node.

3. Capacity reduction and expansion

Shrinkage capacity

If one of the three nodes goes down, what will happen to shards and copies of test8? We close the Node-2 node and look at the head.

As shown above, the original Node-2 node becomes Unassigned, and notice the three shards in the red box marked by me. These three shards have been sent with the node outage message, resulting in the loss of data. Although node2 is down, the index is still working because we have sharded and backed it up.

If you find it more interesting, shard # 2 in Node-2 has been moved to node-0:

To sum up: in the process of using ES cluster, we must pay attention to the use of sharding and copy to ensure the high availability of our whole cluster.

Let’s stop node-1 again and see what happens. The head finds that the entire cluster is no longer accessible.

The interface to query cluster status is no longer accessible:

Conclusion: If there is only one node in the cluster, the whole cluster becomes unavailable.

capacity

After the outage test above, we will now restart the outage service, starting node-1 service first.

The cluster found above has been able to resume access.

At this point we create a three-shard, two-copy index with two points available.

PUT /test9/
{
  "settings":{
    "index":{
        "number_of_shards" : "3",
      "number_of_replicas" : "2"
    }
  }
}
Copy the code

As shown above, there is no problem with the distribution of shards and replicas. This is node-2 restored

As shown above, all copies of Node-2 are found, and shards are not moved to Node-2.

4. Route calculation of index

Through the above experiment, we found the storage rules of sharding and copy, so every time we index, what kind of routing method is used to find the corresponding sharding, so as to obtain the data we want?

The actual ES is to find the location of each data store through the hash operation, and the formula is as follows:

Routing is a variable value, which defaults to the _id of the document and can be set to a custom value. Routing uses the hash function to generate a number, which is then divided by number_of_primary_shards (number of primary shards) to get the remainder. The remainder, between 0 and number_of_primary_shreds -1, is where the document shard we are looking for is located.

An important rule to understand and remember from the above formula is that the number of master shards is determined when the index is created and will never change. Any change in the number of master shards will result in a change in the formula, and ultimately our data will not be found.