“This is the 22nd day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

First, write process

New, index, and delete requests are write operations that must be completed on the master shard before they can be copied to the relevant replica shard.

The sequence of steps required to create, index, and delete documents:

The end sends a new, index, or delete request to Node 1
The document_idDetermine that the document belongs to Shard 0. The request is forwarded to Node 3 because the primary shard from Shard 0 is currently assigned to Node 3.
Node 3 performs the request on the main shard. If successful, it forwards the request to the Node 1 and Node 2 replica shards in parallel. Once all replica shards report success, Node 3 reports success to the coordinating Node, which reports success to the client.

By the time the client receives a successful response, the document changes have been executed in the master shard and all replica shards, and the changes are safe. There are optional request parameters that allow you to influence this process, possibly improving performance at the expense of data security.

These options are rarely used because Elasticsearch is already fast, as shown below:

parameter	meaning
consistency	Consistency. Under the default Settings, even if it’s just trying to perform a write _ _ before operation, the primary shard will require must have specified number (quorum) (or in other words, also must want to have most) copy of the shard in the active state are available, and to perform write _ _ operations (including copy of fragmentation can be primary shard or copy fragmentation). This is to avoid data inconsistency caused by _ write operations when a Network partition fails. _ Specified quantity _ that is:int( (primary + number_of_replicas) / 2 ) + 1The consistency parameter can be set to one (_ write _ is allowed as long as the master shard state is OK),all (_ write _ is allowed only if the master and all replica shards are in good condition), or quorum. The default value is quorum, which means that most shard copies will be allowed to operate _ write _ if the status is fine. Note that number_of_replicas in the specified number calculation formula refers to the number of replicas specified in the index setting, not the number of replicas currently in the active processing state. If your index setting specifies that the current index has three replica shards, then the specified number of shards is calculated:int( (primary + 3 replicas) / 2 ) + 1 = 3If you start only two nodes at this point, you will not have the required number of active shard copies, and therefore you will not be able to index or delete any documents.
timeout	What happens if there are not enough replica sharding? Elasticsearch will wait, hoping for more shards to appear. By default, it waits up to 1 minute. If you need to, you can use the timeout argument to terminate it earlier: 100 100 milliseconds, and 30s is 30 seconds.

parameter

meaning

consistency

Consistency. Under the default Settings, even if it’s just trying to perform a write _ _ before operation, the primary shard will require must have specified number (quorum) (or in other words, also must want to have most) copy of the shard in the active state are available, and to perform write _ _ operations (including copy of fragmentation can be primary shard or copy fragmentation). This is to avoid data inconsistency caused by _ write operations when a Network partition fails. _ Specified quantity _ that is:int( (primary + number_of_replicas) / 2 ) + 1The consistency parameter can be set to one (_ write _ is allowed as long as the master shard state is OK),all (_ write _ is allowed only if the master and all replica shards are in good condition), or quorum. The default value is quorum, which means that most shard copies will be allowed to operate _ write _ if the status is fine. Note that number_of_replicas in the specified number calculation formula refers to the number of replicas specified in the index setting, not the number of replicas currently in the active processing state. If your index setting specifies that the current index has three replica shards, then the specified number of shards is calculated:int( (primary + 3 replicas) / 2 ) + 1 = 3If you start only two nodes at this point, you will not have the required number of active shard copies, and therefore you will not be able to index or delete any documents.

timeout

What happens if there are not enough replica sharding? Elasticsearch will wait, hoping for more shards to appear. By default, it waits up to 1 minute. If you need to, you can use the timeout argument to terminate it earlier: 100 100 milliseconds, and 30s is 30 seconds.

Tips: The new index has 1 replica shard by default, which means that two active shard copies should be required to meet the specified number. However, these default Settings prevent us from doing anything on a single node. To avoid this problem, require that the specified number be executed only when number_of_replicas is greater than 1.

Second, reading process

We can retrieve documents from the master shard or from any other replica shard

The sequence of steps to retrieve a document from a master or replica shard:

The client sends a fetch request to Node 1
The node uses documentation_idTo determine that the document belongs to Shard 0. A replica shard of shard 0 exists on all three nodes. In this case, it forwards the request to Node 2
Node 2 returns the document to Node 1, which then returns the document to the client

When processing read requests, the coordination node achieves load balancing by polling all replica shards on each request. Documents that have been indexed may already exist on the master shard but have not been copied to the replica shard when the document is retrieved. In this case, the replica shard may report that the document does not exist, but the master shard may successfully return the document. Once the index request is successfully returned to the user, the document is available in both the master and replica shards.

3. Update process

Partially updating a document combines the read and write process described earlier:

The steps to partially update a document are as follows:

The client sends an update request to Node 1.
It forwards the request to Node3 where the main shard is located.
Node 3 retrieves the document from the master shard and modifies it_sourceJSON in the field, and try to re-index the primary shard document. If the document has been modified by another process, it retries Step 3 and aborts after retry_on_conflict.
If Node 3 successfully updates the document, it forwards the new version of the document in parallel to shards on Node 1 and Node 2 for re-indexing. Once all replica shards return success, Node 3 returns success to the coordinator, and the coordinator returns success to the client.

Tips: When the master shard forwards changes to the replica shard, it does not forward update requests. Instead, it forwards a new version of the complete document. Remember that these changes will be forwarded to the replica shard asynchronously, and there is no guarantee that they will arrive in the same order in which they were sent. If Elasticsearch only forwards change requests, you might apply the changes in the wrong order, resulting in a corrupted document.

Iv. Multi-document operation process

The MODE of the MGET and Bulk apis is similar to the single-document mode. The difference is that the coordinating node knows in which shard each document exists. It breaks down the entire multi-document request into multi-document requests per shard and forwards these requests in parallel to each participating node.

Once the coordinating node receives the response from each node, it collects the response from each node into a single response and returns it to the client

The sequence of steps required to retrieve multiple documents with a single MGET request:

The client sends an MGET request to Node 1.
Node 1 builds multiple document fetch requests for each shard and then forwards these requests in parallel to nodes that host each required master or replica shard. Once all replies are received, Node 1 builds the response and returns it to the client.

Routing parameters can be set for each document in the Docs array.

Bulk API, which allows multiple create, index, delete, and update requests to be executed in a single batch request.

The BULK API is executed in the following sequence:

The client sends bulk requests to Node 1.
Node 1 creates a batch request for each Node and forwards these requests in parallel to each Node host that contains the master shard.
The master shard performs each operation in sequence, one after the other. When each operation succeeds, the master shard forwards the new document (or deletes it) to the replica shard in parallel, and then performs the next operation. Once all of the replica shards report success of all operations, the node reports success to the coordinator node, which collects and returns the responses to the client.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

ElasticSearch Advanced 4: Sharding operation process

First, write process

Second, reading process

3. Update process

Iv. Multi-document operation process

ElasticSearch Advanced 4: Sharding operation process

First, write process

Second, reading process

3. Update process

Iv. Multi-document operation process

Related Posts

Distributed Transaction Topic (4) : TCC for Distributed transaction Solutions

Recommend 9 GitHub practice projects (online exam, Fake Meituan, fake Douyin, Fake B station, fake Toutiao…)

Predicate and Stream operations in Java