An overview of the

Elasticsearch is one of the most popular middleware in the world. In order to use Elasticsearch well, it is necessary to understand its architecture. In particular, Elasticsearch is a distributed middleware, so if you look at its architecture diagram, you can see why it can achieve high availability and high performance.

Architecture diagram

1. Data is in the unit of document. A document is a piece of data

2. Documents are stored in shards, which are distributed and stored on any machine (P1, P2 and P3 are shards).

3. An index is composed of multiple fragments and the data is distributed in multiple ES nodes (the index is composed of P1, P2 and P3 fragments).

4. Each fragment has 0 to multiple copies distributed on nodes (P1, P2 and P3 have two copies R1,R2 and R3 respectively).

5. Master shard provides read and write capabilities (red P1, P2, P3)

6. Master shard copy synchronizes master shard data and provides read service (R1,R2,R3)

Expansion mechanism

1. An index consists of multiple primary shards and multiple sub-shards

2. The primary shard can be read and write, which means that the write ability of the primary shard is confirmed at the beginning.

3. Sub-sharding can be dynamically modified, which means that the read capability can be continuously expanded.

4. Compared with mysql, ES is obviously capable of distributed expansion, so its reading and writing ability is much higher than that of mysql.

Optimistic locking controls concurrency

Request if_primary_term=1& IF_seq_NO =4 to control concurrency
POST /test-index-20220311/_update/MViAd38BX_oAU-6DNdFf? if_primary_term=1&if_seq_no=4
{
  "doc": {
    "name":"aaa3"}}// The version has changed, so an error will be reported, as follows
{
  "error" : {
    "root_cause": [{"type" : "version_conflict_engine_exception"."reason" : "[MViAd38BX_oAU-6DNdFf]: version conflict, required seqNo [4], primary term [1]. current document has seqNo [5] and primary term [1]"."index_uuid" : "z8wkuCSBRpK2lVvxSF2AVg"."shard" : "0"."index" : "test-index-20220311"}]."type" : "version_conflict_engine_exception"."reason" : "[MViAd38BX_oAU-6DNdFf]: version conflict, required seqNo [4], primary term [1]. current document has seqNo [5] and primary term [1]"."index_uuid" : "z8wkuCSBRpK2lVvxSF2AVg"."shard" : "0"."index" : "test-index-20220311"
  },
  "status" : 409
}
Copy the code

Data routing algorithm

1. The ES read and write data is actually written to a fragment and requires a hash consistency control. In this way, the written data is consistent with the fragment that reads the data

2. In fact, the ES data routing algorithm performs hash calculation first, and then performs modulo operation on the number of main fragments, so that a certain fragment can always be hit

3. The reason why ES cannot modify the master shard is that the data cannot be routed if the master shard is modified. So you can rehash the index using _reindex.

Write data flow (Write according to route value)

1. When a user initiates a write request to any node, such as node1, the node is called a coordination node

2. The coordinating node calculates the shard of data distribution according to the routing data algorithm. Assuming that the calculation result is the second shard, the coordinating node forwards the request to the node where the second shard is the main shard.

3. After the master shard is written, it is synchronized to the replica node and returned after the synchronization

Too much synchronization will cause write performance problems, while too little synchronization will cause data loss. Wait_for_active_shards =10&timeout=10s is used to control the write consistency problem.

The default route value used is document ID

Write data flow (primary shard writes internally)

1. Write the received request (i.e. raw data) to the memory buffer

2. Simultaneously write logs to the translog buffer

3. The memory buffer refresh every 1s. This process creates inverted indexes, generates segment files, and waits until the OS cache writes them to disk

4. Translog Buffer also synchronizes data to disk

5. Create the segment file and add it to the inverted index before it can be searched. Therefore, if you want to search the data quickly, you can manually refresh it

Read data flow (Read by route value)

1. Generally, data read requests are not initiated by users. Generally, they are initiated by pulling document data when coordinating nodes are searching, usually according to the route value (ID)

2. The coordination node firstly calculates which shard the data belongs to according to the data routing algorithm, and then obtains the list of all the primary shards and sub-shards of this shard

3. According to the polling load policy, the coordinating node forwards data to a fragment to read data (it may fail to read data, and some fragments do not complete data synchronization, which can be avoided by using the write policy).

Here read data also refers to read data according to the route value, and search read is not the same thing, can be regarded as the search read data part

Search data flow (search read)

1. When a user initiates a search request, usually with some query conditions, the request reaches the coordination node

2. After receiving the request, the coordinating node will forward the request to each shard of the index (primary or secondary shard), and take one representative for each shard, which is the complete index data, because the search is the whole data of the index, not the data on the shard (shard polling strategy).

3. Each shard will find the data that meets the conditions according to the query of the query and return the doc ID

4. After receiving the query result, the coordination node sorts and merges pages

5. The coordination node finally initiates a request according to doc IDS to read the data on the corresponding shard (meaning of reading data above)(shard polling strategy)