Overall architecture diagram

1. An ES cluster is composed of multiple nodes

2. Es A data index consists of multiple fragments and is distributed in each node of ES

3. Es shards consist of P(primary) shards and R(secondary) shards

4. The primary shard can read and write, while the secondary shard can only read

5. When the index is created, the primary shard cannot be changed (refer to the data routing algorithm), and the secondary shards can be added or decreased

Single shard architecture diagram

1. A fragment contains several files: segment,commit point,translog, and.del

2. The segment is an inverted index with all data stored in it.

3.3.com MIT point Records the segment that has been written to the disk

4. Translog Stores data writing logs that can be used to restore data

5. The del file is used to record data updated and deleted from the segment. This does not modify the segment data and improves the concurrency of writes

6. Segments will be merged periodically. During the merge, delete unnecessary data with the del file (segment rewrite).

Single segment structure (inverted index)

1.Inverted Index A. This is the part we pay more attention to, and it is also the inverted index in our sense

B. Before the document is written, it will be divided into the relationship between the above keywords and the document ID, and the keywords are sorted

C. When we search, that is, read, we can quickly find the document ID according to the keyword

2.Stored Fields

This section stores the ID and data field relationships of the document, which is equivalent to raw data

3.Document Values store Document fields in the same column as data in the column, which can quickly solve sorting, aggregation, and play the advantages of column storage