preface

  • This article works for ElasticSearch 7.10
  • Elasticsearch 7.10 matches Lucene 8.7
  • Lucene 8.7 official documentation on extensionshttps://lucene.apache.org/cor…

    reading

  • ElasticSearch 10 billion data retrieval cases and principles
  • How is data stored in Day 7-ElasticSearch
  • A Dive into the Elasticsearch Storage

A fragment of a shard index file list

/data/nodes/0/indices/wLxsr8mrTfq1ZVro5eAKig/3/index$ ll -ah -rw-r--r-- 1 elasticsearch elasticsearch 746 Mar 5 09:10 _as2. fdm-rw-r -- 1 elasticsearch 3.2G Mar 5 09:10 _as2. fdt-rw-r -- 1 elasticsearch 3.2G Mar 5 09:10 _as2. fdt-rw-r -- 1 elasticsearch 3.2K Mar 2 09:10 -- 2 ElasticSearch (ElasticSearch) Mar 3 09:14 -- 2 ElasticSearch (ElasticSearch elasticsearch 67M Mar 5 09:14 _as2.kdd -rw-r--r-- 1 elasticsearch elasticsearch 207K Mar 5 09:14 _as2.kdi -rw-r--r-- 1 elasticsearch elasticsearch 508 Mar 5 09:14 _as2.kdm -rw-r--r-- 1 elasticsearch elasticsearch 14M Mar 5 09:10 _as2.nvd -rw-r--r-- 1 elasticsearch elasticsearch 463 Mar 5 09:10 _as2.nvm -rw-r--r-- 1 elasticsearch elasticsearch 614 Mar 5 09:14 _as2.si -rw-r--r-- 1 elasticsearch elasticsearch 535M Mar 5 09:14 _as2_Lucene80_0.dvd -rw-r--r-- 1 elasticsearch elasticsearch 11K Mar 5 09:14 _as2_Lucene80_0.dvm -rw-r--r-- 1 elasticsearch elasticsearch 440M Mar 5 09:13 _as2_Lucene84_0.doc -rw-r--r-- 1 elasticsearch elasticsearch 119M Mar 5 09:13 _as2_Lucene84_0.pos -rw-r--r-- 1 Elasticsearch elasticsearch 350 M Mar 5 09:13 _as2_Lucene84_0. Tim - rw - r - r - 1 elasticsearch elasticsearch 6.7 M Mar 5 09:13 _as2_Lucene84_0. Tip - rw - r - r - 1 elasticsearch elasticsearch 5.2 K Mar 5 09:13 _as2_Lucene84_0. TMD

File volume fragments of several indexes

The various files used in the ES query process

  • Image from chat ElasticSearch query burr
  • The red circles in the figure indicate larger files, which have a greater impact on disk IO

Explain them one by one

.tim (larger)

  • Name: Term Dictionary
  • Brief Description: The term dictionary, stores term info
  • Inverted list pointer, metadata information for indexed inversion

.doc (large volume)

  • Name: Frequencies
  • Brief Description: Contains the list of docs which contain each term along with frequency
  • A list of documents containing terms and frequencies, which holds the docid list for each term and the term frequency in doc
  • Note that Lucene’s docid is not the _id of ES. The _id of ES is stored in the.fdt file.

. FDT (large volume)

  • Name: Field Data
  • Brief Description: The stored fields for documents
  • The document that stores the store field. Source is a special store field.

.pos (large volume)

  • Name: Positions
  • Brief Description: Stores position information about where a term occurs in the index
  • The field in the full-text index that will have the file, saves the location of term in the doc.

.DVD (larger volume),.DVM

  • Name: Per-Document Values
  • Brief Description: Encodes additional scoring factors or other per-document information.
  • .DVD: docValues Data, docValues Data
  • .dvm: DocValues Metadata, which is used for clustering DocValues metadata, row index, column storage

segments_N

  • Name: Segments File
  • Brief Description: Stores information about a commit point

write.lock

  • Name: Lock File
  • Brief Description: The Write lock prevents multiple IndexWriters from writing to the same file.

.si

  • Name: Segment Info
  • Brief Description: Stores metadata about a segment

.cfs, .cfe

  • Name: Compound File
  • Brief Description: An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles.

.fnm

  • Name: Fields
  • Brief Description: Stores information about the fields
  • Field metadata information

.fdx

  • Name: Field Index
  • Brief Description: Contains pointers to field data

.tip

  • Name: Term Index
  • Brief Description: The index into the Term Dictionary
  • Dictionary index

.pay

  • Name: Payloads
  • Brief Description: Stores additional per-position metadata information such as character offsets and user payloads

.nvd, .nvm

  • Name: Norms
  • Brief Description: Encodes length and boost factors for docs and fields
  • .nvd: Norms data
  • .nvm: Norms metadata

.tvx

  • Name: Term Vector Index
  • Brief Description: Stores offset into the document data file

.tvd

  • Name: Term Vector Data
  • Brief Description: Contains term vector data.

.liv

  • Name: Live Documents
  • Brief Description: Info about what documents are live

.dii, .dim

  • Name: Point values
  • Brief Description: Holds indexed points, if any

This article from the
qbit snap