Elasticsearch index storage mechanism deduction

For a better article format, see:
https://www.yuque.com/terencexie/geekartt/es-index-store

As an open source search engine, ElasticSearch needs to rely on an important data structure called Inverted Index. Inverted indexes are usually large and the establishment process is quite time-consuming, so how to store the inverted indexes becomes a very important matter.

Inverted indexes, of course, cannot simply be stored in memory; they must be persisted accordingly so that the established inverted indexes can be reused. Inverted index is a chunk of immutable segment stored on disk. Inverted index is stuck to the shelf. Inverted index is stuck to the shelf.

Inverted index: How to store Elasticsearch Inverted index: How to store Elasticsearch Inverted index: How to store Elasticsearch Inverted index: How to store Elasticsearch Inverted index: How to store Elasticsearch Inverted index:

Since it is disk file, the most intuitive optimization method is to use memory to do batch accumulation, and save disk I/O overhead by accumulating data batch. Thus, Elasticsearch introduced in-memory buffer to hold the inverted index of each DOC. When the DOC inverted index accumulates to a certain amount, it can be cleaned from in-memory buffer to disk.

On the other hand, in Lucene’s world, all “searching” behavior depends on disk segment, without which there is no corresponding “searchable”.

Thus, as shown in the figure above, there is a time gap between “index creation” and “index searchable” (this is why ElasticSearch is called near RealTime Search Engine). Obviously, our next goal is to shorten this time gap as much as possible.

One tool that is immediately associated with the operating system is the Disk Page Cache. Disk Page Cache is an effort by OS to optimize disk I/O. It is guided by the same principle that Elasticsearch introduced in memory buffer. To minimize disk I/O as much as possible.

But what makes Disk Page Cache different from user-created memory buffers is that, once data is added to the Disk Page Cache, it is important to the OS as a whole. This is “conceptually” a real disk file (although it’s not, it’s still in memory). Inverted index of disk page cache is listed on the disk segment. Inverted index of disk page cache is listed on the disk segment. Inverted index of disk cache is listed on the disk segment.

And since the disk page cache is in memory, the in-memory buffer to the disk page cache This is much less than the time from in-memory buffer to disk segment.

So, by introducing Disk Page Cache, we’ve shortened the ElasticSearch time from “create index” to “index searchable”, making it closer to RealTime. Elasticsearch (Elasticsearch) : Elasticsearch (Elasticsearch) : Elasticsearch (Elasticsearch) : Elasticsearch (Elasticsearch) : Elasticsearch (Elasticsearch) :

Elasticsearch’s Refresh () API is set to refresh the Elasticsearch memory buffer once per second.
Elasticsearch’s Flush () API fires every 30 minutes from the Elasticsearch page cache to disk.

If the contents of the disk page cache have not been flushed to disk, and the server has experienced an extreme anomaly such as a power off, the server may not be able to flush the disk. Will the data stored in the disk page cache be lost? !

Of course it will!

Elasticsearch adds a temporary disk backup to the Disk Page Cache that persists the contents of the current Disk Page Cache.

As can be seen, although translog is intended to be a persistent backup of Inverted Index in Disk Page Cache, it acts as an OS file. Inverted index of the disk page cache is inverted, so it requires the process from disk page cache (translog disk page cache) to disk. By default, translog itself persists from disk page cache to disk every 5 seconds.

Since translogs are only a temporary backup of Inverted Indexes in the disk page cache, when Elasticsearch fires the flush() API, the corresponding translogs will be flushed. The temporary disk backup is no longer needed because the data of the temporary portion is actually persisted to disk at this point.

Elasticsearch reduces data loss to 5S on Translog, so it can lose up to 5S on Translog.

Fortunately, Elasticsearch also has data backup such as Replica Node. Even if 5s data is lost in one node, it can retrieve data from other data nodes. Therefore, this further reduces the probability of data lost (of course, there is still the possibility of data lost in theory, but as long as it is reduced to the acceptable probability in engineering, it is OK. There is never a 100 percent guarantee in engineering, only an acceptable range of accuracy.

In this way, we have the entire ElasticSearch mechanism chained together. The whole mechanism revolves around how to store disk segments efficiently:

To reduce the overhead of disk I/O, in-memory is introduced
To reduce the time gap between “create index” and “index searchable”, Disk Page Cache was introduced.
Translog was introduced to make a temporary backup of the disk page cache

In the study of technology, we are often faced with a variety of complex “knowledge points”, which are not easy to remember, nor conducive to us to grasp the macro cognitions behind them. The trick, then, is to find a way to satisfy the laws of human cognition and extrapolate the rest with a few assumptions.

Elasticsearch index storage mechanism deduction

Related Posts

Google File System thesis study

Elastic Stack Learning Notes 02

Elasticsearch integrated IK Chinese word divider