Talk about Elasticsearch pits

This article comes from the public number: Hook hook Java universe (wechat: Javagogo), mo to promote, all dry goods!

The original link: mp.weixin.qq.com/s/djT9muWXt… Author: Wei Mu

First pit: is ES quasi-real-time?

When you update data to ES and return a success message, you will find that the data returned by the ES query is still not up to date. Why on earth?

To answer this question, start with data indexing. The whole process involves ES fragmentation, Lucene Index, Segment and Document.

A fragment of ES is a Lucene Index, and each Lucene Index consists of multiple segments, that is, a subset of Lucene Index is a Segment.

Lucene Index, Segment, Document, Lucene Index, Segment, Document, Lucene Index, Lucene Index, Segment, Document, Lucene Index

A Lucene Index can store multiple segments, and each Segment can store multiple documents.

The process of data indexing consists of the following steps.

When a new Document is created, the data is first stored in the new Segment, and the old Segment is deleted with a delete flag marked on it. When the Document is updated, the old Document is marked as deleted and the new Document is stored in the new Segment.
When Shard receives a write request, the request is written to Translog. The Document is then stored in memory buffer, and finally Translog stores all changes. (Note: Memory buffer data cannot be searched.)
Every 1 second (default), refresh is executed and the data in memory buffer is written to a Segment and stored in filesystem cache, at which point new data can be retrieved.

Through the above description of the data index process, we find that ES is not real-time, but has a delay of 1 second, which prompts users to query data with a certain delay.

Second pit: ES downtime after recovery, data loss

Every 1 second (depending on the configuration), the data in the memory buffer is written to the Segment, which is searchable but not persisted. Like the gray bucket in the image above, once ES goes down, data is lost.

How do you prevent data loss? This problem can be easily solved using the Commit operation in Lucene: merge multiple segments and save them to disk, then turn the gray buckets into green buckets.

However, there is a disadvantage to using the COMMIT operation — it consumes IO, causing ES to break down before the commit.

Once the system breaks down before translog fsync, the data will be directly lost, so how to ensure the integrity of ES data becomes an urgent problem to be solved.

In this case, we use translog because the data in translog is not stored directly on disk, only after fsync. Here I share two Translog solutions.

Will Index. The translog. Durability is set to the request, if we find that the system runs well, in this way;
Will Index. The translog. Durability is set to fsync, each ES downtime starts, to compare the main data and ES, then ES find missing data.

For emphasis: when will Translog fsync? When the Index. The translog. Durability is set after the request, every request fsync, but this influence on the performance ES. Then we can put the Index. The translog. Durability is set to fsync, so every Index. The translog. After sync_interval fsync once per request.

Third pit: The deeper the page, the slower the query efficiency

The appearance of ES paging pit is closely related to the processing process of ES read operation request. Therefore, it is necessary to analyze the processing process of ES read operation request in depth first.

There are two main stages.

The Query Phase: The coordinating node first distributes the request to all the fragments, and then each fragment builds a result set queue in local query, stores the Document ID and search score in the queue, and then returns it to the coordinating node. Finally, the coordinating node will build a global queue, merge all the received result sets and conduct global sorting.
Fetch Phase: The coordination node first obtains the complete Document from all shards according to the Document ID in the result set, then all shards return the complete Document to the coordination node, and finally the coordination node returns the result to the client.

In the entire ES read process, the Elasticsearch cluster actually needs to return shards number* (from+size) data to the coordination node, sort the data on the single node, and finally return this size data to the client.

For example, if there are 5 shards, we need to query the result from 10000 to 10010 (from=10000, size=10), how much data is returned by each shard to the coordinating node?

I’m telling you it’s not 10, it’s 10,010.

In other words, the coordination node needs to calculate 10010*5=50050 records in memory, so in system use, the deeper the user pages are, the slower the query speed will be, that is to say, the more pages the better.

So how do you better solve the ES paging problem? To control performance, we mainly use the max_result_window configuration in ES, which defaults to 10000. When from+size > max_result_window, ES returns an error.

It can be seen that in system design, we generally need to control users not to turn pages too deeply, which is acceptable to users in real scenes. This is also the design method adopted in my previous scheme.

If the user does have a need for deep page flipping, we can use the search_after feature in ES to solve the problem, but the page hopping is not possible.

For example, if the query is paginated according to the total amount of the order, the last item of the order on the previous page total_amount is 10, then the sample code of the next page is as follows:

{" query ": {" bool" : {" must ": [{" term" : {". User user_name. Keyword ":" Li Daxia "}}], "must_not" : [], "should" : [] } }, "from": 0, "size": 2, "search_after": [ "10" ], "sort": [ { "total_amount": "asc" } ], "aggs": {} }Copy the code

The value in search_after is the result of the sorting field from the last query.

Welcome to pay attention to the public account of the Java universe (wechat: Javagogo), refuse hydrology, harvest dry goods!

First pit: is ES quasi-real-time?

Second pit: ES downtime after recovery, data loss

Third pit: The deeper the page, the slower the query efficiency

Related Posts

K8s Intermediate – Installing a K8S cluster using RKE

Gracefully close resources with try-with-resources

Notes on SQL Anti-Patterns (part 2)