In general, the Client sends a write request to the ES, the ES receives the data, writes to the disk file, and returns a response saying that the write succeeded, and the Client is done.

Then take a closer look at what’s going on inside.

2. Overall structure of ES

An ES cluster has multiple Server nodes. An ES Index has multiple shard fragments, and each shard has multiple copies.

One replica is the primary copy, which is responsible for writing data. The other replicas are replicas, which cannot write data. They can only synchronize data from the primary copy and process read requests.

ES receives a write request and routes it to the primary copy of the target SHard.

Each shard is a Lucene Index containing multiple segment files and a Commit point file.

Segment file storage is a Document, commit point records which segment file.

The primary shard receives a write request to write the Document data to the segment.

3. What if it is inefficient to write segments directly?

If each write operation is performed directly, the I/O efficiency is low.

So, ES uses a memory Buffer to put the data to be written into.

Memory performance is good, but it is insecure and loses data, so ES uses a log file, Translog. Just like MySQL’s Binlog, which records every operation log, if ES fails, data can be recovered from Translog after restart.

Because the log file is simply appending, there is no other logical operation, so it is fast to write.

Not every operation is written directly to disk, but there is a memory buffer that is written to disk every 5 seconds. Therefore, in extreme cases, 5 seconds of data can be lost, which can be adjusted to ensure strict data security.

Thus, when the data comes to the Primary shard, it is first entered into buffer and the operation records are written to the Translog.

4. How to write data from Buffer to Segment file?

ES executes refresh every second to create a Segment file, write data from the buffer to that Segment, and empty the buffer.

The data entered into the segment is entered by Lucene, which has an inverted index and can be searched.

5. How to further improve the efficiency of writing segments?

This is a performance optimization at the operating system level. It is up to the operating system to determine the physical unload time, unless the program uses fsync to write data to the segment file.

ES does not use fsync for performance reasons, of course, because Translog logs all the steps.

Don’t worry if the segment physical file is not actually written, but once it is in the operating system’s file system, it can be searched.

6. What if Translog log files grow larger?

As more data is written, the Translog grows larger and needs to be cleaned up.

Two conditions are required to trigger the cleanup action:

  1. Size Triggers the set threshold
  2. 30 minutes

If any condition is met, a COMMIT operation is triggered.

Operation process:

  1. Perform the refresh operation.
  2. Force all segments that did not fall before this commit to be written to the physical file.
  3. Create a commit point, record all segments corresponding to the commit, and write the commit point file.
  4. Empty the Translog because the Segment is already grounded and the previous Translog is no longer needed.

This submission process is called Flush.

7. How to deal with too many segments?

As you can see from the above flow, there are too many segment files, one per second, which can seriously impact search performance.

Es has a background program that merges these segment files, merges smaller segments into a larger segment, and modifies the segment record at commit point.

The merge procedure also cleans up deleted data.

When es receives a segment request, it does not actually delete the segment. Instead, it writes the segment to a ‘.del’ file.

Merge a segment that does not contain deleted data.

Nodule 8.

During the write operation, ES routes the write request to the Server node where the target Shard is located. Doc is first put into Buffer, and the operation log of this request is recorded and written to Translog.

Every second, the Refresh operation writes data from the Buffer to the Segment file and clears the Buffer.

Refresh The Segment file cache actually written to is flushed to the physical file during the Flush operation.

Flush is performed when Translog is too large or every 30 minutes.

Execute Refresh first, then flush all Segment files cache to physical file, create Commit Point, record the Segment designed for this operation, write to Commit Point file, and finally clear Translog.

The ES Merge program is responsible for merging small segments and cleaning up deleted documents.

Recommended reading

OAuth2 diagram

Easy to understand the core concepts of Kubernetes

Architecture technology trends that developers must understand: Service Mesh