I have been maintaining the search function of the product during this period. Every time I see ElasticSearch on the management platform with such efficient query efficiency, I am curious about how it achieves it.

This is even faster than my local MySQL query by primary key.

To this end, I searched for relevant information:

There are many answers to this question on the Internet, and the general meaning is as follows:

  • ES is based onLuceneFull-text search engine, it will divide the data after saving the index, good at managing a large number of index data, relative toMySQLI’m not good at constantly updating data and correlating queries.

Said not very thorough, did not analyze the relevant principle; But since we’ve repeatedly mentioned indexes, let’s look at the difference from an index point of view.

MySQL index

Starting from MySQL, the word index is familiar to everyone. It usually exists in some query scenarios, which is a typical case of space for time.

The following takes the InnoDB engine as an example.

Common data structures

If we were to design the MySQL index ourselves, what would our options be?

Hash table

The first thing we should think of is the hash table, which is a very common and efficient query-and-write data structure, the Java equivalent of a HashMap

For example, if we want to query data with id=3, we need to hash 3, and then find the corresponding position in the array.

However, if we want to query interval data such as 1≤id≤6, the hash table will not satisfy very well, because it is unordered, so we have to traverse all the data to find out which data belong to the interval.

Orderly array

Ordered array query efficiency is also very high, when we want to query id=4 data, just need binary search can also be efficient to locate the data O(logn).

At the same time, because the data is also ordered, so naturally can also support interval query; So ordered arrays are good for indexing?

Of course not, it has another big problem; If we inserted data with id=2.5, we would have to move all subsequent data one bit at the same time, which would make the write efficiency very low.

Balanced binary trees

Since ordered arrays are not efficient to write, let’s look at the efficient ones, it’s easy to think of binary trees; Here we take a balanced binary tree as an example:

Due to the properties of the balanced binary tree:

The left node is smaller than the parent node and the right node is larger than the parent node.

So if we want to query data with id=11, we can simply query 10 — >12 — >11 to find the data. The time is O(logn), and the same is true when we write the data.

However, interval range search is still not well supported. Suppose we want to query the data of 5≤ ID ≤20, we need to query the left subtree of 10 nodes first and then the right subtree of 10 nodes to finally query all the data.

This leads to an inefficient query.

Jump table

Skiptables may not be as common as the hash tables, ordered arrays, and binary trees mentioned above, but Redis sort set uses them.

Here we briefly introduce the advantages of the data structure implemented by the jump table.

We know that even an ordered list is not very efficient, because it can’t do binary search using array indices, so it’s O (n).

But we can also subtly optimize the linked list to realize binary search in a disguised way, as shown in the figure below:

We can extract first-level index, second-level index for the lowest level data, and we can extract n-level index according to the different data volume.

When we query, we can use the index here to achieve binary search.

If you want to query the data with id=13, you only need to traversal 4 nodes 1 — >7 — >10 — >13 to query the data. When the number of nodes increases, the efficiency will be more obvious.

At the same time, interval query is also supported. Similar to the query of a single node just now, you only need to query to the starting node, and then traverse (linked list ordered) to the target node in order to query out the whole range of data.

And since we don’t store the actual data on the index, just a pointer, the space is negligible compared to the bottom linked list where the data is stored.

Optimization of Balanced Binary Trees

But InnoDB in MySQL doesn’t use skip tables, it uses a data structure called B+ tree.

This data structure is not like the binary tree that college teachers often talk about as the basic data structure, because this kind of data structure is evolved from the basic data structure according to the requirements scenario in the actual engineering.

For example, the B+ tree here can be thought of as an evolution of a balanced binary tree.

Just now we mentioned that the interval query efficiency of binary tree is not high, which can be optimized:

Optimized on the basis of the original binary tree: all non-leaves do not store data, but only serve as indexes of leaf nodes, and all data are stored in leaf nodes.

In this way, the data of all leaf nodes are stored in an orderly way, which can support interval query very well.

Just query to the start node position and then iterate through the leaf nodes.

When there is a large amount of data, it is obvious that the index file cannot be stored in memory, although it is fast but consumes a lot of resources; So MySQL stores the index files directly on disk.

This is a bit different from the Elasticsearch index mentioned later.

Since the index is stored on disk, we want to minimize disk IO (disk IO is not an order of magnitude more efficient than memory).

It is obvious that the number of IO is closely related to the height of the tree. The lower the height of the tree, the fewer IO will be, and the better performance will be.

So how do you reduce the height of the tree?

We can try to change the binary tree into a tritree, so that the height of the tree will drop a lot, so that the number of IO when querying data will naturally decrease, and the query efficiency will also improve a lot.

That’s where the B plus tree comes from.

Some suggestions for using indexes

In fact, through the understanding of B+ tree in the figure above, we can also optimize some small details of daily work; For example, why do we need primary keys to be incrementing in order?

Assuming that the primary key data we write is unordered, it is possible that the id of the data we write later will be smaller than the id of the data we wrote before, which may require moving the written data while maintaining the B+ tree index.

If you write incrementing data, you don’t have this consideration. You just need to write it one at a time.

That’s why we require the database primary key to be trend increasing. The most reasonable way to do this is to increase the primary key automatically, regardless of the tables.

The overall idea is similar to skip tables, with some adjustments for usage scenarios (such as storing all data in leaf nodes).

ES index

Now let’s see how Elasticsearch uses indexing.

Is row index

In ES, a data structure called an inverted index is used; Before we talk about inverted indexes, let’s talk about the opposite of inverted indexes.

For example, the way we can query a specific object by doc_id is called using a forward index, which can also be understood as a hash table.

It’s essentially looking up value by key.

For example, if doc_id=4, you can quickly query name= Jetty Wang,age=20.

Inverted index

So if I go the other way and I want to query what data in name contains Li? So how do you query efficiently?

Just looking at the forward index mentioned above is of no use. You can only iterate through all the data in order to determine if the name contains Li. It’s very inefficient.

But if we rebuild an index structure:

When you query data in the name, you simply need to query the data contained in the Posting List through this index structure and then query the data in the final data by mapping.

This index structure is actually an inverted index.

Term Dictionary

But how to query Li efficiently in this index structure? Combined with our previous experience, as long as we arrange terms in order, we can use binary tree to search the data structure of the tree and query the data under O (logn).

The process of dividing a text into independent terms is actually what we often call word segmentation.

Combining all the terms together is a Term Dictionary, which can also be called a word Dictionary.

  • English word segmentation is relatively simple, just need to separate the text through Spaces, punctuation marks can break the word, Chinese is relatively complex, but there are many open source tools to support (because it is not the focus of this article, interested in word segmentation can search).

When we have a huge amount of text, there will be a lot of terms after word segmentation. Such an inverted index data structure is certainly not enough to be stored in memory, but it is not so efficient if it is stored in disk like MySQL.

Term Index

So we can choose a compromise. Since we cannot put the entire Term Dictionary in memory, we can create an index for the Term Dictionary and put it in memory.

In this way, we can efficiently query the Term Dictionary and finally query the Posting List through the Term Dictionary.

This also reduces disk IO several times compared to B+ trees in MySQL.

For this Term Index, we can store it in a Trie tree, which is often called a dictionary tree.

More information about dictionary trees can be found here.

If we search for Term beginning with j, the first step is to find out where the Term beginning with j is in the Term Dictionary file by searching the Term Index in memory (this position can be a file pointer, maybe an interval range).

Then, all terms in this position interval are taken out. Since they have been sorted, binary search can be used to quickly locate the specific location. This will query out the Posting List.

Finally, the location information in the Posting List can be used to retrieve the target data from the original file.

More optimization

Of course, Elasticsearch does a lot of specific optimizations. When we retrieve two fields, we can use Bitmap to optimize them.

For example, if we need to query data from name=li and age=18, we need to retrieve the data from these two fields.

The easiest way to do this is to traverse both sets separately, pulling out duplicate data, but this is obviously inefficient.

Then we can use bitmaps for storage (and save storage space), and use the innate bits and * calculations to get the result. *

[1, 3, 5] Tail 10101

[1, 2, 4, 5] Tail 11011

Thus the sum of the two binary arrays can be obtained:

10001[1, 5]

Finally, the Posting List is set as [1, 5]. Naturally, this is much more efficient.

There is no special optimization for the same query requirement in MySQL, but the second field is screened after the data with small amount of data is filtered, so the efficiency is naturally not as high as ES.

Of course, in the latest release of ES, the Posting List will also be compressed. Please refer to the official documentation for details of the compression rules, but I won’t describe them here.


Finally, let’s summarize:

From the above content, we can see that no matter how complex the product is, it is ultimately composed of basic data structure, and it is only targeted optimization for different application scenarios. Therefore, only after laying the foundation of data structure and algorithm and looking at a new technology or middleware can we quickly start, and even know the optimization direction by ourselves.

Finally draw a pie, I will try to do a standalone version of the search engine in accordance with the idea of ES inverted index, only to write it again to deepen understanding.

For a better reading experience, visit

Your thumb up and share is the biggest support for me