From: blog.csdn.net/qq924862077…



2. How to implement MySQL index



In MySQL, index belongs to the concept of storage engine level. Different storage engines implement index in different ways. This paper mainly discusses the index implementation of MyISAM and InnoDB storage engines.


2.1 MyISAM index implementation


MyISAM table index and data are separated, the index is stored in the “table name. MYI file, and the data is stored in the table name. MYD “file.

MyISAM’s index is also called “non-clustered” to distinguish it from InnoDB’s clustered index.


2.2 InnoDB index implementation

InnoDB also uses B+Tree as an index structure, but the implementation is quite different from MyISAM.

Above is a diagram of the main index (which is also a data file) of InnoDB. You can see that the leaf node contains the complete data record. This kind of index is called a clustered index. InnoDB requires tables to have primary keys (MyISAM does not have primary keys) because InnoDB data files themselves are aggregated by primary keys. If this is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key. MySQL automatically generates an implied field for the InnoDB table as the primary key. This field is 6 bytes long and of type long integer.

The ASCII code of English characters is used as the comparison criterion. Clustered index implementations make searching by primary key very efficient, but secondary index searches require retrieving the index twice: first, retrieving the secondary index for the primary key, and then using the primary key to retrieve records from the primary index.

Understand the different storage engines index is implemented for the proper use of and optimize the index are very helpful, know the InnoDB index after implementation, for example, it is easy to understand why not suggest using long field as the primary key, because all the auxiliary index reference primary index, long primary index will make auxiliary index becomes too large. For example, using non-monotonic fields as primary keys is not a good idea in InnoDB, because InnoDB data files themselves are a B+Tree. Non-monotonic primary keys cause data files to be constantly split and adjusted to maintain B+Tree features when new records are inserted, which is inefficient. Using an increment field as a primary key is a good choice.


Traditional relational databases mostly use b-tree /B+Tree data structure. We know that the search efficiency of binary tree is logN, and it is not necessary to move all nodes when writing and updating new nodes at the same time. Therefore, the tree structure stores indexes and gives consideration to both write and query performance. When we do not need to support fast update, we can use pre-sorting and other ways to exchange for smaller storage space, faster retrieval speed and other benefits, the cost is slow update. Let’s take a closer look at how Lucene indexes are implemented.


3. How is Lucene index implemented

3.1 Inverted Index

The core of Lucene’s fast search is an inverted index. What is an inverted index?

There are several concepts here. Let’s look at a practical example. Suppose we have the following data:

docid

age

gender

1

18

female

2

20

female

3

18

male

Each row here is a document. Each document has a doCID. Then the inverted index for these documents is:

age

18

[1, 3]

20

[2]

gender

female

[1, 2]

male

[3]

As you can see, the inverted index is per field, where a field has its own inverted index. These are called terms, and [1,3] is Posting list. Posting List is an array of ints that store all document ids that conform to a certain term. So what are term dictionary and term index?


3.2 Term Dictionary

In order to quickly find a certain term, all terms are arranged in order and term is searched by dichotomy. The search efficiency of logN is the same as that of Dictionary, which is term Dictionary. Now, it looks like a traditional database. Why is the query faster?


3.3 Term Index

Suppose we have many terms, such as:

Carla,Sara,Elin,Ada,Patty,Kate,Selena

If this order is followed, it must be very slow to find a particular term, because there is no sort of term, and you need to filter all the terms to find a particular term. After sorting, it becomes:

Ada,Carla,Elin,Kate,Patty,Sara,Selena

In this way, we can use binary search to find the term of the target faster than full traversal. This is term Dictionary. Once you have term Dictionary, you can use logN disk lookups to get the destination. However, random reads of disks are still very expensive (approximately 10ms per random Access). So as little as possible to read disk, it is necessary to cache some data into memory. But the entire term Dictionary itself is too large to fit fully into memory. So we have term index. Term Index is a bit like a large list of chapters in a dictionary. Such as:

Term …………… starting with A Page. Xxx

Term …………… starting with C Page. Xxx

Term …………… starting with E Page. Xxx

If all terms are English characters, maybe the term index is really composed of 26 English characters. However, the actual situation is that term is not necessarily an English character, and term can be any byte array. Moreover, each of the 26 English characters may not have equal terms. For example, there may be no term beginning with x character, while there are many terms beginning with S character. The actual term index is a tree:

This tree does not contain all terms, it contains some prefixes to terms. Term index allows you to quickly locate an offset in the Term Dictionary, and then look up from there.

So term index does not need to store all terms, but just the mapping between some of their prefixes and the blocks of term Dictionary, combined with Finite State Transducers compression technology (FST), Term index can be cached in memory.

Now we can answer the question “why is Elasticsearch/Lucene faster than MySQL?” Mysql has only one layer, term Dictionary, which is stored on disk as a tree. Retrieving a term requires several Random Access disk operations. Lucene added term Index on the basis of term Dictionary to speed up retrieval. Term Index is cached in memory in the form of tree. After checking the block position of the corresponding term dictionary from the term index, we can look for the term on the disk, which greatly reduces the random access times of the disk.

Two additional points worth mentioning are: Term index is saved in memory in the form of FST (Finite State Transducers), which is characterized by very memory saving. Term Dictionary is stored on disk as a block, and a block is compressed with a common prefix, for example, Ab can be omitted if all words start with Ab. In this way, Term Dictionary can save more disk space.


3.4 FST

FSTs are finite-state machines thatmapaterm (byte sequence)to an arbitraryoutput.

Suppose we now want to map mop, moth, pop, star, stop and top(the term prefix in the term index) to the sequence number: 0,1,2,3,4,5 (the block location in the term dictionary). The simplest way is to define a Map, everyone to find their own location corresponding to the seat is good, but from the point of view of less memory, is there a better way? The answer is: FST

⭕️ indicates a status

– > indicates the status change process. Letters and digits indicate the status change and weight

Separate the word into individual letters by using ⭕️ and – >. The weight of 0 is not displayed. If ⭕️ is followed by a branch, the weights are marked, and the weights along the entire path add up to the number of the word.

FSTs are finite-state machines that map a term (byte sequence) to an arbitrary output.

FST stores all terms in bytes. This compression can effectively reduce the storage space so that term index can fit into memory, but it also requires more CPU resources for lookups.


3.5 Compression Technology

In addition to compressing term indexes with FST, there are also compression techniques for Posting lists. Posting List not only stores document ids, but also needs compression? The answer, of course, is yes. For example, if you want to index the gender of each Posting list, and there are tens of millions of kids in the world, each Posting List will have at least a million document ids. We need to compress our Posting list. How do we compress document ids effectively?


3.5.1 track Of Frame Of Reference

Delta encoding compression, the large number into decimal, by byte storage. Posting lists are ordered. One advantage of Posting is that it is easy to compress, as shown in the following illustration:

By deltas, you take the big number and turn it into a decimal and just store the incremental value, and then you sort it out by bits, and then you store it in bytes, instead of ints even though it’s 2.


3.5.2 Roaring bitmaps

Roaring Bitmaps, we have to start with bitmaps. A Bitmap is a data structure that assumes some Posting list:

,3,4,7,10 [1]

The corresponding bitmap is:

,0,1,1,0,0,1,0,0,1 [1]

Is intuitive, with 0/1 indicates a value exists, such as 10 this value corresponding to the 10th, the corresponding bit value is 1, so using a byte can represent eight document id, the old version before (5.0) of Lucene is in this way to compression, but it still not enough efficient compression way, if there are 100 million documents, That’s 12.5MB of storage for just one index field (we tend to have many). So someone came up with Roaring Bitmaps, a more efficient data structure.

Block the Posting list by 65535, for example, the first block contains document ids ranging from 0 to 65535, the second block has ids ranging from 65536 to 131071, and so on. < quotient, remainder > to represent each group of ids, so that each group of ids in the range of 0 to 65535, the rest is easy, since each group of ids will not become infinite, then we can use the most efficient way to store the id here.

Why is 65535 the limit? Because it equals 2^16-1, which is exactly the maximum number in two words, the unit of storage for a short, If a block has more than 4096 values, encode as a bit set, And otherwise as a simple array using 2 bytes per value”, otherwise as a simple array using 2 bytes per value”.

So why use 4096 to distinguish between an array and a bitmap threshold?

The space required by bitmap is constant: 65536/8 = 8192bytes, while the space required by short[] is: 2*N(N is the number of array elements), as shown below:

3.6 Joint Index Query

Age =18 AND gender= female; age=18 AND gender= female; gender= female; The general process is as follows: According to the filter condition age=18, first find the approximate position of 18 in term dictionary from term index, and then find the term 18 accurately from term dictionary. You then get either a Posting List or a pointer to the Posting List location. The process of querying gender= female is similar. Gender = female; age=18; gender= female;

This theoretical “and” merge is not an easy operation. For mysql, if you index both age and gender, only the most selective of the two fields will be used in the query, and then the other criterion is to calculate the rows in memory and filter them out. So how do you combine two indexes? There are two ways:

A) Use skip List data structures. We skip each other by going through gender and age Posting lists.

B) The bitset data structure is used to calculate the bitsets of gender AND age filters, AND the AND operation is performed on the two bitsets.


Merge using Skip List


The above are three Posting lists. We now need to combine them with the relationship of AND to produce the intersection of Posting Lists. First, select the shortest Posting List and then iterate from small to large. We can skip some elements, like if we go to the green 13, we can skip the blue 3, because 3 is smaller than 13.

The whole process is as follows:

Next -> 2
Advance(2) -> 13
Advance(13) -> 13
Already on 13
Advance(13) -> 13 MATCH!!!
Next -> 17
Advance(17) -> 22
Advance(22) -> 98
Advance(98) -> 98
Advance(98) -> 98 MATCH!!!Copy the code

The resulting intersection is [13,98], which takes much faster time than going through all three Posting lists. But the premise is that each list has to specify Advance, move it quickly. What kind of list can Advance do leapfrog? The skip list:

Conceptually, for a long Posting list such as:

,3,13,101,105,108,255,256,257 [1]

We can split the list into three blocks:

,3,13 [1] [101105108] [255256257]

The second skip list can then be constructed:

[1101255]

1,101,255 each points to its own block. This allows you to quickly move points across blocks.

Lucene naturally compresses the block again. The compression method is the Frame Of Reference mentioned above. As a bonus, skip List saves CPU by skipping the cost of traversal and the process of knowing how to compress these compressed blocks.


Merge with bitset

If you use bitset, it’s pretty straightforward, you just press bitwise and, and you get the final intersection.


conclusion

Lucene/Elasticsearch is an efficient way to search for data from the disk by moving it into the memory as much as possible.