What is the index

MySQL officially defines an Index as: an Index is a data structure that helps MySQL obtain data efficiently. The data structure used by MySQL is B+ tree

Here I recommend you to read a book, “In-depth Understanding of Computer Systems.”

1.1 Locality principle

Access to programs and data are tend to gather in groups, in a time period, use only a fraction of that information will be used in the near future are likely to be and now are using information in the address space is the nearby (called a spatial locality), or recently visited the program code and data, quickly accessed possibility is very large (local time).

1.2 Disk Prefetch

A page is a logical block of memory. Operating systems often divide the main memory and disk storage areas into contiguous blocks of equal size. Each block is called a page (in many operating systems, the page size is usually 4K)

1.3 introduction

In the use of database, database query is usually one of the most important functions of the database. But each lookup algorithm can only be applied to a particular data structure.

  • Binary lookup, for example, requires the data to be retrieved in order
  • And binary tree search can only be applied to the binary search tree, but the organizational structure of the data itself can’t completely satisfy all kinds of data structure (for example, theoretically impossible to two columns at the same time in order to organize), so, in the data, also maintains a database system to meet specific lookup algorithm data structure, the data structure reference (pointing) data in some way, This allows you to implement advanced lookup algorithms on these data structures. This data structure is called an index.

Indexes are usually stored as files on disk, and index retrieval requires disk I/O operations. So the most important metric for evaluating a data structure as an index is the incremental complexity of the number of disk I/O operations during a lookup.

  1. Indexes are data structures that help MYSQL efficiently retrieve data
  2. Indexes are stored in the file system
  3. The file storage format of the index depends on the storage engine
  4. Index file structure: hash, binary tree, B tree, B+ tree

Ii. Classification of indexes

2.1 the hash

id name
1 Mobile phone
2 The computer
3 The tablet

Here is a mysql data file with two columns Id and name. If we use hash format, we just need to calculate the hash value of a certain column, and take a modulus of it according to the length of the array, we can get the position from 0 to 7N subscripts, so that the efficiency is actually quite high. However, using hash tables for storage, it has certain disadvantages:

  1. To use hash storage, all data files need to be added to the memory, which consumes memory space
  2. If all queries are equivalent queries, then hash is really fast, but in an enterprise or real work environment, range look-up data is more than equivalent queries, because hash is not suitable, so there is no format for hash storage in mysql

2.2 binary tree

Index format:



For tree was an update he fell over the order of the inside, don’t come up to look at a structure, first to understand what the tree, the tree is a tree roots, and then there are more than n branches, these branches are some tree structure, the more you have multiple tree branch (element), the search efficiency will be low, so there will be a binary tree, Binary tree why good point, because it is binary tree has two branches, but two branches, can cause an effect, is a time when every time we are in search for data, similar to binary search, but also has its own binary tree is not a good place, everybody can see us in the picture above the index of the binary tree format, The nodes on the left are shorter (only three reads), while the nodes on the right are much longer (five reads), resulting in a deeper tree.Each time a node in the tree is read, there will be an IO. The higher the depth is, the higher the IO will be, which will affect the efficiency of data reading, hence (balanced binary tree) and (red black tree).

Balanced binary tree: Maintain a balance, that is, the left child the difference between the tree and the height of the right subtree, is not greater than 1, but for us the above format is not suitable, because he has been more than 1, but AVL tree will have a problem is to adjust the number of too frequent, it involves an operation is rotating, a left-handed and a right-handed, It takes N rotations to maintain balance, which is a waste of time. Every time you add or delete something, you have to go through N rotations, which is inefficient

AVL Trees (Balanced binary search Trees) : Balanced binary search Trees

Red and black tree: Itself is a balanced tree, but it had a balance in the middle, and the loss is part of the balance of performance, but also maintain the relative balance, it made such an operation, is the eldest son, tree height, as long as no more than twice the shortest subtree, and at the same time it introduces in the red-black tree red and black two nodes information, With that information it can help us to do a balance, in the AVL tree has the balance, and the red and black tree had two kinds of rotation and discoloration for balance, red and black tree AVL tree is advanced, it lost a part of the balance of performance, but maintain our insert, and delete data efficiently, while it lost part of the performance, but it is still a balanced tree, Since it’s a balanced tree, its oldest tree is no more than twice as big as the shortest tree, which means that if the shortest tree is 4, then the oldest tree is 8, so when we’re looking for data, it’s not a binary search, and it’s inefficient again

Whether it is binary tree or red-black tree, the tree depth is too deep, resulting in more I/OS, affecting the efficiency of data reading. The most important thing is to reduce I/OS

IO is a bottleneck in our IT industry, one is the disk I/o is a network IO, we as software development, there is no way to adjust the hardware bottlenecks, only from decrease our IO from within the program, we have two directions, one is to reduce the number of IO, one is to reduce the amount of IO, from the two aspects to solve, For example, we used to read the data 10 times, but now we only need to read the data once, which is 10 times less I/O, we used to read 1MB of data, now we only need to read 1KB of data, which is why we do not recommend using select * from when writing mysql queries. Because this query will query N more fields, ORIGINALLY I only want two fields, but gave me 30 fields, this will cause the IO increase, so we will consider, can not reduce the number of index, so the following leads to our — B tree

2.3 B tree

B tree features:

  • All the keys are distributed throughout the tree
  • It is possible to end the search at non-leaf nodes and make a search in the whole set of keywords. The performance is close to binary search
  • Each node has at most m subtrees
  • The root node has at least two subtrees
  • Branch nodes have at least M /2 subtrees (all branch nodes except root and leaf nodes)
  • All leaf nodes are in the same layer, and each node can have up to m-1 keys, in ascending order



B tree structure description:



Illustration of example diagram:

Each node occupies a disk block. On a node, there are two ascending keywords and three Pointers to the child root node, which store the address of the disk block where the child node resides. The three scope fields divided by the two keywords correspond to the scope fields of the data of the subtree pointed to by the three Pointers. The root node is used as the column, and the keywords are 16 and 34. The data range of the subtree to which the P1 pointer points is less than 16, the data range of the subtree to which the P2 pointer points is 16-34, and the data range of the P3 pointer is greater than 34

Search keyword (28) procedure:

  1. Find disk block 1 according to the node, read the memory
  2. Compare keyword 28 to find pointer P2 to disk block 1 in the interval (16,34)
  3. Find disk block 3 according to P2 pointer, read into memory
  4. Compare keyword 28 in interval (25,31) to find pointer P2 to disk block 3
  5. Disk block 8 (P2); disk block 8 (P2);
  6. Find keyword 28 in the keyword list in disk block 8

Disadvantages:

  • Each node has a key and also contains data. The storage space of each page is limited. If data is large, the number of keys stored in each node will be smaller
  • When a large amount of data is stored, the depth is large, and the DISK I/O count increases, affecting the query performance

2.4 B + tree

B+Tree is an optimization based on BTree. The changes are as follows:

  • B+Tree each node can contain more nodes for two reasons. The first reason is to reduce the height of the Tree, and the second reason is to change the data range into multiple ranges. The more ranges, the faster the data retrieval
  • Non-leaf nodes store keys (disks 1,2 and 3 are storage keys). Leaf nodes store keys and data
  • The sequential query performance is higher when two Pointers on leaf nodes are connected (meeting the prefetch feature of disks)

If there are no other nodes under the current disk block, it is a leaf node; otherwise, it is a non-leaf node

Structure:



Note: On B + Tree has two head pointer, a pointer to the root node, another point to the smallest leaf nodes keyword, and all the leaf nodes (i.e., data nodes) is a kind of chain ring structure, therefore the B + Tree can be two kinds of query operations, to the extent of the primary key is a lookup and paging search, another starts from the root node, Do a random lookup.

Mysql storage engine

3.1 mysql innoDB (leaf nodes directly place data)

id name
1 The computer
2 Mobile phone
3 The refrigerator
4 air conditioner
5 fan
6 Color TV

3.1 mysql innoDB (leaf nodes directly place data)

Store the corresponding row record

1, InnoDB by B + Tree structure to the primary key index creation, and leaf nodes stored in the record, if there is no primary key, so can choose only one key, if there is no unique key, then generates a six row_id to a primary key 2, if you create index key is the other fields, so the record is stored in the leaf node of the primary key, Then find the corresponding record through the primary key index

Create an index on name

The name column stores the ID, and then uses the ID to find the corresponding key and data

Mysql 3.1 MyISAM

The following 0X0022 is actually the address, showing that according to our ID, find our address, and then use the address to find the corresponding table data

Classification of indexes

There are five types of mysql index: primary key index, unique index, normal index and full text index, composite index. By adding indexes to fields, you can speed up data reading and improve the concurrency and stress tolerance of projects

  • Primary key index:

    A PRIMARY KEY is a unique index, but it must be specified as the PRIMARY KEY. There can only be one PRIMARY KEY per table

  • The only index

    All values for indexed columns can occur only once, that is, they must be unique and can be null

  • Normal index

    Basic index type. The value can be null without any restriction on uniqueness

  • The full text indexing

    The index type of a full-text index is FULLTEXT. Full-text indexes can be created on vARCHAR, CHAR, or text columns

  • Composite index

    An index of multiple column values designed for combined searches

Mysql storage engine

1 MyISAM InnoDB
The index type Non-clustered index Clustering index
Support transactions no is
Support table locks is is
Support line lock no is
Support foreign keys no is
Full text index support is Yes (supported after 5.6)
Use action types A large number of select Lots of INSERTS, deletes, updates

summary

When writing this article, the small farmers’ company group is constantly receiving messages, because there are problems in the project that I need to solve, today’s mysql index mechanism is over here, for the article do not understand or questions, welcome students to leave a message below, small farmers will see the first time to reply to you, thank you, everyone cheer ~