“MySQL interview cheat sheet” index test points one side summary

I am fat brother, an unprofessional interviewer!

I am embarrassed, a rookie actively looking for a job, embarrassed said: the most afraid of the interview is that the interviewer asked too general knowledge points, they can not quickly locate the key points!!


The main interview points of this period

What is your understanding of index?
Explain why the computer level index is fast?
Why doesn't the interviewer use a hash structure as the index structure?
Why don't interviewers use a binary tree as an index structure?
Why do the interviewers use B+Tree instead of B-tree?
Should you create as many indexed columns as possible for a table that is indexed for an accelerated query?


What is your understanding of index?

When it comes to indexes, the first thing that comes to mind is probably a dictionary directory. According to the official definition of MySQL, an index is a data structure used to help MySQL retrieve data efficiently.

In essence: An index is an ordered, fast-lookup data structure used to find data quickly and efficiently.

In short, it can be likened to a dictionary catalog or a train schedule.

Explain why the computer level index is fast?

When a computer fetches data from disk and loads it into memory, it typically goes through three common time-consuming processes:

2. Rotation delay (time) : the time taken to determine which sector of the track the data to be read is in. 3. Data transmission (time) : the time taken to load the data into the memory

Each load of data is called a disk I/O, and each I/O operation takes time = seek + rotation delay + data transfer (the time is so short that it is negligible).

In fact, the actual loading time into memory is very short, and the main time of an IO operation is due to seek and rotation delays.

In general, a typical IO operation takes only a few ms. If it is 4ms, although it looks very short, but the database of millions of levels of data to load again, it needs 4000s, for a system, is simply the destruction of the level.

All we need to do is reduce the number of disk IO and that’s the point of using an index!! Indexing can guarantee data at the billion level with only 2-4 disk IO, which is a good thing!

Why doesn’t the interviewer use a hash structure as the index structure?

In normal business scenarios, most queries are usually scope queries like:

select id, name, age from sys_user where age between 18 and 28;

A HASH structure is used as an index, so the storage engine calculates a HASH value for each row of table records. A HASH index stores the HASH code.

The HASH code is generated directly and randomly, without any regularity

The irregular HASH code results in random distribution of data, which results in two closely related row records that are most likely to be allocated to different buckets (blocks).

The worst case scenario is a disk IO for every record found (horrible).

It is very sensitive to exact search and friendly to full value matching. Therefore, the query efficiency of a single record is very high and the time complexity is 1. However, for our daily business, the most commonly used is range search, so the non-hash structure is suitable.

Just remember one thing: Hash indexes are good for exact lookups, full value matches, not range lookups.

MySQL currently has a Memory engine and an NDB engine that supports Hash indexing.

Why don’t interviewers use a binary tree as an index structure?

Let’s first look at the binary tree structure

The binary tree has at most two child nodes. This structure leads to the high height of the tree and the increase of IO times. In special cases, it may be transformed into a linked list structure, which is equivalent to full table scan and full disk IO.

Assuming a binary tree structure as an index, ideally a complete binary tree, then a complete binary tree with n nodes has a depth of log2x+1

(where x represents the largest integer not greater than n)

If a piece of data is at level 100 of the binary tree, 100 disk IO are required to find the data. In the worse case, the binary tree will degenerate into a linked list structure, i.e., an oblique binary tree.

Similarly, a balanced binary tree is also very high.

Why do the interviewers use B+Tree instead of B-tree?

Since the height of the binary Tree structure is very high, resulting in increased disk IO during the query, what about B-tree? B-Tree can store more data, lower height, why not choose? But the B + Tree?

B-tree is a multi-path balanced search Tree. Compared with the binary Tree structure, the disk IO times can be greatly optimized. However, each node of B-tree contains not only the key (index value) of the data, but also the data (whole row of records).

So why not use the B-Tree structure? Same old problem, disk IO number!!

We know that MySQL reads data in pages (disk blocks), and there is a finite amount of storage per page (or disk block)

If the data is large, this will result in a small number of indexes stored per page

Therefore, when the amount of data stored in the data table is large, the depth of B-tree will also be large, and the disk I/O times will be increased during the query, thus affecting the query efficiency.

As for B+Tree, B+Tree is an optimized structure for B-Tree, making it more suitable for external storage index structure

1. Non-leaf nodes only store key-value information (index information). 2. All data records are stored on leaf nodes in the same layer in order of key-value size

Advantages: The non-leaf nodes of B+Tree only store key value information, so each page can store more indexes, the height of the Tree is compressed to a very low, the disk IO times are smaller, generally 2~4 IO, you can query the desired record.

And because the table data is stored sequentially in the B+Tree structure leaf nodes, so range lookup is very friendly, efficient!

Should you create as many indexed columns as possible for a table that is indexed for an accelerated query?

Although the advantage of indexes is to speed up query efficiency and reduce disk IO times, creating too many indexes blindly greatly increases the time and space cost of index maintenance.

Let’s start with the benefits of indexing

1, reduce the number of IO, improve - retrieval efficiency 2, reduce data sorting cost, can reduce CPU consumption

Time cost

Because an index is an ordered, fast-lookup structure, maintaining the fast-lookup and fast-order nature of the index requires constant adjustments, which require a time cost.

It takes time to create and maintain indexes, and as data is added, deleted, and modified in a table, indexes are maintained dynamically, which slows down data maintenance.

And the cost of this time increases as the amount of data increases!

The cost of space

Second, each index is a B+Tree that holds indexes and references to the entity table, which takes up space.

If you set up a clustered index, where the data and primary keys are stored in the index file, you will need more space cost.

Please look forward to the second content of the awkward little white index!

More exciting content, welcome to pay attention to WeChat public number: jiongyao fat matter (or search: jiongmefeishi)

Read the original MySQL interview cheat sheet