The text links: mp.weixin.qq.com/s/krJ7vXnNH… Reprinted from yes training guide official account all pictures from the group chat

Of course, a good blogger should have love wife, and I am one of them, but recently the blogger unexpectedly abandoned, we who love wife that can not have a look at what the blogger has been busy recently, let’s get to the point: The story of this article is very good, and this article is also recommended because little Y [the protagonist of the following] has been attacked, maybe the MySQL empire will be stalled, various reasons point to little Y little Y is trying to build the MySQL empire, as shown in the picture:

Live an intemperate life

Relentlessly touted

Recruiting love princess

Go to a different arena where you can relax without inserting videos, or you can watch how little Y relaxes

Increase efforts to build MySQL empire

Honest people don’t build a MySQL empire

Compile code online fishing, regardless of MySQL empire building

The crowd saw right through it. It was just a hint

Group friends actually vindicate this does not concentrate on the construction of the empire of the difficult people, competing for jealousy

Actually suspect group friend is not his love concubine

Writing on vacation, but instead of building a MySQL empire,

Warm greetings love concubines, and then want to poison [circle of friends only open within three days of no map]

Successfully deviating from the MySQL empire again

Once again,

Once again, you have to indulge yourself

Show off and be honest

I want to build a MySQL empire, but Y has been hit

Little Y has been knocked out

I’m M. I’m on the planet Kalabara. I love data, and I aspire to be a data manager. So I applied to Y Company. I heard that they have a large amount of data. The interview process is pretty easy. I easily beat a bunch of applicants who boasted of 996 with three 007 numbers. At the moment, I’m in charge.

1

Today is my first day at work, and I’m sitting bolt upright at my desk, waiting for my boss to favor me. At 2 p.m. Boss Y opened the door, yawned and looked at me. “007, you got a back problem? Sit up so straight?” “Boss hello, I call small M, is not called 007, 007 is my love for the company, is my life….” “Hold on, hold on. See a dozen pieces of data on the desk in front of you?” “Your job is to manage it. It can be added, reviewed, deleted and updated all the time.” Excited and trembling, I replied, “Yes, Sir!”

2

It took me ten minutes to scan the data on my desk. Yeah, that’s right. I spent ten minutes on a dozen pieces of data. Because this is my first job and I love it! The data comes from a department called User, and they all have the same structure:

So this is a bunch of data with regular ids, so I thought I’d just put them in order, and I’ll connect them with a linked list.

Every time I look for data, I start from the beginning and I just go back, orderly, easy. I’m such a genius. So day after day, year after year, the User data continues to increase, now up to a hundred.

The linked list is getting longer and longer, and every time my boss comes to me for data, my search time is getting slower and slower. And not only big bosses, I found that many small bosses also came to me for data. I am tired to death, and my waist is gradually not straight.

3

May 1st, late at night, today is Labor Day. I still work overtime in the company. I think this can’t go on anymore. It’s time to bring up my secret weapon! I can only summon it in the dead of night when I am alone! The powder of snails!

Yes, that’s it! Because I found that after the snail noodles, my mind is very clear, thinking is particularly divergent. For this, I also specially went to the hospital to check the next brain, the doctor said that everything is normal from the film, but from the feeling that I may not be normal. Anyway, I know that snail noodles can really give me a clear mind. Because my brain is already working! There are! I suddenly remembered that in college, there was a course called Data Structures that talked about dichotomy. There were now nearly a thousand pieces of ordered data, and I sorted them into groups of ten, so I did a bit of guzzling.

Then I configure a slot for each group, and each slot records the address of the largest record in each group.

In this way, I can quickly find records through dichotomy! So let’s say I have 10 sets of data, and I want to find ID equals 12, and I say:

  1. First compute the position of the middle slot(1 + 1) / 2 = 5, find the fifth group through the address, now the fifth group ID is 50, 12<50, so continue to divide.
  2. (1 + 5) / 2 = 3, find the third group through the address, now the third group ID is 30, 12<30, so continue dichotomy.
  3. (1 + 3) / 2 = 2, find the second group by the address, now the second group ID is 20, 12<20, so continue the dichotomy.
  4. (1 + 2) / 2 = 1Find the first group by the address. Now the first group ID is 10, 12 is greater than 10, and now we can tell that 12 is in the second group.
  5. It looks like the first group and the second group are separate, right? The addresses are actually connected, so if you go through the first and last item, you can go through the one-way list and find this item with ID 12.

Is it convenient? To sum up:

  1. Find the slot where the data resides by using a dichotomy.
  2. The data is then iterated through a one-way linked list.

When the amount of data is very large, this search is very fast! Because the time to find the data has been reduced from O(n) to almost O(LGN)! I call them page catalogs, and I’m such a genius!

4

And so it goes, day after day, year after year. The amount of User data continues to grow day by day. I found that every query required pulling out the entire list. I can hardly carry my little arms and legs. So, on a dark and windy night, I took out the snail noodles again. Aha!

I split the data on a boundary of a thousand data points, and I call each thousand data points a page. Yes, I came up with the idea again. I divided all the data into one page and connected each page with a bidirectional linked list.

So instead of having to pull out all the data at once for each query, I can go through it page by page. Of course, the inside of the page is still grouped as before. Now here’s how I find the data:

  1. I start at the first page of every data search.
  2. Then according to the way of page search bisection to look up the data, can not find through the linked list to access the next page.

As a result, access isn’t faster, just page by page instead of pulling out all the data at a time. My arm was free.

5

As the company got bigger and bigger, User’s data exploded. There are more and more separate pages, and bosses and small bosses are starting to complain. The boss says, “I asked you to find someone, and you looked for an hour? Do you want your year-end bonus or not?” “Alas, that person in the last page, I turn to die to turn, I am too difficult!” Although complain in the heart, but I know this is not the way to go on. The head can be broken blood can flow, year-end bonus can not be little! Don’t ask, ask is snail noodles! Sure enough, snail noodles are unbeatable. There’s a solution! Each page is marked with a unique page number. Reference the design of book catalogue, I still made a page specially, the storage inside the page is catalogue! I call it the contents page.

You see, so I can go through the table of contents page and find the corresponding data page. For example: I’m now looking for the data with ID 500.

  1. I walk through the table of contents page data and I know it’s on page 1.
  2. Then start looking up the data within the page using the old-fashioned dichotomy!

Some people may think that the directory page is also a size! What if it doesn’t fit? Nothing, and the same data page, new page bai!

The likelihood somebody can ask, that directory page is much, search directory page also can become slow! You’re right, but I’m not too clever for that! We can have another level of catalogs, which I call meso (just kidding)!

This way, every time I query from the root directory, I can find the data I want in just a few queries! I call this whole thing the index!

Boss Y saw my design, patted my head, “007, good skill, I see your structure looks like a tree, I like to brag B, how about you call this thing B tree?” “No boss, I think this B lattice is not high enough, or call B+ tree?” “Ok, 007, year-end bonus I send you in advance!” “Ding Ling, alipay to account, 0.1 yuan.” Me: “…” “Right 007, this User ID is too hard to remember, next time I’m going to just tell you the name, let you find.” Inside me: “@#%^&%……..” Story unfinished, please look forward to ~


Ha ha, first time to write this kind of story, a little interesting. MySQL InnoDB cluster index design, how to find MySQL data. I remember the interviewer at Ali asked me this question and asked me to explain how the data was found in the index, in as much detail as possible. Heh heh, now you know. However, because of the story, some of the descriptions are inaccurate, such as the one page for every thousand pieces of data. I unify below correct and add a few points, valued ah!

  • MySQL defaults to 16 KB per page, so not by number, but by total number of records.
  • In-page records are one-way list links, and bi-directional list links are between pages.
  • When a page of data is full, it needs to split the record into two pages.
  • For example, if you insert data to the middle of a page and squeeze data to the next page, the next page is also full, which cascades and affects performance. Therefore, it is recommended that the primary keys be ordered so that data cannot be inserted to the middle page.
  • By default, there is a maximum record and a minimum record in a MySQL page that does not store data, similar to a linked list dummy node.
  • In addition to storing data on a page, there’s some metadata, like FileHeader, PageHeader, etc., which is so detailed that you can remember it now, but you won’t remember it in two weeks, and neither will I.
  • A record also has a lot of detail.

Like most of you probably know InnoDB’s B+ tree index design. You can say why you want to use B+ trees, but you rarely talk about binary search in the page. In fact, this design is very common, like Kafka index also uses binary, generally large data, ordered cases will be binary. The next time an interviewer asks you, tell them this, and they’ll be like, wow, wow, wow. By the way, what I said above is just the general layout of the index structure. If you want to see the details, you can see the root of understanding MySQL, such as FileHeader has what fields ah what. However, I personally think there is no need to be so detailed, anyway, I can not remember, the essence of the master can. If you like this kind of story article, can leave a message to let me know, I will write more in this area later.