What data structures do MySQL indexes use? Why use B+ as an index?

Use B+ trees.

This problem, you can think about it in your mind, if you design the index of the database, how will you design?

We still use “Why”? What? How? Look at this in three steps.

Why do you need an index? What is the index? How does the index work?

Now why do we need B+ trees? What is a B plus tree? How do I use B+ trees?

Answer: the main function of most programs is to deal with data, write, query, transform and output. The most vivid analogy is the relationship between tree and content and table of contents. Table of contents is the index, according to which we can quickly get the page number of the content we want.

Why B+ trees? There are several reasons:

  • If AVL is used to balance the binary tree, the tree height is too high, and the index query needs to access the disk. Each access requires disk I/O on a node basis. Therefore, the I/O operation of data reading needs to be reduced as much as possible.

  • B + trees are often used to compare the B tree, B + tree than B tree has a big B + tree is the feature of all keywords appear on a leaf node (dense index) in the list, and the key words in the list is ordered, for a range search, such as 15 to 50, B tree need sequence traversal of binary tree, but directly in the B + tree leaf nodes sequential access.

What is the leftmost matching principle?

First of all:

Leftmost prefix matching rule: MySQL follows the leftmost prefix matching rule when creating a federated index. That is, the leftmost prefix matches the data from the leftmost in the federated index.

For example, if we have a student table, we create a joint index index_magor_class(magor,class) based on the college number + class. This index consists of two columns.

The bottom level of the index is a B+ tree, so the bottom level of the joint index is also a B+ tree, but the B+ tree nodes of the joint index store multiple values separated by commas.

Example: to create a federated index index_magor_class(magor,class), its index tree would look like the following figure.

It sorts first by magor, then by class, and if there are any fields after the index, and so on.

If only the class is passed in, the where condition will not reach the union index, but if only the college number is passed in, it may reach the union index. For example, if you have a small amount of data in your database, you may be able to perform a full traversal faster than the index.

How to design the index when building the table? Is there any index optimization?

1, use overwrite index to perform query operations, to avoid back table operations.

Explanation: If a book needs to know the title of chapter 11, will it turn to the page corresponding to chapter 11? Just browse the directory, this directory is to play the role of overwriting the index.

For example, your primary key index is student number, you write select statement, directly select student number from table can, do not select other fields, generally unless very necessary, as far as possible according to the select field, use less or do not select, otherwise also need to return to the table.

Here I explain back table, for example, our table primary key index is student number, in addition, we also built an index according to the mobile phone number, if we condition is mobile phone number, divided into two cases:

Example: IDB can create index types are divided into [primary key index, unique index, ordinary index], and overwrite index is a kind of query effect, with explain results, extra column will appear: using index.

  • If we select the field is student number, directly in the number of the index table can obtain the data, do not need to return to the table;

  • If there are other fields when we select, when we query the process is like this, first according to the mobile phone number to check the student number, and then according to the student number to the primary key index table query data, this process call back the table.

2. It is recommended to build a unique index for fields with unique features, even for combined fields. Note: Do not assume that the unique index affects the insert speed, the speed loss can be ignored, but the increase in search speed is significant; In addition, even if checksum control is very good at the application layer, as long as there is no unique index, murphy’s law is bound to produce dirty data.

3. Disallow join for more than three tables. The data types of the fields to be joined must be the same. When multiple tables are used for associated query, ensure that the associated fields have indexes. Note: Pay attention to table index, SQL performance even if double table JOIN.

4, when creating an index on a vARCHAR field, you must specify the index length. It is not necessary to create an index on the whole field. The index length depends on the actual text distinction. Note: The length of an index is contradictory to its distinctiveness. Generally, the distinctiveness of an index with a length of 20 is over 90% for string data. You can use the distinctiveness of count(distinct left(column name, index length)/count(*) to determine the distinctiveness.

5, page search is strictly prohibited left fuzzy or full fuzzy, if necessary, please go to the search engine to solve. Note: The index file has the left-most prefix matching feature of b-tree. If the left value is not determined, the index cannot be used.

6, SQL performance optimization goal: at least range level, requirements are ref level, if can be const preferably. Description:

  • 1) There is at most one matching row (primary key or unique index) in a const single table, and data can be read during optimization.

  • 2) ref refers to using a normal index. (normal index)

  • 3) range Retrieves the index in range. Counter example: explain table results, type=index, index physical file full scan, speed is very slow, this index level comparison range is also low, and full table scan is nothing.

7. When building composite indexes, the most discriminating index is at the far left. If a=? and b=? , the value of column A is almost unique, so only the idX_A index is needed. Note: If non-equal sign and equal sign mixed judgment condition exists, please place the equal sign condition column in front when building the index. Such as: the where c >? and d=? Therefore, d must be placed in the forefront of the index even if C is more differentiated, that is, the combined index IDx_D_C should be established.

8, to prevent the implicit conversion caused by different field types, resulting in index failure.

Have you used MyBatis? Is the first and second level cache clear?

  • Level 1 cache

    Mybatis level 1 cache refers to SQLSession, the scope of level 1 cache is SQLSession, Mabits enabled level 1 cache by default. When the same SQL query is executed in the same SQL Session; The first time it will query the database and write to the cache, and the second time it will fetch directly from the cache. If an increment, deletion, or change occurs between two queries during SQL execution, the SQLSession cache will be cleared. Each query is first searched in the cache. If it cannot be found, the database is queried and the result is written to the cache. The internal cache of Mybatis uses a HashMap with a hashcode+statementId+ SQL key. Value is the Java object mapped from the query result set. SqlSession After the INSERT, UPDATE, and delete operations are committed, the SqlSession cache is cleared.Copy the code
  • The second level cache

    Level 2 cache is mapper level, Mybatis default is not enabled level 2 cache. The SQL under mapper is called for the first time to query user information, and the queried information is stored in the level-2 cache area corresponding to the Mapper. When the mapper mapping file under the namespace is called the second time, the same SQL is used to query user information, and the result will be obtained from the corresponding level 2 cache.Copy the code

MySQL master/slave synchronization Is the binlog clear?

  • Any changes made to the Master database are immediately recorded in the Binary log file

  • The Slave database starts an I/O thread to connect to the Master database and requests binary logs of the Master’s changes

  • Slave I/O Obtains binary logs and saves them in its own Relay log file.

  • The Slave has an SQL thread that periodically checks for changes in the Realy log and updates the data

MySQL > select * from database; How is it designed?

Why? :

When a table is in the tens of millions, the time it takes you to query a table increases, and if you have a federated query, I think you’re probably going to die there. The purpose of the table is to reduce the burden of the database, shorten the query time.

Mysql has a mechanism called table locking and row locking to ensure data integrity. Table lock means that none of you can operate on this table until I finish operating on it. The same goes for row locking. Other SQL must wait until I’m done with this data before I can operate on it.

The When? (When do I need a separate table?) :

You are advised to divide databases and tables only when the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB. Note: Do not create a separate table from the database if the data volume is not expected to reach this level in three years.

Counterexample: a business has only 20,000 rows of data for three years, but it is divided into 1024 tables. Ask: Why do you design this way? A: Divided into 1024 tables, is not the standard?

How? (There are several strategies to divide database and table) :

Split vertically or horizontally

Split middleware, for details, please refer to:

  • Sharding-sphere, formerly sharding-JDBC; Dangdang sub-database sub-table middleware

  • TDDL: JAR, Taobao Distribute Data Layer;

  • Mycat: middleware.

Note: the pros and cons of the tool, please do your own research, official website and community priority.

  • Angela has seen 64 tables with numbers from 0 to 63 modded according to userId % 64.

  • Fixed bit split, take the specified two digits of userId, such as the last 2,3 digits composed of 00 to 99 total 100 tables, 100 library table table.

  • Hash: userId hash, then % table number;

  • Range: Also according to the userId specified Range, 0-10 million a table, this is used less, easy to generate hot spots.

  • Separate the tables of different business domains into different libraries, such as order related tables, user information related tables and marketing related tables in different libraries;

  • Store large fields in a separate table

  • Store infrequently used fields in a separate table

Use userId to do sub-database sub-table, now need to use the phone number query how to do?

Create a phone number index table to store phone numbers and userids in the same way as the back table logic. The userId is queried by phone number first, and then the data is queried by userId.