directory

  • directory
  • preface
  • Indexes related to
  • Issues related to
  • Table structure design
  • Storage engine correlation
  • Scattered problem
  • Refer to the article

preface

This article is mainly for developers, so it does not involve MySQL service deployment and other operations, and more content, we are ready to patience and melon seeds mineral water.

A while ago, I systematically studied MySQL and had some practical operation experience. When I happened to see an interview article related to MySQL, I found that I couldn’t answer some questions well. Although I knew most of the knowledge points, I couldn’t connect them together.

So I decided to do a MySQL Soul 100 quiz to try to answer the questions in a way that makes me understand the knowledge point a little more deeply.

This article will not explain mysql in detail from the use of select, mainly aimed at the developers need to know some mysql knowledge, including index, transaction, optimization, etc., in the interview of the high frequency question form answer. If you have other MySQL interview questions that you find interesting or difficult, you can comment on them or email them to [email protected] and I will include them in this article with your name.

Most of the content of this article has been explained in detail in other articles, especially indexes and transactions, respectively, in the following article, for those who are interested in further understanding.

MySQL indexing is extremely optimized

Mysql transaction extreme isolation level

Indexes related to

The MySQL index is linked to the MySQL index principle and its optimization.

1. What is an index?

An index is a data structure that helps us quickly find data.

2. What data structure is an index?

The data structure of index is related to the implementation of specific storage engine. Indexes used in MySQL include Hash index,B+ tree index, etc. The default index of InnoDB storage engine we often use is B+ tree index.

3. What’s the difference between a Hash index and a B+ tree?

First, understand the underlying implementation of Hash indexes and B+ tree indexes:

The underlying hash index is a hash table. When searching for data, you can call the hash function once to obtain the corresponding key value, and then query back to the table to obtain the actual data. The underlying implementation of a B+ tree is a multi-path balanced lookup tree. For each query, it starts from the root node, and the key value can be obtained when the leaf node is found, and then it is judged whether it is necessary to query data back to the table according to the query.

So you can see that they have the following differences:

  • Hash indexes are faster for equivalent queries (in general), but not for range queries.

After the hash function is used to create indexes in the hash index, the index order cannot be the same as the original order, and range query cannot be supported. All nodes of a B+ tree follow the rules (the left node is smaller than the parent node, the right node is larger than the parent node, and the same is true for multi-fork trees), which naturally supports the range.

  • Hash indexes do not support sorting by indexes.

  • Hash indexes do not support fuzzy query and left-most prefix matching of multi-column indexes. It also works because hash functions are unpredictable. The indexes of AAAA and AAAAB have no correlation.

  • Hash indexes can always be used to query data back to the table, whereas B+ trees can use indexes only when certain conditions (clustered indexes, overwriting indexes, etc.) are met.

  • Hash indexes, while fast for equivalent queries, are not stable. Performance is unpredictable. When there are a large number of duplicate key values, hash collisions occur, and the efficiency may be very poor. The query efficiency of B+ tree is relatively stable. All queries are from the root node to the leaf node, and the height of the tree is relatively low.

Therefore, in most cases, choosing B+ tree indexes directly can achieve stable and good query speed. Instead of using hash indexes.

4. The above mentioned B+ tree does not need to query data back to the table when it meets the requirements of clustered index and overwritten index. What is clustered index?

In the index of a B+ tree, the leaf node may store the current key value, or it may store the current key value as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index.

When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table.

5. Will non-clustered indexes always be queried back into the table?

Not necessarily. This involves whether all the fields required by the query match the index. If all the fields match the index, then there is no need to perform the query back to the table.

Select age from employee where age < 20; select age from employee where age < 20; select age from employee where age < 20;

6. What are the factors to consider when building an index?

When building an index, you should take into account the frequency of field usage. Fields that are often queried as conditions are more suitable. If you need to build a federated index, you also need to consider the order in the federated index. There are other things to consider as well, such as preventing too much of everything from putting too much pressure on the table. It all has to do with the actual table structure and how queries are made.

7. What is a federated index? Why do I care about the order in a federated index?

MySQL can use multiple fields to create an index at the same time, called a federated index. If you want to match an index in a joint index, you need to match the index one by one in the order of the fields when the index is created. Otherwise, the index cannot be matched.

Specific reasons are as follows:

MySQL > create index (name,age,school); MySQL > create index (name,age,school); MySQL > create index (school);

When the query is performed, the index is only strictly ordered according to name, so the name field must be used for equivalent query first, and then the matched columns are strictly ordered according to age field, which can be used for index search at this time,, and so on. Therefore, when establishing a joint index, we should pay attention to the order of index columns. In general, the columns with frequent query requirements or high field selectivity should be placed first. Additional adjustments can be made individually, depending on the specific query or table structure.

8. Is the created index used? Or how do I know why this statement is running slowly?

MySQL provides the Explain command to view the execution plan of a statement. Before executing a statement,MySQL runs the statement through the query optimizer and then gets an analysis of the statement, known as the execution plan, which contains a lot of information. The possilbe_key,key,key_len, and other fields can be used to analyze whether the index is hit. The possilbe_key,key,key_len, and other fields can be used to describe the possible index, the actual index, and the length of the index.

9. In what cases can an index be created for a column but not used in the query?

  • Use is not equal to query,
  • Columns are involved in mathematical operations or functions
  • The left side of the string like is the wildcard. Aaa is similar to the ‘%’.
  • Mysql does not use indexes when parsing full table scans faster than using indexes.
  • When using a federated index, the first condition is a range query, and the second condition cannot use the index even if it complies with the left-most prefix rule.

In this case,MySQL cannot use the index.

Issues related to

1. What are transactions?

Understand what is the most classic transaction is the transfer of chestnut, I believe we also understand, here no longer say side.

A transaction is a series of operations that conform to ACID properties. The most common understanding is that the operations in a transaction either all succeed or all fail. But that’s not enough.

2. What is ACID? Can you explain it in detail?

A=Atomicity

Atomicity, as I said above, is all or nothing. It is impossible to perform only part of the operation.

C=Consistency

The system (database) always moves from one consistent state to another; there is no intermediate state.

I=Isolation

Isolation: In general, a transaction is not visible to other transactions until it is fully committed. Notice that in general I put red on the front, which means there are exceptions.

D=Durability

Persistence. Once a transaction is committed, it will remain so forever, even if a system crash does not affect the outcome of the transaction.

3. What about having multiple transactions going on at the same time?

The concurrent execution of multiple transactions generally causes the following problems:

  • Dirty read: transaction A reads uncommitted content from transaction B, and transaction B rolls back later.
  • Non-repeatable reads: Setting transaction A to read only what transaction B has committed will result in two different queries within transaction A, because transaction B committed during this time.
  • Phantom read: transaction A reads A range of contents while transaction B inserts A single piece of data in the meantime. Causing “hallucinations “.

4. How to solve these problems? MySQL transaction isolation level

The four isolation levels of MySQL are as follows:

  • READ UNCOMMITTED

This is the exception to the above isolation level where other transactions can see changes that have not been committed by the transaction. This causes the problem of dirty reads (reading uncommitted portions of other transactions that are later rolled back).

This level of performance does not have a big enough advantage, but it has many problems, so it is rarely used.

  • READ COMMITTED

Other transactions can only read the committed part of the transaction. This isolation level has the problem of non-repeatable reads, where two reads in the same transaction get different results because another transaction has modified the data.

  • REPEATABLE READ(REPEATABLE READ)

The repeatable read isolation level solves the problem of unrepeatable reads above (as the name indicates), but there is still a new problem, which is unreal reads, when you read id> 10 Rows of data to involve all the lines and read lock, exception a transaction at this time to insert a new article id = 11 data, because it is newly inserted, so will not trigger rejection of the lock, so for this transaction for the next query are found to have an id = 11 data, and the last query operation did not get to, then insert will have the Lord Key conflicts.

  • SERIALIZABLE

This is the highest isolation level and solves all of the problems mentioned above, because it forces all operations to be executed serially, which leads to a very rapid decline in concurrency performance and is therefore not very common.

5. What isolation level does Innodb use?

InnoDB uses repeatable read isolation by default.

6. Do you know about MySQL locks?

When a database has concurrent transactions, data inconsistencies may occur, and some mechanism is needed to ensure the order of access. The locking mechanism is such a mechanism.

Just like a hotel room, if people go in and out at random, there will be many people snatches for the same room, and a lock will be installed on the room. Only the person who has obtained the key can enter and lock the room, and others can use it again only after they have finished using it.

7. What locks does MySQL have? Isn’t locking like the above kind of hinders concurrency efficiency?

Will tell from the type of lock, have share lock and exclusive lock.

Shared lock: also known as read lock. When the user wants to read the data, a shared lock is placed on the data. Multiple shared locks can be added simultaneously.

Exclusive lock: also known as write lock. An exclusive lock is placed on the data when the user writes to it. Only one exclusive lock can be added, and other exclusive locks and shared locks are mutually exclusive.

In the above example, there are two kinds of user behaviors. One is to view the house. It is acceptable for multiple users to view the house together. One is a real one-night stay, during which neither those who want to stay nor those who want to see the house are allowed.

The granularity of locking depends on the specific storage engine. InnoDB implements row-level locking, page-level locking, and table-level locking.

Their locking overhead varies from large to large, and their concurrency varies from large to small.

Table structure design

1. Why try to have a primary key?

Primary keys ensure the uniqueness of data rows in the entire table. You are advised to add a self-growing ID column as the primary key even if the table does not have a primary key. After setting the primary key, it is possible to make subsequent deletions faster and ensure the safety of the operation data range.

2. Does the primary key use an autoincrement ID or a UUID?

It is recommended to use the autoincrement ID instead of the UUID.

Because in InnoDB storage engines, the primary key index as a clustering index, that is, the primary key index of B + tree leaves node stores the primary key index, and all the data (in order), if the primary key index is the ID, so you just need to constantly backward arrangement, if it is a UUID, due to the size of the ID with the arrival of the original not sure. It causes a lot of data inserts, a lot of data movement, and then a lot of memory fragmentation, which in turn degrades insert performance.

In general, in the case of large data volumes, the performance is better with auto-increment primary keys.

MySQL > alter table MySQL > alter table MySQL > alter table MySQL > alter table MySQL > alter table MySQL > alter table MySQL_uuidFor tests using the UUID primary key, the performance of inserting 100W and 300W rows was tested.

As for the primary key being a clustered index,InnoDB selects a unique key as the clustered index if there is no primary key, and generates an implicit primary key if there is no unique key.

If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.

If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.

3. Why is the field required to be not NULL?

MySQL官网这样介绍:

NULL columns require additional space in the rowto record whether their values are NULL. For MyISAM tables, each NULL columntakes one bit extra, rounded up to the nearest byte.

Null values take up more bytes and cause a lot of mismatches in your program.

4. What fields should be used to store user password hashes?

Fixed length strings such as password hashes, salt, and user id numbers should be stored in char rather than vARCHar to save space and improve retrieval efficiency.

Storage engine correlation

1. Which storage engines does MySQL support?

MySQL support for multiple storage engines, such as InnoDB, MyISAM, Memory, Archive, etc. In most cases, it is best to use the InnoDB engine, which is the default storage engine for MySQL.

  1. What is the difference between InnoDB and MyISAM?
  • InnoDB supports things, MyISAM does not support things
  • InnoDB supports row-level locking, while MyISAM supports table-level locking
  • InnoDB supports MVCC, while MyISAM does not
  • InnoDB supports foreign keys, while MyISAM does not
  • InnoDB does not support full-text indexing, while MyISAM does.

Scattered problem

MySQL > select varchar from varchar;

Char is a fixed-length field, and if you claim space for char(10), it doesn’t matter how much content is actually stored. Varchar is variable length, that is, only the maximum length is requested. The space occupied is the actual character length +1, and how much space was used to store the last character.

In terms of retrieval efficiency,char > vARCHar, so in use,char can be used if the length of a field value is determined, otherwise vARCHAR should be used as much as possible. For example, use char to store the MD5 encrypted password of the user.

2. What do varchar(10) and int(10) represent?

Varchar 10 represents the requested space length, is also the maximum length of data can be stored, while int 10 only represents the displayed length, less than 10 bits are filled with zeros. That is,int(1) and int(10) can store the same number of digits and occupy the same amount of space, except that they are displayed by length.

3. How many types of entries are available for MySQL binlog? What’s the difference?

There are three formats, Statement, Row and mixed.

  • In statement mode, the record unit is a statement. That is, the impact of each SQL is recorded. Because SQL execution is contextual, relevant information needs to be saved at the time of saving, and some statements that use functions and the like cannot be recorded and copied.
  • At the row level, the unit of recording is the change of each row. Basically, all the changes can be recorded. However, many operations will result in a large number of changes of rows (such as ALTER table).
  • Mixed. A compromise where statement records are used for normal operations and row records are used when statement is not available.

In addition, the row level has been optimized in the new version of MySQL to record statements instead of row by row when table structure changes.

4. How to deal with large pages?

Large paging is generally handled in two directions.

  • The database level, which is what we’re focusing on (though not as much), is something likeSelect * from table where age > 20 limit 1000000,10There is room for optimization. This statement takes load1000000 data and then basically dumps it all. Fetching 10 is slow of course. We could have changed it toSelect * from table where id in (select id from table where age > 20 limit 1000000,10)This is a million data load, but because of index overwrite, all the fields to be queried are in the index, so it is very fast. And if the ids are continuous, we can alsoselect * from table where id > 1000000 limit 10Efficiency is also good. There are many possibilities of optimization, but the core idea is the same, that is, to reduce load data.
  • Reduce this request from a requirements perspective…. The main thing is not to do similar requirements (jump directly to a specific page millions of pages later. Allows only page-by-page viewing or following a given path, which is predictable and cacheable) and prevents ID leaks and continuous malicious attacks.

In fact, to solve the problem of large paging, we mainly rely on cache. We can check the content in advance predictably, cache it to redis and other K-V databases, and return it directly.

In the Alibaba Java Development Manual, the solution to large paging is similar to the first one mentioned above.

5. Care about the SQL time in the business system? Statistics too slow query? How are slow queries optimized?

In the business system, except for the query using the primary key, I will test the time on the test library. The statistics of the slow query are mainly done by the operation and maintenance, and the slow query in the business will be fed back to us regularly.

Slow query optimization first to understand what is the cause of slow? Does the query condition not match the index? Load unwanted columns? Or too much data?

So optimization is going in those three directions,

  • The statement is first analyzed to see if additional data is loaded, perhaps by querying for extra rows and discarding them, or by loading many columns that are not needed in the result. The statement is analyzed and overwritten.

  • Analyze a statement’s execution plan to see how it uses the index, and then modify the statement or index so that the statement matches the index as closely as possible.

  • If statement optimization is no longer possible, consider whether the amount of data in the table is too large, and if so, split the table horizontally or vertically.

6. As mentioned above, can you give an example suitable for the horizontal and vertical sub-tables?

A horizontal table is a table by row. Suppose we have a user table, the primary key is the ID of the user and the primary key is the ID of the user. The amount of data is large, there are more than 100 million, so the query effect in a table at this time is not ideal. We can divide the table according to the primary key ID, either by the tail number, or by the interval of the ID. If 100 tables are divided according to the ending numbers 0-99, each table contains only 100W of data. At this time, the query efficiency can undoubtedly meet the requirements.

The vertical table is divided by column. Suppose we now have a list of articles. Contains fields ID – Summary – content. The display form in the system is to refresh a list, which only contains the title and summary, and the body content is required when the user clicks on an article to enter the details. At this point, if there is a large amount of data, putting together a large and infrequently used column of content can slow down the query speed of the original table. We can divide the above table into two pieces: id- summary,id- content. When the user clicks details, the primary key retrieves the content again. The added storage is for small primary key fields. The price is small.

Of course, sub-tables are actually highly related to the business, so we must do research and benchmark before sub-tables. Don’t blindly follow your own assumptions.

7. What are stored procedures? What are the pros and cons?

Stored procedures are pre-compiled SQL statements. 1, the understanding of the more straightforward: the stored procedure is a set of records, it is composed of some T – SQL statement block, these T – SQL code as a way to realize some function (for single tables or more tables to add and delete), and then come up with a name for this block of code, in use this function called when he went. 2. Stored procedure is a precompiled code block with high execution efficiency. A stored procedure replaces a large number of T_SQL statements, which can reduce network traffic, improve communication rate and ensure data security to a certain extent

In Internet project, however, is not recommended a stored procedure, well-known is ali’s “Java development manual” to ban the use of the stored procedure, my understanding is that in the Internet project, iteration is too fast, the project life cycle is short, turnover compared to the traditional program also more frequently, in this case, the management of the stored procedure It’s not as convenient, and it’s not as reusable as writing in the service layer.

8. Name three paradigms

First normal form: No column can be split again. Second normal form: non-primary key columns depend entirely on the primary key, not on a part of the primary key. Third normal form: non-primary key columns depend only on primary keys and not on other non-primary keys.

When designing a database structure, try to follow the three paradigms, and if not, there must be a good reason for it. In fact, we often compromise database design for the sake of performance.

What is the difference between # and $in MyBatis?

Got into a strange question….. I just want to document this issue separately because it comes up so often.

# will treat the incoming content as a string, while $will concatenate the incoming value directly into the SQL statement.

So # can protect against SQL injection attacks to a certain extent.


To the end.

ChangeLog





All the above are personal thoughts, if there is any mistake welcome to comment.

Welcome to reprint, please sign and keep the original link.

Contact email: [email protected]

For more study notes, see my personal blog or follow the wechat official account ——>HuYan ten