The article has been included in the Github open source project.

Look at the table of contents:

1 talk about MySQL infrastructure diagram

How is a SQL query executed in MySQL?

How do you optimize SQL in your daily work?

4. How to explain the execution plan? How to understand the meanings of each field?

5. Do you care about the time of SQL in the business system? Slow query statistics? How do you optimize for slow queries?

6 Differences between clustered indexes and non-clustered indexes

7 Why use B+ trees, why not use normal binary trees?

8 What is the difference between a Hash index and B+ tree index? How did you choose to design the index?

9 What is the leftmost prefix principle? What is the leftmost matching principle?

10 What scenarios does the index not apply to?

11 What are the advantages and disadvantages of indexes?

Has MySQL ever encountered deadlock problems, and how did you solve them?

What are optimistic and pessimistic database locks and the difference between them?

14 MVCC familiar, know its underlying principles?

MySQL transaction has four features and implementation principles

16 What are the transaction isolation levels? What is the default isolation level for MySQL?

What is fantasy reading, dirty reading or non-repeatable reading?

18 What should I do if the CPU of the MySQL database surges?

MYSQL master/slave delay, how do you solve?

If you are asked to do the design of sub-database and sub-table, tell me briefly what you will do?

Database schema

Talk about the MySQL infrastructure diagram

Tell the interviewer about the logical architecture of MySQL, there is a whiteboard can draw the following picture.

Mysql logical architecture diagram is divided into three layers: (1) The first layer is responsible for connection processing, authentication, security, etc. (2) the second layer is responsible for compiling and optimizing SQL (3) the third layer is the storage engine.

How does an SQL query execute in MySQL?

  • Check the statement firstDo I have permission?If there is no permission, return error message directly. If there is permission, query cache first (before MySQL8.0 version).
  • If there is no cache, the parser proceedsLexical analysis, extract key elements such as SELECT in SQL statement, and then judge whether there are syntax errors in SQL statement, such as whether the keyword is correct, etc.
  • Finally, the optimizer determines the execution scheme for permission verification. If there is no permission, it will return an error message directly, and if there is permission, it willCall the database engine interfaceTo return the execution result.

SQL optimization

How do you optimize SQL on a daily basis?

This question can be answered in several dimensions:

1. Optimize the table structure

(1) Use numeric fields whenever possible

If the field contains only numeric information, try not to design character type, this will reduce query and join performance, and increase storage overhead. This is because the engine processes queries and joins by comparing each character in the string, whereas for numeric types you only need to compare once.

(2) Use varchar instead of char whenever possible

The storage space of variable-length fields is small, which can save storage space.

(3) If a large amount of data is duplicated in the index column, you can delete the index

For example, an index with a column of gender, almost exclusively male, female, and unknown, is invalid.

2. Optimize the query

  • Try to avoid using! In the WHERE clause. = or <> operators
  • Try to avoid using OR to join conditions in the WHERE clause
  • Do not include select * in any queries either
  • Avoid null values for fields in the WHERE clause

3. Index optimization

  • Create an index on the fields that are the query criteria and order by
  • Avoid creating too many indexes and use composite indexes

How to interpret the explain plan and understand the meaning of each field?

Adding the explain keyword before the SELECT statement returns information about the execution plan.

(1) ID column: is the serial number of the SELECT statement. MySQL divides select queries into simple queries and complex queries.

(2) Select_type column: indicates whether the corresponding row is a simple or complex query.

(3) Table column: indicates which table the row that explains is accessing.

(4) Type column: One of the most important columns. Represents the association type or access type, where MySQL determines how to look up rows in the table. From the best to the worst are: system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

(5) Possible_keys column: Shows which indexes the query might use to find.

(6) Key column: This column shows which index mysql actually uses to optimize access to this table.

(7) key_len column: displays the number of bytes used by mysql in the index. This value can be used to determine which columns in the index are used.

(8) ref column: This column displays the column or constant used to find the table value in the index of the key column record.

(9) Rows: This column is the number of rows that mysql expects to read and detect. Note that this is not the number of rows in the result set.

(10) Extra column: Displays Extra information. Examples include Using index, Using WHERE, and Using Temporary.

Are you concerned about the SQL time in your business system? Slow query statistics? How do you optimize for slow queries?

When we write Sql, we should get into the habit of using explain analysis. Slow query statistics, operation and maintenance will regularly statistics to us

Optimize slow query ideas:

  • Analyze statements to see if unnecessary fields/data were loaded
  • Analyze the SQL execution sentence, whether the index is hit, etc
  • If the SQL is complex, optimize the SQL structure
  • If the table data volume is too large, consider sub-tables

The index

The difference between clustered and non-clustered indexes

You can answer in the following four dimensions:

(1) A table can have only one clustered index. A table can have multiple clustered indexes.

(2) Clustered indexes, where the logical order of key values determines the physical order of the corresponding rows in the table; For a non-clustered index, the logical order of indexes in the index is different from the physical storage order on the disk.

(3) The index is described by the data structure of binary tree. We can understand the clustering index as follows: the leaf node of the index is the data node. A non-clustered indexed leaf is still an inode, but with a pointer to the corresponding block.

(4) Clustered index: physical storage is sorted by index; Non-clustered index: Physical storage is not sorted by index;

Why do WE use a B+ tree, why don’t we use a normal binary tree?

You can look at it from several dimensions: is the query fast enough, is the efficiency stable, how much data is stored, how many times is the disk searched, why not a normal binary tree, why not a balanced binary tree, why not a B tree, but B+ tree?

(1) Why not ordinary binary trees?

If the binary tree is specialized as a linked list, it is equivalent to a full table scan. Compared with binary search tree, the search efficiency of balanced binary tree is more stable and the overall search speed is faster.

(2) Why not a balanced binary tree?

We know that data in memory is much faster to query than data on disk. If this tree data structure as the index, requires that each time we find the data read from the disk in a node, that is we said a disk block, but balanced binary tree is each node and data store only one key values, if it is B tree, more nodes can store the data, the height of the tree would also reduce, so read the number of disk will be down, Query efficiency is faster.

(3) Why not B trees but B+ trees?

B+ tree non-leaf nodes do not store data, only keys, while B tree nodes store not only keys, but also data. The default page size in InnoDB is 16KB. If you don’t store data, you will store more key values, the tree order will be larger, the tree will be shorter and fatter, so you will need less disk I/O to find the data, and the data query will be faster.

B+ tree index all data is stored in the leaf node, and the data is arranged in order, linked list. So B+ trees make range lookup, sorted lookup, group lookup, and de-lookup incredibly easy.

What is the difference between a Hash index and a B+ tree index? How did you choose to design the index?

  • B+ trees can be queried in a range, but Hash indexes cannot be queried.
  • B+ trees support the leftmost principle of the union index, but Hash indexes do not.
  • B+ trees support order by sorting, but Hash indexes do not.
  • Hash index is more efficient than B+ tree for equivalent query.
  • When a B+ tree uses “like” for fuzzy query, the following “like” (such as %) can be optimized, and the Hash index cannot be used for fuzzy query at all.

What is the leftmost prefix principle? What is the leftmost matching principle?

The leftmost prefix principle means that the leftmost takes precedence. When creating a multi-column index, the most frequently used column in the WHERE clause should be placed on the leftmost according to business requirements.

When we create a composite index, such as (A1,a2,a3), we create (a1), (a1,a2), and (A1,a2,a3). This is the leftmost matching principle.

In what scenarios does the index not fit?

  • Low data volume is not suitable for indexing
  • Frequent updates are also not suitable for indexing

= Low discrimination fields are not suitable for indexing (such as gender)

What are the advantages and disadvantages of indexing?

Advantages: (1)

  • Unique indexes ensure that the data of each row in a database table is unique
  • Indexes can speed up data query and reduce query time

(2) disadvantages:

  • Creating and maintaining indexes takes time
  • Indexes take up physical space. In addition to data table space, each index takes up a certain amount of physical space
  • When data in a table is added, deleted, or changed, the index must be maintained dynamically.

The lock

Has MySQL ever had a deadlock problem, and how did you solve it?

Met. My usual procedure for troubleshooting deadlocks is:

(1) Check the deadlock log show engine Innodb status; (2) find the deadlock Sql (3) analysis of Sql locking situation (4) simulation of the deadlock case (5) analysis of the deadlock log (6) analysis of the deadlock results

What are optimistic and pessimistic database locks and the difference between them?

(1) Pessimistic lock:

Pessimistic lock she is focused and insecure, her heart only belongs to the current transaction, all the time worried about its beloved data may be modified by other transactions, so a transaction has (acquired) pessimistic lock, other transactions can not modify the data, can only wait for the lock to be released before the execution.

(2) Optimistic lock:

Optimistic Lock’s “optimism” is that it doesn’t think the data will change too often. Therefore, it allows multiple transactions to change the data simultaneously.

Implementation: Optimistic locking is generally implemented using the version number mechanism or CAS algorithm.

Are you familiar with MVCC? Do you know how it works?

MVCC (Multiversion Concurrency Control), that is, multi-version Concurrency Control technology.

The implementation of MVCC in MySQL InnoDB is mainly to improve the concurrent performance of the database, and to use a better way to deal with read-write conflicts, so that even if there are read/write conflicts, it can also do concurrent reads without locking and non-blocking.

The transaction

MySQL transaction has four major features and implementation principles

  • Atomicity: transactions are executed as a whole, and all or none of the operations on the database contained within them are executed.
  • Consistency: data is not corrupted before and after the transaction. If A transfers $10 from A to B, the total amount of A and B remains the same whether the transaction is successful or not.
  • Isolation: When multiple transactions are concurrently accessed, the transactions are isolated from each other, that is, one transaction does not affect the performance of other transactions. In short, it is between the affairs of the water does not make water.
  • Persistence: Indicates that after the transaction completes, the operational changes made by the transaction to the database are persisted in the database.

What are the transaction isolation levels? What is the default isolation level for MySQL?

  • Read Uncommitted
  • Read Committed
  • Repeatable Read (Repeatable Read)
  • Serializable

The default transaction isolation level is Repeatable Read.

What is magic reading, dirty reading, unrepeatable reading?

Transaction A and TRANSACTION B execute alternately. Transaction A is disturbed by transaction B because transaction A reads uncommitted data from transaction B. This is A dirty read.

In a transaction scope, two identical queries that read the same record return different data, which is called non-repeatable reads.

Transaction A queries for A range of result sets, another concurrent transaction B inserts/deletes data into the range and quietly commits, then transaction A queries the same range again, and the two reads get different result sets. This is A magic read.

In actual combat

MySQL Database CPU surge, how to deal with?

Investigation process:

(1) Use the top command to observe and determine whether the cause is mysqld or other causes. (2) If mysqld is the cause, show processlist and check session status to see if there is a resource consuming SQL running. (3) Find out the SQL with high consumption, see whether the execution plan is accurate, whether the index is missing, and whether the amount of data is too large.

Processing:

(1) kill these threads (while watching CPU usage decrease), (2) adjust the corresponding (such as index, SQL, memory parameters change), and (3) run the SQL again.

Other situations:

It is also possible that each SQL consumes a small amount of resources, but all of a sudden, there are a lot of sessions coming in and the CPU is skyrocketing. In this case, you need to analyze with your application why the number of connections is skyrocketing and adjust accordingly, such as limiting the number of connections

How do you resolve master-slave latency in MYSQL?

Master-slave replication takes place in five steps:

  • Step 1: Update events (UPDATE, INSERT, DELETE) of the primary database are written to the binlog
  • Step 2: The secondary library initiates a connection to the primary library.
  • Step 3: The primary library creates a binlog dump thread and sends the binlog contents to the secondary library.
  • Step 4: After the slave starts, create an I/O thread that reads the binlog content from the master and writes it to the relay log
  • Step 5: An SQL thread will be created to read from the relay log, execute the update event from the position Exec_Master_Log_Pos, and write the update to the SLAVE DB

Reason for delay of primary/secondary synchronization

A server opens N connections to clients, so there is a large number of concurrent updates, but only one thread reads the binlog from the server. If an SQL takes a little longer to execute on the slave server or if an SQL has to lock a table, there is a large backlog of SQL on the master server. Not synchronized to the slave server. This results in master-slave inconsistency, known as master-slave delay.

Solution to master-slave synchronization delay

  • The primary server is responsible for update operations and is more secure than the secondary server, so some parameters can be changed, such as sync_binlog=1, innodb_flush_log_at_trx_COMMIT =1 etc.
  • Choose a better hardware device as a slave.
  • Using a slave server as a backup instead of providing queries makes it more efficient to execute SQL in relay Logs when the load is down.
  • By increasing the number of reads from the server, the goal is to spread read pressure and reduce server load.

If you are asked to do the design of sub-database and sub-table, briefly describe what you will do?

Database and table classification scheme:

  • Horizontal library splitting: Divides data in one library into multiple libraries based on fields and based on a policy (such as hash and range).
  • Horizontal sub-table: Divides the data in one table into multiple tables based on fields and based on a policy (such as hash and range).
  • Vertical library splitting: Divides different tables into different libraries based on the table and service ownership.
  • Vertical sub-table: Divides the fields of a table into different tables (main table and extended table) based on the field activity.

Common middleware for library and table sorting:

  • sharding-jdbc
  • Mycat

Problems that may be encountered when dividing the database and dividing the table

  • Transaction issues: You need to use distributed transactions
  • Cross-node Join problem: This problem can be solved in two queries
  • Count, Order by, Group BY, and aggregate functions across nodes: the results are retrieved separately on each node and then merged on the application side.
  • Data migration, capacity planning, and capacity expansion
  • ID problem: After the database is shred, you can no longer rely on the database’s own primary key generation mechanism, the simplest thing to consider is UUID
  • Sorted paging across shards

Author: Lei Xiaoshuai

Github “Java Eight-part essay” open source project author, focus on Java interview routines, Java advanced learning, break the internal exam to get a big factory Offer, promotion and raise!

About the author:

☕ read a few years: Huazhong University of Science and technology master graduate;

😂 wave over several big factories: Huawei, netease, Baidu……

😘 has always believed that technology can change the world, willing to keep the original mind, refueling technology people!