MySQL database basic knowledge and optimization of the core knowledge, how much you have mastered

Search the public account zhang on wechat, reply to the interview manual, and get the PDF version of this document and more interview materials.

Recommended reading:

  • Java basic knowledge of the latest version of high-frequency interview questions

  • Computer network high frequency interview questions latest edition

  • Java set high frequency interview questions latest edition

  • MySQL transactions and locks

  • Database index high frequency interview questions latest edition

What are the main categories of SQL statements? *

  • Data Definition Language (DDL) : CREATE, DROP, and ALTER operate logical structures, including table structures, views, and indexes.
  • Data Query Language (DQL) : The database Query Language is MAINLY SELECT
  • Data Manipulation Language (DML) : Includes INSERT, UPDATE, and DELETE
  • Data Control Language (DCL) : allows permission Control operations, such as GRANT, REVOKE, COMMIT, and ROLLBACK.

What are the SQL constraints? * *

  • Primary key constraint: A primary key is a combination of one or more columns in a table that uniquely identifies each row in the table. A table has only one primary key, and primary key constrained columns cannot be empty.
  • Foreign key constraint: A foreign key constraint is used to establish a relationship between two tables by specifying which column of the main table is referenced. Only the primary key of the primary table can be used as a foreign key by the secondary table. The columns of the constrained secondary table may not be primary keys. Therefore, to create a foreign key constraint, you need to define the primary key of the primary table and then the foreign key of the secondary table.
  • Unique constraints: A table can define multiple unique constraints to ensure that a column of data in the table does not have the same value.
  • Default constraint: When inserting new data, the system assigns a default value to the row if no data is specified, or NULL if no default value is set.
  • Check constraint: Check uses logical expressions to determine the validity of data and limits the range of values that can be entered in one or more columns. When a column updates data, the input must satisfy the Check constraint.

What is a subquery? * *

Subquery: Use the results of one query in another query

Subqueries can be divided into the following categories:

  • Standard quantum query: refers to a subquery that returns a value. You can use the operators =,>,<,>=,<=,<> to compare the scalar results of the subquery. Generally, the subquery is placed on the right side of the comparison formula.

    SELECT * FROM user WHERE age = (SELECT max(age) from user) // Query oldest personCopy the code
  • Column subquery: the result of a subquery is one column with n rows. It is usually used to query a field in a table. You can use operators such as IN, ANY, SOME, and ALL, but not directly

    SELECT num1 FROM table1 WHERE num1 > ANY (SELECT num2 FROM table2)
    Copy the code
  • Row subquery: indicates the row n columns of the result returned by the subquery

    SELECT * FROM user WHERE (age,sex) = (SELECT age,sex FROM user WHERE name="zhangsan")
    Copy the code
  • Table subquery: a table whose subquery is n rows and n columns

    SELECT * FROM student WHERE (name,age,sex) IN (SELECT name,age,sex FROMClass1) // Find the class in the student table1The students of classCopy the code

Do you know some of the connection queries in MySQL? * * *

MySQl connection query can be divided into external join, internal join, cross join

  • Outer join

    External joins are mainly divided into LEFT JOIN, RIGHT JOIN and full JOIN.

    Left outer join: Displays all the data in the left table and the eligible data in the right table. The unqualified data in the right table is null.

Right outer join: Displays all data in the left table and the eligible data in the right table. The unqualified data in the right table is null.

MySQL does not support full external connections.

  • Inner join: Only the data that meets the conditions is displayed

  • Cross join: A join that uses cartesian products.

    The Cartesian product of two sets X and Y is expressed as X × Y, where the first object is a member of X and the second object is a member of all possible ordered pairs of Y. For example: A = {A, b}, b = {0}, A * b = {(A, 0), (A, 1), (A, 2), (b, 0), (b, 1), (b, 2)}

For example: There are two tables, L and R.

L table

A B
a1 b1
a2 b2
a3 b3

R table

B C
b1 c1
b2 c2
b4 c3
  • Select L. ‘*’,R. ‘*’ from L left join R on L.b=R.b

    A B B C
    a1 b1 b1 c1
    a2 b2 b2 c2
    a3 b3 null null
  • Select L. ‘*’,R. ‘*’ from L right join R on L.b=R.b

    B C A B
    b1 c1 a1 b1
    b2 c2 a2 b2
    b4 c3 null null
  • Select L. ‘*’,R. ‘*’ from L inner join R on L.b=R.b

    A B B C
    a1 b1 b1 c1
    a2 b2 b2 c2
  • Select L. ‘*’,R. ‘*’ from L,R

    A B B C
    a1 b1 b1 c1
    a1 b1 b2 c2
    a1 b1 b4 c3
    a2 b2 b1 c1
    a2 b2 b2 c2
    a2 b2 b4 c3
    a3 b3 b1 c1
    a3 b3 b2 c2
    a3 b3 b4 c3

What is the difference between in and exists in mysql? * *

In and EXISTS are generally used for subqueries.

  • If exists is used, the external query will be performed first, and each row of data queried will be entered into the inner table query to see whether the conditions are met. In typically retrieves the result set from the inline table query, and then returns the data from the outer query matching the result set.
  • An index is used in either an inner table query or an outer table query.
  • Exists only uses indexes for inner table queries
  • Generally speaking, exist is more efficient when the result set of sub-query is large and the appearance is small. When the subquery finds a small result set and a large appearance, in is more efficient.
  • For not IN and NOT EXISTS, the efficiency of not EXISTS is higher than that of NOT in. It has nothing to do with the result set of the subquery, because NOT IN scans all tables on both inner and outer surfaces without using indexes. Indexes on tables can be used in a not EXISTS subquery.

What is the difference between varchar and char? * * *

  • Varchar means changed length, char means fixed length. When inserted characters exceed their length, insertions are rejected in strict mode with an error message, and insertions are truncated in normal mode. For example, char(5) is 5 regardless of the length of the inserted character. If the inserted character is less than 5, it is supplemented by a space. For vARCHar (5), if the inserted character length is less than 5, the stored character length is the inserted character length and will not be filled.
  • The storage capacity varies. For a CHAR, a maximum of 255 characters can be stored. For vARCHAR, the maximum number of characters can be 65532.
  • A char has a fixed length and is stored faster than a vARCHar. However, it takes up more space than a vARCHar. Varchar has higher space utilization but slower storage speed, which is a time-for-space strategy.

Int (10); char(10); varchar(10); * * *

The 10 in int(10) represents the length of the displayed data, while char(10) and vARCHar (10) represent the size of the stored data.

What is the difference between DROP, DELETE and TRUNCate? * *

drop delete truncate
speed fast Delete line by line, slow faster
type DDL DML DDL
The rollback Do not roll back Can be rolled back Do not roll back
Delete the content If you drop the entire table, all rows and indexes are deleted Delete some or all of the table data while the table structure is still in place Delete all data from table

In general, use DROP to delete an entire table, use DELETE to delete part of the table, and use TRUNCate to delete all data of the table while retaining the table structure.

The difference between a UNION and a UNION ALL? * *

Both union and union all are used to join two sets of results together.

  • The union will rehash and sort the result. The union all will return the merged result directly without rehash or sorting.
  • The performance of union All is better than that of Union.

What is a temporary table, when is it used, and when is it dropped? *

MySQL creates temporary tables that store intermediate result sets during SQL statement execution. These tables are called temporary tables. Temporary tables are visible only to the current connection.

Temporary tables are classified into memory temporary tables and disk temporary tables. MEMORY temporary tables use the MEMORY storage engine and disk temporary tables use the MyISAM storage engine.

Temporary tables are commonly used in the following situations:

  • Subquery in FROM
  • DISTINCT query with ORDER BY
  • Temporary tables are created when the ORDER BY and GROUP BY clauses are different
  • Using UNION queries produces temporary tables

How to optimize large table data query? * * *

  • The index optimization
  • SQL statement optimization
  • Horizontal split
  • The vertical resolution
  • Create intermediate tables
  • Use caching technology
  • Fixed-length tables are faster to access
  • Smaller columns access faster

Do you understand slow log queries? Statistics too slow query? How to optimize slow queries? * * *

Slow queries are used to log SQL statements whose execution time exceeds a certain threshold.

Related parameters:

  • Slow_query_log: indicates whether slow log query is enabled. 1 indicates that slow log query is enabled, and 0 indicates that slow log query is disabled.
  • Slow_query_log_file: path for storing slow query logs of the MySQL database.
  • Long_query_time: slow query threshold. If the QUERY time of an SQL statement exceeds the threshold, it will be recorded in a log.
  • Log_queries_not_using_indexes: Queries that do not use indexes are recorded in the slow query log.
  • Log_output: indicates the log storage mode. FILE: saves logs to a FILE. TABLE: saves logs to the database.

How to optimize slow queries?

  • Analyze the execution plan of the SQL statement to check whether the index of the SQL statement matches
  • Optimize the database structure by splitting tables with many fields into multiple tables, or consider creating intermediate tables.
  • Optimize LIMIT paging.

Why a primary key? * *

A primary key is a unique identifier that uniquely distinguishes each row in a table. Without a primary key, updating or deleting a particular row in a table can be difficult because a row cannot be uniquely and accurately identified.

Is the primary key an autoincrement ID or a UUID? * *

Benefits of using an incremented ID:

  • The field length is much smaller than the UUID.
  • Database automatic numbering, according to the order of storage, easy to search
  • You don’t have to worry about primary key duplication

Disadvantages of using an incremented ID:

  • Because the service volume increases automatically, it is easy to be queried by others in some service scenarios.
  • When data migration occurs, or table merges can be cumbersome
  • In high concurrency scenarios, competitive auto-lock will reduce the throughput of the database

UUID: a universal unique identifier (UUID) calculated based on data such as the current time, counter, and hardware id.

Advantages of using UUID:

  • Unique identifier, does not consider the problem of duplication, in the data split, merge can also achieve global uniqueness.
  • It can be generated in the application layer to improve the throughput of the database.
  • There is no need to worry about leakage of business volume.

Disadvantages of using UUID:

  • Because the UUID is generated randomly, random I/OS occur, affecting the insertion speed and causing low disk usage.
  • UUID occupies a large space. The more indexes you create, the greater the impact.
  • The comparison between UUids is much slower than the self-added ID, which affects the query speed.

In general, MySQL recommends using an incremented ID. Because InnoDB storage engines in MySQL, the primary key index is a kind of clustering index, the primary key index of B + tree leaf nodes stored in order according to the primary key and the data, if the primary key index is the ID, you just need to back are arranged in sequence, if the UUID, the ID is randomly generated, in inserting data will cause a lot of data movement, A large amount of memory fragmentation is generated, resulting in a decrease in insert performance.

Why is the field set to not NULL? * *

First, NULL is NOT the same as NULL. NULL does NOT take up space, whereas NULL does. Therefore, NULL values can still be inserted after being set to NOT NULL.

The field is set to not NULL for several reasons:

  • A NULL value will affect the statistics of some functions, such as count. If a NULL value is encountered, the record will not be counted.

  • The B tree does not store NULL, so the index does not use NULL.

  • The NOT IN subquery returns NULL values IN the case of NULL values.

    For example, the user table is as follows

    id username
    0 zhangsan
    1 lisi
    2 null

    select * from `user` where username NOT IN (select username from `user` where id ! = 0), this query should find zhangsan, but the result is null.

  • When MySQL performs comparison, NULL will participate in field comparison. Because NULL is a special data type, the database needs special data processing during processing, which increases the complexity of database record processing.

How to optimize data access during query? * * *

In terms of reducing data access:

  • Use indexes correctly and try to achieve index coverage
  • Optimize the SQL execution plan

In terms of returning less data:

  • Data paging
  • Return only the required fields

In terms of reducing server CPU overhead:

  • Fair use sort
  • Reduce comparison operations
  • Complex operations are handled on the client side

In terms of increasing resources:

  • Client multi-process parallel access
  • Database parallel processing

How to optimize long and difficult query statements? * *

  • Decompose a large query into several smaller queries
  • Decompose associated query to make cache more efficient

How do I optimize LIMIT paging? * *

  • When the LIMIT offset is large, the query efficiency decreases. You can record the maximum ID retrieved each time and use the ID for the next query

  • Create composite indexes

How to optimize UNION query * *

If the result set does not need to be de-duplicated or sorted, it is better to use UNION ALL.

How to optimize the WHERE clause * * *

  • Do not use it in a WHERE clause! = and <> are not equal to determine, which will cause the index to drop for full table scan.
  • Do not use null or null values in the WHERE clause. Try to set the field to not NULL.
  • Try to use union all instead of or
  • Index the columns involved in WHERE and Order by
  • Minimize the use of in or not in. Full table scan is performed
  • Using parameters in the WHERE clause results in a full table scan
  • Avoid expression or function operations on fields in the WHERE clause that cause the storage engine to drop the index and perform a full table scan

What causes slow EXECUTION of SQL statements? * * *

  • If the SQL statement is executed slowly only occasionally, it may be because of a lock, or because the redo log is full, and data in the redo log must be synchronized to disk.
  • If the SQL statement is always slow, it may be that the field has no index or the field has an index but no index.

What is the execution order of SQL statements? *

SELECT DISTINCT 
	select_list 
FROM 
	left_table 
LEFT JOIN 
	right_table ON join_condition 
WHERE 
	where_condition 
GROUP BY 
	group_by_list 
HAVING 
	having_condition 
ORDER BY 
	order_by_condition
Copy the code

The execution sequence is as follows:

  • FROM: When querying an SQL statement, join the tables on either side of the keyword as a Cartesian product and create a virtual table V1. Virtual tables are views, and the data comes from the execution results of multiple tables.

  • ON: Filters the results of the FROM connection ON and creates virtual table V2

  • JOIN: Add the left table filtered by ON and create a new virtual table V3

  • WHERE: Filters virtual table V3 WHERE and creates virtual table V4

  • GROUP BY: Creates virtual table V5 BY grouping the records in V4

  • HAVING: Filter V5 to create virtual table V6

  • SELECT: Create virtual table V7 by filtering the results in V6 according to SELECT

  • DISTINCT: Create virtual table V8 BY using the GROUP BY clause. You do not need to use DISTINCT because the unique values in a column are grouped into a GROUP, and each GROUP returns only one row of records. Therefore, h is different from h.

  • ORDER BY: Sorts the results in the V8 table.

Database optimization

How to optimize large tables? * * *

  • Scoping of data: Avoid queries that do not have any scoping conditions.
  • Read/write separation: the master library is responsible for writing and the slave library is responsible for reading.
  • Vertical split table: A table is divided into multiple tables by field, and each table stores a portion of its fields.
  • Horizontal partition table: In the same database, the data of one table is divided into multiple tables according to certain rules.
  • Single table optimization: Optimize the fields, indexes, and query SQL in the table.
  • Add the cache

What is vertical sub – table, vertical sub – database, horizontal sub – table, horizontal sub – database? * * *

Vertical split table: A table is divided into multiple tables by field, and each table stores a portion of its fields. Commonly used fields are placed in one table and less commonly used fields in another.

Advantages of vertical table:

  • Avoiding IO contention reduces the probability of locking tables. Because large fields are less efficient, the first data volume is large, and the reading time is long. Second, large fields take up more space and store fewer rows on a page, resulting in more I/O operations.

  • It can improve the query efficiency of popular data.

Vertical database: Tables are classified by service and deployed to different databases. Different databases can be placed on different servers.

Advantages of vertical repository:

  • This reduces service coupling and facilitates hierarchical management of different services.
  • It can increase the number of I/O and database connections and solve the bottleneck of hardware resources on a single machine.

Disadvantages of vertical split (library, table) :

  • The primary key is redundant and redundant columns need to be managed
  • Transaction processing becomes complex
  • There is still the problem of too much data in a single table

Horizontal partition table: In the same database, the data of the same table is divided into multiple tables according to certain rules.

Advantages of horizontal table:

  • The problem of large data volume in a single table is solved
  • Avoid IO contention and reduce the probability of locking tables

Horizontal database: Split the data of the same table into different databases according to certain rules. Different databases can be placed on different servers.

Advantages of horizontal repository:

  • The bottleneck problem of large data volume in single database is solved
  • IO conflicts are reduced, lock contention is reduced, and the failure of one database does not affect other databases (availability), improving system stability and availability

Disadvantages of horizontal split (table, library) :

  • Shard transaction consistency is difficult to resolve
  • Cross-node JOIN performance is poor and the logic becomes complicated
  • Data expansion is difficult and difficult to maintain

In system design should be according to the business when coupled to determine the scheme of vertical depots and vertical table, the data access should be considered when the pressure is not particularly great caching, reading and writing methods of separation, if the amount of data is very big, or sustained growth can consider depots table, horizontal split the logic involved is more complex, common solutions are client architecture and evil agent architecture.

How to deal with the ID key after dividing the database into different tables? * * *

After a database is divided into different tables, the ids of each table cannot start from 1. Therefore, a global ID is required. You can set the global ID using the following methods:

  • UUID: Advantages: Locally generated ID, no need to remote call; Globally unique is not repeated. Disadvantages: Large footprint, not suitable for indexing.

  • Database increment ID: when the database increment ID is used after the sub-database and sub-table table, a special database is needed to generate the primary key. Each time the service receives a request, it first inserts a meaningless data into the database, obtains a self-increment ID of the database, and uses this ID to write data into the sub-database and sub-table. Advantages: Simple and easy to implement. Disadvantages: Bottlenecks in high concurrency. The system structure is shown below (the picture is from the network)

  • Redis generates ID: Advantages: Independent of database, good performance. Disadvantages: Introducing new components can increase the complexity of the system

  • Twitter’s Snowflake algorithm is a 64-bit LONG ID with 1 unused bit, 41bit for the number of milliseconds, 10bit for the work machine ID, and 12bit for the serial number.

    1bit: The first bit is 0 by default, because the first bit in binary is negative if it is 1, but the ID cannot be negative.

    41bit: Indicates the time stamp, in milliseconds.

    10bit: records the working machine ID. Five bits indicate the equipment room ID and five bits indicate the machine ID.

    12bit: records different ids generated in the same millisecond.

  • Meituan Leaf distributed ID generation system, Meituan Dianping distributed ID generation system

MySQL replication principle and process? How to implement master slave replication? * * *

MySQL replication: To ensure data consistency between the primary server and secondary server, after data is inserted to the primary server, the secondary server automatically synchronizes the modified data from the primary server.

Principle of master-slave replication:

There are three main threads for master-slave replication: binlog thread, I/O thread, and SQL thread.

  • Binlog thread: Responsible for writing data changes on the primary server to the Binary log.
  • I/O thread: Reads Binary logs from the primary server and writes them to the Relay log of the secondary server.
  • SQL thread: Is responsible for reading the relay log, parsing out the data changes that have been made in the master server and replaying them in the slave server

The copying process is as follows (picture from network) :

  1. The Master writes the record of the operation to the binlog before each transaction updates the data.
  2. The Slave Slave library connects to the Master library and creates as many binlog dump threads as there are Master slaves. When the binlog of the Master node changes, binlog dump notifies all slaves and sends the corresponding binlog to them.
  3. After receiving the binlog, the I/O thread writes it to the Relay log.
  4. The SQL thread reads the relay log and replays it from the slave server.

Here is an easy-to-understand diagram.

The role of master-slave replication:

  • High availability and failover
  • Load balancing
  • The data backup
  • Upgrading test

Do you know about read-write separation? * * *

Read/write separation mainly depends on primary/secondary replication, which serves as read/write separation service.

Advantages of read-write separation:

  • The master server writes and the slave server reads, alleviating lock contention
  • Secondary servers can use MyISAM to improve query performance and save system overhead
  • Increase redundancy and improve availability