preface

Recently my friend Xiao Wang was looking for a job, and then an interviewer asked him if he knew “the specific execution process of query SQL”.

Wang said he didn’t know, and the interviewer directly said to Wang: Young man rattail juice, how even so simple you don’t know?

After listening to xiao Wang blurt out: hum! Interviewer you do not speak martial arts, not according to routine card ah, you should ask index related knowledge ah, this I times clear.

After listening to Wang’s description, I searched this knowledge point in my mind. Unfortunately, I did not find relevant content in my knowledge base. Then I went to the wall to think about it, and then this article was born.

Note: This article mainly uses MySql as an example; Speaking of MySql, a bit of nagging about MariaDB, MySql’s sister database, which is now very common.

What is MariaDB?

After MySql was acquired by Oracle, the founders of MySql were concerned about the future of MySql database development (slow development, closed, and possibly closed source), so they created a branch, MariaDB, using the new Maria storage engine by default, which is an upgraded version of the MyISAM storage engine in MySql.

Main line of this article:

(1) Overall architecture description of MySql;

2. Description of each node of the Server layer;

③ InnoDB storage engine description;

MySql Architecture Description

Let’s not say anything, but post a big picture from the Internet:

Is the picture above clear? Not clear, then don’t worry, let’s paste another one:

Description of Server Service layer:

As can be seen from the above architecture diagram, the Server layer is mainly composed of connector, query cache, parser/analyzer, optimizer and executor, which will be mainly described below.

The connector

When the client wants to operate on the database, the premise is to establish a good connection with the database. Connectors are responsible for establishing connections with clients, obtaining permissions, maintaining and managing connections.

Connection mode:

MySQL supports both short and long connections. Short connections are closed immediately after the operation is complete.

The long connection can be kept open, reducing the cost of creating and releasing the connection on the server, and can be used by later applications for access. We typically use long connections in connection pools.

Note when using long connection:

The default duration of a long connection between the client and the server is 8 hours. After 8 hours, the MySql server will disconnect the connection. If the client requests again, the connection will be reported as disconnected.

And keeping the connection long consumes memory. If the connection is inactive for a long time, the MySQL server will be disconnected. How do you check the 8-hour timeout?

Show global variables like 'wait_timeout'; -- Show global variables like' interactive_timeout';Copy the code

The following result is obtained after execution: default is 28800 seconds, 8 hours.

Connections in connection pools used in general projects are long-connected; (e.g. Druid, C3P0, DBCP, etc.)

Here is an example of a practical problem caused by a long connection timeout disconnection:

A friend’s company has a management system, and this system uses Mysql. But he recently encountered a problem: the system obviously worked the day before yesterday, but the next day when he went to the company, he couldn’t open it. As long as he restarted the system, he didn’t know what the reason was.

Finally, I checked the log and found that all the connections in the connection pool were disconnected because the time between the day before and the next day was more than 8 hours. Alas, such a small knowledge point led to several days of confusion, really shouldn’t ah, or knowledge is not comprehensive ah.

Ok, now we have the cause of the problem, but how can we fix it?

Solution to the long connection timeout disconnection:

(1) Regularly disconnect the long connection. Use for a period of time, or in the program to determine the execution of a large memory – consuming query, disconnect, then query and reconnect.

If you are using MySQL 5.7 or later, you can re-initialize the connection resource by executing mysql_reset_connection after each large operation. This process does not require reconnection and re-authentication of permissions, but restores the connection to the state it was in when it was created.

The query cache

MySQL caching is turned off by default, which means caching is not recommended. Why?

MySql does not enable caching by default.

Mainly due to the limitations of its usage scenarios:

Key (SQL statement) -value (data value); So if the SQL statement (key) as long as there is a little difference will be directly database query;

②, because the data in the table is not invariable, most of it is often changed, and when the data in the database changes, then the corresponding cache data related to this table needs to be removed;

Note that MySQL 8.0 directly removed the entire query cache function, meaning that this function has been completely removed from MySQL 8.0.

analyzer

The work of the analyzer is mainly to parse the SQL statement to be executed, finally get the abstract syntax book, and then use the preprocessor to determine whether the table in the abstract syntax tree exists, if so, then determine whether the SELECT projection column field exists in the table, etc.

Lexical analysis

Lexical analysis is used to break down SQL into non-divisible atomic symbols, called tokens. It classifies them into keywords, expressions, literals, and operators based on the dictionaries provided by different database dialects.

Syntax analysis

Parsing is the process of converting SQL statements into abstract syntax trees based on the tokens (atomic symbols) extracted from lexical analysis. The following is a direct example of how to look at a SQL abstract syntax book:

The SQL statement:

SELECT id, name FROM t_user WHERE status = 'ACTIVE' AND age > 18
Copy the code

SQL statement after lexical analysis, syntax analysis to obtain the abstract syntax as follows:

Pictures from: shardingsphere.apache.org/document/le…

Note that for ease of understanding, the tokens for keywords in the abstract syntax tree are green, the tokens for variables are red, and gray indicates that further splitting is required.

The preprocessor

Pre-processing is used to perform semantic verification on the generated abstract syntax tree. Semantic verification is used to verify the queried table and select projection column fields to determine whether the table and field exist.

The optimizer

Optimizer functions: it is mainly used to parse SQL syntax tree, through MySQL data dictionary and statistics content, through a series of operations, so as to obtain an execution plan.

What is the sequence of operations that go through during optimization? The following is a brief description:

(1) Logical transformation: for example, SQL where condition exists 8>9, that logical transformation is to simplify the constant expression in the syntax tree directly, simplify to false; In addition to simplification and constant expression calculation.

②, cost optimization: is by paying some data statistical analysis of the price, to get the SQL execution can go to the index, and what index; In addition, in multi-table associated query, the order of final table join is determined.

In the analysis of whether to go through the index query, it is through the dynamic data sampling statistical analysis; As long as it is a statistical analysis, there may be analysis errors, so when SQL execution does not go to the index, also take into account this factor.

How to check the MySql execution plan?

Add the explain keyword before the SQL statement executed.

Extension: How does Oracle view execution plans? Refer to this article to see if an index is used in a query statement through an execution plan

actuator

MySQL knows what you want to do from the parser, it knows how to do it from the optimizer, so it goes to the executor phase and starts executing statements. If the table does not have permission to perform operations on the connected object, an error will be returned. If so, execute according to the generated execution plan.

As can be seen from the architecture diagram at the beginning of this article, the storage engine is connected to the executor, and the executor calls the data operation by calling the API interface provided by the storage engine.

Storage Engine Description

A storage engine is a component that performs actual operations on the underlying physical data and provides various apis for the Server Server layer to manipulate data. MySQL supports plug-in storage engines, including InnoDB, MyISAM, Memory, etc. Typically, MySQL uses InnoDB as its default storage engine.

An overview of features supported by the InnoDB storage engine

extension

InnoDB storage engine is divided into Memory Structures and Disk Structures as shown in the figure below.

If you want to learn more, please refer to this article. InnoDB storage engine is divided into memory architecture and disk architecture.

Follow + like + favorite + comment yo

If this article is helpful to you, please wave your love to make a fortune of the little hand under the praise ah, your support is my continuous creation of power, thank you!

You can VX search [Muzi-lei] public number, adhere to the high quality of original Java technology articles, is worth your attention!

The resources

Query SQL execution process and MySQL architecture analysis

(2) What happens to the underlying SQL query statement when it is executed?