Click “Technical blog post of Brother Flathead” above and select “Follow the public account”

Technical articles first time delivery!

As a Java developer, it’s common to write SQL statements, but do you know the processing logic behind SQL statements? For example, the following SQL statement:

select * from user where id=1Copy the code

After executing this statement, we get user information with ID 1. So what does the MySQL server do with this SQL statement? In this article we will check the processing logic of SQL statements in MySQL database.

What are the benefits of understanding the internal processing logic of SQL statements in the MySQL database? When we encounter some exceptions or problems with MySQL, we can get to the root of the problem and locate it more quickly.

To better understand the internal processing logic of SQL statements, we can first look at the basic architecture diagram of MySQL, so that we can stand in a higher Angle to overlook the MySQL database. The basic architecture diagram of MySQL is as follows:

The basic architecture of MySQL

From the figure, we can clearly see the MySQL architecture, each module and the execution process of SQL statements. The MySQL database as a whole can be divided into two parts: Server layer and storage engine layer. The Server layer is common, while the storage engine layer can be extended in the form of plug-ins. An SQL statement goes through three modules: link management, parsing and optimization, and finally storage engine. Let’s talk about these three modules.

Connection management

Connection management is the first step encountered in the process of SQL statement execution. Link management is like a gate, controlling the interaction between the client and the Server. The main work of connection management is the identity authentication of the client and the management of the connection thread.

When each client establishes a connection with the Server, the Server creates a thread to interact with the client. The first item of interaction is to verify the identity of the client. The authentication credential is based on the host information, user name, and password carried by the client when initiating a connection request. If the authentication fails, the connection task is terminated and the Access denied for user error is returned.

If authentication is successful, connection management will also do one thing, in the table to access the query the user permissions, in this connection, the powers of a subsequent judgment is based on the read permissions as the basis, that is to say after the connection is successful, even if the administrator do to this user permissions to modify permissions validation will not affect the connection.

Connection management needs to be done is relatively simple, mainly responsible for the client and server connection, of course on the connection thread connection management is optimized, and not after each client to perform the task, it threw the thread, connection management will take these threads cached, waiting for new connections, this would not be frequent creating and destroying threads, It saves money.

Analysis and Optimization

After connection management is complete, the second step of SQL statement execution is parsing and optimization. This step is very complex, and all operations of SQL statement query are performed here. We can break this down into four small steps.

The query cache

There is also caching on the MySQL server, which is a very weak feature, why? You’ll find out when you’re done.

When the MySQL server receives the query request, it will first check the query cache to see if the statement has been executed before. Previously executed statements and their results may be cached directly in memory as key-value pairs. Key is the query statement and value is the query result. If your query can find the key directly in the cache, the value will be returned directly to the client. If the statement is not in the query cache, subsequent execution phases continue. After the execution is complete, the results are stored in the query cache.

However, MySQL’s query cache hit ratio is very low. The main reason is that if two query requests are different on any character (e.g., space, comment, case), the cache will not hit.

And cache may obtain the wrong data, with some system functions, for example, may be the same function call twice will produce different results, such as function NOW, each call to generate the latest current time, if you call the function in a query request, that even if the query request text information are all the same, Two queries at different times should also get different results. If the first query is cached, it is wrong to use the first query directly in the second query.

In addition to this, MySQL cache failures are very frequent. The MySQL cache system monitors every table involved, and whenever the structure or data of the table is changed, If an INSERT, UPDATE, DELETE, TRUNCATE TABLE, ALTER TABLE, DROP TABLE, or DROP DATABASE statement is used against this TABLE, all cache queries using this TABLE will be invalidated and removed from the cache!

You can see that the query cache is a bit of a chore. Caching does more harm than good to the MySQL database, so the whole query cache feature was removed from MySQL 8.0

Parsing and preprocessing

If the query cache is not hit, then it is time to enter the formal query phase. The MySQL server first parses the text because the client program sends a request that is just a piece of text.

The SQL statement is first parsed by keywords and a “parse tree” is generated. The MySQL parser validates and parses queries using MySQL syntax rules, such as whether the keywords are used correctly, whether the keywords are in the correct order, or whether the quotes match before and after.

Preprocessing further checks the parse tree against some MySQL rules, such as tables and columns, names and aliases to see if they are ambiguous, and so on.

Query optimization

After parsing and preprocessing, you know what tables to query, what columns to query, what conditions to query, and so on. But what is the best way to query? This is what query optimization is for. The MySQL optimizer makes some optimizations to our statements, such as outer join to inner join, expression simplification, subquery to join, and so on. The result of optimization is an execution plan that indicates which indexes should be used for the query and in what order the tables should be joined.

actuator

The executor executes the optimized query execution plan and interacts with the storage engine to complete the data query and return the final data result.

SQL > select * from table T; select * from table T; select * from table T; select * from table T; The query also calls Precheck to verify permissions before the optimizer).

mysql> select * from user where ID=1; ERROR 1142 (42000): SELECT command denied to user 'b'@'localhost' for table 'T'Copy the code

If you have permission, open the table and continue. When a table is opened, the executor uses the interface provided by the table engine according to its definition.

For example, in our example table user, assuming that the ID field has no index, the execution flow of the executor would look like this:

1. Call InnoDB engine interface to get the first row of this table, check whether the ID value is 10, if not, skip, if yes, save this row in the result set;

2. Call the engine interface to fetch “next row” and repeat the same logic until the last row of the table is fetched.

3. The executor returns the recordset composed of all the rows that meet the conditions in the above traversal process as the result set to the client.

At this point, you’re done executing the SQL statement, which is actually quite complicated internally.

The storage engine

At this point, the SQL statement is done, but it is the storage engine that deals with the real data. The storage engine is the wrapper module that the MySQL server uses to store and extract the data. We know that a table is made up of rows of records, but this is only a logical concept. The storage engine is responsible for how records are represented physically, how data is read from the table, and how data is written to specific physical storage.

To achieve different functions, MySQL provides a variety of storage engines. The specific storage structure of tables managed by different storage engines may be different, and the access algorithm adopted may also be different. For example, the default InnoDB storage engine after MySQL5.7.

It can be seen that the execution of a SQL statement is very complex, involving a lot of modules, the article is over here, thank you for reading, I hope this article is helpful to your study and work, if you think the article is useful, welcome to like + forward.

Selected articles of the Past
Share a few Redis interview questions, don’t worry about the interview
The little things you ignore at work are the secret to getting ahead
By Redis cluster, talk about the data distribution algorithm in the cluster
ConcurrentHashMap no problem