This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

I think it’s important for a developer to understand how MySQL executes a query.

So first of all, what is the architecture of MYSQL? And then what about the flow of executing a query?

MYSQL Architecture

Take a look at an architecture diagram, as follows:

Module,

  1. Connector: JDBC for PHP, Python, Java, etc.

  2. Management Serveices & Utilities: system Management and control tools, including backup and restoration, MySQL replication, and clustering.

  3. Connection Pool: a Connection Pool that manages resources that need to be buffered, including user passwords, permission threads, and so on.

  4. SQL Interface: used to receive SQL commands from users and return the query results required by users.

  5. Parser: Parses SQL statements.

  6. Optimizer: query Optimizer;

  7. Cache and Buffer: Query caches, including table caches, Key caches, permission caches, etc.

  8. Pluggable Storage Engines: Pluggable Storage Engines provides apis for the service layer to use to deal with specific files.

Architectural layering

MySQL is divided into three layers: the connection layer that connects to the client, the service layer that actually performs the operations, and the storage engine layer that deals with the hardware.

The connection layer

To connect to MySQL server port 3306, our client must establish a connection with the server, so managing all connections, verifying the identity and permissions of the client, these functions are completed in the connection layer.

The service layer

The connection layer passes SQL statements to the service layer, which in turn contains a series of processes:

For example, the judgment of the query cache, call the corresponding interface according to the SQL, and parse the lexical and syntax of our SQL statements (such as how to identify keywords, how to identify aliases, syntax errors and so on).

Then there is the optimizer, MySQL bottom will optimize our SQL statement according to certain rules, and finally give the executor to execute.

The storage engine

Storage engines are where our data is actually stored, and there are different storage engines supported in MySQL. Then you have memory or disk.

SQL execution flow

Let’s take a look at the MySQL workflow using a query statement as an example.

select name from user where id=1 and age>20; 
Copy the code

First, let’s look at a graph based on which the rest of the process is based:

The connection

The first step for a program or tool to operate on a database is to establish a connection to the database.

There are two types of connections in the database:

  • Short connection: short connection is closed immediately after the operation is completed.
  • Long connections: Long connections can be kept open, reducing the cost of creating and releasing connections for subsequent applications.

Establishing a connection is more troublesome, first to send a request, send a request to verify the account password, verify the end to see what you have access to, so in the process of use, try to use a long connection.

Keeping a long connection consumes memory. If the connection is inactive for a long time, the MySQL server will be disconnected. You can view the default time using an SQL statement:

show global variables like 'wait_timeout';
Copy the code

This time is controlled by wait_timeout and the default is 28800 seconds, 8 hours.

The query cache

MySQL has a built-in cache module. After executing the same query, we find that the cache does not take effect. Why? MySQL cache is turned off by default.

show variables like 'query_cache%';
Copy the code

Default off means not recommended, why not use MySQL’s built-in cache?

This is mainly because MySQL’s built-in cache has limited application scenarios:

The first is that it requires the SQL statement to be identical, with an extra space in the middle, and to be considered different depending on the case of the letter.

The second is that when any data in the table changes, all the caches in the table will be invalidated, so it is not suitable for applications with a large number of data updates.

So caching is better handled by ORM frameworks (MyBatis has level 1 caching enabled by default) or independent caching services such as Redis.

In MySQL 8.0, the query cache has been removed.

Parsing and preprocessing

Why can a SQL statement be recognized? If you execute a random string hello, the server returns an error of 1064:

[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'hello' at line 1

This is the parser and preprocessor module for MySQL.

This step is to perform lexical and syntactic analysis and semantic parsing of statements based on SQL syntax.

lexing

Lexical analysis is the breaking of an entire SQL statement into words.

For example, a simple SQL statement: select name from user where id = 1 and age >20;

It’s going to recognize select, which is a query, and then it’s going to recognize the user, so you want to do a query in this table, and then it’s going to recognize the conditions after where, which I used to have to look up.

Syntax analysis

Syntax analysis checks the SYNTAX of SQL statements, such as whether single quotes are closed, and generates a data structure based on the syntax rules defined by MySQL. This data structure is called select_lex.

For example, if the syntax in English is “I use is, You use are”, it must not be allowed if it is not correct. After parsing, You will receive “You hava an error in your SQL syntax” message if your SQL statement does not conform to the rules.

The preprocessor

If I write a lexically and syntactically correct SQL, but the table name or field does not exist, where can I get an error? Is it in the execution layer of the database or in the parser? For example: select * from hello;

Or when parsing error, parsing SQL link there is a preprocessor. It examines the generated parse tree to resolve semantics that the parser cannot resolve. For example, it checks for table and column names, names and aliases to make sure there is no ambiguity. After preprocessing, a new parse tree is obtained.

Query optimizer

Is there only one way to execute an SQL statement? Or is the SQL that the database ultimately executes the same SQL that we send?

The answer is no. An SQL statement can be executed in many ways and return the same result. They are equivalent. But if there are so many ways to execute, how do these ways of executing come about? Which one do you choose to implement? According to what judgment criteria to choose?

This is the MySQL query Optimizer module. The purpose of the query optimizer is to generate different Execution plans based on the parse tree, and then select the best Execution Plan. MySQL uses a cost-based optimizer, which is the least expensive Execution Plan.

You can use this command to see the cost of the query:

show status like 'Last_query_cost';
Copy the code

What types of optimizations can the MySQL optimizer handle?

Two simple examples:

1. When we perform associated query on multiple tables, which table’s data should be used as the benchmark table.

2. Select an index when multiple indexes are available.

In fact, modules of the optimizer are essential for every kind of database, with the goal of optimizing query efficiency as much as possible through complex algorithms. But the optimizer is not a panacea, not garbage SQL statements can be automatically optimized, not every time can choose the optimal execution plan, we still need to pay attention to when writing SQL statements.

The execution plan

The optimizer eventually turns the parse tree into an execution_plans, which is a data structure. Of course, this execution plan may not be the optimal one, as MySQL may not cover all execution plans.

How do we check the MySQL execution plan? For example, if there are multiple tables in an associated query, which table should be queried first? What indexes might be used when executing the query, and what indexes are actually used?

MySQL provides a tool to execute a plan. We can see the execution plan information by prefacing the SQL statement with EXPLAIN.

EXPLAIN select name from user where id=1;
Copy the code

The storage engine

Before introducing storage engines, ask two questions:

1. From a logical point of view, where is our data stored, or in what structure?

2. Where is the execution plan? Who does it?

This section describes the storage engine

In a relational database, data is stored in a Table. We can think of this table as an Excel spreadsheet. So our table stores data at the same time, but also organize the data storage structure, this storage structure is determined by our storage engine, so we can also call the storage engine table type.

In MySQL, a variety of storage engines are supported. They can be replaced, so they are called plug-in storage engines. Why support so many storage engines? Isn’t one kind enough?

In MySQL, each table can specify its storage engine, instead of one storage engine per database. Storage engine usage is in tables. Furthermore, the storage engine can be modified after the tables are created.

How to choose a storage engine?

  • If you have high data consistency requirements and need transaction support, you can choose InnoDB.

  • You can select MyISAM if data is frequently queried and updates are seldom, which requires high query performance.

  • If you need a temporary table to query, you can choose Memory.

  • If all storage engines cannot meet your requirements and the technical capability is sufficient, you can develop a storage engine in C language according to the internal manual of the official website. (dev.mysql.com/doc/interna…

Execution engine

Who uses execution plans to manipulate storage engines? This is the execution engine (executor), which uses the corresponding API provided by the storage engine to perform operations.

Why do we change the storage engine of the table without making any changes? Because different functional storage engines implement the same API.

Finally, the data is returned to the client, even if there is no result.

chestnuts

Or the above SQL statement as an example, to comb through the entire SQL execution process.

select name from user where id = 1 and age >20;
Copy the code
  1. Use the connector to query whether the current actor’s role has permissions. Access denied for user (denied for user)

  2. The next step is to query the cache, first to see if there is any in the cache, if there is, then there is no need to go down, just return the result to the client; If not, execute the parser and preprocessor. (MySQL 8.0 directly removed the entire query cache function)

  3. Syntax parser and preprocessing is mainly to analyze the SYNTAX and syntax of THE SQL statement is correct, no problem will proceed to the next step, to the query optimizer;

  4. The query optimizer will perform some optimizations on the SQL statement to see which method is the most cost-effective, and will execute which SQL statement.

    • Select * from user where id = 1 and age > 20;
    • Select * from user where id = 1 and age > 20;
  5. After the optimizer decides which solution to choose, the execution engine executes it. The result is then returned to the client.

conclusion

If this article is helpful to you, I encourage you to click your little finger, like, follow and favorites after reading it.