This is the 8th day of my participation in the First Challenge 2022. For details: First Challenge 2022.


Previous article: # How is an SQL query executed (part 1)? — Handling connections

2. Parsing and optimization

After the server receives the request from the client, it also needs to go through query caching, lexical parsing and pre-processing, and query optimization.

2.1 Querying Cache information

If we execute the same query twice, will the response time be shorter the second time than the first time?

For those of you who have used the Redis cache tool before, it is natural to think that MySQL should check the cache when it receives a query request to see if it has executed this command before. If the cache hits, the result is returned directly; Otherwise, query again and add to the cache.

MySQL does have a caching module built in.

Now I have a table with 500W rows and no index. If I execute the following command twice, will the second one be faster?

SELECT * FROM t_user WHERE user_name = 'mu'Copy the code

Will not! The cache does not take effect. Why? MySQL has its caching disabled by default. Take a look at the query_cache_type variable setting.

mysql> show variables like 'query_cache_type';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_cache_type             | OFF     |
+------------------------------+---------+
Copy the code

Default off means not recommended. Why doesn’t MySQL recommend users to use their own caching feature?

  1. The MySQL cache system is very limited in application scenarios. It requires that THE SQL statements must be identical. An extra space or a change in case are considered as two different SQL statements
  2. Cache failures are very frequent. Any changes to a table’s data invalidate all caches for that table. For frequently updated tables, the cache hit ratio is very low!

Therefore, the function of caching is more suitable for professional ORM framework (for example, MyBatis enables level 1 caching by default) or independent cache service Redis.

MySQL8.0 has removed caching completely

2.2 Parser & Preprocessor (Parser & Preprocessor)

Now that I’ve skipped the caching step, what do I need to do?

If I randomly type the string chanmufeng into the client terminal, the server returns a 1064 error

mysql> chanmufeng;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'chanmufeng' at line 1
Copy the code

How does the server figure out that my input is wrong? This is where MySQL’s Parser comes in. It consists of two steps: lexical parsing and parsing

2.2.1 Lexical analysis

The following SQL statements are used as an example

SELECT * FROM t_user WHERE user_name = 'mu' AND age > 3; SELECT * FROM t_user WHERE user_name = 'mu' AND age > 3;Copy the code

The analyzer will first do “lexical analysis”, which is to break a complete SQL statement into words. For example, a simple SQL statement will be broken into eight symbols, what type each symbol is, and where it starts and ends.

MySQL recognizes this from the keyword SELECT that you type, which is a query statement. It also identifies the string t_user as “table name t_user” and the string user_name as “column user_name”.

2.2.2 Grammar analysis

After lexical analysis, we need to do grammatical analysis.

Based on the results of lexical analysis, the parser determines whether the SQL statement you entered meets the MySQL syntax based on the syntax rules, such as whether the single quotation marks are closed or whether the keyword is spelled correctly.

The parser generates a data structure based on the SQL statement, which we call the parse tree.

I misspelled it on purposeSELECTMySQL > select * from ‘MySQL’;

Mysql > ELECT * FROM t_user WHERE user_name = 'my_user_name' AND age > 3; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ELECT * FROM t_user WHERE user_name = 'mu wind' at line 1Copy the code

Lexical parsing is a very basic function, Java compiler, Baidu search engine if you want to identify statements, must also have lexical parsing function.

Any database middleware that parses SQL for routing must also have lexic and syntactic capabilities, such as Mycat, Sharding-JDBC (Druid Parser), etc. There are also many open source lexical parsing tools on the market, such as LEX, Yacc, etc.

2.2.3 Preprocessor

If we write a syntactic and lexically-sound SQL, but the field and table names do not exist, at what stage does the error occur?

Lexical analysis and parsing are unable to know what tables and fields exist in the database. Knowing this information also requires another tool of the parsing phase, the preprocessor.

It examines the generated parse tree to resolve semantics that the parser cannot resolve. For example, it checks for table and column names, names and aliases to make sure there is no ambiguity. After preprocessing, a new parse tree is obtained.

Parsing and preprocessing is essentially a compilation process that involves lexical parsing, syntax, and semantic analysis. We won’t go into more details, but interested readers can check out books on compilation principles.

2.3 Query Optimizer and query execution plan

MySQL > select table, column, and search criteria from MySQL

Not yet. The MySQL authors are worried that our SQL is too garbage, so they have designed something called query optimizer to help us improve query efficiency.

2.3.1 What is a query optimizer?

Is there only one way to execute an SQL statement? Or is the SQL that the database ultimately executes the same SQL that we send?

It isn’t. An SQL statement can be executed in many ways and return the same result. They are equivalent.

To take a very simple example, say you execute the following statement:

SELECT * FROM t1, t2 WHERE t1.id = 10 AND t2.id = 20
Copy the code
  • We can fetch the records with ID =10 from table T1, and then associate them with table T2 according to the ID value, and then judge whether the value of ID in t2 is equal to 20.
  • You can also fetch the records with ID =20 from table T2, and then associate them with table T1 according to the ID value, and then determine whether the value of ID in t1 is equal to 10.

The logical result of these two methods of execution is the same, but the efficiency of execution will be different. If there are so many ways of execution, how do these ways of execution come to be? Which one do you choose to implement? According to what judgment criteria to choose?

This is what the MySQL query Optimizer module does.

The purpose of the query optimizer is to generate different Execution plans based on the parse tree, and then select the best Execution Plan. MySQL uses a cost-based optimizer, which will be the least expensive Execution Plan.

2.3.2 What exactly does the optimizer do?

To give two simple examples:

  1. When we perform associated query on multiple tables, which table’s data is used as the benchmark table.
  2. Select an index when multiple indexes are available.

In fact, modules of the optimizer are essential for every kind of database, implementing complex algorithms to optimize query efficiency as much as possible.

In detail, the query optimizer does the following:

  • Subquery optimization
  • Equivalent predicate rewriting
  • Condition of reduction
  • External connection elimination
  • Nested connections are eliminated
  • The connection to eliminate
  • Semantic optimization

This article will not explain the details of optimization, you should first understand the overall architecture of MySQL, specific details will be introduced separately

But the optimizer is not a panacea. If the SQL statement is written too crappy, the optimizer won’t save you. So when you write SQL statements, you still need to consciously optimize them.

2.3.3 Executing the Plan

So what do I get when I optimize it? The optimizer eventually turns the parse tree into a query execution plan.

The query execution plan shows how the query will be executed later, such as multiple table associated query, which table to query first, and which indexes to actually use when multiple indexes are available during query execution.

MySQL provides a tool for viewing execution plans. We can see the execution plan information by adding EXPLAIN in front of the SQL statement.

mysql> EXPLAIN SELECT * FROM t_user WHERE user_name = ''; +----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+------------- + | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+------------- + | 1 | SIMPLE | t_user | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | 100.00 Using the where | +----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+------------- +Copy the code

For more detailed information, you can also use FORMAT=JSON, or turn on optimizer Trace.

mysql> EXPLAIN FORMAT=JSON SELECT * FROM t_user WHERE user_name = '';
Copy the code

The text will not cover every parameter of the execution plan in detail. It will be rather complicated. You should first understand the overall architecture of MySQL, and then introduce the details separately