Every time we execute an SQL, for example:

mysql> select * from T where ID=10;Copy the code

You may be curious to know how mysql parses and queries the data from the database. In this video, I want to show you how MySQL works

In general, MySQL can be divided into two parts: the Server layer and the storage engine layer.

The Server layer includes connectors, query caches, analyzers, optimizers, actuators, etc., and covers most of MySQL’s core service functions, as well as all built-in functions (such as date, time, math, encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc.

The storage engine layer is responsible for data storage and extraction. It is a plug-in architecture that supports multiple storage engines such as InnoDB, MyISAM and Memory. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5.

In other words, if you do not specify an engine type when creating a table, InnoDB will be used by default. However, you can also select another engine by specifying the type of storage engine, such as using engine=memory in the CREATE Table statement to specify that a memory engine is used to create tables. Different storage engines provide different table data access methods and support different functions. There is a Server layer, which runs from connectors to actuators, for different storage engines. Let’s start with the Server layer.

Server layer

Setting up a connection (Connectors&Connection Pool)

Establish connection with MySQL through client/server communication protocol. The communication mode between the MySQL client and server is “half-duplex”. For every MySQL connection, there is a thread state at all times to identify what the connection is doing.

Communication mechanism:

  • Full duplex: The ability to send and receive data at the same time, such as a normal phone call.
  • Half-duplex: A time when data is either sent or received, but not both. For example, early walkie-talkies
  • Simplex: Only send data or only receive data. Like a one-way street

Thread status: show processList; // User root can view all threads. Other users can only view their own threads

Querying the cache (Cache&Buffer)

Once the connection is established, the select statement can be executed, and the second logical step is to query the cache. This is the place where one can optimize the MySQL, if open the query cache and in the process of query cache query to exactly the same SQL statement, you will query results returned to the client directly, if there is no open the query cache or the SQL query to the same will be handed over to the parser to syntactic semantic parsing, and generate the parse tree “”

show variables like ‘%query_cache%’; // Check whether the query cache is enabled, space size, limit, etc

show status like ‘Qcache%’; // See more detailed cache parameters, available cache space, cache blocks, cache size, etc

The query cache is a point that can be optimized, but I would rather not use the query cache because the disadvantages of the query cache outweigh the advantages.

  1. The invalidation of the query cache is particularly frequent; whenever there is an update to a table, all the query caches on that table are cleared. So you might save it one minute, and then empty it the next.

  2. For databases that are heavily updated, the hit ratio of the query cache can be very low. Unless your business is a static table that updates only once a long time. For example, a system configuration table, the queries on this table are suitable for query caching.

Fortunately, MySQL also provides this “use as needed” approach. You can set the parameter query_cache_type to DEMAND so that the query cache is not used for the default SQL statements. For statements that you do want to use the query cache, you can explicitly specify it with SQL_CACHE, as in the following statement

mysql> select SQL_CACHE * from T where ID=10;Copy the code

Note that MySQL 8.0 removes the entire query cache feature.

The Parser (Parser)

If there is no cache hit, or the query cache is not turned on at all, the parser is the next step. The parser, also called a parser, parses the SQL sent by the client to produce a “parse tree.”

Syntax parsing: An SQL statement is composed of multiple strings and Spaces. MySQL needs to identify what each character is and what it stands for.

MySQL recognizes that this is a query statement by typing the keyword “select”. It also identifies the string “T” as “table name T” and the string “ID” as “column ID”.

After this recognition is done, “parsing” is done. Based on the results of the lexical analysis, the parser will determine whether the SQL statement you entered meets the MySQL syntax based on the syntax rules

If your statement is incorrect, You will receive an error message “You have an error in your SQL syntax”, such as the following statement select missing the starting letter “S”.


mysql> elect * from t where ID=1;

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'elect * from t where ID=1' at line 1
Copy the code

A common syntax error will indicate the first place where the error occurred, so you should focus on what comes right after “use near”.

Parse tree: Parse tree is used to parse keywords and non-keywords of SQL. For example, select username from userinfo by keyword and non-keyword.

Preprocessor: Further checks whether the parse tree is valid according to some mysql rules. For example, check whether the queried table name and column name are correct, and whether the table permission is available

The Optimizer (Optimizer)

Generate the optimal execution plan from the parse tree. MySQL uses a number of optimization strategies to generate the optimal execution plan, which can be divided into two categories: static optimization (compile time optimization) and dynamic optimization (run time optimization).

Equivalent transformation strategy

  • A =5 and a =5
  • A < b and a=5
  • Based on the joint index, adjust the condition position and so on

Optimize count, min, Max and other functions

  • The InnoDB engine min function only needs to find the leftmost part of the index
  • The InnoDB engine Max function only needs to find the rightmost index
  • MyISAM engine count(*), no need to calculate, directly return

Premature termination of a query

  • When you use the LIMIT query, you get the data required by the LIMIT instead of continuing to iterate through the next data

In the optimization of the

  • MySQL will first sort in queries, and then use dichotomy to find data. For example, where id in (2,1,3)

As in (1, 2, 3)

Such as:

mysql> select * from t1 join t2 using(ID)  where t1.c=10 and t2.d=20;
Copy the code
  • We can first fetch the ID value of the record c=10 from table T1, then associate the ID value to table T2, and then determine whether the d value in T2 is equal to 20.
  • Select * from t2 where d=20; select * from T2 where D =20; select * from T2 where D =20;

The logical result of the two execution methods is the same, but the efficiency of the execution will be different, and the optimizer’s role is to decide which one to use.

Execution engine

Query execution engine is responsible for executing SQL statements, begin to execute, must determine first you for the table T have permission to execute the query, if not, it will return without permission error, as shown below (on the project implementation, if a query cache, can be in when you return to the query cache, do the authentication. Queries also invoke precheck validation rights before the optimizer). If you have permission, the query execution engine will open the table to continue execution. At this time, the query execution engine will get the query results and return them to the client based on the storage engine type of the table in the SQL statement and the corresponding API interface to interact with the cache of the underlying storage engine or physical files. If query caching is enabled, the SQL statement and results are stored intact in the Cache&Buffer, and subsequent execution of the same SQL statement returns results directly.

This is the end of MySQL’s operating mechanism.