If you want to have an in-depth understanding of MySQL, you should first understand how MySQL statements are implemented. If you understand the execution process of MySQL statements, you can more quickly analyze the cause of the problem, or carry out reasonable optimization.

MySQL architecture

The architecture diagram of MySQL is shown below and consists of the following parts: connector, cache, analyzer, optimizer, executor and storage engine.

MySQL can be divided into server layer and storage engine layer, server layer includes connectors, analyzers, optimizers and actuators, mainly responsible for SQL syntax parsing, implementation of built-in functions, triggers, views and so on. Storage engine layer is responsible for data storage and extraction, storage engine is plug-in, MySQL supports storage engine InnoDB, MyISAM, Memory and so on. Currently, InnoDB is the default storage engine for mysql.

The connector

Connectors are responsible for establishing network connections with clients, verifying user names and passwords, verifying user rights, and maintaining and managing connections.

After the network connection is established, the user name and password are verified. After the user name and password are verified, the connector queries the permission of the user in the permission table. After that, the permission logic in the connection will depend on the permissions read at that point. This means that after a user successfully connects, modifying the user’s permission will not affect the established connection. The permission will take effect only after the connection is re-established.

MySQL’s network connection uses a multithreaded model, maintaining a thread pool and selecting a thread from the pool to process each new connection request. You can use the show processList command to see all the connections that are currently established.

+------------+--------------+--------------------+------------------+---------+-------+-------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------------+--------------+--------------------+------------------+---------+-------+-------+------------------+ | 49788 | 1801071833 | user_name | 10.1.1.1: test_db | Sleep | 131 | | NULL | | 2309292411 | user_name | 10.1.1.1:57642 | test_db | Query | 0 | NULL | show processlist |Copy the code

ID Indicates the ID of the thread that establishes the connection. If the client has not acted for a period of time, Sleep is displayed in the Command column, indicating that the connection is idle. A multithreaded model is bound to have a limited number of connections, so if the client is inactive for too long, the connector will automatically disconnect and reclaim the thread.

The cache

After the connection is established, the query statement can be executed. The query statement first queries whether the statement is cached in the cache because the execution result of the MySQL query statement may be stored in the cache in k-V format. The SQL statement is used as the KEY, and the query result is used as the value.

However, the MySQL cache is not recommended because the MySQL cache invalidates frequently. If a table is updated, all caches in the table will be invalidated. Therefore, the cache hit ratio is very low. Not as flexible and efficient as using Redis or Memcached in the business layer.

analyzer

If the cache misses or is not used, the query will arrive at the parser, which is a programming language parser that parses SQL. The work of the analyzer is divided into two parts:

1. Lexical analysis: During lexical analysis, the analyzer will analyze each string separated by Spaces or commas in SQL statements, extract the SELECT keyword, map the string identified as the table name in the statement to the MySQL table, and map each column to the field in the table.

2 Syntax analysis: Syntax analysis refers to whether the entire SQL statement meets the syntax requirements. If yes, the statement can be executed successfully. If no, an error is reported.

The optimizer

The optimizer’s functionality can be described in a sentence, but it is very important to determine the performance of the query. The optimizer determines which index to use when there are multiple indexes in the table. Or deciding which table to associate with which table during a join table query.

actuator

The function of the actuator is to call the API of the storage engine to store or retrieve data. Before invoking the API of the storage engine, permission verification is performed to check whether the user has the operation permission on the table. If the index does not match, the storage engine scans the table one by one until it finds the specified data and returns it to the server layer. If an index is hit, the storage engine scans the hit data until it finds the specified data. If the index type is const, the storage engine hits it directly and returns.

How is the query statement implemented

Having said the architecture of MySQL, let’s use an example to summarize how a query language is implemented

select * from t where id = 123 and name = 'tom'
Copy the code

1 The client establishes a network connection with the MySQL server. The connection statement is as follows:

Mysql -h 127.0.0.1 -p 3306 -u'name' -p'password! ' database_name -A --default-character-set=utf8
Copy the code

This statement specifies the address of the MySQL server as 127.0.0.1, i.e. the local host, port number as 3306, user name as name, and password as password. Specify the library name as database_name and the default character set as UTF8.

2 After the connection is complete, if the MySQL cache mechanism is enabled, the system checks whether the cache is hit. If the cache is hit, the system returns the data in the cache. If the cache is not hit, the system continues.

For example, select is the SQL keyword, t is the table name, and ID and name are the fields in the table name. Then analyze whether the SQL syntax is normal and the statement can be executed normally.

4 The optimizer analyzes whether there is an index on the field ID and name and which index should be selected. If table T is the primary key id, then the parser will go directly to the primary key index.

5 The actuator checks whether the user has the read permission before executing the user. After permission verification, the executor invokes the API of the storage engine to query the data and returns the data to the client.

How is the update statement implemented

An update statement execution will experience a query that go through several stages, connector connection is established, syntax analyzer analysis, the optimizer to choose the index and actuator API call storage engines, compared with the query, the update statement is more complex, because MySQL InnoDB engine to ensure that the data is not lost in the database after the machine downtime.

Also use an example to summarize how the query statement is implemented

update t set name = 'tom' where id = 123
Copy the code

Procedure 1 Establish a network connection between the client and the MySQL server

2 The parser determines that this is an update statement

3 The optimizer selects the primary key index, assuming id as the primary key of the table

4 The executor first checks whether there is a row of data in table T whose ID is 123 in the memory. If there is no row of data in table T, the executor retrives the row from the memory through the storage engine

5 The executor changes the name field to Tom to get a new line

6 The storage engine writes the new row to the memory and writes redo log. The redo log is in the prepare state

7 Write bin log logs to the executor

8 The storage engine changes the redo log status to commit

Mysql asynchronously writes data from memory to disk at regular intervals. This process is similar to that of a file system with a page cache. A file is written to the cache first and then a separate process is used to flush data to disk. Mysql uses redo log logs so that data is not lost even if the server goes down and can be recovered from redo log logs.

Redo log and bin log

1 Redo log logs are written by the server layer. Bin log logs are written by the storage engine.

The redo log is a physical log that records what was changed on a data page. The bin log is used to record logical operations. In statement mode, bin log records SQL statements.

3 Redo log Indicates that data is written in a cycle. When the space is used up, data is flushed to disks and then the space is cleared. Bin log Logs are appended.

4 Redo log is used to restore data after a database crash. Bin log is used for active/standby synchronization and data backup.