MySQL logical architecture analysis

Refer to the link: www.bilibili.com/video/BV1iq…

1. Logical architecture analysis

1.1. The server handles client requests

First, Mysql is a typical C/S architecture, and the server program uses mysqld

Regardless of how the client and server processes communicate, the final result is that the client process sends a piece of text (SQL statement) to the server process, which then sends a piece of text (processing result) to the client process.

What does the server process do with the request sent by the client process to produce the final result? The following uses the query request as an example:

Let’s have a look at it in detail:

  • Connectors: Refers to interaction with SQL in a different language [client program other than mysql server, language specific].

  • Connection Pool: provides multiple threads for client and server interactions to process connections, manage buffered user connections, thread processing, and other cached requirements. MySQL database connection layer.

  • Management Serveices & Utilities: System Management and control tools. Backup, security, replication, clustering, etc. [Basic service components].

  • SQL Interface: receives SQL commands from users and returns the query results.

  • Parser: SQL statement Parser [Parser, semantic parsing, syntax tree generation].

  • Optimizer: a query Optimizer that is used by SQL statements to optimize queries before they are queried. Is to optimize the client request query, according to the client request query statement, and some statistical information in the database, on the basis of a series of algorithms to analyze, get an optimal strategy, tell the following program how to obtain the results of the query statement.

    • For Example:select uid,name from user where gender = 1;thisselect Query based onwhere Statement to select, rather than first all the table query out latergenderFiltering; Then according to theuidandnameInstead of pulling out all the attributes and filtering them later, project them. Finally, the two query conditions are joined to generate the final query result.
  • Caches & Buffers: query cache, caching query results as key:value [deprecated in mysql8.0]

  • Pluggable Storage Engines: Pluggable Storage Engines. The most important thing that differentiates MySQL from other databases is its plug-in table storage engine (note: the storage engine is table-based, not database-based). Interacting with the underlying file system, the ** storage engine is mainly InnoDB.

  • File System: Data is stored on disks.

  • Files&logs: Log files

MySQL database Compared to other databases, MySQL is a little different, its architecture can be used in many different scenarios and work well. It is mainly reflected in the architecture of storage engine. Plug-in storage engine architecture separates query processing from other system tasks and data storage and extraction. In this architecture, storage engines can be selected based on business requirements and actual requirements.

1.2 logical architecture layer

1.2.1 Layer 1: Connection layer

At the top layer are some client and connection services, including local SOCK communication and most TCP/IP-like communication implemented based on client/server tools. Mainly complete some similar connection processing, authorization and authentication, and related security schemes. On this layer, the concept of thread pools is introduced to provide threads for clients that are securely accessed through authentication. SSL – based secure links can also be implemented at this layer. The server also validates that it has operational permissions for each client that is securely connected.

The first thing the system (client) does before accessing the MySQL server is to establish a TCP connection.

After the connection is successfully established through the three-way handshake, the MySQL server performs identity authentication and obtains permissions for the account password transmitted by TCP.

  • If the user name or password is incorrect, an “Access denied for user” message will be received, and the client program will stop running
  • If the user name and password are authenticated, the system checks the permission of the account and its connection from the permission table. The subsequent permission judgment logic depends on the read permission

When a TCP connection receives a request, it must be allocated to a thread that interacts exclusively with the client. So there’s also a thread pool to go through the process. Each connection fetches threads from the thread pool, eliminating the overhead of creating and destroying threads.

1.2.2 Layer 2: Service layer

MySQL core services function layer, which is the core of MySQL, such as SQL interface, including query cache, parser, parse tree, preprocessor, query optimizer. Perform query parsing, analysis, query cache, built-in functions, stored procedures, triggers, and views. The SELECT operation checks whether the query cache is matched. If the query cache is matched, the cache data is returned.

SQL Interface: indicates the SQL Interface

  • Receives SQL commands from the user and returns the results that the user needs to query. Such as the SELECT… FROM calls SQL Interface
  • MySQL supports multiple SQL language interfaces such as DML (Data Manipulation Language), DDL (Data Definition Language), stored procedures, views, triggers, and custom functions

Parser: the Parser

  • In the parser for SQL statement syntax analysis, semantic analysis. The SQL statement is decomposed into a data structure and passed to the following steps, on which the SQL statement is passed and processed. If an error is encountered in the decomposition composition, then the SQL statement is unreasonable.

  • When the SQL command is delivered to the parser, the parser verifies and parses it, creates a syntax tree for it, enriches the query syntax tree based on the data dictionary, and verifies whether the client has the permission to execute the query. After the syntax tree is created, MySQL also optimizes and rewrites SQl queries syntactically.

Optimizer: query Optimizer

  • After the SQL statement is parsed but before the query, the query optimizer is used to determine the execution path of the SQL statement and generate an execution plan. The execution plan indicates which indexes should be used for the query (full table or index), the join order between the tables, and finally calls the methods provided by the storage engine to actually execute the query according to the steps in the execution plan and returns the query results to the user.

  • It uses a select-project-connect strategy for queries. SELECT id,name FROM student WHERE gender = ‘female ‘; The SELECT query is selected based on the WHERE statement, rather than gender filtering after all the tables are queried. The SELECT query first projects attributes based on ID and name, rather than fetching all attributes and then filtering them, concatenating the two criteria to produce the final query result.

Caches & Buffers: Query the cache component

  • MySQL maintains some caches and buffers. For example, Query Cache is used to Cache the results of a SELECT statement. If the Query result can be found in Query Cache, the whole process of Query parsing, optimization, and execution is not needed, and the result is directly reported to the client.

  • This caching mechanism consists of a series of small caches. Such as table cache, record cache, key cache, permission cache and so on

  • This query cache can be shared between different clients.

  • As of MySQL 5.7.20, query caching is not recommended and was removed in MySQL 8.0, mainly due to the extremely low hit ratio.

Quick story: If I asked you what the value of 9+8×16-3×2×17 was, you would probably use a calculator to work it out and end up with 35. If I were to ask you again what 9+8 times 16-3 times 2 times 17 is, would you do it again with a stupid calculation? We just did the math, so just say the answer.

1.2.3 Layer 3: Engine layer

Storage Engines are responsible for the Storage and extraction of data in MySQL, performing operations on the underlying data maintained at the physical server level. The server communicates with the Storage engine through apis. Different storage engines have different functions, so you can select them according to your actual needs.

MySQL 8.0.25 supports the following storage engines by default:

1.2.4 Storage Layer

The data storage layer stores data on file systems running on raw devices and interacts with storage engines. All the data, the database, the table definition, the contents of each row of the table, and the indexes, are stored on the file system as files and interact with the storage engine. Some storage engines, such as InnoDB, also support direct management of raw devices without using a file system, but modern file system implementations make this unnecessary. Under the file system, local disks and storage systems such as DAS, NAS, and SAN can be used. 【 var/lib/mysql 】

1.3, summary

The MySQL architecture diagram is shown at the beginning of this section. In order to be familiar with the SQL execution process, we can simplify as follows:

Simplified to three-layer structure:

  1. Connection layer: the client establishes a connection with the server, and the client sends SQL to the server.
  2. SQL layer (service layer) : query processing of SQL statements; Has nothing to do with how database files are stored;
  3. Storage engine layer: works with database files and is responsible for storing and reading data.

2. Causes of SQL performance degradation

Performance degradation leads to slow SQL execution, long execution time, and long waiting time:

  • The query statement is written poorly.
  • Index failure: An index is created but not used.
  • Too many associated queriesjoin(design defects or forced requirements).
  • Server tuning and setting of various parameters (buffer, thread count, etc.).

3. SQL execution process

3.1 SQL execution process in MySQL

  1. The client sends a query request to the MySQL server
  2. The server first queries the query cache, and returns the result stored in the cache immediately if it hits the cache, otherwise it goes to the next stage
  3. The server parses the SQL, preprocesses it, and generates the corresponding execution plan by the optimizer
  4. MySQL invokes the storage engine API to execute the query according to the execution plan
  5. The result is returned to the client and the query result is cached

MySQL query process:

1. Query cache:

If the Server finds this SQL statement in the query cache, it will directly return the result to the client. If not, the parser phase is entered. It should be noted that since query caching tends to be inefficient, this feature was abandoned after MySQL8.0.

In most cases the query cache is a chicken rib, why?

SELECT employee_id,last_name FROM employees WHERE employee_id = 101;
Copy the code

Query caching is the process of caching query results in advance so that they can be retrieved the next time without execution. Note that the query cache in MySQL does not cache the query plan, but the corresponding results of the query. This means that query matching is much less robust, and only the same query operation will hit the query cache. A difference in any character (for example, space, comment, case) between two query requests will result in a cache miss. Therefore, the MySQL query cache hit ratio is not high.

So what are the conditions for hitting a cache, and what won’t be cached?

If the query request contains some system functions, user-defined variables and functions, and some system tables such as those in mysql, INFORMATION_SCHEMA, and Performance_SCHEMA databases, the request will not be cached. To certain system functions, for example, the same function of the two called may produce different results, such as function NOW, each call to generate the latest current time, if call the function in a query request, that even if the query request text information are all the same, the different time of two query should also get a different result, If the result of the first query is cached, it is an error to use the result of the first query on the second query.

In addition, since it is a cache, there are times when the cache will expire. MySQL’s caching system monitors every table involved, and whenever the structure or data of the table is changed, If an INSERT, UPDATE, DELETE, TRUNCATE TABLE, ALTER TABLE, DROP TABLE, or DROP DATABASE statement is used for this TABLE, All cache queries that use this table will be invalidated and removed from the cache! For databases under pressure to update, the hit ratio of the query cache can be very low.

It is recommended to use query caching for static tables (i.e. tables that rarely update, such as configuration tables and dictionary tables). You can set query_cache_type to DEMAND in the configuration file my.cnf, which means that the SQL statement will be cached only when the sql_cache keyword is present, for example

query_cache_type=2Use as neededCopy the code

If set to 2, no query cache is used for default SQL statements, and sql_cache can be used for statements that you do want to use query cache, as in the following SQL statement

select sql_cache * from test where ID = 5;
Copy the code

Check whether query cache is enabled:

mysql> show variables like "%query_cache%";
+------------------------------+----------+
| Variable_name                | Value    |
+------------------------------+----------+
| have_query_cache             | YES      |
| query_cache_limit            | 1048576  |
| query_cache_min_res_unit     | 4096     |
| query_cache_size             | 16777216 |
| query_cache_type             | OFF       |
| query_cache_wlock_invalidate | OFF      |
+------------------------------+----------+
6 rows in set (0.00 sec)
Copy the code

Query_cache_type: 0 indicates that query cache is disabled, 1 indicates that query cache is always enabled (sql_no_cache is not used for caching), and 2 indicates that query cache is enabled on demand (SQL_cache is used for caching). Query_cache_size: specifies the maximum memory space allocated to the cache

RESET QUERY CACHE: Removes all queries from the QUERY CACHE FLUSH TABLES: closes all open TABLES and clears the QUERY CACHE

Example: SQL statement execution effect after query caching is enabled

[root@localhost WWW]# find / -name my.cnf /etc/my.cnf [root@localhost WWW]# vim /etc/my.cnf add query_cache_type=1 [root@localhost www]# systemctl restart mysqldCopy the code

Query result:

#Before query caching is enabled
SELECT * from purchase_order WHERE order_sn like "ry2bhxtAu4%"
> OK
>Time: 2.674 s

#After query caching is enabled
SELECT * from purchase_order WHERE order_sn like "ry2bhxtAu4%"
> OK
>Time: 0.002 s
Copy the code

2. Parsers:

In the parser for SQL statement syntax analysis, semantic analysis.

The parser first does a “lexical analysis”. You are entering an SQL statement consisting of multiple strings and Spaces. MySQL needs to figure out what the strings are and what they represent. MySQL knows from the keyword “select” that you typed in. This is a query. It also recognizes the string “T” as “table name T” and the string “ID” as “column ID.”

Next, do a “grammar analysis”. Based on the results of the lexical analysis, the parser (such as Bison) will determine whether the SQL statement you entered meets the MySQL syntax based on the syntax rules.

select department_id,job_id,avg(salary) from employees group by department_id;
Copy the code

If the SQL statement is correct, an error is reported, and a syntax tree like this is generated:

Procedure steps for SQL lexical analysis:

3. Optimizer:

In the optimizer, the execution path of the SQL statement is determined, such as whether it is retrieved by the full table or by the index.

After the parser, mysql knows what to do. Before the SQL statement is executed, it needs to be processed by the optimizer. A query statement can be executed in many ways that can return the same result.

For example, the following statement executes a join on two tables:

select * from test1 join test2 using(ID)
where test1.name='zhangwei' and test2.name='Mysql Advanced Course';
Copy the code

Select name=’zhangwei’ from test1 where ID =’zhangwei’; select name=’zhangwei’ from test2 where ID =’zhangwei’; select name from test2 where ID =’zhangwei’;

Select * from test2 where name=’ ID ‘and name= zhangwei; select * from test1 where name= zhangwei; select * from test1 where name= zhangwei; select * from test1 where name= zhangwei;

The logical result of the two execution methods is the same, but the efficiency of the execution will be different, and the role of the optimizer is to decide which one to use. After the optimizer phase is complete, the execution plan of the statement is determined, and then the executor phase is entered.

If you have any questions, such as how the optimizer chose the index and whether it might have chosen it wrong, etc. We’ll talk about that later when we talk about indexes.

In query optimizer, it can be divided into logical query optimization stage and physical query optimization stage.

4. Actuator:

So far, no actual tables have been read or written, only an execution plan has been produced. So you’re in the actuator phase.

Before performing this operation, check whether the user has the permission. If not, a permission error is returned. If you have the permission, you execute the SQL query and call the storage engine API of the corresponding table and return the results (the storage engine API is just an abstract interface, and there is a storage engine layer below, which depends on the storage engine chosen by the table). In versions below MySQL8.0, query results are cached if query caching is set.

select * from test where id=1;
Copy the code

Select * from test where ID does not have an index;

Call InnoDB engine interface to fetch the first row of the table, check whether the ID value is 1, if not skip, if it is stored in the result set; Call the engine interface to fetch “next row” and repeat the same logic until the last row of the table is fetched. The executor returns the recordset of all rows that meet the criteria in the above traversal as a result set to the client. At this point, the statement is complete. For indexed tables, this is more efficient, but the logic is similar.

3.2. Summary: the query process of Sql statements

The flow of SQL statements in MySQL is as follows: SQL statement → query cache → parser → optimizer → executor.

Mysql query flow:

Mysql client establishes connection with mysql server through protocol, sends query statement, first checks query cache, if match, directly returns the result, otherwise carries on statement parsing, that is, before parsing query, sends query statement. The server first accesses the Query cache, which stores SELECT statements and the corresponding query result set. If a query result is already in the cache, the server will no longer parse, optimize, and execute the query. It simply returns the cached results to the user, which greatly improves the performance of the system.

Syntax parsers and preprocessing:

First, mysql parses SQL statements by keyword and generates a corresponding “parse tree”. The mysql parser validates and parses queries using mysql syntax rules; The preprocessor further checks that the parsed number is valid based on some mysql rules.

Query optimizer:

When the parse tree is considered legal, it is turned into an execution plan by the optimizer. A query can be executed in many ways, all returning the same result. The optimizer’s job is to find the best execution plan.

Then, the executor: mysql uses the B TREE index by default, and a general direction is that no matter how much SQL is messed up, at least for now, mysql only uses one index in the table at most.

3.3 SQL execution principle in MySQL8

To understand mysql query execution, you can use select @@profiling; Or show variables like ‘profiling’; Check whether schedule is enabled to enable mysql to collect resources used during SQL execution

  1. Verify that profiling is enabled
mysql> select @@profiling;
mysql> show variables like 'profiling';
Copy the code

Profiling =0 means off, we need to turn profiling on, that is, set to 1:

mysql> set profiling=1;
Copy the code

2. Execute the same SQL query multiple times

Then we execute an SQL query (you can execute any SQL query) :

mysql> select * from employees;
Copy the code

3. View Profiles

View all profiles generated by the current session:

mysql> showprofiles; # display the most recent queriesCopy the code

4. View profiles

Display the execution plan and view the execution steps of the program:

mysql> showprofile; # display the most recent1A queryCopy the code

You can also run the show profile CPU,block IO for Query Query_ID command to Query the specified Query ID.

Such as:

mysql> show profile for query 7;
Copy the code

Query SQL execution time results are the same as above.

In addition, you can also query for richer content:

mysql> show profile cpu,block io for query 6;
Copy the code

Continue to:

mysql> show profile cpu,block io for query 7;
Copy the code

You can see that they are the same, so the query cache is not enabled,

The above operation was tested in MySQL5.7, and it was found that the same SQL statement executed twice before and after is still the same query process. Don’t you use caching? Here we need to explicitly turn on the query caching mode. In MySQL5.7, set the following:

1. Enable query cache in the configuration file

Add a new line to /etc/my.cnf:

query_cache_type=1
Copy the code

2. Restart the mysql service

systemctl restart mysqld
Copy the code

3. Enable the query execution plan

Because the service has been restarted, you need to re-execute the following command to enable profiling.

mysql> set profiling=1;
Copy the code

4, execute statement twice:

mysql> select * from locations;
mysql> select * from locations;
Copy the code

5. View Profiles

6. View profiles

Display the execution plan and view the execution steps of the program:

mysql> show profile for query 1;
Copy the code

mysql> show profile for query 2;
Copy the code

The conclusion speaks for itself. When executing number 2, there is much less information than when executing number 1. As you can see from the screenshot, the query gets the data directly from the cache.

Pay attention to

  • SQL must be consistent; otherwise, the cache cannot be hit
  • The same cache-enabled configuration information was added in Mysql8, and an error occurred during service restart

3.4. Order of SQL syntax

With the update of Mysql version, its optimizer is also constantly upgraded. The optimizer will analyze the performance consumption caused by different execution sequences and dynamically adjust the execution sequence.

Demand: Query the number of people over 20 years old in each department and the number of people over 20 years old should not be less than 2

Here is the order of the most common queries:

Conclusion:

4. Seven JOIN theories

5, Database buffer pool

InnoDB storage engine manages storage space in the unit of pages. We add, delete, change and search operations are essentially accessing pages (including reading pages, writing pages and creating new pages). While disk I/O needs to consume a lot of time, and in memory operations, efficiency will be much higher, in order to make the data table or index data available to us at any time, DBMS will request memory as a data buffer pool, before the actual access to the page, Pages on disk need to be cached into a Buffer Pool in memory before they can be accessed.

This has the advantage of minimizing disk activity and thus reducing direct I/O time with the disk. As you know, this strategy is critical to improving query performance for SQL statements. If the indexed data is in the buffer pool, the cost of access is much lower.

5.1 Buffer pool vs query cache

Are buffer pools and query caches the same thing? It isn’t.

1. Buffer Pool

First we need to understand what buffer pools are included in the InnoDB storage engine.

In The InnoDB storage engine, a portion of the data is stored in memory. The buffer pool accounts for most of this memory. It is used to store various data caches, as shown in the following figure:

From the diagram, you can see that the InnoDB buffer pool includes data pages, index pages, insert buffers, lock information, adaptive Hash and data dictionary information, etc.

Caching principles:

The “location * frequency” principle helps optimize I/O access efficiency.

First, location determines efficiency, and buffer pools are provided so that data can be accessed directly in memory.

Second, frequency determines priority order. The size of the buffer pool is limited. For example, if the disk has 200 GB, but the memory is only 16 GB, and the buffer pool size is only 1 GB, all data cannot be loaded into the buffer pool. In this case, the hot data that is frequently used is preferentially loaded.

2. Query cache

So what is query caching?

Query caching is the process of caching query results in advance so that they can be retrieved the next time without execution. Note that the query cache in MySQL does not cache the query plan, but the corresponding results of the query. The hit ratio is low because the hit criteria are harsh and the query cache is invalidated whenever the table changes.

5.2 How does the Buffer Pool read Data

The buffer pool manager tries to save frequently used data. When the database performs a page read operation, it first determines whether the page is in the buffer pool. If not, it stores the page to the buffer pool through memory or disk for reading.

The structure and function of the cache in the database are shown in the following figure:

If we update the data in the cache pool while executing the SQL statement, will the data be immediately synchronized to disk?

5.3 Viewing and Setting the Buffer Pool Size

If you are using the InnoDB storage engine, you can check the size of the buffer pool by looking at the innodb_buffer_pool_size variable. The command is as follows:

show variables like 'innodb_buffer_pool_size';
Copy the code

You can see that InnoDB buffer pool size at this time is only 134217728/1024/1024=128MB. We can change the buffer pool size to, say, 256MB as follows:

set global innodb_buffer_pool_size = 268435456;
Copy the code

Or:

[server]
innodb_buffer_pool_size = 268435456
Copy the code

Now look at the buffer pool size, which has been successfully changed to 256 MB:

5.4 Multiple Buffer Pool Instances

[server]
innodb_buffer_pool_instances = 2
Copy the code

This indicates that we want to create two Buffer Pool instances.

To see how to check the number of buffer pools, use the following command:

show variables like 'innodb_buffer_pool_instances';
Copy the code

How much memory does each Buffer Pool instance actually occupy? It was actually calculated using this formula:

innodb_buffer_pool_size/innodb_buffer_pool_instances
Copy the code

This is the total size divided by the number of instances, resulting in the size of each Buffer Pool instance.

5.5 Extended questions

The Buffer Pool is a core component of MySQL’s memory structure. You can think of it first as a black box.

Update data flow under the black box

When we query data, the storage engine will first load the data from the disk into the buffer pool and then return the data to the client. Similarly, when we update some data, if the data does not exist in the buffer pool, we will also load the data in the buffer pool first. Then modify the data in memory, the modified data will be uniformly flushed to disk later

Also, I get an error halfway through the update and want to roll back to the previous version. What should I do? What crash recovery can we talk about if we can’t guarantee data persistence and transaction rollback?

Answer: Redo Log and Undo Log