Understand the MySQL architecture from a macro perspective

preface

Learn MySQL from a global perspective to understand the macro, and then into the details, which can help us understand the details more deeply. This article explains the MySQL architecture.

MySQL architecture

Generally speaking, MySQL can be divided into Server layer and storage engine layer two parts.

The Server layer includes connectors, query caches, analyzers, optimizers, actuators, etc. It covers most of the core service functions of MySQL, as well as all the built-in functions (such as date, time, math and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc.

The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in, supporting InnoDB, MyISAM, Memory and other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5.

In other words, if you do not specify the engine type when executing create Table, InnoDB is used by default. However, you can also select another engine by specifying the type of storage engine, such as using engine=memory in the CREATE Table statement to specify the use of memory engines to create tables. Different storage engines use different ways to access table data.

The different storage engines share a Server layer, which runs from the connector to the actuator. You can get an idea of the name of each component and then look at what each component does in turn.

The connector

The connector is responsible for establishing a connection with the Client, obtaining permissions, and maintaining and managing the connection. The connection command is as follows:

mysql -h -p -u -p
Copy the code

After the above command is sent, the classic TCP handshake is completed, and the connector begins to authenticate the identity, which is essentially to verify the user name and password. After you are authenticated, the connector will check the permissions in the permission table, and the permissions that you can access after connecting will depend on the permissions found at this time.

After a connection is established, if no action is taken, the connection is idle. If there is no action for too long, the connector automatically disconnects. This time is controlled by wait_timeout, which defaults to 8h

In the database, the long connection means that the client keeps using the same connection if there are continuous requests after a successful connection. The short connection means that the client disconnects after a few operations and creates a new connection. The process of establishing connections is complicated. It is recommended to reduce the operation of establishing connections, that is, to use long connections as much as possible.

The query cache

Once the connection is established, you can do something with the database. The execution logic leads to the second step: query the cache. When a query request is received, MySQL will check the query cache to see if it has been executed before. The previously executed statement and its result may be directly cached in memory in the form of key-value. Key is the query statement and value is the query result. The result is directly returned to the client.

If the statement is not in the query cache, subsequent execution phases continue. After the execution is complete, the results are stored in the query cache. As you can see, if the query hits the cache, MySQL can return the result directly without having to perform further complicated operations, which can be very efficient.

But in most cases, query caching is not recommended. Why?

The query cache does more harm than good because the query cache invalidates so frequently that whenever an update is performed on a table, all the query cache on that table is cleared. For databases under pressure to update, the hit ratio of the query cache can be very low. The query cache is suitable only when the data in the table is infrequently updated and is mostly used for queries.

Query_cache_type can be set to DEMAND with the query_cache_type parameter, so that query caching is not used for default SQL statements. For statements that are certain to use query caching, The available SQL_CACHE is explicitly specified in the SQL statement.

select SQL_CACHE * from T where ID=10;
Copy the code

MySQL 8.0 has removed the query cache directly, which means it has been completely removed from MySQL 8.0.

analyzer

If you did not hit the query cache in the previous step, you would go to the parser and actually execute the statement.

MySQL needs to parse the SQL, and the parser will do “lexical analysis” first. MySQL needs to identify the SQL we wrote, and what each string represents. After the identification, the analyzer performs syntax analysis. Based on the result of lexical analysis, the analyzer determines whether the entered SQL meets the MySQL syntax based on the syntax rules.

If there is a syntax ERROR, you will receive an ERROR message.

The optimizer

After a profiler, MySQL optimizes SQL before starting to execute it.

The optimizer determines which index to use when there are multiple indexes in a table, or the order in which tables are executed when there are multiple table associations in a statement.

select * from t1 join t2 using(ID)  where t1.c=10 and t2.d=20;
Copy the code

Select * from t1 where (c=10); select * from T2 where (d = 20); select * from T2 where (d = 20); However, the execution efficiency of the two is different, and the optimizer must choose the optimal execution scheme at this time.

actuator

The optimizer has chosen the optimal execution of the SQL, and the executor begins to execute the statement.

At the beginning of the execution, the system checks whether the current user has the permission to the table to be operated. (If the user matches the query cache, the system does permission verification when the query cache returns the result.) If yes, the system opens the table and executes SQL.

For a SELECT statement, if the SQL does not use an index, the executor executes:

Call the storage engine interface to get the first row of the current table and judge whether it meets the criteria in SQL. If not, skip it. If yes, add it to the result set
Take the next row and repeat the same logic until the entire table is traversed
The executor returns to the client the result set found in the above traversal that matches the criteria

SQL uses roughly the same execution logic for indexes, first fetching the “first row that meets the condition” interface, then iterating through the “next interface that meets the condition” interface, which is defined by the storage engine.

When we analyze an SQL using EXPLAIN, the Rows field can view the number of rows currently being scanned by the SQL, which is accumulated each time the executor calls the storage engine to retrieve rows of data.

The storage engine

The executor calls the interface defined by the storage engine, which is divided into:

InnoDB
MyISAM
MEMORY
CSV
ARCHIVE
BLACKHOLE
MERGE

There are two common storage engines: InnoDB and MyISAM

summary

This article introduces the MySQL architecture from a macro perspective. If you are interested in MySQL, please continue to follow the MySQL column.