Understand the implementation principles of MySQL

We visit all kinds of websites and apps every day. There is a lot of data on these things. So where is the data stored? The database.

When we make an operation to the website or App to access data, in fact, in the final analysis is the execution of a SQL statement, so how is a SQL statement executed?

First, architecture

It involves a lot of knowledge, so, let me take you to understand the principle behind SQL statement execution ~

Let’s look at a couple of pictures before we start

The above image is the MySQL architecture diagram provided by MySQL. We can clearly see that a connection has to go through many steps to obtain the result.

For those who are not very good at English (like me), the picture above should be confusing. So I found a clearer picture in the blog garden. Let’s take a look at the following:

If you look at this picture, is it a little bit clearer? Many people still don’t know how to start when they see this diagram, so they don’t know where to start and how to analyze it, so I drew a more simple and easy to understand architecture diagram.

To put it simply, MySQL is divided into Server layer and storage engine layer:

  • Server layer: Mainly includes connectors, query cache (MySQL8.0 removed), profilers, optimizers, actuators, etc. All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, functions, etc. There is also a general purpose log module binglog
  • The storage engine: mainly responsible for data storage and reading, the plug-in architecture can be replaced, support InnoDB, MyISAM, Memory and other storage engines, InnnoDB has its own log module (will be introduced below).The most commonly used storage engine is InnoDB, which has been used as the default storage engine since MySQL5.5.5.

Second, the detailed introduction of components

1. Connector (Verify identity + Permissions)

Connectors are primarily concerned with authentication and permit-related functions, much like a high-level gatekeeper.

Is mainly responsible for database user login, user authentication, including checking account password, permissions and other operations, if the user account password has passed, the connector will query the user to access list all permissions, in this connection the permissions after logic are will depend on the read permissions on the data at this time, that is to say, later as long as this connection is open, Even if the administrator changes the permission of the user, the user is not affected.

Mysql > select * from MySQL8.0;

The query cache is mainly used to cache the SELECT statement we execute and the result set of that statement.

After the connection is established, the MySQL database checks whether the SQL statement has been executed and stores it in memory in the form of key-value, where Key is the query statement and Value is the result set. If the cache key is matched, it is directly returned to the client. If the cache key is not matched, subsequent operations are performed and the result is cached for the convenience of the next call. Of course, the actual execution of the cache query will verify the user’s permission, whether there is a query condition of the table.

Why was MySQL8.0 removed?

Because query cache invalidation can be very frequent in real business scenarios, if you update a table, all the query cache on that table will be cleared. For data that is infrequently updated, caching is fine. Therefore, query caching is generally not recommended in most cases.

MySQL 8.0 has removed the cache feature, which is not used in practical scenarios.

3. Analyzer (lexical analysis + grammar analysis)

If MySQL does not hit the cache, then it will enter the parser. The parser is mainly used to analyze the SQL statement for what purpose.

The first step, lexical analysis, a SQL statement has multiple strings, the first to extract keywords, such as SELECT, put forward the query table, put forward the field name, put forward the query conditions and so on. Once you’ve done this, you’ll move on to step 2.

The second step, syntax analysis, mainly is to determine whether you input SQL is correct, whether the syntax of MySQL.

With these two steps done, MySQL is ready to execute, but how and what is the best result? This is where the optimizer comes in.

4. Optimizer (select the solution that MySQL considers best to execute)

The optimizer’s job is to execute what it thinks is the best (and sometimes not the best, as described in a future article “Slow SQL Analysis”), such as how to select an index for multiple indexes, how to select an association order for multiple table queries, and so on.

You can say that after going through the optimizer you can say that exactly how this statement should be executed has been decided.

5. Actuator (start execution)

If the user does not have the permission, an error message is displayed. If the user does have the permission, the interface of the engine is invoked to return the result of the interface execution.

So much for the introduction, here's a picture to understand the process

3. Statement analysis

The above mentioned so much, is to introduce a general step, so exactly how a SQL statement is executed? In fact, SQL statements can be roughly divided into two kinds, one is query statement, one is update statement (increase, update, delete).

1. Query statements

select * from tb_student  A where A.age='18' and A.name='Joe';
Copy the code

With the above description, let’s analyze the specific execution flow of this statement:

  • Check whether the statement has permission, if not, directly return an error message, if yes, before MySQL8.0, the first query cache, use this SQL statement as the key to query whether there is a result in memory, if there is a direct cache, if not, go to the next step.
  • Select table tb_student from tb_student; select table tb_student from tb_student; Then determine whether the SQL statement has syntax errors, such as whether the keyword is correct, etc., if the check is ok, proceed to the next step.
  • Next, the optimizer determines the execution plan. The above SQL statement can have two execution plans:

A. Select student whose name is “Zhang SAN” from student table, then check whether age is 18. Select * from student where age is 18 and then select * from student whose name is “John”.

The optimizer then chooses the solution that performs the best according to its optimization algorithm (sometimes not necessarily the best, according to the optimizer). Then confirm the execution plan and be ready to execute.

  • For permission verification, if there is no permission will return an error message, if there is permission will call the database engine interface, return engine execution results.

Since the update statement is designed to log operations, I’ll start with logging.

2. Log

Binlogs are available in MySQL Server. The logical logs

This section describes the SQL statements that record user operations on the database

The binlog is a logical log that records the original logic of the statement in binary format. The binlog does not have the crash-safe capability.

The statement format records the contents of the SQL statement, and the row format records the contents of the row, both before and after the update.

When sync_binlog is set to 1, the binlog of each transaction is persisted to disk. You are advised to set this parameter to 1 to ensure that the binlog will not be lost after MySQL restarts abnormally.

Because of its archival function, binlog is primarily used for master-slave synchronization and point-in-time database restoration.

Undo log

Undo Log, as its name implies, provides a rollback function, but it also provides multiple line versioning (MVCC) to ensure atomicity of transactions. In the process of data modification, a logical log is recorded in the Undo log that is the opposite of the current operation (it can be argued that when a record is deleted, the Undo log records an insert record, and vice versa, when a record is updated, it records a reverse update record). If a transaction fails due to some reasons, the undo log can be used to roll back the transaction to ensure the integrity of the transaction. Therefore, the Undo log is also necessary.

The redo log is an InnoDB engine specific log. Physical log

Changes to each page in the database are recorded and can be used to restore the physical data page after the commit (restore the data page, and only to the location of the last commit, because the change overwrites the previous one).

In MySQL, if every update operation needs to be written to the disk, and the disk also needs to find the corresponding record, and then update, the whole process of IO costs and search costs are very high. To solve this problem, the designers of MySQL have adopted the redo log to improve update efficiency.

The whole process of Logging and disk coordination is actually WAL technology in MySQL. WAL stands for write-Ahead Logging. The key point of WAL is to Write logs first and then Write disks.

In particular, when a record needs to be updated, InnoDB writes the record to the redolog and updates the buffer pool. The update is complete. At the same time, the InnoDB engine will update the operation record to disk (brush dirty pages) when appropriate (e.g. when the system is idle).

Redo log is InnoDB storage engine layer log, also weight log file, redo log is sequential, compared to update data file random write, log writing overhead is lower, significantly improve statement execution performance, increase concurrency.

No new update requests can be received between the time the redo log is full and the time the old records are erased to create new space, which may cause MySQL to stall. (It is important to set the redo log file size for a large number of concurrent systems.)

The redo log is fixed in size. For example, you can configure a set of 4 1GB files to log 4GB operations. Start at the beginning, and then cycle back to the beginning at the end.

Redo log and binlog:

  1. Redo logs are unique to InnoDB; Binlog is implemented in the Server layer of MySQL and is available to all engines.
  2. A redo log is a physical log that records changes made to a data page. The binlog is a logical log that records the original logic of the statement.
  3. Redo logs are written in a redo log cycle. Binlog can be appended. Appending means that the binlog file will be switched to the next one after being written to a certain size without overwriting the previous log.

Tips: Crash-safe indicates that after the MySQL server restarts, you can ensure that:

  • Data for all committed transactions still exists.
  • Data for all uncommitted transactions is automatically rolled back.

3. Update statements

After logging, let’s look at the update statement. Let’s look at the following statement:

UPDATE` ``test` ``SET` ``c` = `c` + 1` `WHERE` ``id` = 1;
Copy the code

Select * from c where id = 1 and add 1 to c.

What is the order of operations in this statement? Let me tell you all about it.

Operating sequence

  1. Select id=1 from engine. ID is the primary key, and the engine uses the tree search directly to find this row. If the row id=1 is already in memory, it is returned directly to the executor. Otherwise, you need to read the data into the memory from the disk and then return to the memory.

  2. The executor takes the row data returned by the engine, adds C +1 to get a new row of data, and then calls the engine interface to write the new row of data.

  3. The redo log is prepared, and the redo log is prepared.

  4. The engine says to the executor, I’m done, you can call my interface anytime to commit a transaction;

  5. The executor generates a binlog of this operation and writes the binlog to disk.

  6. The executor calls the commit transaction interface of the engine, and the engine changes the redo log to commit.

Take a look at the figure below to clarify the idea:

So the question is, why do redo logs have two steps, with a binlog in between?

In the figure above we can see that Redolog is divided into two phases, so why? What’s the problem if you don’t? Here we use contradiction to discuss:

  • Write to redolog and submit it, and then write to binlog. Suppose that after redolog is written, the machine is down, and the binlog is not written, and the machine restarts, because the redolog has been written, and the system restarts and restores the data through the redolog. If you want to restore the temporary database using the binlog, the temporary database will miss the update because the binlog of the temporary database is lost, and the restored data will be different from the original data. There is a master-slave inconsistency
  • Redolog does not log to redolog because the redolog does not log to redolog because redolog does not log to redolog because redolog does not log to redolog. But it’s already logged in binlog. Therefore, when the binlog is used to restore the temporary database, the data in the restored temporary database will be different from the original database.

END

MySQL > MySQL > MySQL > MySQL > MySQL > MySQL

The resources

MySQL crash-safe

Geek Time -MySQL Combat 45 talk