This article will analyze the process of executing a SQL statement in MySQL, including how the SQL query will flow in MySQL, and how the UPDATE of SQL statement is completed.

I’ll walk you through the infrastructure of MySQL before analyzing it. Knowing what components MySQL is made of and what they do will help us understand and solve these problems.

MySQL infrastructure analysis

1.1 basic Architecture Overview of MySQL

The following is a brief architecture diagram of MySQL, from which you can clearly see how user SQL statements are executed within MySQL.

To help you understand this picture, I will briefly describe the basic functions of some of the components covered in the following figure, which will be covered in more detail in Section 1.2.

  • Connector: Authentication is related to permissions (when logging into MySQL).
  • Query cache: When executing a query statement, the cache is queried first (removed in MySQL 8.0 because this feature is not practical).
  • Parsers: If the cache is not hit, the SQL statement will be parsed. Parsers simply look at what your SQL statement is doing and then check whether your SQL statement is syntactically correct.
  • Optimizer: Execute whatever MySQL thinks is best.
  • Executor: Executes statements and returns data from the storage engine.

In brief, MySQL is mainly divided into Server layer and storage engine layer:

  • Server layer: mainly includes connectors, query cache, analyzers, optimizers, actuators, etc. All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, functions, etc. There is also a general log module binglog log module.
  • Storage engine: mainly responsible for data storage and reading, using a plug-in architecture that can be replaced, support InnoDB, MyISAM, Memory and other storage engines, among which InnoDB engine has its own log module Redolog module. The most commonly used storage engine is InnoDB, which has been used as the default storage engine since MySQL 5.5.5.

1.2 Basic Components of the Server Layer

1) the connector

Connectors are primarily concerned with authentication and permit-related functions, much like a high-level gatekeeper.

Is mainly responsible for database user login, user authentication, including checking account password, permissions and other operations, if the user account password has passed, the connector will query the user to access list all permissions, in this connection the permissions after logic are will depend on the read permissions on the data at this time, that is to say, later as long as this connection is open, Even if the administrator changes the rights of the user, the user is not affected.

2) Query cache (MySQL 8.0 removed)

The query cache is mainly used to cache the SELECT statement we execute and the result set of that statement.

After the connection is established, the MySQL database will check whether the SQL statement has been executed in the cache. The SQL statement is cached in the memory in the form of key-value, where Key is the query expectation and Value is the result set. If the cache key is matched, it is directly returned to the client. If the cache key is not matched, subsequent operations are performed and the result is cached for the convenience of the next call. Of course, the actual execution of the cache query will verify the user’s permission, whether there is a query condition of the table.

MySQL queries are not recommended because query cache invalidation can be very frequent in real business scenarios. If you update a table, all the query cache on that table will be cleared. For data that is infrequently updated, caching is fine.

Therefore, query caching is generally not recommended in most cases.

MySQL 8.0 has removed the cache feature, which is not used in practical scenarios.

3) analyzer

If MySQL does not hit the cache, then it will enter the parser. The parser is mainly used to analyze the SQL statement for what purpose.

The first step, lexical analysis, a SQL statement has multiple strings, the first to extract keywords, such as SELECT, put forward the query table, put forward the field name, put forward the query conditions and so on. Once you’ve done this, you’ll move on to step 2.

The second step, syntax analysis, mainly is to determine whether you input SQL is correct, whether the syntax of MySQL.

With these two steps done, MySQL is ready to execute, but how and what is the best result? This is where the optimizer comes in.

4) optimizer

The optimizer’s job is to execute what it thinks is the best (and sometimes not the best, this article will cover this in depth), such as index selection for multiple indexes, association order for multiple table queries, and so on.

You can say that after going through the optimizer you can say that exactly how this statement should be executed has been decided.

5) actuators

If the user does not have the permission, an error message is displayed. If the user does have the permission, the interface of the engine is invoked to return the result of the interface execution.

2. Statement Analysis

2.1 Query Statement

With all that said, how exactly does an SQL statement execute? In fact, our SQL can be divided into two kinds, one is query, one is update (add, update, delete). Let’s first analyze the query statement, which is as follows:

select * from tb_student  A where A.age='18' and A.name='Joe';
Copy the code

With the above description, let’s analyze the execution flow of this statement:

Check whether the statement has permission, if not, directly return an error message, if yes, before MySQL8.0, the first query cache, use this SQL statement as the key to query whether there is a result in memory, if there is a direct cache, if not, go to the next step.

Select * from tb_student where id= ‘1’; select * from tb_student where id= ‘1’; Then determine whether the SQL statement has syntax errors, such as whether the keyword is correct, etc., if the check is ok, proceed to the next step.

Next, the optimizer determines the execution plan. The above SQL statement can have two execution plans:

A. Select a student whose name is "Zhang SAN" from the student table, and then judge whether the age is18. B. Find out the middle age of the students first18Then query the student whose name is "Zhang SAN".Copy the code

The optimizer then chooses the solution that performs the best according to its optimization algorithm (sometimes not necessarily the best, according to the optimizer). Then confirm the execution plan and be ready to execute.

For permission verification, if there is no permission will return an error message, if there is permission will call the database engine interface, return engine execution results.

2.2 Update Statement

The above is a query SQL execution flow, so let’s see how an update statement is executed? The SQL statement is as follows:

update tb_student A set A.age='the' where A.name='Joe';
Copy the code

Let’s change the age of Zhang SAN. The age field will not be set in the actual database, otherwise it will be hit by the technical person in charge. MySQL has a binary log module that can be used by all storage engines. MySQL has a binary log module that can be used by all storage engines. Our common InnoDB engine also has a log module called redo log. Let’s explore the execution process of this statement in InnoDB mode. The process is as follows:

  • The first query to zhang SAN this piece of data, if there is a cache, also will use cache.
  • InnoDB takes the query, changes the age to 19, calls the engine API, and writes this line of data. InnoDB stores the data in memory and logs the redo log. The redo log is prepared and tells the executor that it’s done, You can submit it at any time.
  • After receiving the notification, the executor logs the binlog and calls the engine interface to submit the redo log as the commit state.
  • Update complete.

I’m sure some of you will ask, why two log modules? Why not one log module?

This is because MySQL didn’t have an InnoDB engine at first. MySQL has a MyISAM engine, but we know that redo log is unique to the InnoDB engine. Other storage engines do not have the capability of crash-safe (the capability of crash-safe does not lose previously committed records even if the database is abnormally restarted). Binlog logs can only be used for archiving.

This is not to say that a single log module is not possible, but the InnoDB engine supports transactions through redo logs. So, again, I use two log modules, but don’t make it so complicated. Why did redo log introduce prepare commit state? So let’s use contradiction here to explain why we’re doing this, right?

  • If the machine hangs up and the redo log is not written to the binlog, the machine recovers data from the redo log after the restart. However, bingog does not record the data. This data will be lost during the subsequent machine backup and the master/slave synchronization.
  • Write binlog first, and then write the redo log, assuming that finished binlog, exception to restart the machine, since there is no redo log, this machine is unable to restore the record, but binlog and recorded, so for the same reason as above, can produce inconsistent data.

If the redo log is committed in two stages, the binglog is written and the redo log is committed. This prevents the above problems and ensures data consistency. So the question is, is there an extreme case? Suppose the redo log is pre-committed and the binglog has been written. What happens if an abnormal restart occurs? This is dependent on the MySQL processing mechanism, MySQL processing as follows:

  • Determine if the redo log is complete, and if so, commit it immediately.
  • If the redo log is pre-committed but not in the commit state, the binlog is determined to be complete. If the redo log is complete, the transaction is committed. If the redo log is incomplete, the transaction is rolled back.

This solves the problem of data consistency.

Three summary

  • MySQL is mainly divided into Server layer and engine layer. Server layer mainly includes connector, query cache, analyzer, optimizer, executor, and a log module (binlog), which can be shared by all execution engines. Redolog has only InnoDB.
  • The engine layer is plug-in type, including MyISAM,InnoDB,Memory and so on.
  • SQL execution processes are classified into two types. One is for query processes: Permission verification – Query cache – Analyzer – Optimizer – Permission Verification – Executor – Engine
  • Perform the following operations for statements such as updates: Analyzer —- Permission verification —- Executor – Engine – Redo log prepare – Binlog – Redo log commit