Byte three: detail the execution process of an SQL

preface

You can write dozens of SQL statements a day, but do you know how our system interacts with the database? How does MySQL help us store data and manage transactions? . Do you feel like your mind is blank except to write a few “select * from dual”? Golden three silver four reader welfare: collate good MySQL actual combat notes, golden three silver four interview materials collection. This article will take you into the world of MySQL and give you a thorough understanding of how the system interacts with MySQL and what MySQL does when it receives a SQL statement from us.

MySQL driver

When our system is communicating with the MySQL database, it is impossible to receive and send requests for no reason. Even if you are not doing anything, there must be other “people” doing something for us. Almost any programmer who has worked with the MySQL database is familiar with the concept of a MySQL driver. Is this MySQL driver in the bottom to help us do the connection to the database, only established the connection, can have the interaction behind. Look at the graph below

In this case, before the system interacts with MySQL, the MySQL driver will set up the connection for us, and then we just need to send the SQL statement to execute CRUD. One SQL request will establish a connection, and multiple requests will establish multiple connections, so the problem comes, our system is certainly not one person in use, in other words, there must be multiple requests to compete for the connection at the same time. Our Web systems are typically deployed in Tomcat containers, and Tomcat can process multiple requests concurrently, which can cause multiple requests to set up multiple connections and then close them. What’s the problem? The following figure

The Java system uses the TCP/IP protocol to connect to the MySQL database via the MySQL driver, so if every request is to create and destroy a connection, it will cause unnecessary waste and performance degradation. That said, it is not reasonable for multithreaded requests to frequently create and destroy connections. This will degrade the performance of our system, but what if we give you a fixed number of threads to connect to, so that we don’t have to create and destroy connections repeatedly? Believe knowledgeable friends will smile, yes, that is the database connection pool.

Database connection pool: maintain a certain number of connections, easy to use the system to get the connection, use the pool to get the connection, and then put back, we do not care about the creation and destruction of the connection, do not care about the thread pool to maintain the connection.

Common database connection pool Druid, C3P0, DBCP, connection pool implementation principle here will not be discussed in depth, the use of connection pool greatly saves the cost of constantly creating and destroying threads, this is the famous “pooling” idea, whether it is thread pool or HTTP connection pool, can see its figure.

Database connection Pool

So far, we have learned that when our system accesses the MySQL database, the connection is not created on every request, but is obtained from the database connection pool, which solves the performance cost of repeatedly creating and destroying connections. But there is a small problem, the business system is concurrent, and MySQL receives requests from only one thread?

In fact, the architecture of MySQL has also provided such a pool, is also the database pool. Both parties manage connections through the database connection pool, so threads don’t have to scramble for connections and, more importantly, don’t have to repeatedly create and destroy connections.

So far the connection problem between the system and the MySQL database has been explained clearly. How and by whom are these connections handled in the MySQL database?

Network connections must be handled by threads

The students who have a little understanding of the basis of computing all know that the connection in the network is handled by threads, the so-called network connection is a request, each request will have the corresponding thread to deal with. This means that requests for SQL statements are processed by individual threads in MySQL.

How do these threads handle these requests? What kind of things?

SQL interface

The thread that processes the request in MySQL retrieves the SQL statement and gives it to the SQL interface to process.

Query parser

Suppose you now have such an SQL

SELECT stuName,age,sex FROM students WHERE id=1

But this SQL is written for us. How does the machine know what you’re talking about? This is where the parser comes in. It will parse the SQL statements passed by the SQL interface and translate them into a language that MySQL itself can understand. As for how to parse, it is nothing more than its own set of relevant rules.

Now that SQL has been parsed to what MySQL knows it to be, will the next step be execution? In theory, that’s true, but MySQL is much more powerful than that. It also helps us choose the best query path.

What is the optimal query path? Is that MySQL executes queries in the way it thinks is most efficient

How do you do that? This brings us to MySQL’s query optimizer

MySQL query optimizer

How the query optimizer within specific implementation is we don’t need to care about, I need to know is MySQL will help me to use his own thought the best way to optimize the SQL statements, and generate the whole execution plan, such as you create multiple index, MySQL is on the basis of minimum cost principle to select and use the corresponding index, The cost here mainly includes two aspects, IO cost and CPU cost

IO cost: That the cost of the load data from disk into memory, by default, the read data page I/o cost is 1, MySQL is read data in the form of pages, namely when some data are used, can not only read the data, and will put the data of the adjacent data is read into memory together, this is the famous program locality principle, So MySQL is going to read a whole page at a time, and the cost per page is 1. So the cost of IO is mainly related to the size of the page

CPU cost: The cost of CPU operations such as checking for conditions and sorting after reading data into memory, which is obviously related to the number of rows. By default, the cost of detecting records is 0.2.

The MySQL optimizer will calculate the index with the lowest “IO cost + CPU” cost to execute

After the optimizer picks the best index, it calls the storage engine interface to start executing the SQL statements that have been parsed and optimized by MySQL

The storage engine

The query optimizer calls the storage engine’s interface to execute the SQL, meaning that the actual execution of the SQL is done in the storage engine. Data is stored either in memory or on disk (the storage engine is a very important component, more on this later)

actuator

The executor is a very important component, because the operations of the preceding components must eventually be performed by invoking the storage engine interface through the executor. The executor eventually calls the storage engine’s interfaces according to a series of execution plans to complete the SQL execution

First introduction to the storage engine

Let’s illustrate this with an updated SQL statement, as follows

UPDATE students SET stuName = 1 WHERE id = 1

When our system sends such a query to MySQL, MySQL will invoke the storage engine through the executor to execute the query according to a series of processes described above, and the flowchart is the one above. When the SQL statement is executed, the data is either in memory or on disk. If the SQL statement is executed directly on disk, the speed of random I/O reads and writes is not acceptable. Therefore, every time the SQL statement is executed, the data is loaded into memory. This memory is a very important component in InnoDB: the Buffer Pool

Buffer Pool

The Buffer Pool is a very important memory structure in InnoDB storage engine. As the name implies, the Buffer Pool actually acts as a Buffer like Redis, because we all know that MySQL data is ultimately stored on disk. If we don’t have this Buffer Pool, every request to the database will be looked up on disk, so there will be IO operations, which is unacceptable. However, with the Buffer Pool, the query results will be stored in the Buffer Pool for the first time, so that the subsequent requests will be queried from the Buffer Pool first, and if there is no disk search, then put into the Buffer Pool, as shown in the figure below

Following the above diagram, the execution steps for this SQL statement look something like this

The InnoDB storage engine checks the buffer pool to see if the data with id=1 exists
If it doesn’t exist, the disk loads are removed and stored in the buffer pool
An exclusive lock will be placed on the record. (This mechanism will not be discussed in this article, but will be discussed in a future article.)

Undo log file: records the appearance of data before modification

“Undo” means “not done” or “not done”. An undo log is a log of something that didn’t happen (what was it originally?)

We just said that when a statement is loaded into the Buffer pool, it is loaded into the Buffer pool. In fact, when the statement is loaded into the Buffer pool, a log is inserted into the undo log file. So I’m just going to write down the original value of this record where ID is equal to 1.

What is the purpose of this?

Innodb’s storage engine supports transactions. If the update fails, that is, the transaction fails to commit, then all operations in the transaction must be rolled back to the way they were before execution. This means that when the transaction fails, the original data will not be affected

As an extra word, MySQL is also a system, just like the functional system of Java we usually develop. MySQL uses its own corresponding language to develop a set of systems. It designs corresponding functions according to its own needs. Then it must have been defined that way by the designers or evolved according to the actual scene. So let’s calm down and get familiar with MySQL as a system.

At this point, our SQL statement has been loaded into the Buffer Pool, and we start to update the statement. The update operation is actually performed in the Buffer Pool, and the problem arises. According to our theory, when the data in the Buffer Pool is inconsistent with the data in the database, If the data in the Buffer Pool is dirty, then the data in the Buffer Pool is dirty. MySQL > select * from Buffer Pool where Buffer Pool is dirty; MySQL > select * from Buffer Pool where Buffer Pool is dirty; MySQL > select * from Buffer Pool where Buffer Pool is dirty

Redo log files: records data that has been modified

Except for loading files from disk and saving pre-operation records to undo log files, all other operations are performed in memory, and data in memory is characterised by loss during power failure. If the MySQL server is down, all data in the Buffer Pool will be lost. This is where the redo log file is needed

Voice-over: The redo log file is InnoDB specific. It is stored engine level, not MySQL level

Update students set stuName=’ SQL ‘where id=1; update students set stuName=’ SQL’; The operation is recorded in the redo log buffer. What? MySQL keeps the redo log buffer in memory and persists it to disk at some point in time.

By now, we are familiar with how MySQL’s executor call storage engine loads an SQL into the buffer pool and logs what it does. The process is as follows:

Prepare to update an SQL statement
MySQL (InnoDB) will look for the data in the BufferPool first, if it doesn’t find it, it will look in the disk, and if it finds it, it will load the data into the BufferPool
When the Buffer Pool is loaded, the original record of the data is saved to the undo log file
Innodb will update the Buffer Pool
The updated data is recorded in the redo log buffer

All the steps mentioned above are performed under normal conditions, but the design and optimization of the program are not only done for these normal conditions, but also for those critical areas and extreme cases

If the server is down, the data in the cache is still lost. Really annoying, actually the data is always lost, that can not put in memory, save directly to disk? Obviously not, because as mentioned above, the purpose of operating in memory is to improve efficiency.

At this point, if MySQL does go down, it doesn’t matter, because MySQL will consider the transaction as a failure, so the data will still be as it was before the update, and there will be no impact.

If the transaction is successfully committed, the last change will be saved to the database, and there will still be other related operations before the transaction is committed

To persist the redo Log Buffer to disk, write the redo Log Buffer to the redo Log disk. The strategy for writing redo log Buffer data to disk is to flush the redo log Buffer to disk immediately (detailed in the following section), as shown in the figure above

If the database server goes down after the redo log Buffer is flushed to disk, what happens to our updated data? At this point the data is in memory, the data is not lost? No, no data will be lost this time, because the redo log buffer has been written to disk and is persisted, even if the database is down. MySQL will restore the redo log file to the Buffer Pool on the next restart. (Redis will check for RDB or Aof or both) Restore data to memory based on persistent files)

So far, what does calling the storage engine interface from the actuator do?

1. Prepare to update an SQL statement

MySQL (Innodb) will first look for the data in the BufferPool, if not, it will look in the disk, if found, it will load the data

Adding the data to the BufferPool 3. When loading the data to the BufferPool, the system saves the original record of the data to the undo log file

4. Innodb will update the Buffer Pool

5. The updated data is recorded in the redo log buffer

— This is what has already been summarized

6. Innodb_flush_log_at_trx_commit specifies whether to write data from the redo log buffer to the redo log file

The value 0 indicates that the disk is not flushed

The value 1 indicates that the disk is flushed immediately

A value of 2 indicates that the OS cache is flushed first

7. Myslq restart restores redo log to buffer pool

So far, InnoDB has done a lot of things when MySQL’s executer calls the storage engine’s interface to execute the SQL provided by execution plan, but that’s not all. The bin log file of the MySQL level needs to be introduced next

Bin log Log file: records the entire operation process

The redo log described above is an InnoDB storage engine specific log file, while the bin log is a MySQL level log file. The redo log records things that tend to be physical, such as “what data was changed, what changes were made.” SQL > select * from students where id = 1; SQL > select * from students where id = 1;

How do bin log files flush to disk?

The policy can be modified by sync_bin log. The default value is 0, which means that data is written to the OS cache first. That is, data is not directly written to the disk when a transaction is committed. Therefore, you are advised to set sync_bin log to 1 to directly write data to disk files.

The bin log can be flushed in the following modes

1, the STATMENT

Statement-based replication (SBR) : Each SQL statement that modifies data is recorded in the bin log

[Advantages] : You do not need to record the changes of each row, reducing the amount of bin log, saving I/O, and improving performance

[Disadvantages] : In some cases, it can lead to inconsistency between master and slave data, such as sysdate(), slepp(), etc

2, ROW

Row-based replication (RBR) does not record the context of each SQL statement, but simply records which data was modified

[Advantages] : There are no specific cases where the calls and triggers of stored procedures, functions, or triggers cannot be copied correctly

[Disadvantages] : Large number of logs are generated, especially when ALTER table is used

3, MIXED

Mixed-based replication (MBR) based on STATMENT and ROW mode. Generally, STATEMENT mode is used to save bin logs. Use ROW mode to save bin logs for operations that cannot be replicated in STATEMENT mode

So bin log is also a log file, so where does it record data?

At the time of the transaction, MySQL will not only write the redo log buffer to the redo log file, but also log the change to the bin log file. A commit marker is written at the end of the redo log, indicating that the transaction has been successfully committed.

If data is written to the bin log file, and the database stops right after writing, will the data be lost?

If there is no COMMIT mark at the end of the redo log, the transaction has failed. However, the data is not lost because it has been recorded in the redo log disk file. When MySQL restarts, data from the redo log is restored to the Buffer Pool.

Ok, so far, we’ve basically covered an update operation, but do you feel like there’s something missing? If you are aware that the update is performed only in memory, the update will be loaded into the Buffer Pool even after the shutdown. If you are aware that the update is performed only in memory, the update will be loaded into the Buffer Pool, and the MySQL database will still have the old value. So what do you do?

MySQL will have a background thread that will flush dirty data from the Buffer Pool to the MySQL database at some point.

This paper summarizes

At this point, the Buffer Pool, Redo Log Buffer and undo Log, Redo Log, bin Log concepts and relationships are basically similar.

So let’s go back to that

The Buffer Pool is a very important component of MySQL, because the Buffer Pool is where all the additions, deletions, and changes to the database are done
The Undo log records the data before the operation
The Redo log is a log of data that has been manipulated (the Redo log is unique to Innodb storage engines).
The bin log records the entire operation (which is very important for master-slave replication)

A description of the process from preparing to update a piece of data to committing a transaction

First, the executor queries the data according to the MySQL execution plan. First, the executor queries the data from the cache pool. If the data is not found, the executor queries it from the database
When data is cached to the cache pool, it is written to the undo log log file
Updates are done in the BufferPool and the updated data is added to the redo log buffer
Once you’re done, you can commit the transaction, doing three things at the same time
(first thing) Flush the redo log buffer into the redo log file
(Second thing) Write the operation record to the bin log file
3. Add the commit mark at the end of the redo log

This point indicates that the entire update transaction has completed

conclusion

This is the end of the article, the system is how to deal with the MySQL database, submit an updated SQL statement to MySQL, what processes MySQL executes, what things do from the macro have been explained. More details about the Buffer Pool will be explained in a later article.