preface

You can write dozens of SQL statements a day, but do you know how our system interacts with the database? How does MySQL help us store data and manage transactions? . Do you feel like your mind is blank except to write a few “select * from dual”? This article will take you into the world of MySQL and give you a thorough understanding of how the system interacts with MySQL and what MySQL does when it receives a SQL statement from us.

MySQL driver

When our system is communicating with the MySQL database, it is impossible to receive and send requests for no reason. Even if you are not doing anything, there must be other “people” doing something for us. Almost any programmer who has worked with the MySQL database is familiar with the concept of a MySQL driver. Is this MySQL driver in the bottom to help us do the connection to the database, only established the connection, can have the interaction behind. Look at the graph below

In this case, before the system interacts with MySQL, the MySQL driver will set up the connection for us, and then we just need to send the SQL statement to execute CRUD. One SQL request will establish a connection, and multiple requests will establish multiple connections, so the problem comes, our system is certainly not one person in use, in other words, there must be multiple requests to compete for the connection at the same time. Our Web systems are typically deployed in Tomcat containers, and Tomcat can process multiple requests concurrently, which can cause multiple requests to set up multiple connections and then close them. What’s the problem? The following figure

!

The Java system uses the TCP/IP protocol to connect to the MySQL database via the MySQL driver, so if every request is to create and destroy a connection, it will cause unnecessary waste and performance degradation. That said, it is not reasonable for multithreaded requests to frequently create and destroy connections. This will degrade the performance of our system, but what if we give you a fixed number of threads to connect to, so that we don’t have to create and destroy connections repeatedly? Believe knowledgeable friends will smile, yes, that is the database connection pool.

The database connection pool maintains a certain number of connections to facilitate the system to obtain connections, use the pool to obtain, and when used up, put it back. We do not need to care about the creation and destruction of connections, nor do we need to care about the thread pool to maintain these connections.

! [] (Common database connection pools are as followsDruid, C3P0, DBCPThe implementation principle of connection pooling is not discussed in detail here. The use of connection pooling greatly reduces the overhead of constantly creating and destroying threads. This is known as “pooling” and can be seen in both thread pools and HTTP connection pools.

Database connection Pool

So far, we have learned that when our system accesses the MySQL database, the connection is not created on every request, but is obtained from the database connection pool, which solves the performance cost of repeatedly creating and destroying connections. The business system is concurrent, but the MySQL database accepts only one request.

In fact, the architecture of MySQL has also provided such a pool, is also the database pool. Both parties manage connections through the database connection pool, so threads don’t have to scramble for connections and, more importantly, don’t have to repeatedly create and destroy connections.

So far the connection problem between the system and the MySQL database has been explained clearly. How and by whom are these connections handled in the MySQL database?

Network connections must be handled by threads

The students who have a little understanding of the basis of computing all know that the connection in the network is handled by threads, the so-called network connection is a request, each request will have the corresponding thread to deal with. This means that requests for SQL statements are processed by individual threads in MySQL.

How do these threads handle these requests? What kind of things?

SQL interface

The thread that processes the request in MySQL retrieves the SQL statement and gives it to the SQL interface to process.

Query parser

Suppose you now have such an SQL

SELECT stuName,age,sex FROM students WHERE id=1
Copy the code

But this SQL is written for us. How does the machine know what you’re talking about? What’s more, the machine only knows zeros and ones. This is where the parser comes in. It will parse the SQL statements passed by the SQL interface and translate them into a language that MySQL itself can understand. As for how to parse, it is nothing more than a set of relevant rules.

Now that SQL has been parsed to what MySQL knows it to be, will the next step be execution? In theory, that’s true, but MySQL is much more powerful than that. It also helps us choose the best query path.

What is the optimal query path that MySQL will execute the query in the way it thinks is most efficient

How do you do that? This brings us to MySQL’s query optimizer

MySQL query optimizer

How the query optimizer within specific implementation is we don’t need to care about, I need to know is MySQL will help me to use his own thought the best way to optimize the SQL statements, and generate the whole execution plan, such as you create multiple index, MySQL is on the basis of minimum cost principle to select and use the corresponding index, The cost here mainly includes two aspects, IO cost and CPU cost

IO cost: That the cost of the load data from disk into memory, by default, the read data page I/o cost is 1, MySQL is read data in the form of pages, namely when some data are used, can not only read the data, and will put the data of the adjacent data is read into memory together, this is the famous program locality principle, So MySQL is going to read a whole page at a time, and the cost per page is 1. So the cost of IO is mainly related to the size of the page

CPU cost: The cost of CPU operations such as checking for conditions and sorting after reading data into memory, which is obviously related to the number of rows. By default, the cost of detecting records is 0.2.

The MySQL optimizer will calculate the index with the lowest “IO cost + CPU” cost to execute

Voiceover: How to calculate the index cost, please refer to this article

After the optimizer picks the best index, it calls the storage engine interface to start executing the SQL statements that have been parsed and optimized by MySQL

The storage engine

The query optimizer calls the storage engine’s interface to execute the SQL, meaning that the actual execution of the SQL is done in the storage engine. Data is stored either in memory or on disk (the storage engine is a very important component, more on this later)

This article we first have a general understanding of the storage engine can be. The following special article will be introduced in detail.

actuator

The executor is a very important component, because the operations of the preceding components must eventually be performed by invoking the storage engine interface through the executor. The executor eventually calls the storage engine’s interfaces according to a series of execution plans to complete the SQL execution

First introduction to the storage engine

Let’s illustrate this with an updated SQL statement, as follows

UPDATE students SET stuName = 'jack' WHERE id = 1
Copy the code

When our system sends such a query to MySQL, MySQL will invoke the storage engine through the executor to execute the query according to a series of processes described above, and the flowchart is the one above. When the SQL statement is executed, the data is either in memory or on disk. If the SQL statement is executed directly on disk, the speed of random I/O reads and writes is not acceptable. Therefore, every time the SQL statement is executed, the data is loaded into memory. This memory is a very important component in InnoDB: the Buffer Pool

Buffer Pool

The Buffer Pool is a very important memory structure in InnoDB storage engine. As the name implies, the Buffer Pool actually acts as a Buffer like Redis, because we all know that MySQL data is ultimately stored on disk. If we don’t have this Buffer Pool, every request to the database will be looked up on disk, so there will be IO operations, which is unacceptable. However, with the Buffer Pool, the query results will be stored in the Buffer Pool for the first time, so that the subsequent requests will be queried from the Buffer Pool first, and if there is no disk search, then put into the Buffer Pool, as shown in the figure below

Following the above diagram, the execution steps for this SQL statement look something like this

  1. The InnoDB storage engine checks the buffer pool to see if the data with id=1 exists
  2. If it doesn’t exist, the disk loads are removed and stored in the buffer pool
  3. An exclusive lock will be placed on the record. (This mechanism will not be discussed in this article, but will be discussed in a future article.)

Undo log file: records the appearance of data before modification

“Undo” means “not done” or “not done”. An undo log is a log of something that didn’t happen (what was it originally?)

We just said that when a statement is loaded into the Buffer pool, it is loaded into the Buffer pool. In fact, when the statement is loaded into the Buffer pool, a log is inserted into the undo log file. So I’m just going to write down the original value of this record where ID is equal to 1.

What is the purpose of this?

Innodb’s storage engine supports transactions. If the update fails, that is, the transaction fails to commit, then all operations in the transaction must be rolled back to the way they were before execution. This means that when the transaction fails, the original data will not be affected

Here to say an extra word, MySQL is also a system, just like the functional system of Java we usually develop, MySQL is a set of system developed by its own corresponding language, it designs the corresponding function according to its own needs, and it can do what things. Then it must have been defined that way by the designers or evolved according to the actual scene. So let’s calm down and get familiar with MySQL as a system.

At this point, our SQL statement has been loaded into the Buffer Pool, and we start to update the statement. The update operation is actually performed in the Buffer Pool, and the problem arises. According to our theory, when the data in the Buffer Pool is inconsistent with the data in the database, If the data in the Buffer Pool is dirty, then the data in the Buffer Pool is dirty. MySQL > select * from Buffer Pool where Buffer Pool is dirty; MySQL > select * from Buffer Pool where Buffer Pool is dirty; MySQL > select * from Buffer Pool where Buffer Pool is dirty

Redo log files: records data that has been modified

Except for loading files from disk and saving pre-operation records to undo log files, all other operations are performed in memory, and data in memory is characterised by loss during power failure. If the MySQL server is down, all data in the Buffer Pool will be lost. This is where the redo log file is needed

Voice-over: The redo log file is InnoDB specific. It is stored engine level, not MySQL level

Update students set stuName=’ stun ‘where id=1; update students set stuName=’ stun’ where id=1; The operation is recorded in the redo log buffer. What? MySQL keeps the redo log buffer in memory and persists it to disk at some point in time.

By now, we are familiar with how MySQL’s executor call storage engine loads an SQL into the buffer pool and logs what it does. The process is as follows:

  1. Prepare to update an SQL statement
  2. MySQL (InnoDB) will look for the data in the BufferPool first, if it doesn’t find it, it will look in the disk, and if it finds it, it will load the data into the BufferPool
  3. When the Buffer Pool is loaded, the original record of the data is saved to the undo log file
  4. Innodb will update the Buffer Pool
  5. The updated data is recorded in the redo log buffer

All the steps mentioned above are performed under normal conditions, but the design and optimization of the program are not only done for these normal conditions, but also for those critical areas and extreme cases

If the server is down, the data in the cache is still lost. Really annoying, actually the data is always lost, that can not put in memory, save directly to disk? Obviously not, because as mentioned above, the purpose of operating in memory is to improve efficiency.

At this point, if MySQL does go down, it doesn’t matter, because MySQL will consider the transaction as a failure, so the data will still be as it was before the update, and there will be no impact.

If the transaction is successfully committed, the last change will be saved to the database, and there will still be other related operations before the transaction is committed

To persist the redo Log Buffer to disk, write the redo Log Buffer to the redo Log disk. The strategy for writing redo log Buffer data to disk is to flush the redo log Buffer to disk immediately (detailed in the following section), as shown in the figure above

If the database server goes down after the redo log Buffer is flushed to disk, what happens to our updated data? At this point the data is in memory, the data is not lost? No, no data will be lost this time, because the redo log buffer has been written to disk and is persisted, even if the database is down. MySQL will restore the redo log file to the Buffer Pool on the next restart. (Redis will check for RDB or Aof or both) Restore data to memory based on persistent files)

So far, what does calling the storage engine interface from the actuator do?

MySQL (Innodb) will look for the data in the BufferPool first, if it is not found, it will look in the disk, if it is found, it will load the data into the BufferPool 3. When loading the Buffer Pool, innodb will save the original record of this data to the undo log file. 4. Innodb will update the Buffer Pool 5. The updated data is recorded in the redo log buffer

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — to this is the summarize of — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

6.MySQL commit transaction The data in the redo log buffer will be written to the redo log file. The innodb_flush_log_at_trx_commit parameter can be used. The value 0 means no flushing OS cache 7. Myslq will restore redo logs to the buffer pool upon restart

Up until now, InnoDB has done almost everything when MySQL’s executer calls the storage engine’s interface to execute the SQL provided by the execution plan, but that’s not all. The bin log file of the MySQL level needs to be introduced next

Bin log Log file: records the entire operation process

The redo log described above is an InnoDB storage engine specific log file, while the bin log is a MySQL level log file. The redo log records things that tend to be physical, such as “what data was changed, what changes were made.” SQL > select * from students where id = 1; SQL > select * from students where id = 1;

The nature of the redo Log bin Log
The file size The size of the redo log is fixed (this can also be set in the configuration; the default is usually sufficient). Bin log You can set parametersmax_bin log_sizeSet eachbin logThe size of the file (but this is generally not recommended).
implementation redo logisInnoDBEngine layer implementation (i.e. Innodb storage caused by unique) bin logIt is implemented by the MySQL layer and can be used by all enginesbin logThe log
Record the way At the end of a redo log, the redo log returns to the beginning of a redo log. If the size of a file is larger than the specified value, subsequent logs are added to a new file
Usage scenarios redo logApplies to crash-safe (which is actually very similar to Redis persistence) bin logApplicable to master/slave replication and data recovery

** How do bin log files flush to disk? 六四事件

The policy can be modified by sync_bin log. The default value is 0, which means that data is written to the OS cache first. That is, data is not directly written to the disk when a transaction is committed. Therefore, you are advised to set sync_bin log to 1 to directly write data to disk files.

The bin log can be flushed in the following modes

1, the STATMENT

Statement-based replication (SBR) : Each SQL statement that modifies data is recorded in the bin log

[Advantages] : You do not need to record the changes of each row, reducing the amount of bin log, saving I/O, and improving performance

[Disadvantages] : In some cases, it can lead to inconsistency between master and slave data, such as sysdate(), slepp(), etc

2, ROW

Row-based replication (RBR) does not record the context of each SQL statement, but simply records which data was modified

[Advantages] : There are no specific cases where the calls and triggers of stored procedures, functions, or triggers cannot be copied correctly

[Disadvantages] : Large number of logs are generated, especially when ALTER table is used

3, MIXED

Mixed-based replication (MBR) based on STATMENT and ROW mode. Generally, STATEMENT mode is used to save bin logs. Use ROW mode to save bin logs for operations that cannot be replicated in STATEMENT mode

So bin log is also a log file, so where does it record data?

At the time of the transaction, MySQL will not only write the redo log buffer to the redo log file, but also log the change to the bin log file. A commit marker is written at the end of the redo log, indicating that the transaction has been successfully committed.

If data is written to the bin log file, and the database stops right after writing, will the data be lost?

If there is no COMMIT mark at the end of the redo log, the transaction has failed. However, the data is not lost because it has been recorded in the redo log disk file. When MySQL restarts, data from the redo log is restored to the Buffer Pool.

Ok, so far, we’ve basically covered an update operation, but do you feel like there’s something missing? If you are aware that the update is performed only in memory, the update will be loaded into the Buffer Pool even after the shutdown. If you are aware that the update is performed only in memory, the update will be loaded into the Buffer Pool, and the MySQL database will still have the old value. So what do you do?

MySQL will have a background thread that will flush dirty data from the Buffer Pool to the MySQL database at some point.

This paper summarizes

At this point, the Buffer Pool, Redo Log Buffer and undo Log, Redo Log, bin Log concepts and relationships are basically similar.

1. The Buffer Pool is a very important component of MySQL. The Undo log records the data before the operation; the redo log records the data after the operation; the bin log records the data after the operation It records the entire operation (which is very important for master-slave replication)

A description of the process from preparing to update a piece of data to committing a transaction

First, the executor queries the data according to the MySQL execution plan. First, the executor queries the data from the cache pool. If no data is found, the executor queries the data from the database. When data is cached to the cache pool, it is written to undo log log file 3. Updates are done in the BufferPool and the updated data is added to the redo log buffer 4. Once you’re done, you can commit the transaction, 6. Log the redo log file to the bin log file. 7. Log the bin log file name and updates to bin Add the commit mark at the end of the redo log

This point indicates that the entire update transaction has completed

conclusion

So far, how the system works with the MySQL database, commits an updated SQL statement to MySQL, what processes MySQL executes, and what things MySQL does has been explained from the macro perspective. More details on Buffer pools will be discussed in a later article