InnoDB architecture in 5 minutes

The introduction

As a back-end programmer, we have to deal with databases almost every day, there are many databases on the market, such as: Mysql, Oracle, SqlServer and so on, so we write the program is how to connect with the database? That is database drivers, different databases correspond to different database drivers. When we connect to the database, we first register the database driver, and then establish a connection with the database based on the database address, user name, password and other information. If you use Maven to manage a project, you will typically see the following configuration:

<dependency>
  <groupId>mysql</groupId>
  <artifactId>mysql-connector-java</artifactId>
  <version>8.024.</version>
</dependency>
Copy the code

You can then use maven to access the MySQL driver JAR package in your project. So what does the database do when it receives a request? Next, I’ll elaborate with the MySQL database.

1. Overall architecture of Mysql database

Generally, we know that after the web project is developed, we can put the project file into a WAR package, and then publish it through the Tomcat container, and finally users can access our system. As we all know, Tomcat supports concurrent access. When multiple requests need to operate the database at the same time, is it multiple requests to preempt a database connection? Surely not, how inefficient would it be to create a connection for every request and then destroy the connection after the request is finished? That’s certainly not true, and frequent connection creation and destruction is certainly not a performance concern. How does Tomcat solve this problem? Remember when we talked about thread pools and the idea of “pooling”? Yes, Tomcat has a database connection pool, so the same database server has a corresponding database connection pool, as shown in the following figure.

SQL interface

When a request arrives in the database, it is detected by a listening thread and forwarded to the SQL interface for processing. The SQL interface is used to execute SQL statements such as add, delete, change, and query.

The parser

So the SQL interface passes the SQL statement to the parser. The query parser is responsible for parsing the SQL statement. That is, it parses the SQL statement according to the given SQL syntax and understands the operation that the SQL statement is trying to do.

The optimizer

Once the parser understands what the SQL statement needs to do, it then uses the optimizer to choose what it thinks is the best path. In general, there is not only one path to achieve a certain result. For example, to query the values of two fields f1 and F2 in table T that meet condition C, there can be at least two paths:

Firstly, all data rows that meet condition C are screened out from table T, and then the values of fields F1 and F2 are selected as the result set.
Firstly, all f1 and F2 values are selected, and then the data rows that meet the conditions are screened according to condition C to form the result set.

The optimizer gets what it thinks is the optimal query path based on different strategies.

actuator

When the optimizer selects the optimal query path, it can not get the result we finally want, so it needs to use the executor. The function of the executor is to generate a set of execution plans according to the optimal query path selected by the optimizer, and then continuously call the interface provided by the database storage engine to complete the execution plan of SQL statements.

The storage engine

Databases typically store data in one of two places: memory or disk. So when we query data, does the executor need to query to disk or memory? How is the query done in memory? How to query information in disk? The memory capacity is limited. What if there is no spare memory space? MySQL provides storage engines: InnoDB,MyISAM,MEMORY, etc. The most common ones are InnoDB and MyISAM. You can check out the storage engine of the current MySQL database by using the show Engines command. This series will focus on the InnoDB storage engine.

To sum up, a complete SQL statement execution process is shown in the figure below

2. InnoDB storage engine architecture

How does the InnoDB storage engine work if an SQL statement passes through the above process to the interface where the executor calls InnoDB storage engine?

Memory buffer pool

The first important component of InnoDB’s storage engine is the memory Buffer Pool, or Buffer Pool. This is an area of memory that stores a large amount of data for performing queries, updates, and other operations. The purpose of this is to improve the efficiency of SQL statement execution, so it is clear that our query, update and other operations are completed in the Buffer Pool (whether the data exists in the Buffer Pool or not, if it exists, directly operation, If it does not exist, load it from disk to Buffer Pool first.

Undo log Log file

Those of you who are familiar with databases know that when we update data, we usually operate in a transaction. If the operation succeeds, the transaction is committed. If the operation fails, the transaction is rollback. The rollback is implemented using undo log. (Once I was asked, I was nervous and didn’t think of it for a while.) .

Autocommit = 0 set autocommit = 0 set autocommit = 0 If you are interested, try to feel it.

Redolog log file

As mentioned above, the update operation is performed in the Buffer Pool, that is, in memory. If MySQL crashes after the update operation, the data in memory will be lost. To solve this problem, the InnoDB architecture has a redo log that records what data you modify. If MySQL is down, you can use the redo log to restore data after the restart. However, redo logs are written to the redo log buffer in memory first, and are not persisted to disk. Therefore, the risk of data loss still exists. For example, innodb_flush_log_at_trx_commit=1 indicates that transaction commit logs are flushed to disk immediately. There is no risk of data loss, but performance will definitely suffer. You can set policies based on service requirements.

Binlog Log file

A binlog is also called an archived log. Unlike the redo log, which is unique to mysql Server and not InnoDB, a binlog restores data at a point in time, primary/secondary synchronization, etc. Archived logs are usually committed when a transaction is committed. Archived logs also have several flushing strategies, using sync_binlog to control flushing after several transaction commits. The special sync_binlog=0 means that the operating system controls the flush timing, not Mysql.

InnoDB executes the flow

After introducing several components of the InnoDB storage engine, suppose you now need to update a piece of data. What would the execution flow look like in InnoDB? As follows:

If the data does not exist in the Buffer Pool, random I/O reads the data from the disk and puts the data into the Buffer Pool.
Write undo log to roll back data.
Update Buffer Pool data;
Write redo log to redo log buffer for data recovery.
Prepare commit transactions. Redo log is prepared to flush to disk based on the policy.
Prepare to commit transaction, binlog is prepared to flush to disk based on policy;
Write binlog and commit flags to redo log files.
Commit transaction;
The background I/O thread inputs dirty data from the Buffer Pool to the disk. (Because only the logs in the Buffer Pool are modified, the data in the disk is not modified, so the data in the Buffer Pool is dirty data for the disk data.)

The process is as follows:

The above is the overall architecture of MySQL database, and the overall architecture of InnoDB storage engine, and explains the execution process of updating a data in InnoDB storage engine.