This article focuses on the implementation of ACID in MySQL InnoDB engine. For more information about what is a transaction and what isolation levels mean, see my MySQL series articles.

ACID

MySQL as a relational database, in terms of the most common InnoDB engine, how to ensure ACID.

  • Atomicity: A transaction is the smallest unit of execution and cannot be split. Atomicity ensures that actions are either done completely or not at all;

  • Consistency: Data is consistent before and after a transaction is executed.

  • Isolation: A transaction is not disturbed by other transactions when accessing the database concurrently.

  • (Durability) : After a transaction is submitted. Changes to data in the database are persistent, even if the database fails.

Isolation,

Isolation is mainly implemented with locks and MVCC. This can also be seen in my previous mysql series, but I won’t repeat it here.

atomic

There are many types of MySQL log, such as binary log, error log, query log, slow query log, etc. InnoDB storage engine also provides two types of transaction log: redo log and undo log. The redo log is used to ensure transaction persistence. Undo log is the basis for atomicity and isolation of transactions.

Undo log: Rollback logs. We know that isolation MVCC relies on it, as does atomicity. The key to achieving atomicity is the ability to undo all SQL statements that have been successfully executed when the transaction is rolled back.

When a transaction changes the database, InnoDB generates the corresponding undo log. If a transaction fails or something needs to be rolled back because of a rollback, you can use the information in the undo log to rollback the data to where it was before the modification. Undo logs are logical logs that record information about SQL execution. When a rollback occurs, InnoDB does the reverse based on the contents of the undo log:

  • For each INSERT, delete is performed during the rollback;

  • For each DELETE, the rollback will perform an INSERT;

  • For each update, a reverse update is performed to change the data back.

Take the update operation as an example. When a transaction performs an update, the generated Undo log contains the primary key of the row that was modified (to know which rows were modified), the columns that were modified, and the values of the columns before and after the changes. This information can be used to restore the data to the state before the update.

persistence

Innnodb has a lot of logs, and the persistence is called redo logs.

Read data: reads data from the buffer pool first. If not, reads data from disk before putting it into the buffer pool. Write data: The system writes data to the buffer pool first. Data in the buffer pool is periodically synchronized to disks (dirty reads are generated).

That’s where the redo log comes in. If redo logs are stored and disk IO is involved, why use redo logs? 1. Redo logs are stored sequentially, while cache synchronization is performed randomly. Cache synchronization is performed on a data page basis, and the size of each transfer is greater than the redo log.

Redo log uses WRITE-ahead logging (WAL). All changes are written to log first and then updated to Buffer Pool. This ensures that data will not be lost due to MySQL downtime.

Since the redo log also needs to write the log to disk at transaction commit time, why is it faster than writing the modified data in the Buffer Pool directly to disk (i.e., flushing)? The causes are as follows: 1. I/O brushes are random because the data position of each redo log is random. The default MySQL Page size is 16KB. Every small change on a Page is written to the entire Page. The redo log contains only the parts that actually need to be written, and the number of invalid IO is greatly reduced.

Redo log is used for crash recovery to ensure that MySQL downtime does not affect persistence. Binlog is used for point-in-time recovery to ensure that the server can recover data based on point in time. Binlog is also used for primary/secondary replication. 2. Different levels: Redo log is implemented by InnoDB storage engine, binlog is implemented by MySQL server layer (see MySQL logical architecture earlier in this article), and supports InnoDB and other storage engines. Redo logs are physical logs. They are disk-based pages. The contents of a binlog are binary and, depending on the binlog_format parameter, may be based on SQL statements, the data itself, or a combination of the two. 4. Different writing timing: Binlog is written when the transaction is committed; The redo log can be written at different times. The default redo log flushing policy is: Fsync is called when a transaction is committed to flush the redo log. Changing the innodb_flush_LOG_at_trx_COMMIT parameter can change this policy, but the persistence of the transaction is not guaranteed. Other flush times: For example, the master thread flusher the redo log every second. The commit time is much faster than the commit time.

Let’s look at how an SQL update statement works:

Such as:

update t set age=20 where id=1;

Persistence is definitely about writing. MySQL uses WAL, which stands for write-Ahead Logging.

[redo log]

The redo log is the log. When a log is updated, the InnoDB engine writes it to the redo log (and updates memory). When the redo log is updated to disk, the redo log has two characteristics:

  • Fixed size, loop

  • Crash-safe (All data flushed from the redo log is erased. After the database restarts, restore the redo log to memory. This is why redo logs have crash-safe capabilities.

There are two phases for redo logs: COMMIT and prepare. If you do not commit in two phases, the state of the database may be inconsistent with the state of the database recovered from its logs.

[Buffer Pool]

A Buffer Pool contains a map of some of the data pages on disk, which serves as a Buffer for accessing the database:

  • Data is read from the Buffer Pool first. If no data is read from the disk, the data is added to the Buffer Pool.

  • When data is written to the database, it is first written to the Buffer Pool. The modified data in the Buffer Pool is periodically flushed to disks.

The use of Buffer Pool greatly improves the efficiency of reading and writing data, but it also brings a new problem: if MySQL crashes and the modified data in the Buffer Pool is not flushed to disk, the data will be lost and the persistence of transactions cannot be guaranteed.

So I added a redo log. When data is modified, the redo log logs data in the Buffer Pool.

When a transaction commits, the fsync interface is called to flush the redo log.

If the MySQL database is down, you can read the redo log data during the restart to restore the database.

consistency

Consistency is the ultimate goal pursued by transactions, and measures to achieve consistency include:

  • Atomicity, persistence, and isolation are guaranteed, and if these characteristics are not guaranteed, neither is transaction consistency.

  • The database itself provides guarantees, such as not allowing string values to be inserted into an integer column and not exceeding column limits

  • For example, if the transfer operation only deducts the balance of the sender, but does not increase the balance of the receiver, no matter how perfect the database implementation is, it cannot guarantee the consistency of the state.

Consistency in CAP and ACID

In CAP theorem, data consistency actually means that each node in a distributed system has the same value for the copy of the same data. If a schema specifies that a value must be unique, then a consistent system must ensure that the value is unique in all operations. There is a fundamental difference between CAP and ACID in the definition of consistency.

conclusion

ACID four basic characteristics of the transaction is the cornerstone of the database to be able to run, but completely to ensure that the database ACID, especially the isolation will have a larger effect on performance, we will be in the actual use of adjust isolation, according to the requirements of the business, in addition to the isolation, the database of atomicity and durability believe are features that are easy to understand, The former ensures that all transactions of the database are either executed or not executed at all, while the latter ensures that all writes to the database are persistent and non-volatile. Consistency is not only a requirement for the integrity of the database’s own data, but also a requirement for developers to write logically correct and reasonable transactions.

Finally, and most importantly, be aware of the context in which someone is talking about consistency.