@[TOC] Why two-stage submission? Can’t you just commit in one stage?

As you know, transactions in MySQL are committed in two phases. Many distributed transactions we see are committed in two phases, such as Seata. So why commit in two phases? Can’t you just submit it once? Let’s talk about that today.

For distributed transaction seATA, please refer to Songo’s previous article, Portal:

  • Experience distributed transactions in five minutes! So easy!
  • Read so many blogs, still do not understand TCC, might as well take a look at this case!
  • XA business water is very deep, young man I am afraid you can not grasp!
  • Does your Saga business guarantee “isolation”?

1. What is two-phase commit

1.1 binlog redolog

binlog

MySQL/MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8 / MySQL8

The binlog contains all DDL and DML statements in the form of events. It also contains the elapsed time of the statement execution. Note:

  • A binlog is a logical log that contains the original logic of an SQL statement, such as +1 for a field. Note that this is different from a physical log called a redo log (what changes were made on a data page).
  • When the binlog file is full, it automatically switches to the next log file and does not overwrite the previous log. This is also different from the redo log, where redo logs are written in a loop.
  • Generally, when configuring the binlog file, you can specify the validity period of the binlog file. After the validity period expires, the log file will be automatically deleted to avoid occupying too much storage space.

According to the MySQL documentation, there is about a 1% performance loss when binlog is enabled, but this is acceptable. In general, binlog has two important uses:

  • For MySQL primary/secondary replication: Enable the binlog function on the host. The host synchronizes the binlog function to the secondary host. The secondary host uses the binlog function to synchronize data between the host and secondary host.
  • MySQL data recovery, by using the mysqlBinlog tool combined with the binlog file, you can restore data to a time in the past.

redo log

The binlog is provided by MySQL, in MySQL server, and the redo log is not provided by MySQL, it is provided by InnoDB. So there are two types of logs in MySQL: binlog and redo log. There are two types of logs, both for historical reasons (InnoDB wasn’t the official MySQL storage engine) and for technical reasons, which we’ll talk about later.

As we all know, one of the four major features of transactions is persistence, which means that as long as the transaction commits successfully, the changes made to the database are permanently saved and written to disk. How can this be done? In fact, it is easy to imagine that every time a transaction commits, all the data pages involved in the transaction will be flushed to disk. Once written to disk, the data will not be lost.

But if you do this every time, the database will not know where to go slow! Since Innodb interacts with disks on a per-page basis, and a transaction may only modify a few bytes of a data page, it is inefficient and wasteful to flush the entire data page to disk. The inefficiency is due to the fact that the data pages are not physically contiguous, and flushing the data pages to disk involves random IO.

For this reason, MySQL has designed a redo log that records only what changes transactions make to the data page. Write a redo log. Writing data to disk is also disk IO. Since it is disk IO, why not write data directly to disk? Go to all this trouble!

That’s not true.

Redo logs are sequential I/O. While redo logs involve random I/O, where data is addressed, located, and updated/added/deleted, redo logs are sequential I/O. So it’s faster than writing data.

The redo log itself is divided into:

  • Redo log buffer. This part of the log is volatile.
  • A redo log file is a log file on disk that is persistent.

Each DML statement executed by MySQL is written to the redo log buffer, and at a certain point in time, multiple operations are written to the redo log file. This technique is known as WAL(write-Ahead Logging) in MySQL.

1.2 Two-phase commit

In MySQL, the main players of the two-phase commit are binlog and redolog.

As you can see from the figure above, there are 3 steps when the transaction is finally committed:

  1. Write redo log to the prepared state.
  2. Write binlog.
  3. Change the redo log status to commit.

Because the redo log commit is divided into prepare and COMMIT phases, it is called a two-phase commit.

2. Why two-phase commit

If there is no two-phase commit, binlog and redolog commit in one of two ways:

  1. Binlog first and then redolog.
  2. Redolog and then binlog.

Let’s look at these two cases separately.

Let’s say we want to insert a record R into the table, and if we write binlog first and then redolog, let’s say the binlog crashes before redolog is written. There is already a record of R in the binlog. When data is synchronized from the slave machine to the host or when data is recovered using binlog, R will be synchronized to the record. However, there is no record of R in Redolog, so after crash recovery, the transaction inserted into R record is invalid, that is, there is no record of that row in the database, which causes data inconsistency.

Instead, let’s say we want to insert a record R into the table, and if we write redolog first and then binlog, let’s say redolog crashes before the binlog is written. Redolog has a record of R, so after the crash recovery, the transaction inserted R record is valid, through which data is restored to the database. However, there is no record about R in binlog, so when data is synchronized from the slave machine to the host or when data is recovered using binlog, R will not be synchronized to the record, resulting in data inconsistency.

Will two-phase commit solve the problem?

Let’s look at the following three situations:

** a crash occurred after a redo log was written in the prepare state.

Because the binlog has not been written and the redo log was not committed in the prepare state, the transaction is rolled back during crash recovery. The binlog has not been written and is not transferred to the standby database.

** If it crashes after writing binlog, then:

If the transaction exists and is complete, commit the transaction directly. If the transaction does not exist or is incomplete, roll back the transaction.

** If redolog crashes while in commit state, the redolog will be rebooted.

Thus, two-phase commit ensures consistency of data.

3. Summary

If you have any questions, please leave a comment.