“This is the first day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021”

Transaction definition

1. One of the smallest non-divisible units of work; A transaction usually corresponds to a complete business, such as a bank account transfer, which is the smallest unit of work

A complete business requires a set of DML(INSERT, UPDATE, delete) statements to work together

Transactions are only associated with, or have transactions in, DML statements. This is related to the business logic, and the number of DML statements varies according to the business logic

Transaction characteristics

A transaction must contain four basic properties, commonly referred to as ACID

  • Atomicity: A transaction is the smallest unit of execution and cannot be split. Atomicity ensures that actions either all succeed or all fail;
  • Consistency: Data is consistent before and after a transaction is executed.
  • Isolation: A transaction is not disturbed by other transactions when accessing the database concurrently.
  • (Durability) : After a transaction is submitted. Changes to data in the database are persistent, even if the database fails.

Next, MySQL is transactional under the innoDB engine

Isolation,

Why isolation first? This is because isolation is the most fundamental feature of a transaction, involving concepts such as isolation level, locking, MVCC, and so on.

Isolation level instructions
Read uncommitted Before a transaction commits, its changes can be seen by other transactions
Reading has been submitted After a transaction commits, its changes are seen by other transactions
Repeatable read A read of the same data in a transaction is always the same, regardless of whether other transactions operate on the data and whether the transaction is committed or not.InnoDB default level.
serialization Transactions are serialized, and each read requires a table-level shared lock. Read and write block each other with the highest isolation level, sacrificing system concurrency.

Different isolation levels are designed to address different problems. That is, dirty reading, magic reading, unrepeatable reading.

  1. Dirty read: A dirty read is when a transaction is accessing data and making changes to the data that have not been committed to the database, and the data is also accessed by another transaction and then used.
  2. Non-repeatable read: the same data is read multiple times in a transaction. The same data is accessed by another transaction while the transaction is still active. Then, between the two reads in the first transaction, the data read in the first transaction may not be the same because of the modification in the second transaction. This happens when the data read twice in a transaction is not the same and is therefore called a non-repeatable read.
  3. Phantom read: The first transaction modifies data in a table, involving all rows in the table. At the same time, the second transaction also modifies the data in the table by inserting a new row into the table. Then, the user of the first transaction will later discover that there are still unmodified rows in the table, as if there were an illusion that the rows were too many or too few.

The following table describes possible problems caused by different isolation levels of MySQL

Transaction isolation level Dirty read Unrepeatable read Phantom read
Read uncommitted may may may
Reading has been submitted Can’t be may may
Repeatable read Can’t be Can’t be may
serialization Can’t be Can’t be Can’t be

So how do different isolation levels ensure isolation? The answer is lock and MVCC.

In terms of granularity, MySQL locks are divided into table locks, page locks and row locks.

Table locks include intentional shared locks (IS), intentional exclusive locks (IX), and auto-increment locks.

Types of row locks Shared locks (S) and shared locks (X). Row locks are added when they are needed, but are not released immediately when they are no longer needed, but wait until the end of a transaction.

Three row locking algorithms

  1. Record Lock: A Lock on a single row Record.
  2. Gap Lock A Gap Lock locks the gaps between records to prevent data insertion. We are using real-time read (SELECT FOR… UPDATE) or UPDATE. In order to prevent new data from being inserted during the read process, the left and right ranges of the read data are locked to prevent other transactions from inserting data. Therefore, gap locks are not excluded between the read data, and gap locks only exclude data insertion operations.
  3. A next-key Lock is used to Lock records and gaps between records. A next-key Lock is used to Lock index records and gaps between records. A next-key Lock is used to Lock index records and gaps between records.

Locks and MVCC

A general introduction to the lock, as you can see. With a lock, when a transaction is writing data, other transactions cannot obtain the write lock and cannot write data, which ensures the isolation between transactions to a certain extent. If you add a write lock, why can other transactions also read data? This relies on multi-version Concurrency Control (MVCC).

Innodb stores each row of data with a few additional fields: DATA_TRX_ID and DATA_ROLL_PTR.

  • DATA_TRX_ID: indicates the version number of a data row. Identifies the transaction ID that made the most recent change to the row record.
  • DATA_ROLL_PTR: pointer to the rollback segment of the row. All the old versions on the row are organized in a linked list in undo log.

ReadView

Created at the beginning of each SQL statement, it has several important properties:

  • Trx_ids: set of active (uncommitted) transaction version numbers for the current system.
  • Low_limit_id: Maximum transaction version number +1 when the current read view is created.
  • Up_limit_id: “system was in active transaction minimum version” when the current Read view was created
  • Creator_trx_id: transaction version number for creating the current Read View;

The query

  • DATA_TRX_ID <up_limit_id: Indicates that data exists before the current transaction.

  • DATA_TRX_ID >= low_limit_id: Indicates that this data is generated after the current read View is created and is not displayed.

    • DATA_ROLL_PTR find historical version from undo log, null if not found.
    • Up_limit_id

RR level illusion

With locks and MVCC, transaction isolation is resolved. The default RR level resolves illusory reading. Phantom reading is usually for INSERTS, and non-repeatability is for updates.

Read the snapshot

A normal SELECT statement is a normal read, that is, the data read is the state at the start of the transaction data, ordinary read phantom problem is mainly solved by MVCC, see the above MVCC query operation.

Real-time read

SELECT *** FOR UPDATE

SELECT *** LOCK IN SHARE MODE

A next-key Lock will be added to Lock the current record and the gap between the left and right ranges, so that data cannot be inserted into our query range when reading.

atomic

The undo log rollback log was mentioned earlier. The isolation MVCC actually relies on it, as does atomicity. The key to achieving atomicity is the ability to undo all SQL statements that have been successfully executed when the transaction is rolled back.

When a transaction changes the database, InnoDB generates the corresponding undo log. If a transaction fails or a rollback is called, the transaction needs to be rolled back, you can use the information in the undo log to rollback the data before the modification. The undo log is a logical log that records information about SQL execution. When a rollback occurs, InnoDB does the reverse based on the contents of the undo log:

  • For each INSERT, delete is performed during the rollback;
  • For each DELETE, the rollback will perform an INSERT;
  • For each update, a reverse update is performed to change the data back.

Take the update operation as an example. When a transaction performs an update, the generated Undo log contains the primary key of the row that was modified (to know which rows were modified), the columns that were modified, and the values of the columns before and after the changes. This information can be used to restore the data to the state before the update.

persistence

Persistence depends on redo logs. WAL(Write-Ahead Logging) technology is often mentioned in MySQL. The key point of WAL is to Write logs first and then Write disks.

When a data update operation is performed, the changes to the data page are recorded in the redo log. The data page is then updated in memory. The data page is updated to disk the next time the data page is queried or during idle time. This changes random I/ OS to sequential I/ OS.

The advantage is that the number of disk I/ OS is reduced, and data can be restored to the latest state based on the redo log in the event of a failure.

The disadvantage is that it will cause dirty pages in memory. Background threads will automatically flush dirty pages or flush data pages, which will temporarily query operations and affect the query.

Redo log has two characteristics:

  • Fixed size, loop
  • crash-safe

There are two phases for redo logs: COMMIT and prepare. If you do not commit in two phases, the state of the database may be inconsistent with the state of the database recovered from its logs.

What is the two-paragraph submission system?

MySQL Server writes the updated SQL file to bin log. After the redo log is prepared, the MySQL Server writes the updated SQL file to bin log. The innoDB interface is then called to set the redo log to commit state, and the update is complete.

You might wonder if there’s a binlog that’s also used to write and recover data, so what’s the difference?

  • Level: Redo log is innoDB engine specific, server layer called binlog.
  • Content: REdolog is a physical log of “what was changed on a data page”; A binlog is a logical log. It is the original logic of statements such as “add 1 to the c field in the row where ID=2”.
  • Write: The redolog loop writes and writes frequently, the binlog appends and writes when the transaction commits

Update T set c=c+1 where ID=2;

  1. If ID=2 is the primary key, return it directly to the executor. If ID=2 is the primary key, return it directly to the executor. If ID=2 is the primary key, return to the executor.
  2. [The actuator takes the row given by the engine, adds 1, N+1, to get a new row, and then calls the engine interface to write the new row.]
  3. The engine updates the new row of data to memory and logs the update to the redo log, which is prepared. The redo log then tells the executor that the transaction is ready to commit.
  4. The executor generates a binlog of this operation and writes the binlog to disk.
  5. The executor calls the commit transaction interface of the engine. The engine changes the redo log to commit.

Rules for determining crash recovery (determined by whether redolog is committed or binlog is complete)

  1. Commit the redo log if the transaction is complete and has a commit flag.
  2. If the redo log contains only a complete prepare, check whether the binlog exists and is complete: a. If so, commit the transaction; B. Otherwise, roll back the transaction.

consistency

Consistency is the ultimate goal of transaction pursuit. Atomicity, persistence and isolation mentioned above are all intended to ensure consistency of database state. Of course, the above is the guarantee at the database level, and the implementation of consistency also needs to be guaranteed at the application level.

That is, your business, such as the purchase operation only deducts the balance of the user, not the inventory, certainly cannot guarantee the consistency of the state.

conclusion

MySQL transactions should be known, but the implementation principle may not be so clear, I hope this article can help understand transactions.

The more you learn today, the less you beg tomorrow