The harder you work, the more lucky you are. This article has been collected in GitHub JavaCommunity, which has an interview to share, source code analysis series articles. Welcome to collect, like github.com/Ccww-lx/Jav…

preface

During the interview, basically will be asked questions about the database transaction, or if what all don’t answer to the surface only on knowledge points, that is basically in the interview, in order to pass the interview, the MySql transaction issues you need to understand, so just according to the online version data in a MySql affairs knowledge, consolidate the knowledge of the transaction.

The transaction

A transaction is a logical set of operations that either all or none of them execute,

Properties of transactions (ACID)

  • Atomicity: A transaction is an inseparable unit of work that either succeeds or fails. If a SQL statement fails in a transaction, the executed statement must be rolled back and the database reverts to its pre-transaction state.

  • Consistency: Transactions cannot compromise data integrity and business Consistency. In a bank transfer, for example, whether the transaction succeeds or fails, the total amount of money in both parties remains the same

  • Isolation: The visibility (usually invisible) of data operated by one transaction to other transactions before it is committed

  • Durability: After a transaction is committed, modifications are persisted forever and do not cause data to be lost due to system failures

Strictly speaking, only when meeting the transaction ACID characteristics of the database at the same time can be considered as a complete transaction, but in reality, the implementation can really meet the complete transaction characteristics are very few, but in the implementation must try to meet the characteristics of transaction requirements.

So how exactly is transaction ACID implemented? Let’s take a look, first of all, at the nature of transactions.


Atomicity (Atomicity)

First let’s look at the atomic nature of transactions and see how it is implemented.

Atomicity: A transaction is an inseparable unit of work that either succeeds or fails. If a SQL statement fails in a transaction, the executed statement must be rolled back and the database reverts to its pre-transaction state

The implementation of Atomicity depends on the MySQL transaction log type of undo log. When a transaction needs to be rolled back, the database state needs to be rolled back to before the transaction starts, that is, all SQL statements that have been successfully executed need to be cancelled. So undo log plays a key role:

When a transaction changes the database, InnoDB generates the correspondingundo log; If the transaction fails or is calledrollback, causing the transaction to need to be rolled back, can be usedundo logThe information in rolls back the data to where it was before the modification.

So what is an undo log? How is each data change operation recorded?

Undo log

Undo log: Logs are recorded in the form of segments. Each undo operation occupies an undo log segment. Why is the corresponding Undo log recorded during the data change operation? Its purpose is to:

  • To ensure atomicity of the data, a version prior to the transaction is recorded for rollback,
  • throughmvcc+undo logImplement InnoDB transaction repeatable reads and read committed isolation levels.

The undo log is divided into:

  • insert undo log :insertGenerated in an operationundo log.
  • update undo log:deleteupdateOperationally generatedundo log

How does the undo log record data changes?

Because the records of insert operations are visible only to the transaction itself, they are not visible to other transactions. Therefore, the undo log can be deleted after the transaction is committed without the need for the Purge operation.

A Delete operation does not actually Delete a row in a transaction. Rather, it is a Delete Mark operation that identifies a Delete_Bit on a record without deleting the record. It is a “fake delete” that is marked as a “purge” thread in the background.

Update is divided into two cases: whether the update column is a primary key column.

  • If it’s not a primary key column, inundo logHow about a direct reverse record inupdate. namelyupdateIt’s direct.
  • If it’s a primary key column,updateThis is done in two parts: delete the row and insert the target row.

Unlike insert undo log, Update undo log does not delete the undo log immediately when a transaction is committed. Innodb will place the undo log corresponding to the transaction in the delete list, which will be deleted through the Purge thread in the future.

Because undo log may be used later, for example, when the isolation level is REPEATable Read, the transaction reads the latest version of the commit row when the transaction is started. As long as the transaction does not end, The version of the row cannot be deleted (that is, undo log cannot be deleted), and the pages allocated by undo log can be reused to reduce storage space and improve performance.

Note: The purge thread has two main functions: to purge undo pages and to purge rows marked with Delete_Bit.

Next, let’s look at the isolation of a transaction, what isolation levels a transaction has, and what problems can arise in transaction concurrency.


Isolation (Isolation)

Isolation means that the operations within a transaction are isolated from other transactions, and the concurrently executed transactions cannot interfere with each other. The visibility of data operated by one transaction to other transactions is set before submission (generally invisible).

Transaction isolation level

Moreover, in order to effectively ensure the correctness of read data under concurrent conditions, the database provides four transaction isolation levels >, which are:

  • Read Uncommitted (dirty read) : The uncommitted data can be read and dirty read is allowed
  • Read Committed (non-repeatable read) : Allows reading of data that has been committed by a transaction
  • Repeatable read (phantom read) : query results within the same transaction are the same as query results at transaction start (InnoDB default level)
  • Serialization: All transactions are executed one by one, each read requires a table-level shared lock, and reads and writes block each other

Where, different isolation levels may exist in different concurrency problems >, the main concurrency problems include:

  • Data loss: When two or more transactions operate on the same data and update the row based on the originally selected value, the lost update problem occurs because each transaction is unaware of the existence of the other transactions — the last update overwrites the updates made by the other transactions

  • ** Dirty read: ** A reads data that has not been committed by another transaction. Transaction A reads data that has been updated by transaction B, and then TRANSACTION B rolls back

  • ** Not repeatable to read (emphasis on modification) : * * in A transaction, has read two of the same, due to another transaction to modify the data, lead to the results of the first and the second, issues A read the same data for many times, transaction B in the process of transaction A read many times, the data update and submit, lead to A multiple times while reading the same data, inconsistent results.

  • Phantom read: In a transaction, two identical reads (typically a range query) are performed, and the results are inconsistent because data has been added or deleted in another transaction

What’s the difference between unrepeatable and phantom?

The biggest difference between unrepeatable and phantom reads is how they solve their problems through locking mechanisms,

The locking mechanism is used to achieve both levels of isolation. In repeatable reads, the same SQL locks the data after it is read for the first time, and other transactions cannot update the data to achieve repeatable read isolation.

However, this method does not lock the data of the insert. Therefore, when transaction A reads the data earlier, transaction B commits the data of the INSERT, and then finds that transaction A finds some more data. This is A phantom read and cannot be avoided by row locking.

Now that we know about concurrency issues, let’s look at the different isolation levels that may exist in different concurrency issues:

Transaction isolation level Dirty read Unrepeatable read Phantom read
Read uncommitted is is is
Unrepeatable read no is is
Repeatable read no no is
serialization no no no

To achieve transaction isolation, database locks are extended. The isolation level of InnoDB transactions is achieved by locking and MVCC (Multi-version Concurrency Control)

Let’s take a look at the principle of locking, how to use locks to achieve transaction isolation?

Locking mechanism

The basic working principle of the lock mechanism is that the transaction needs to obtain the corresponding lock before modifying the data. Once the lock is acquired, the transaction can modify the data; During the transaction operation, this part of the data is locked. If other transactions need to modify the data, they need to wait for the current transaction commit or rollback to release the lock.

MySQL is divided into three main types (levels) of locking mechanisms:

  • Table lock: the locking mechanism with the largest granularity, the probability of locking resources is also the highest, the concurrency is the lowest, but the cost is small, lock fast, no deadlock.

  • Row-level locking: the locking mechanism of maximum granularity is very small, and the probability of locking resource contention is also minimum. It can give the application program as much concurrent processing capacity as possible and improve the overall performance of some applications requiring high concurrent processing, but the overhead is high and locking is slow. There will be deadlocks,

  • Page-level locking: the overhead and locking time are between table and row locks. Deadlocks occur; The locking granularity is between table locks and row locks, and the concurrency is average

And different storage engines support different locking mechanisms, the main analysis of InnoDB lock.

InnoDB lock

InnoDB implements two types of row locking

  • Shared lock (S lock, row lock) : Multiple transactions can share a lock on the same row, which can only be read but cannot be modified
  • Exclusive lock (X lock, row lock) : if a transaction acquires an exclusive lock on a row, other transactions cannot acquire the lock on that row (shared lock, exclusive lock), allowing the transaction update data to acquire the exclusive lock

For UPDATE,DELETE,INSERT operations, InnoDB automatically adds an exclusive lock (X) to the data set involved; InnoDB does not place any locks on normal SELECT statements.

And because InnoDB engine allows row and table locks to coexist, implements multi-granularity locking, uses intent locks for table locking,

  • Intended shared locks (IS locks, table locks) : When a transaction IS about to lock a row, an intended shared lock IS first added to the table. Intent shared locks are compatible
  • Intentional exclusive lock (IX lock, table lock) : an intentional exclusive lock is first applied to a table when a transaction is about to apply an exclusive lock to a row. Intentional exclusive locks are compatible with each other

Intentional locks (IS, IX) are automatically added before InnoDB data operations without user intervention. Its significance lies in: when the transaction wants to lock the table, it can first determine whether the intent lock exists, and if so, it can quickly return the table. Table lock cannot be enabled, otherwise it needs to wait.

The compatibility of the four locks is as follows

Current lock mode/compatible/Requested lock mode X IX S IS
X conflict conflict conflict conflict
IX conflict Compatible with conflict Compatible with
S conflict conflict Compatible with Compatible with
IS conflict Compatible with Compatible with Compatible with

InnoDB grants the requested lock to a transaction if its lock mode is compatible with the current one; Otherwise, if the two are incompatible, the transaction waits for the lock to be released.

InnoDB row locks

InnoDB row locks are implemented by locking index entries on indexes. Row locks can only be used if data is retrieved through the index, otherwise table locks are used (all records of the index are locked)

A next-key lock prevents phantoms. According to the index, divided into a left open right closed interval. When a range query is performed, if the index hits and the data is retrieved, the range in which the record is located and its next range are locked.

Next-key = Record Locks + Gap Locks

  • When we use a range condition rather than an equality condition to retrieve data and request a shared or exclusive lock, InnoDB locks the index entries of existing data records that meet the range condition. Records whose key values are in the condition range but do not exist are called gaps.
  • Record Locks are used when unique indexes are used and precise queries exist to Record

Where is the specific use reflected? As shown below:

  • Range query, records exist

  • When the record does not exist (either for equivalence or range queries), the next-key degrades to a Gap Lock

  • A next-key degenerates into a Record Lock when the condition is an exact match (i.e. equivalent query) and the Record exists and is a unique index.

  • When the condition is exact matching (i.e. equivalent query) and the Record exists, but not the only index, the next-key data with exact value will increase the Record Lock and the interval data before and after the exact value will increase the Gap Lock.

How to use locks to solve concurrency problems

Use locks to solve dirty reads, unrepeatable reads, and phantom reads

  • X locks solve dirty reads

  • The S lock resolves non-repeatable reads

  • Keylock solves illusionary reading

Multiversion concurrency control (MVCCMulti-version concurrency control)

InnoDB’s MVCC is implemented by storing two hidden columns at the end of each row, one holding the row’s transaction ID (which is incremented) and one holding a pointer to the rollback segment of the row.

Each time a new transaction is started, a new transaction ID is automatically incremented. DB_ROLL_PTR indicates a pointer to the rollback segment of this row. All version data on this row is organized in a linked list in undo. This value actually points to the history list of this row in Undo.

When accessing the database concurrently, MVCC multi-version management is performed on the data in the transaction to avoid write operation blocking read operation, and phantom read is solved by comparing versions.

In addition, MVCC only works at REPEATABLE READ and READ COMMITIED isolation levels, where the MVCC implementation essentially saves a snapshot of the data at a certain point in time. Which operations are snapshot reads?

Snapshot read and current read

Snapshot read, InnoDB snapshot read, data read by cache(original data) + undo(data before transaction modification) two parts

  • ordinaryselect, such asselect * from table where ?;

Currently read, the data read by SQL is the latest version. Locks are used to ensure that read data cannot be modified by other transactions

  • UPDATE

  • DELETE

  • INSERT

  • The SELECT... LOCK IN SHARE MODE

  • The SELECT... FOR UPDATE

    SELECT… LOCK IN SHARE MODE LOCK IN SHARE MODE for all read records except S LOCK.

How does MVCC operate at the RR isolation level?

MVCC operations in RR isolation

InnoDB follows the next two rules for SELECT operations:

  1. InnoDB only looks for rows whose version is earlier than the version of the current transaction (that is, rows whose transaction number is less than or equal to the transaction number of the current transaction). This ensures that the transaction reads rows that either existed before the transaction started or were inserted or modified by the transaction itself.
  2. The deleted version of the row is either undefined, read to the version of the state before the transaction begins >, which ensures that the row read by the transaction is not deleted before the transaction begins. Only the records that meet both requirements can be returned as the query result.

INSERT: InnoDB stores the current transaction number as the row version number for each newly inserted row.

DELETE: InnoDB saves the current transaction number for each row deleted as the row deletion identifier.

UPDATE: InnoDB inserts a new record, saves the current transaction number as the row version number, and saves the current transaction number to the original row as the row delete mark >.

Save these two additional system versions so that most read operations are unlocked. This design makes reading data simple, performs well, and guarantees that only the rows that meet the criteria will be read. The downside is that each row requires additional storage, more row checking, and some additional maintenance.

Having analyzed atomicity and isolation, let’s move on to transaction persistence.

Durability (Durability)

Durability: After transactions are submitted, modifications are kept forever, they don’t lose data due to system failures,

In addition, the key of the redo log implementation is the redo log. During SQL execution, the executed SQL statements are saved to a specified log file. When recovery is executed, the REDO log recorded SQL operations are re-executed.

So how is redo log implemented?

redo log

When data is written to the database, the Buffer Pool is first written. The modified data in the Buffer Pool is periodically flushed to disk (this process is called the redo log). Redo log is divided into:

  • Buffer Pool Log Buffer in memory. This part of the log is volatile.
  • A redo log file on disk. This part of the log is persistent.

The use of Buffer Pool can greatly improve the efficiency of reading and writing data, but it also brings a new problem: if MySQL crashes and the modified data in the Buffer Pool is not flushed to disk in memory, the data will be lost, and the persistence of transactions cannot be guaranteed.

To ensure the persistence of a transaction, when a transaction commits, Innodb_flush_log_at_trx_commit = innodb_flush_log_at_trx_commit = innodb_flush_log_at_trx_commit = innodb_flush_log_at_trx_commit = innodb_flush_log_at_trx_commit

  • 0: the system refreshes data every second. When the system crashes, 1 second of data is lost.
  • 1: each transaction commit is written to disk.
  • 2: is flushed to disk every second, but is different from 0.

The redo log is explained in more detail and will be covered later. By now, you have understood all three features of transactions. What about transaction consistency?


Consistency (Consistency)

Consistency: a transaction must not compromise data integrity or business Consistency:

  • Data integrity: entity integrity, column integrity (such as field type, size, length to meet the requirements), foreign key constraints, etc

  • Consistency of business: for example, in a bank transfer, the total amount of money in both parties remains the same whether the transaction succeeds or fails.

How does that guarantee data consistency?

In fact, data consistency is guaranteed by atomicity, persistence and isolation of transactions

  • Atomicity: statements are either executed at all or not executed at all, which is the core characteristic of transactions. Transactions themselves are defined by atomicity. The implementation is mainly based on undo log
  • Persistence: Ensure that data will not be lost due to downtime and other reasons after the transaction is submitted. The implementation is mainly based on redo log
  • Isolation: Ensuring that transaction execution is as independent as possible from other transactions; InnoDB’s default isolation level is RR. RR is implemented based on locking (including next-key lock), MVCC (including hidden columns of data, undo log-based version chain, ReadView).

conclusion

Where both ACID properties are satisfied, such transactions are rare. In practice, many examples only satisfy certain features, such as:

  • MySQL NDB Cluster transactions do not meet the persistence and isolation;
  • InnoDB default transaction isolation level is repeatable read, does not meet isolation;
  • Oracle’s default transaction isolation level is READ COMMITTED, which does not satisfy isolation

So we can only use this four-dimensional property to measure the operation of the transaction.

Finally, wechat search “Ccww Technology Blog” to watch more articles, but also welcome to pay attention to a wave.