Explain database transactions and isolation levels in detail

There are fewer people busy with live and refactoring and writing articles these days, and updates have to be slow

First, the nature of transactions

A transaction is a sequence of operations that are performed as a single logical unit of work and either all execute successfully or fail. Database things have four characteristics: atomicity, consistency, isolation, and durability, also known as ACID properties. Transaction processing ensures that data-oriented resources are not permanently updated unless all operations within the transaction unit complete successfully. Transaction is a logical unit of work in database operation. Transaction management subsystem in DBMS is responsible for transaction processing. Transactions are mainly used for tasks that are complex, important, and involve concurrency. Tasks such as bank debit, order generation, money transfer, equipment reinforcement (fog) need to be encapsulated into a transaction.

Atomicity

Atomicity means that a transaction contains all operations that either succeed or fail, with no partial and partial commits.

Consistency

Consistency means that the transaction must change the database from one consistent state to another consistent state, that is, before and after the transaction is executed, the database content is in a consistent state.

Isolation line

Isolation line means that when multiple users concurrently access the database, the database opens one transaction for each user, and the transactions cannot affect each other. The specific isolation effect depends on the isolation level of transactions (InnoDB in Mysql supports transactions, MyISAM does not support transactions).

“Durability”

Persistence means that once a transaction is committed, the effect of the transaction on data modification is permanent and will not be lost due to database failure.

Two, the realization principle of things

Atomicity, consistency, and persistence are achieved through redo logs and undo logs of the database.

Redo log is an InnoDB storage engine layer log. It is also a log file that records transaction changes. The redo log is recorded regardless of whether the transaction is committed or not. When an update statement is executed, InnoDB writes the update to the redo log, then updates the memory until the statement is complete, and then updates the contents of the redo log to disk at idle time or according to the specified update policy.

Generally, undo log is a logical log, which is recorded according to each row. It mainly stores logical changes of data. For example, if you want to insert a data, undo log generates a corresponding DELETE log. To put it simply, undo log records the data before it was modified, because rollback is required.

When rollback is required, only undo log logs can be used to restore the data before modification.

Undo log is also used for multi-version control (MVCC). The undo record contains a mirror of the data before the change. If the transaction that changed the data is not committed, the old version of the data should not be returned for transactions with an isolation level greater than or equal to Read COMMIT.

The processing process is as follows:

The bin log stores all data changes. In theory, all data recorded in the bin log can be restored. The redo log does not store all historical data changes. When memory data is flushed to disk, redo log data becomes invalid

The isolation of transactions is achieved through the locking mechanism

Isolation level of the database

In our actual work, the data in the database is often accessed by multiple users. When multiple users operate on the same data at the same time, some concurrent problems may occur: lost updates, unrepeatable reads, dirty reads and phantom reads.

Read Uncommitted: A transaction can access newly inserted data and Uncommitted modified data during execution. If one transaction has already started writing data, another transaction is not allowed to write data at the same time, but other transactions are allowed to read this row. This isolation level prevents lost updates.
Read Committed: A transaction can access newly inserted or modified data that was successfully Committed by other transactions. A transaction reading data allows other transactions to continue to access the row, but an uncommitted write transaction prevents other transactions from accessing the row. This isolation level effectively prevents dirty reads.

A dirty read indicates that transaction T1 reads uncommitted data from transaction T2. After transaction T2 is rolled back, data read by T1 is dirty. The granularity of dirty reads is table rows. The solution to dirty reads is to change the isolation level above “Unread Commit”.

Repeatable Read: During the execution of a transaction, the newly inserted data successfully committed by other transactions can be accessed, but the modified data cannot be accessed. A transaction that reads data will prohibit a write transaction (but allow a read transaction), and a write transaction will prohibit any other transaction. This isolation level effectively prevents unrepeatable and dirty reads. This is the default isolation level for Mysql databases.

Non-repeatable reads are when transaction T1 queries the same record multiple times and returns different data. This is because the database record has been modified by transaction T2 during the interval between queries. The granularity of non-repeatable reads is table rows.

Serializable: When a transaction is executed, other transactions are not allowed to be executed concurrently. Instead, a fully serialized read is allowed, so long as there is a read, the write is prohibited, but can be read at the same time, eliminating the illusion. This is the highest level of transaction isolation and, while most secure, is too inefficient to use.

Phantom read refers to that when transaction T1 reads a type of data in the table for operation, transaction T2 modifies (adds, modifies or deletes) a record in the batch during operation, causing T1 to find inconsistent data. The main differences between an unrepeatable read and a phantom read are as follows: An unrepeatable read focuses on data modification, while a phantom read focuses on data addition or deletion.

The details are as follows:

	Dirty read	Unrepeatable read	Phantom read
Read uncommitted	Square root	Square root	Square root
Reading has been submitted	x	Square root	Square root
Repeatable read	x	x	Square root
serialization	x	x	x

3.1 Gap Lock

Gap locking is a locking mechanism introduced by Innodb to solve illusory problems at the repeatability level. The problem of phantom reading exists because of new or update operations. At this time, if the range query (lock query), there will be inconsistent problems. At this time, different row locks can not meet the requirements, and the data within a certain range needs to be locked. At the repeatable read isolation level, the database is implemented through a combination of row and next-key locks.

A gap lock locks the interval in an index record, or the range before the first index record, or the range after the last index record.

Conditions for clearance lock:

1. Use common index lock.
2. Use unique index with multiple columns;
3. Lock multi-row records with unique indexes.

Clearance lock characteristics:

1. The basic unit of locking is next-key lock, which is open left and close right.
2. Locks will be added to objects accessed during the flashback process.
3. Equivalent query on index — next-key lock is upgraded to row lock when a unique index is locked.
4. Equivalent query on index — when the last value of the right search does not meet the query requirements, the next-key lock degrades to a gap lock.
5. A range query on a unique index will access up to the first value that does not meet the condition.

Iv. MVCC multi-version concurrency control

Concurrency Control. MVCC, for Multi-version Concurrency Control. MVCC is a method of concurrency control, which can realize concurrent access to database in database management system and transaction memory in programming language.

MVCC features:

For high concurrency transactions, MVCC is more efficient than locking alone.
MVCC only works at Read Committed and Repeatable Read isolation levels.
The MVCC can be implemented using optimistic locks and pessimistic locks.

In the process of concurrent database operations, inconsistent data may be read (dirty read). To avoid this situation, it is necessary to control concurrent database access, such as locking (for upuDate, these are called current reads, corresponding to subsequent snapshot reads). However, locking will serialize read and write operations, causing read operations to be blocked by write operations and greatly reducing read performance.

In the Concurrent package in Java, there is the CopyonWrite family of classes, which are designed to optimize for situations where reads far outnumber writes. During the write operation, data is copied without affecting the original data, and then modified. After the modification, the atom replaces the original data, while the read operation only reads the original data. In this way, the write operation does not block the read operation, thus optimizing the read efficiency.

MVCC works in a similar way to CopyonWrite. Under the MVCC protocol, a consistent snapshot is seen for each read operation and non-blocking reads can be implemented. MVCC allows multiple versions of data, which can be timestamps or globally increasing transaction ids, so that different transactions see different data at the same point in time.

4.1 Implementation principle of MVCC

The purpose of MVCC is multi-version concurrency control. The realization of MVCC in the database is to solve Read and write conflicts. Its realization principle is mainly based on the three implicit fields in the record, undo log and Read View.

Database implicit fields:

DB_TRX_ID (current transaction ID) : Specifies the ID of the transaction in which the record was created or last modified.
DB_ROLL_PTR: points to the previous version of this record. The rollback pointer points to the undo log that was written to the rollback segment.

Since the undo log in MySQL maintains a historical data record, we should get in the habit of committing transactions regularly, otherwise the rollback segment will grow larger and even fill up the table space.

DB_ROW_ID: InnoDB automatically generates a cluster index DB_ROW_ID if the table does not have a primary key.

A Read View is a Read View produced when a snapshot Read is performed by a transaction. At the moment of snapshot Read, a snapshot of the current database system is generated to record and maintain the ID of the current active transaction in the system. (When each transaction is started, an ID is assigned, which is incrementing. So the latest transaction, the larger the ID value)

So we know that Read View is mainly used for visibility judgment, that is, when a transaction performs a snapshot Read, we create a Read View of the record, which is compared to a condition to determine which version of the data can be seen by the current transaction. It could also be a version of the undo log recorded in the row.

The Read View follows a visibility algorithm that takes the DB_TRX_ID (current transaction ID) from the latest record of the data to be modified and compares it with the ids of other currently active transactions in the system (maintained by the Read View). If DB_TRX_ID is not visible in the Read View, use the DB_ROLL_PTR pointer to retrieve the DB_TRX_ID from Undo Log. Until the DB_TRX_ID meets the specified criteria, the old record for which the DB_TRX_ID resides is the latest old version visible to the current transaction.

So how does MVCC actually query?

According to the information on the Internet, it can be concluded that:

1. Query data where the value DB_TRX_ID is less than or equal to the transaction ID. (If the transaction inserts a data, it will generate a data of the current transaction ID, so it must contain the data inserted by the transaction itself.)
2. Query data whose rollback pointer is null or whose rollback pointer is greater than the current transaction ID. (This cannot be equal to because if the transaction deletes a data, the rollback pointer of the data will be generated as the current transaction ID, so the deleted data must be excluded)

Hello everyone, I am Nanju who has been practicing Java for two and a half years. Here is my wechat. If you need the previous map or want to exchange experience with each other, you can communicate with each other.

My blog is synchronized to tencent cloud + community, invite everyone to come together: cloud.tencent.com/developer/s…

Explain database transactions and isolation levels in detail

First, the nature of transactions

Two, the realization principle of things

Isolation level of the database

3.1 Gap Lock

Iv. MVCC multi-version concurrency control

4.1 Implementation principle of MVCC

Related Posts

Use Nginx forward proxy to connect Intranet machines to the Internet

Using Python to restore memories of my parents’ generation, black and white photos become color photos

Many people have encountered this problem when installing a database