This is the sixth day of my participation in the August More text Challenge. For details, see: August More Text Challenge

Concurrency control

Transactions are the basic unit of concurrency control. Concurrency control is used for interleaved execution when more than one transaction is processed.

In the following operations, R indicates read and W indicates write.

Multi-transaction execution

Let’s start with database concurrency. Database is a data management system that shares resources (data) and allows simultaneous access by multiple users. Sharing can be achieved by time-sharing, simultaneous access and other technologies.

Multi-user database system: a database system that allows multiple users to use the same database simultaneously.

Most common scenarios, and most database systems, are multi-user database systems, such as DBMS used by banks, e-commerce websites, airplane train ticket booking, and so on.

In this way, it involves the execution of multiple transactions:

The serial execution

  1. Transaction Serial: Only one transaction runs at a time, other transactions must wait until the current transaction ends.

It is wasteful and cannot make full use of system resources, the execution efficiency is slow, and the database cannot be shared.

Cross concurrency

  1. Interleaved Concurrency

Mainly for the case of single-core CPU (multi-core also has this way), the parallel execution of transactions is run by transactions in turn cross parallel operations.

In this case, parallel transactions do not really run in parallel, but can reduce CPU idle time, improve system utilization and processing speed, etc.

Simultaneous concurrent mode

  1. Simultaneous Concurrency

In a multi-core CPU environment, each CPU core can run one transaction, and multiple cores can run multiple transactions at the same time, realizing the true parallel running of multiple transactions. This is the ideal way to do concurrency, but it also requires a more sophisticated concurrency control mechanism.

In addition to serial execution, multiple transactions should be executed with concurrency control.

Non-serial execution of concurrent transactions, resulting in problems

  • Multiple transactions accessing the same data at the same time.
  • Incorrect data can be accessed and stored, breaking transaction isolation and database consistency.

Here is a typical example of a concurrent transaction breaking data integrity (consistency) :

Therefore, the DBMS must provide the corresponding concurrency control mechanism.

Concurrency control mechanism is one of the important indicators to measure the performance of a database management system

Concurrent operations create (potentially) data inconsistency issues

Data inconsistencies occur because concurrent operations break the isolation of transactions, resulting in inter-transaction interference.

Transaction consistency problems are also known as dependency problems or consistency problems.

The classification of data inconsistencies in concurrent transactions

The detailed description here is the content of personal understanding, and I want to explain the corresponding situation of each problem.

  1. Lost Update — Modified-update conflict. If a single record is modified, deletion and modification in the update can cause the change to be lost. If multiple records are modified, the modification is incomplete due to the insertion in the update, or the modification is lost due to the deletion or modification.

  2. Non-repeatable Read — read-update conflict is divided into three cases: read-modify – Read, read-insert – Read, and read-delete – Read. Among them, read-insert – Read and read-delete – Read belong to Phantom Read, and the subdivided cases belong to Phantom Row. Even the first read-modify-read is a phantom, because the reading is different, like a phantom.

Reads in various operations can be more than just reads, but can also be changed.

  1. Dirty Read – modify – Read – rollback conflict with fake data being Read in the middle.

  2. “Phantom Read” — Read – insert – Read, Read – delete – Read, Read – modify – Read, insert/delete – Read – rollback, etc. The first three belong to not repeatable read, of which 4 kinds of circumstances insert/delete – read – rollback is dirty reads the corresponding processing, just insert data is rolled back, because they do not affect the final data, in the middle of the reads like an extra data, but because of the extra data is false data among them, also belong to “dirty” data on a strictly; And the deleted data is rolled back, because the final data is not affected, and the intermediate data is read as if there is a piece of data missing. Also classified as phantom reading.

‘Lost changes” unrepeatable reads’ ‘magic reads” Dirty reads’

There is no problem with the logical distinction of the discordant classification above, and you can see that the classification of the same phenomenon is not absolute. Focus on understanding the causes of the inconsistencies.

Again, look at the common issues of ‘lost changes,’ ‘non-repeatable reads,’ ‘phantom reads,’ and ‘dirty reads’ that occur in concurrent transactions.

The following are mainly references from The Wikipedia Isolation (Database Systems)

1. Lost to modify

Changes made to data by one transaction are saved by changes made by another transaction, resulting in the loss of changes from the previous transaction. A lost change is when two or more transactions select the same row (or data), then update it based on the original value, and the final transaction update overwrites the update results of other transactions, resulting in the loss of data. Each transaction is unaware of the other transactions.

If two transactions are preceded and followed by sequential updates to the same row and the same data, neither is a lost change. This is normal logical operation and transaction execution. Only when two transactions select the same data and each changes the data based on the original value will the transaction update result be overwritten, that is, lost.

Lost updates — Also called Buried Updates, hidden updates, buried updates, and annihilated updates.

2. Unrepeatable read

When the same transaction reads the same record twice, the data read is different, usually because the interval between the two reads is modified by other transactions.

A non-repeatable read occurs when, during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.

Non-repeatable reads are also called Inconsistent Analysis. Data Read by transactions is Inconsistent.

Non-repeatable reads are similar to dirty reads, but they read committed data, and each read is different.

3. Phantom read

When the same transaction executes the same query twice before and after, it reads inconsistent rows (more rows or fewer rows), which is similar to an illusion.

A phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.

As for phantom reading, there are still great positioning differences in different places. The following is a brief explanation.

In the same transaction, the same query conditions before and after two times, read data inconsistent situation is called magic read and non-repeatable read. The magic read is when the number of rows in the query is more than the last one. Unrepeatable reads are based on data changes, such as data inconsistency caused by DELETE and Update.

The phantom reads are for inserts, and in some cases also for DELEtes (with less data retrieved, different results are seen). However, the update statement of another transaction can also cause the second query to have more rows or less data.

Therefore, phantom reading is sometimes defined as:

A magic read is when a transaction queries the same condition twice, and the last query sees rows that the previous query did not. Or the last one has more rows than the last one.

== In many places, magic reads are defined to refer exclusively to “newly inserted rows” by other transactions, especially in discussions about MySQL.

The query in the phantom read here usesThe current reading.Read the snapshotNone of this happens.

4. Dirty read

A dirty read, also known as an uncommitted dependency, is when a transaction reads data that has been modified by another transaction but has not yet committed. Often the other transaction is rolled back instead of committed, or the data may be modified.

A dirty read (aka uncommitted dependency) occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed.

Lost Update

Two transactions, T1 and T2, read the same data and modified it, and T2’s commit (later) corrupted T1’s commit, causing T1’s change to be lost.

In the previous figure, when T2 overwrites T1, it is a lost change problem. It’s a modification – an update.

Non-repeatable Read and Phantom Read

Non-repeatable read means that after transaction T1 reads data, transaction T2 performs an update operation so that T1 cannot reproduce the previous read result. The data cannot be read again.

It’s a read-update.

There are three cases of non-repeatable reads. The last two cases are phantom reads:

  1. Read-modify conflict: Transaction 1 reads data, transaction 2 makes changes to it, and when transaction 1 reads the data again, it gets a different value than it did the previous time.

T1 reads B=100; T2 reads the same data B, changes it and writes B=200 back to the database; T1 reread B to proofread the read value. B has been set to 200, which is inconsistent with the first read value.)

  1. Read-delete conflict: Transaction T1 reads certain data records from the database under certain conditions, transaction T2 deletes some of them, and when T1 reads the data again under the same conditions, some records mysteriously disappear.

  1. Read-insert conflict: Transaction T1 reads some records from the database on certain conditions, transaction T2 inserts some records, and when T1 reads the data again on the same conditions, it finds some more records.

The latter two are sometimes called Phantom rows. As if the extra rows or missing rows from the last read were phantom. It is more appropriate to translate as Phantom walk, which also belongs to a kind of Phantom Read.

‘Dirty’ Read

Reading “dirty” data refers to:

Transaction T1 modifies some data and writes it back to disk (usually to memory); After transaction T2 reads the same data, T1 is cancelled for some reason. At this time, the modified data of T1 is restored to the original value, and the data read by T2 is inconsistent with the data in the database. The data read by T2 is “dirty”, that is, incorrect data.

T2 reads a nonexistent data, an intermediate state of the data.

T1 change C to 200; T2 reads until C is 200; T1 is cancelled for some reason, its modification is invalid, and C is restored to the original value of 100; In this case, THE C-200 read by T2 is inconsistent with the database content, which is “dirty” data.)

Summary of inconsistent data caused by concurrent transactions

As you can see, the several concurrent transaction problems that occurred above were caused by two transactions being executed in interleaving, due to different possible times (different execution points) and different processing (commit or rollback).

Concurrency control is about scheduling concurrent operations in the right way so that the execution of one user transaction is not disturbed by other transactions, thus avoiding data inconsistencies.

Concurrency control can affect the efficiency of the system while ensuring the consistency of data. Therefore, according to actual needs, if some inconsistencies are allowed (such as the statistics of a large number of data, and the processing of some dirty data does not affect the final statistical accuracy), the data consistency requirements can be reduced, the system processing efficiency can be improved, and the system overhead can be reduced.

Relationship between concurrency issues and isolation levels (data inconsistency issues for different isolation levels (lost changes, dirty reads, magic reads, non-repeatable))

Different isolation levels may cause data inconsistency as follows:

  • Read Uncommitted: The lowest level. Dirty Read, magic Read, unrepeatable, and lost changes may occur.
  • Read Committed: Avoid dirty reads.
  • Repeatable Read: Avoid dirty reads, non-repeatable reads, and lost changes.
  • Serializable: Avoids dirty reads, non-repeatable reads, fantasy reads, and lost changes.

The following +/- instructions:

  • ‘+’ — possible
  • ‘-‘ — not possible


             Read phenomena




Isolation level




Dirty reads Lost updates[inconsistent] Non-repeatable reads Phantoms
Read Uncommitted + + + +
Read Committed + + +
Repeatable Read +
Serializable
Snapshot

About different isolation levels may produce problems, you can through a few simple table data, and then two connections to log in RDBMS, isolation level Settings, the execution of the transaction according to different steps to query, insert, modify operations, view the corresponding phenomenon. I won’t go into detail here.

The main technology of concurrency control

  • The lock (Locking)
  • Time stamp (Timestamp)
  • Optimistic control method (optimistic concurrency control, OCC)
  • Multi-version Concurrency Control (MVCC)

The most common and widely used treatment of concurrency control is “locking” (or “blocking”) and MVCC, which correspond to the main implementations of pessimistic concurrency and optimistic concurrency, respectively.

reference

  • Schedule (computer science)
  • Introduction to Database Systems (Advanced)
  • And other resources.