A friend of mine asked me, “How do databases guarantee consistency?”

I: so small white problem all don’t know??

So we broke up.


It is more complicated to discuss transactions in a distributed environment.

So, let’s start with local issues. The so-called local transaction is a single data source transaction scenario.

The ACID of the transaction is known, respectively:

features instructions
Atomicity All operations contained in a transaction will either all succeed or all fail and roll back to the state before the transaction.
Consistency The status before and after the transaction is the same, for example, A transfers money to B, the total account of the two people before the transfer is 1,000 yuan, so it must be 1,000 yuan after the transfer.
Isolation Transactions are isolated from each other and data reads and writes do not interfere with each other.
“Durability” Once the transaction is successfully committed, the data will be modified and not lost.

C is the end, AID is the means!!

Therefore, the focus needs to be analyzed is AID, where AD is closely linked and there is no atomicity without persistence.

Persistence is primarily disk-dependent, while writing to disk is a non-atomic operation. For example, a machine power failure may occur in the middle of a write, which is an intermediate state for a transaction.

There are two general solutions:

Commit Logging

First of all, when a transaction commits, you need to record more information, like what was the data before it was modified, what was the data after it was modified, where is the data on the disk and so on and second, you need to mark the state, what state is the transaction currently in, “Committed” or “Committed incomplete?

Finally, the transaction can be recovered during failover based on the previously recorded information.

Shadow Paging

When modifying the data, Copy a Copy of the shadow data to modify, and then modify the location of the data pointing to after the modification. This step is atomic itself. However, this scheme has limited concurrency, so it is generally not used.

Commit Logging is also problematic in that it does not allow actual changes to the data until the transaction has committed and marked its “committed” transaction state.

This is a big problem. This is a blocking operation that can have a significant impact on database performance.

You need to work around this problem. If you think about it, it is understandable that Commit Logging has this limitation. If you allow early modification of data, the modified data will become the wrong data if the database crashes.

If there was a way to roll back the modified data to its original state, the problem would be solved.

That’s right, write-Ahead Logging.

In write-ahead Logging, Commit Logging logs are called Redo logs, which I prefer to translate as continued logs.

An Undo Log has been added to solve the problem of the Commit Logging.

With Undo Log, you can change your data in advance without having to worry about finding your way home.

Write-ahead Logging goes through three phases during crash recovery,

First, see which transactions did not complete properly. Then, perform a sequel based on the Redo Log for those committed transactions that did not complete properly. Finally, the remaining part of the transaction is the one that needs to be rolled back, according to Undo Log.

How does the database implement atomicity and persistence? How does the database implement isolation

This requires locks to ensure that the general database provides three types of locks:

The lock type instructions
Write lock Also known as exclusive locking, if a transaction has a write lock, other transactions cannot have either a write lock or a read lock. Is to tell the other business, pit I have taken, you wait.
Read lock Also called shared lock, if a transaction has a read lock, other transactions can also have a read lock, but not a write lock. Is to tell the other transaction, if the brother (read lock) to come, or (write lock) to wait.
The lock range The difference between a range lock and other locks is that a range lock locks a specific number of rows. For example, if the range (amount < 10) is locked, the range cannot be modified, added or deleted.

Isolation levels of databases are generally provided in a variety of ways for users to adjust according to actual conditions.

Isolation level The problem
serializable

Avoid phantoms by adding scope locks
Repeatable read

In addition, the read lock is not released (throughout the transaction life cycle), blocking the write lock of other transactions, thus avoiding the non-repeatability problem phantom read
In the current transaction, it may be affected by the addition or deletion of other transactions, and the number of records read between the two transactions is inconsistent
Reading has been submitted

Add read lock, read release, so can not solve the repeatable read problem
Unrepeatable read

In the current transaction, it may be affected by the modification of another transaction, and the data states are inconsistent between the two reads
Read uncommitted

Because there is no read lock, there is no constraint, another transaction with a write lock can also read the data that has not been committed
Dirty read

In the current transaction, it is possible to read uncommitted data from another transaction

These are all cases where one transaction read is affected by another transaction write, for which there is an optimization of “multi-version concurrency control”.

MVCC means that there are now multiple versions of a record using the transaction ID, which is globally ordered incremented.

If the isolation level is repeatable read, the read version number is less than or equal to the largest data of the current transaction ID.

If you are reading the committed isolation level, read the record with the largest version number.

Note that MVCC is read + write optimization and does not apply to other scenarios.

With AID’s guarantee, C is not really a problem.

From then on, the boat of friendship can set sail again.


Welcome to follow my wechat official account “Stubborn Wenge” (an Internet code writer with a cold surface but warm heart), and share various Java technology experience, hot interview questions and practical Python tips from time to time.