Transaction

Transaction is the basic concept of a database system. It refers to the execution of several steps of an atom, that is, all of them succeed or all of them fail. The status of the execution is not visible to other transactions.

A simple example is that Alice transfers $100 to Bob.

UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice';
UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
Copy the code

We need to ensure that:

  1. Either all or none of these updates have occurred.
  2. If something goes wrong, the previous update will not take effect.

Making these updates into a single Transaction accomplishes both.

BEGIN;
UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice';
UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
COMMIT;
Copy the code

In addition:

  1. Once theTransactionDone, even if the database crashes later, the updated data will not be lost.
  2. When more than oneTransactionExecute concurrently when eachTransactionI’m not supposed to see anything elseTransactionIncomplete changes to.

Concurrency Control

PostgreSQL uses MVCC for concurrency control. Concurrency control isolates each Session to prevent inconsistent data from being observed in the current transaction due to updates from other concurrent transactions.

The common methods of concurrency control are PCC, OCC and MVCC, and each of them has a variety of implementation methods.

PCC and OCC can only read data without blocking, while MVCC can read and write data without blocking. Therefore, MVCC performs better. In addition to PostgreSQL, MySQL also uses MVCC for concurrency control, but the two implement MVCC in different ways.

PostgreSQL introduces the SSI isolation level to provide the strictest transaction isolation while ensuring that mVCC-based reads and writes do not block.

You can also use PostgreSQL table and row locks to control concurrent transaction conflicts without MVCC, but the performance is not as good as MVCC. PostgreSQL also provides application-defined advisory locks, which are locks that are not bound to transactions.

Transaction Isolation

The following phenomena may occur when multiple transactions are executed concurrently.

The phenomenon of explain
Dirty Read The current transaction reads uncommitted data from the concurrent transaction
Nonrepeatable Read For a row of data, the two reads are inconsistent
Phantom Read For several rows of data that meet a certain condition, the results of two reads are inconsistent
Serialization Anomaly The result of executing a transaction concurrently is inconsistent with the result of executing the set of transactions sequentially in some order

The SQL standard defines four isolation levels, and for each isolation level, some phenomena are prohibited.

Isolation Level Dirty Read Nonrepeatable Read Phantom Read Serialization Anomaly
Read Uncommitted Square root Square root Square root Square root
Read Committed x Square root Square root Square root
Repeatabled Read x x Square root Square root
Serializable x x x x

Of course, the above is specified by the SQL standard, but PostgreSQL has its own ideas.

Isolation Level Dirty Read Nonrepeatable Read Phantom Read Serialization Anomaly
Read Uncommitted x Square root Square root Square root
Read Committed x Square root Square root Square root
Repeatabled Read x x x Square root
Serializable x x x x

You should note that RU and RC behave the same, PostgreSQL actually has only 3 isolation levels.

You should also note that Phantom Read is not allowed in RR, but this is in accordance with the SQL standard, which only states that Dirty reads and Nonrepeatable reads are not allowed in RR.

The isolation level can be SET using the SET TRANSACTION command.

Note: JSON Types and Sequence Manipulation Functions do not follow standard transaction behavior in PostgreSQL.

Read Committed

In PostgreSQL, the default isolation level is RC. In RC Transaction, each command starts with a snapshot and acts based on that snapshot.

Snapshots include the results of committed transactions and the uncommitted results of the current transaction. A snapshot is a logical concept that represents a visible range and consists of some version of each row of data.

SELECT

A snapshot is taken, followed by a query and a result set is returned.

UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR SHARE

Get the snapshot first and calculate WHERE to find the corresponding row. The specified row may have been updated/deleted/locked by another concurrent transaction (TA), waiting for the TA to commit or roll back.

If the TA is rolled back, the specified row is updated/deleted/locked.

If the TA commits, if it is DELETE, nothing is done, and if it is not DELETE, the current command is executed again.

INSERT

The snapshot is taken and the INSERT is calculated, but the INSERT may fail because the results of concurrent transactions are not visible to the INSERT.

INSERT CONFLICT DO UPDATE

Obtain the snapshot and check whether the snapshot can be inserted. If the snapshot cannot be inserted or the snapshot can be inserted but fails to be inserted, run the UPDATE command.

Repeatable Read

In RR Transaction, only one snapshot is taken at the beginning of a Transaction, and each command is based on this snapshot, so consecutive Select results are the same.

UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR SHARE

The target row may have been updated/deleted/locked by another concurrent transaction (TA), waiting for the TA to commit or roll back.

If the TA is rolled back, the specified row is updated/deleted/locked.

If the TA commits and updates/deletes the specified row, the transaction is rolled back and a serialization conflict is reported.

ERROR: could not serialize access due to concurrent update
Copy the code

This is because changes to other concurrent transactions cannot be updated/deleted/locked in RR Transactions. After receiving this error, the application should retry the transaction.

Note that the RR does not avoid Serialization Anomaly. For example, a sequential Select will see the update, but a concurrent Select may not see the update.

Serializable

Now you have this table, and two concurrent transactions, assuming RR isolation level.

class value
1 10
1 20
2 100
2 200
SELECT SUM(value) FROM mytab WHERE class = 1;
INSERT INTO mytab VALUES (2.30);
Copy the code

The result of the query is 30.

SELECT SUM(value) FROM mytab WHERE class = 2;
INSERT INTO mytab VALUES (1.300);
Copy the code

The query results in 300.

This result is reasonable from the point of view of concurrent execution, but can it be executed sequentially in some order and get the same result?

The answer is no. This also confirms that RR cannot avoid Serialization Anomaly. What will happen if RR is Serializable?

The answer is that one of the transactions will commit and the other will roll back and report a serialization collision error.

ERROR: could not serialize access due to read/write dependencies among transactions
Copy the code

Serializable works the same as RR, but it monitors whether concurrent transactions can be serialized, as in the example above. When using Serializable Transaction, you should be aware that the results of a Select in a Transaction should not be considered valid data until the Transaction commits.

PostgreSQL uses predicate locks to determine serialization conflicts.

What is a predicate? That’s a really bad name, because in this case WHERE class = 1 is a predicate. Select class = 1, the lock that holds this predicate, triggers serialization collisions when other transactions INSERT class = 1, so the lock is used but not blocked.

Also note that even serializable concurrent transactions can cause conflicts due to unique constraints.

Ending

= = = = = = = = = = = = = = = = = = = = In fact, C is the result of AID. If AID is achieved, then C is achieved. Of course, the application layer should also be correct. In this article, MVCC guarantees AI, and in D, WAL guarantees AI.

Most of this article is based on the official PostgreSQL documentation. Please point out any errors.