Brief introduction:The query optimizer and the transaction architecture are two important load-bearing walls in the towering database architecture, so important that a large number of data structures, mechanisms, and features are built around them in the entire database architecture design. They are responsible for how to query the data more quickly and organize the underlying data system more effectively. A secure, stable and persistent storage of data for users to read, write and concurrency to provide logical implementation. The topic we’re going to explore today is the system of transactions, but the system of transactions is so large that we need to break it up into several pieces. In this paper, the atomicity of POLARDB transaction system is analyzed.

The author | | on city source ali technology to the public

A preface

The query optimizer and the transaction architecture are two important load-bearing walls in the towering database architecture, so important that a large number of data structures, mechanisms, and features are built around them in the entire database architecture design. They are responsible for how to query the data more quickly and organize the underlying data system more effectively. A secure, stable and persistent storage of data for users to read, write and concurrency to provide logical implementation. The topic we’re going to explore today is the system of transactions, but the system of transactions is so large that we need to break it up into several pieces. In this paper, the atomicity of POLARDB transaction system is analyzed.

Two problems

Before you read this article, ask a few important questions that you might have wondered about before you even got to the database. While the answers to these questions may have been simply answered by simple answers such as “pre-write logging” and “crash recovery mechanism”, this article hopes to discuss the implementation and underlying principles of these mechanisms in a more in-depth way.

  • How is database atomicity guaranteed? What special data structures are used? Why?
  • Why can I write data successfully without losing it?
  • Why can the database crash be completely recovered from the data that I have logically committed?
  • Further, what is logically committed data? Which step is a real commit?

Three background

1 The position of atomicity in ACID

The concept has been referenced repeatedly since the famous ACID feature was proposed (originally written into the SQL92 standard), and these four properties roughly encapsulate the core of what people want from databases. Atomicity is the first property in this article, and we’ll focus on its place in the transaction ACID.

This is my personal understanding of the database ACID property relationship. I think the database ACID property can be defined from two perspectives, the AID (atomic, persistent, isolated) property is defined from the perspective of the transaction itself, and the C (consistent) property is defined from the perspective of the user. Now I will talk about my own understanding.

  • Atomicity: Again, we start with the concept of these properties. Atomicity is the notion that a transaction either succeeds or fails, i.e. All or Nothing. This trait we can use to define a minimum transaction model, we assume that there is a transaction, we through a mechanism can realize its true committed or rolled back, the goal is reached, the user only through our system made a submission, and the atomic center of gravity is not in a transaction success or failure itself; Instead, it ensures that the transaction system accepts only success or failure states, and that there are policies in place to ensure that the physical and logical consequences of success or failure are consistent. Atomicity can be defined by the properties of the smallest transaction unit and is the cornerstone of the entire transaction system.
  • Persistence: Persistence means that a transaction can be stored permanently in the database once committed. The scope and perspective of persistence is almost identical to atomicity, which results in the fact that the two are conceptually and implementatively closely related. Both ensure data consistency and recoverability in a sense, and the boundary is the moment the transaction commits. For example, if A transaction A attempts to update its state to T+1, then the database state returns to T, which is atomicity guaranteed. If transaction A commits successfully, then the moment the transaction state becomes T+1, this is atomicity guaranteed; Once the transaction state becomes T+1 and the transaction commits successfully, the transaction has ended and there is no atomicity, the T+1 state is guaranteed by persistence. From this perspective, it can be inferred that atomicity guarantees crash recovery of data before a transaction commits, while persistence guarantees crash recovery after a transaction commits.
  • Isolation: Isolation is also a mechanism defined at the transaction level that provides some degree of isolation for transaction concurrency. The essence of isolation is to prevent transaction concurrency from leading to inconsistent states. Since it is not the focus of this article, I will not go into detail here.
  • Consistency: Among the other features, the concept of consistency is that the database must remain in a consistent state after one or more transactions. When viewed from the perspective of a transaction, guaranteeing AID guarantees that the transaction is serializable, recoverable, and atomic, but is the consistency of the transactional state really consistency? If the AID is destroyed, C must be destroyed, but if the AID is otherwise guaranteed, C must be guaranteed? If the answer is yes then the concept loses its meaning. We can guarantee AID to ensure that transactions are consistent, but can we prove that transaction consistency guarantees data consistency? In addition, the concept of data consistency is difficult to define precisely through transactions, whereas it is easy to define at the user level. Data consistency means that the user believes that the state of the data in the database at any time satisfies his business logic. For example, the bank deposit cannot be negative, so the user defines a non-negative constraint. I think this is a blank space for concept designers, who tend to see consistency as a high order goal.

This article will focus on atomicity, and the topic of crash recovery in the middle will probably involve persistence. Isolation and consistency are not discussed in this article, but in the visibility section we default the database to a complete isolation, serializable isolation level.

The intrinsic requirement of atomicity

So much for understanding the transactional nature of databases, let’s move on to the topic of atomicity. We need to continue with atomicity with the example we just had. The current state of the database is T, and now you want to upgrade the data state to T+1 through A transaction A. Let’s talk about the atomicity of this process.

If we want to ensure that the transaction is atomic, then we can define three requirements to say that the transaction is atomic only if:

  • There is a point in time in the database where the transaction actually commits successfully.
  • Transactions opened before this point in time (or snapshots taken) should only see T state, and transactions opened after this point in time (or snapshots taken) should only see T+1 state.
  • Any time a crash occurs before this point in time, the database should be able to return to the T state; Any time you crash after this point in time, the database should be able to return to T+1 state.

Notice that we haven’t defined this point in time, and we’re not even sure that this point in time in 2/3 is the same time. All we can say for sure is that this point in time must exist, otherwise we cannot say that the transaction is atomic, and atomicity dictates that there must be a certain point in time for commit/rollback. In addition, according to our description just now, we can infer the time point in 2, which can be defined as atomic site. Since the atomic site is not visible to us before the commit and is visible to us after the commit, this atomic site is the point in time when the transaction was committed to other transactions in the database. The locus in 3 can be positioned as a persistence locus, since this meets the definition of persistence for crash recovery. That is, for persistence, the transaction has already committed after the point 3.

Discussion of the tetraatomic scheme

1 Let’s start with two simple solutions

Let’s start with two simple scenarios for atomicity. The purpose of this step is to try to explain why the data structures we describe in each of the following steps are necessary to achieve atomicity.

Simple Direct IO

Imagine that we have a database where every user action writes data to disk. We call this simple Direct IO, and simple means that we don’t log any data, just the data itself. Assuming that the initial version of the data is T, if a data crash occurs after we insert some data, there will be a T+0.5 page written on the disk, and we have no way to roll back or continue with the subsequent operation. Such a failed CASE would certainly break atomicity, since the current state is neither commit nor rollback but something in between, so it is a failed attempt.

Simple Buffer IO

Next we have a new scheme called Simple Buffer IO. Again we don’t have logs, but we’ve added a new data structure called the “shared cache pool”. Each time we write a page, we don’t write it directly to the database, but to the shared buffer pool. This will have obvious advantages, first of all, the read and write efficiency will be greatly improved, each time we do not have to wait for the data page to actually write to the disk, but can be asynchronous; Second, if the database is rolled back or crashes before the transaction is committed, we only need to discard the data in the shared buffer pool. Only when the database is successfully committed can it actually flush the data to disk, so we have met the requirements in terms of visibility and crash recovery.

But there is a difficult problem with the above scheme, that is, the matter of data fall is not as simple as we imagine. For example, if there are 10 dirty pages in the shared buffer pool, the database may crash at any time in the middle of the 10 dirty pages. We can use storage technology to ensure that a single page’s brush is atomic. Then whenever we decide to unload the data, if the machine crashes during the unload, the data may produce a T+0.5 version on the disk, and there is no way to redo or roll back the data after a restart.

Above two examples this paper seems to be doomed database there is no way not to rely on other conditions to ensure the consistency of the data structure (there is also a popular solution is Shadow of SQLite database Paging technology, don’t discuss here), so if you want to solve these problems, we need to introduce the next important data structure, data logging.

2. Prewrite log + Buffer IO scheme

Program overview

We introduced data structures such as data logs on top of Buffer IO to solve the problem of data inconsistencies.

The data cache part is the same as the previous idea, except that we will record an additional XLog Buffer before writing the data. The Xlog buffer is a log with a sequence. Its sequence number is called LSN, and we will record the corresponding log LSN on the data page. Each data page records its latest log sequence number. The purpose of this feature is to ensure that the logs are consistent with the data.

Imagine if we could introduce logs that were exactly the same version of the data, and make sure that the data log was persisted prior to the log, so that whenever the data crashed we could recover from this consistent log page. This will solve the data crash problem mentioned earlier. Whether a transaction crashes before or after commit, we can play back the correct version of the data through the logging scheme, so that crash recovery atomicity can be achieved. The other part about visibility is that we can take snapshots of multiple versions. Keeping the data log consistent with the data is not easy. Let’s discuss how to do this in detail and how to recover the data in case of a crash.

Transaction commit and control brush dirty

Wal logs are designed to ensure data recoverability. In order to ensure the consistency of Wal logs with data, when the data cache is persisted to disk, the corresponding Wal logs of the persisted data pages must be persisted to disk first. This statement describes the essential meaning of controlling the cleaning.

  1. There is a process in the background of the database called checkpoint process, it periodically checkpoint operations. When a checkpoint occurs, it writes a checkpoint log containing the current REDO locus to the XLog log. Checkpoint ensures that all dirty data has been washed to disk.
  2. The first insert, when the shared memory cannot find the page, loads the page from disk into the shared memory, writes the input to the insert, and inserts a write Xlog into the Xlog buffer, upgrading the table’s log flag from LSN0 to LSN1.
  3. At the time of a transaction commit, a transaction commit log is written to the transaction, and then all WAL logs from the WAL buffer pool for the transaction commit are flushed to disk.
  4. Then insert the second data B, which inserts a write Xlog into the Xlog buffer, upgrading the log flag of the table from LSN1 to LSN2.
  5. Same thing as 3.

Then, if the database is running correctly, the next bgwriter/checkpoint process will asynchronously brush the pages to disk. If the database crashes, the data log and transaction commit log corresponding to log A and log B have been flushed to disk, so the data can be replayed in the shared buffer pool through log playback, and then written to disk asynchronously.

The fullpage mechanism guarantees recoverability

Wal log recovery seems to be perfect, but unfortunately there are some flaws in the previous scheme. Imagine a database CRASH that occurs when a BGWriter process writes asynchronously. Some dirty pages are written to disk, and there may be bad pages on the disk. (Polardb data pages are 8K, and in extreme cases 4K write to disk can write bad pages.) However, Wal logs are not able to play back data on bad pages. Another mechanism is needed to ensure that in extreme cases the database is able to find the original data, and this involves an important mechanism, the fullpage mechanism.

At the first change after each checkpoin action, POLARDB writes the modified data to the Wal Buffer and then brushes it to disk, along with the entire page. Such a Wal log containing the entire page is called a backup block. The backup block makes it possible for WAL logs to play back the entire data page in any case. Here’s the complete process.

  1. Action:
  2. The first insert, when the shared memory can’t find the page, loads the page from disk into the shared memory, and then writes the input for the insert. Unlike in the previous section, the Wal log with PolarDB number LSN1 writes the entire page read from disk to the Wal Buffer Pool marked as LSN 0.
  3. Transaction commits, at which point the entire WAL log is forced to be washed to a WAL segment on disk.
  4. Same as above section
  5. Same as above section

At this point, if the database crashes, when the database is pulled up to recover, once it encounters the broken page, it can play back the correct data step by step from the original version of the page recorded in the original WAL log.

Crash recovery mechanism based on WAL logs

With the previous two sections in mind, we can continue to demonstrate how data can be played back if the database crashes. We demonstrate a playback in which a data page is written out.

  • When the database plays back to the WAL log that wrote data A, it reads the page Table A from disk. The Wal log here is a backup log because after CHECKPOINT, the first Wal log of each playback page is a backup log.
  • When this log is played back, the backup log has special playback rules: it always overwrites its page on the original page and upgrades the LSN of the original page to the LSN of the same page. To ensure data consistency, the normal playback page will only play back WAL logs larger than their own LSN number. In this example, the badly written page was successfully restored due to the backup block.
  • Polardb will then play back subsequent logs according to the normal playback rules.

When the playback is successful, the data in the shared buffer pool can be asynchronously flushed to disk to replace the data that was previously damaged.

We spend a great deal of time explaining how the database recovers from crashes by prewriting logs, which seems to explain what persistence loci mean; Now we need to revisit the visibility issue.

3 Visibility mechanics

Since our description of atomicity involves the concept of visibility, this concept is implemented in POLARDB by a complex set of MVCC mechanisms, and is mostly in the category of isolation. A brief explanation of visibility will be given here, but more details will be covered in the isolation article.

Transaction tuples

The first thing to talk about is transaction tuples. It’s the smallest unit of data that actually stores the data, but we’ll just focus on a few fields here.

  • T \_xmin: The transaction ID from which this data was generated
  • T \_xmax: The transaction ID to modify the transaction data (the transaction ID to delete or lock the data)
  • T \ _CID: An ordinal number for the tuple operation in the same transaction
  • T \_ctid: A pointer consisting of a segment number/offset to the latest version of the data

The snapshot

The second thing to talk about is snapshots. A snapshot records the state of a transaction in a database at a point in time.

We are still not going to expand on snapshots, but we know that snapshots can capture the status of all possible transactions in the database at a given point in time from Procarray.

Current transaction state

The third point is the current transaction state. The transaction state is the mechanism in the database that determines the transaction state. In a concurrent environment, it is very important to determine the state of the transactions you see.

When viewing the state of a transaction in a tuple, three data structures may be involved: t\_infomask, procArray, clog:

  • Infomask: A cache flag bit located in the header of a tuple, which marks the running state of the tuple Xmin/Xmax transactions. This state can be regarded as a layer of asynchronous cache of CLOG, which is used to speed up the acquisition of transaction state. The status setting is asynchronous and does not immediately upgrade all transaction-related tuples when the transaction commits, but rather waits until the first snapshot setting that is new enough to see the update.
  • Procarray snapshot: the transaction state in the snapshot. The snapshot is actually the state of all the transactions in the database at the moment the snapshot is taken in Procarray. Once the snapshot is taken, the state is constant unless it is taken again (whether the fetched content in the same transaction changes depends on the transaction isolation level).
  • CLOG: The actual state of the transaction, divided into a CLOG buffer and a CLOG file. All transaction states are recorded in the CLOG buffer in real time.

In a visibility judgment, the order of access is [infomask-> snapshot, clog], and the decisive order is [snapshot-> clog-> infomask].

InfoMask is the most easily accessible information, recorded in the header of the tuple. In some cases, the visibility of the current transaction can be determined by using the InfoMask without involving the following data structures. The snapshot has the highest discretion, and ultimately determines whether the Xmin/Xmax transaction is running/not running; CLOG is used to assist in visibility determination and to assist in setting the value of the INFOMASK. For example, if the Xmin transaction visibility judgment finds that both are committed in the snapshot /clog, then t\_infomask is set as committed; If Xmin transaction visibility detects a snapshot commit and CLOG is not committed, the system determines that a crash or rollback occurred and sets the INFOMASK to transaction illegality.

Transaction snapshot visibility

Once we’ve covered tuples and snapshots, we can move on to the topic of snapshot visibility. Polardb’s visibility has a complex set of definitions that need to be defined through many combinations of information, but the most straightforward are snapshots and tuple headers. The visibility of tuple headers and snapshots is illustrated below with an example of data inserts and updates.

This article does not discuss isolation; we assume that the isolation level is serializable:

  • Snapshot1: transaction 1184/1187 has not started, there is no record in tuple, student is empty; The data available from the Snapshot1 snapshot is empty, and we’ll call this version T.
  • Snapshot1 – Snapshot2, we’re taking a snapshot at this point so we’re still getting Snapshot1, so the data he sees is still T.
  • SNAPSHOT 2: Transaction 1184 has ended and transaction 1187 has not started. So the changes to 1184 are visible to the user, and 1187 remains invisible. When we look at the tuple, we see the tuple header (1184/0), so we see the data version Tom, which we’ll call T+1.
  • Snapshot2 – Snapshot3, we’re taking a snapshot at this point so we’re still getting Snapshot2, so he’s still going to see T+1.
  • Snapshot3: At this point transaction 1184/1187 has ended and both are visible, so we can see that neither (1184,1187) nor (1187,1187) is visible in the tuple, and (1187,0) Susan is visible. Let’s call this version T plus 2.

A simple conclusion from the above analysis is that the visibility of the database depends on the timing of the snapshot. What we mean by different versions of visibility in atomicity is that we take different snapshots that determine whether a transaction in execution has committed or not. This commit has nothing to do with the transaction marking the commit status or even recording the CLOG commit, we can use this method to make the snapshot we get consistent with the transaction commit.

Visibility in transaction atomicity

We have already outlined the issue of PolarDB snapshot visibility in the previous article, but here we will add the specific implementation issues when the transaction is committed.

The core idea behind our visibility mechanism is that a transaction should only see the version of the data it is supposed to see. If an Xmin transaction of a tuple is not committed, other transactions are likely not to be seen. If an Xmin transaction of a tuple has been committed, other transactions may see it. How to know if this Xmin is committed or not? As mentioned above, we use snapshots to decide, so the key mechanism for our transaction commit is the update mechanism for the new snapshot.

Visibility involves two important data structures at transaction commit time, the CLOG Buffer and the Procarray. The relationship between the two, as explained above, plays a role in determining transaction visibility, and of course Procarray plays a decisive role. This is because taking a snapshot is essentially a process of traversing Procarray.

In the actual third step, the committed information is written to the CLOG Buffer. At this point, the transaction marks CLOG as committed, but it is still not committed. After that, the transaction marks Procarray committed. This step completes the actual commit, and the snapshot retrieved after this point updates the data version.

The realization of atomicity in PolarDB

After completing the explanation of PolarDB crash recovery and visibility theory, we can know that PolarDB can guarantee the crash recovery and visibility consistency of transactions through such a set of pre-written log +BufferIO scheme, so as to achieve atomicity. Let’s take a look at the most important part of the transaction commit to find out what we originally referred to as the atomic site.

Transaction crash recovery consistency – persistence site

In short, there are four operations in a transaction commit that are central and important to the atomicity of the transaction. In this section, we’ll consider the first two operations.

  • The COMMIT log of the Commit transaction (that is, the WAL log of the Commit).
  • All WAL logs submitted by this firm will be forced to brush disk and persist to storage.

We mark this Xlog (Wal log) drop point, and we assume two scenarios:

  • If the transaction crashes or is rolled back before this point, the COMMIT log must not be flushed, regardless of whether the data log is flushed or not. Due to the sequential nature of WAL logs, the COMMIT log must be the last one persisted to disk. At this point, if we play back the data, we find that the transaction without the COMMIT log cannot be marked as committed, and the data associated with this state must be invisible based on visibility. This data is then cleaned up as dirty data. So we can conclude that if the transaction crashes before this node, it is actually not committed. The database is essentially restored to state T.
  • If there is a crash or rollback after this point, at which point we can be sure that the COMMIT log must be flushed to disk regardless of which step it crashes or rolls back. Once the COMMIT log is flushed to disk, the data written by the transaction can be played back and marked as committed. Then the data is visible. The transaction has actually committed and the database has been restored to T+1.

This phenomenon indicates that site 2 appears to be the critical point for crash recovery, indicating that database crash recovery can return to T or T+1 state. So what do we call this site? Recall the concept of persistence: Once a transaction commits, changes made to the database by that transaction remain permanently in the database. The two actually match. So we call this site 2 the persistence site.

Another thing to note about the Xlog brush is that the Xlog brush and playback are atomistic as a single file; CRC validation in the WAL log header provides validation for individual WAL log files. If the WAL log write disk is damaged, the contents of the WAL log are invalid, ensuring that no partial playback of data occurs.

Visibility of transactions is consistent — the atomic site

Now let’s move on to operations 3 and 4:

  • Writes this transaction commit to the CLOG Buffer.
  • Writes the result of this transaction commit to ProcArray.

Operation 3 records the current state of the transaction in the CLOG Buffer, which can be thought of as a layer of log caching. Operation # 4 writes the commit operation to Procarray. This is a very important step. We know that the snapshot determines the transaction status using Procarray. That is, this step determines the state of the transaction as seen by other transactions.

If the transaction crashes or rolls back before operation number 4, then all other transactions in the database see the version of the data as T, which means that the transaction did not actually commit. This is determined in the order of Visibility -> Snapshot -> Procarray.

After operation 4, the transaction is committed to all observers, because all snapshot data taken after this point in time are T+1 versions.

In this regard, operation 4 fits perfectly with the meaning of atomic operation. This is because the success of the transaction can be influenced by whether or not the transaction is committed. A transaction before operation 4 is always allowed to roll back because no other transaction sees the T+1 state of the transaction. However, after operation 4, transactions are not allowed to be rolled back, otherwise if there are other transactions that read T+1 version of the data inconsistent. The concept of atomicity is that a transaction commits successfully or fails to roll back. Since no rollback is allowed after operation 4, operation 4 is a good indication that the transaction committed successfully.

In summary, we can define operation 4 as the atomic point of the transaction.

Persistent site and atomic site

Atomicity and persistence requirements

Again, the concepts of atomicity and persistence are given:

  • Atomicity: A transaction either succeeds or fails.
  • Persistence: Once a transaction is executed successfully, it can be stored permanently in the database.

We mark operation 4 as atomic because at the time of operation 4, objectively all observers believe that the transaction has been committed, the snapshot version has been upgraded from T to T+1, and the transaction is no longer rollback. So once the transaction commits, does atomicity stop working? I think it does, atomicity at most guarantees data consistency at the moment the transaction commits successfully, and we can’t talk about atomicity once the transaction is over. So atomicity ensures that transactions are visible and recoverable before the atomic site.

We label site 2 as a persistence site because persistence assumes that a transaction can be permanently retained after success. Based on the above speculation, this locus is undoubtedly the persistence locus number 2. So we should be persistent all the time from site 2 on.

How to understand two loci

Having explained points 2 and 4, we can finally define the two most important concepts involved in transaction commit, and we can now answer the first question, at what point does a transaction actually commit? The answer is that transactions can be fully recovered after the persistence point; Transaction after atomic site is really treated as committed by other transactions. But the two are not separable, so how do you understand that?

I think this is actually a compromise of atomicity, because we don’t have to harmonize the two, we just have to make sure that the order of the two sites is consistent with the data in different states, then we can say that it meets our definition of atomicity.

  • A crash or rollback before the persistence site, when the transaction fails and the data version before the crash or after the recovery is T.
  • Crashes or rollbacks between atomic sites after the persistence site, where the visible version of the transaction is T, which means that for all transactions in the database, we see T. After the rollback, the data is replayed to T+1; When the database is restarted, the snapshot of the transaction before the crash shows the data version T, and the snapshot of the transaction after the crash shows the data version T+1, as if the transaction was committed implicitly. But this does not violate the consistency of the data.
  • Collapse after atomic site. The transaction has already been committed. The transaction before and after the crash sees the T+1 version of the data.

Finally, we consider why the two loci did not merge. Persistent site operation is the WAL log brush, this relates to disk IO issues; On the other hand, what atomic sites do is write to Procarray, which means that you get a big lock on Procarray, which is a big scramble, which you can think of as a high frequency shared memory write; Both of them are related to the efficiency of database transactions. If the two are bound to become an atomic operation, there is no doubt that the waiting for both of them will be quite serious, which may have a great impact on the efficiency of the transaction. From this point of view, the separation of behavior between the two is an efficiency consideration.

Can the order be reversed?

Obviously not, as we can see from the diagram above, in the middle period there may be regions that meet neither the atomicity requirement nor the persistence requirement.

In particular, if you proceed to the atomic site first and then to the persistence site, then imagine a transaction that crashes in between. Other transactions that see T+1 data before the crash and T data after the crash are obviously not allowed to see future data.

How do you define a true commit

A true submission is an atomic site submission.

At its most basic level, a true commit is marked by updating the data version from T to T+1. This site is called the atomic site. Prior to this point, the other transactions see the version of the data as T, and it is not appropriate to say that the actual commit was committed; Transactions cannot be rolled back after this point. This is enough to show that this is the actual commit point of the transaction.

Other operating

Our final focus is on operation 1/3:

  • Operation # 1 is to write a WAL commit to the XLog Buffer, which is not critical for transaction commit. Because if it’s written and it’s not brushed to disk, then it’s still useless.
  • Operation 3 marks the transaction as committed in the CLOG Buffer. This operation is also not critical to the transaction commit. Because if the database is running properly, it does not affect the visibility of the transaction snapshot; If the database crashes, the transaction state of the CLOG can be played back by the COMMMIT/ABORT log in the Xlog, whether or not the CLOG state is persisted.

The atomicity process of PolarDB

1 Transaction commit

In this section, we return to the transaction commit function and see where these operations are in the function call stack.

  • The transaction commit process is a transaction with a transaction ID; a transaction without a transaction ID does not have this process. Since transactions without transaction IDs are most likely read-only operations, they do not have any impact on data consistency in the database.
  • Before committing the Xlog, strict mode is enabled. Any error in this mode will be fatal and the database will crash and restart.
  • The sequence of Xlog brush and CLOG write memory is done in synchronous mode. In asynchronous mode, Xlog brush is not guaranteed, so data may be lost after a crash.
  • There is a critical step in the middle of 3/4, Replication waits. In fact, the Xlog has already been brushed, but not yet committed. In synchronous mode, the master database waits for the Xlog to be brushed to disk before proceeding to the next step.
  • Write ProcArray to commit this transaction, the transaction is truly committed completed, the transaction can no longer be rolled back.
  • Clean up the resource state, when the work is no longer related to this transaction.

2 transaction rollback

  • A transaction rollback without a transaction ID is skipped.
  • The rollback first determines whether the transaction was committed, based on CLOG. How can a transaction commit and then roll back? This is the state between 3 and 4 we discussed earlier. If the CLOG logs the commit, the database crashes and restarts with a fatal failure on the rollback command.
  • The XLog rollback log is written to the disk asynchronously during the rollback. It can be assumed that the data is not visible even if the rollback log is not written.
  • When the transaction is written to the rollback log in Procarray, the transaction is actually rolled back in the process (in fact, this state has no effect on other transactions, before and after the data version is T).

Summary and prospect

Finally, a summary of the whole paper is made. This paper focuses on the topic of “how to achieve transaction atomicity”, and explains the underlying principle of PolardB database to achieve atomicity from the perspective of crash recovery characteristics and transaction visibility of the database. Shared Buffers, Wal logs, Clog, Procarray, and other data structures that are important for atomism are also discussed in the introduction of prewrite log +buffer IO. Under the whole transaction, each module of the database is skillfully overlapped, making full use of disk, cache, IO and other computer resources to form a complete database system.

In association with other models of computer science, such as the ISO network model, the transport-level TCP protocol provides reliable communication services over an unreliable channel. Database transactions implement a similar idea of storing data reliably on an unreliable operating system (which can crash at any time) and disk storage (which cannot atomic write large amounts of data). This simple but important idea is the cornerstone of a database system, so important that most of the core data structures in the entire database are related to it. Perhaps with the development of the database in the future technology changes more advanced database architecture system, but we can not forget that atomicity, persistence should still be the core of database design.

Eight thinking

This concludes the focus on transaction atomicity, and leaves you with a few questions to ponder on the points made in this article.

  • How do you understand the atomicity and persistence sites of a transaction commit?
  • Think about the relationship between single transaction atomicity and multiple transaction atomicity. Okay? Are crash recovery and visibility one?
  • The concept of asynchronous commit exists in POLARDB, meaning that XLOG log downtime is not required when the transaction is not committed. Think about what characteristics of transactions might be violated in this pattern? Does it violate atomicity and persistence?

Resources for

Copyright notice: The content of this article is contributed by Aliyun real-name registered users. The copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.