digression

I think the average interviewer might look at candidates from two dimensions. When an interviewer keeps asking a question until you can’t answer it or the interviewer is satisfied, it shows the depth of the interviewer’s knowledge. When an interviewer says, “Tell me what you know about X” or “Can you tell me about X?” This kind of abstract question is to test the breadth of the interviewer’s knowledge. This article takes the transaction as an example, in view of the latter situation to discuss with you, generally meet this kind of problem how to organize the idea of answering the question.

Begin

If one day someone asks you, “Tell me about your understanding of mysql transactions,” “Tell me about mysql transactions,” and you’re thinking, “What, what am I supposed to understand about that? How can I answer that if you don’t ask me clearly?” Don’t panic, just like when we talk about Hopu, the first thing we think of is a handsome guy. So we can start by simply stating in popular language what the other person is asking about.

I apologize for using Begin as the subtitle, because I really don't know what to call it, but given that transactions can be started with Begin, I'll call it Begin.Copy the code

Business is what

To put it simply, a transaction is one or more operations on a database, and through its features provides a series of guarantees for the operation process to ensure the accurate execution of business logic. When you write a transaction, start the transaction with start TRANSACTION /begin, write a series of database operations, and commit the transaction with COMMIT. The entire transaction is committed only when all the operations in the transaction are successfully executed. The transaction is automatically rolled back. Here you only need to understand the business with their own language to express it, the important thing is that we treat this kind of problem, first of all to tell clearly what is!

The premise

Since transactions are implemented in the storage engine layer, for the more common storage engines, if we want to use transactions, we can not use the MyISAM storage engine, but should use the Innodb storage engine. In fact, transactions are one of the core features provided by the Innodb storage engine.

Characteristics of transactions

As mentioned earlier, transactions guarantee a series of database operations through their properties, so we need to clarify what the properties of transactions are. There are four characteristics of transactions, respectively

  1. Atomicity
  2. Consistency
  3. Isolation
  4. Durability (Durability)

These are known as the four PROPERTIES of ACID in transactions.

What safeguards can be provided

Having said what the characteristics of transactions are, the next step is to talk about what are the guarantees that these features provide? After all, it takes time so that the interviewer doesn’t have time to ask difficult questions.

  • The atomicity of a transaction can provide a guarantee for the atomicity of a series of database operations. A series of database operations belonging to a transaction will either all succeed or all fail. There is no such thing as some statements executing successfully and others failing.
  • Consistency means that the transaction should always conform to the logical norms in the real world before and after the execution, and always be in a consistent state before and after the execution of the transaction. This is my understanding of consistency. In many articles, consistency is explained by transfer examples. If A transfers A dime to B, then A’s account should be reduced by one dime and B’s account should be increased by one dime. However, the total balance between the two should always be in A consistent state, which conforms to the norms in the real world. There are norms like age can’t be negative, gender can’t be anything other than male and female, and consistency is the epitome of a set of logical norms in the real world.
  • Isolation means that the operation of each transaction cannot affect the normal operation of other transactions. In the real world, two state transitions cannot be affected by each other. This is easy to understand.
  • Persistence means that once a transaction is committed, changes made to records or data in the database are permanently stored on disk, even if a power failure does not affect them.

Realize the principle of

In retrospect, there is a lot more to be said about this issue. Having mentioned the nature of transactions and what guarantees they provide, it is time to talk about how transactions provide these guarantees.

atomic

Atomicity of a transaction is a mechanism for restoring the database state prior to transaction execution if the transaction fails. To implement this mechanism, we need to undo the changes made to the database, just as we send a message, feel bad, and then withdraw. The implementation of transactions relies on one of mysql’s three major logs, the Undo log, also known as the Undo log. Undo log if the details of the discussion, I am afraid to need tens of thousands of words, here for fear of causing physiological discomfort, I will not discuss in detail. In general, Undo log is a kind of logical log. The so-called logical log means that it records the operation to the database, not the memory after the operation. To Undo our changes to the database, simply record the reverse action in Undo log. For example, if we insert a record into the database, the Undo log records the corresponding delete statement. If we modify a row in the database, the Undo log records the corresponding modification to the previous data. If we delete a row in the database, the Undo log records the corresponding logic to add the row. In fact, in order to realize the above logic, Implement different types of Undo logs. In addition, the system determines whether the storage space can be reused according to whether the storage space occupied by the record row changes before and after the modification. So to summarize, Undo log records the logical opposite of what we did to the database, and it doesn’t record memory data, it records logical operations.

The Undo log is appending. When an Undo log reaches its maximum value, a new Undo log file is opened to continue writing. How to use Undo log to rollback a failed transaction? This requires an understanding of the row format of mysql records. The mysql record has two hidden columns transanction_ID and roll_pointer. In addition, Innodb generates a 6-byte row_ID as the primary key if there is no primary key or unique key in the table. Something like that

Innodb assigns a TRx_ID to a record only when it is modified. Roll_pointer is a pointer to the corresponding Undo log, so that when a transaction fails, we can find the corresponding Undo log through the transaction ID and roll_pointer to restore the state before the transaction is executed.

persistence

When a transaction commits, the changes we make to the transaction should be permanently reflected on disk, based on the guarantee of persistence, but this is not the case. Interaction is the Innodb a page with a disk, the size of a page default is 16 KB, sometimes we modify a record, may be only takes up a very small space pages, if each of us to record in the database to make changes to its persisted to disk, then it is modify a little bit to the disk write 16 KB of data, Disk IO is very slow, which is unbearable. If the specified condition is not found in the Buffer pool, the specified condition is read from memory. If the specified condition is not found in the Buffer pool, the specified condition is not found in the Buffer pool. Load the specified page from disk into the Buffer pool and proceed accordingly. When data is written to the Buffer pool, it is also written to the Buffer pool first. The data in these pages is inconsistent with the data in the disk, which is called Dirty pages. This is done by linking the dirty pages together using a linked list. The modified dirty pages are then flushed to disk at a later appropriate time (in fact, the operating system has a buffer that is flushed to the operating system’s buffer and then flushed to disk).

Here is the problem, if after the transaction commits successfully, the modified dirty page has not had time to brush disk, as a result of the system power failure. This is where the second of Mysql’s three main log players comes in: the redo log. Transaction persistence is implemented through redo logs. The redo log is a physical log. It does not record changes. It records what was changed at the offset of the page in memory, so it records actual data in memory. I’m also not going to do redo logs here, because the interviewer might have a heart attack if I did.

How does the redo log crash and recover after a transaction is committed? The answer is that when we make a change to a mysql record (insert, modify, or delete), we log the redo log, and then we flush the redo log to disk. The redo log is very small. And redo logs are sequential IO, which is much faster than record changes, which may be random IO in many cases. If the database hangs after the transaction commits, you only need to redo the database using the redo log.

The redo log can be overwritten, and when the redo log is full, it is reused from scratch. That’s a bit of a digression. In short, transaction persistence is done through redo logs.

Isolation,

Isolation should be the most complex principle in the process of transaction implementation. If you look at it in detail, you will find that Mysql has put a lot of effort into isolation, because the concurrency problem is always a tricky problem. To achieve isolation, there is no way around how to make Mysql efficient concurrency.

Mysql has four isolation levels: uncommitted read, committed read, repeatable read, and serialized read. I’m sure you already know what the problems are with the four levels of isolation. Below in the form of a table for you to review, know that you have already known.

Isolation level Dirty read Unrepeatable read Phantom read
READ UNCOMMITTED Possible Possible Possible
READ COMMITTED Not Possible Possible Possible
REPEATABLE READ Not Possible Not Possible Possible
SERIALIZABLE Not Possible Not Possible Not Possible

So how does Mysql implement the different isolation levels? This brings up the roll_pointer hidden column in the record row format we talked about earlier. We say that roll_pointer is a pointer to the corresponding undo log, so each undo log has an old_roll_pointer that points to a history of earlier changes, so that all historical changes are connected in a linked list. Just like this, we changed the name to Liu Bei, Guan Yu, Zhang Fei, Zhao Yun and Zhuge Liang. So our changes are concatenated by roll_pointer to form a version chain.

For unsubmitted read, it is obvious that as long as the latest modification of the record is read every time, it is inevitable that the phenomenon of dirty read, other things have not been changed, you are sure to want to eat hot tofu!

For serialized reads, Innodb uses locking, which means no concurrent sharing. This is a simple and crude solution.

Commit reads and repeatable reads are a little tricky, but these are not difficult to design mysql uncle, so the famous multi-version concurrency control (MVCC) was born.

Simply put, commit reads and repeatable reads are implemented by generating the contents of a ReadView. ReadView can simply be interpreted as taking a snapshot of the state of the mysql record at that moment. With this Readview, a transaction can determine which records are visible and which are not on the version chain of that record. ReadView is implemented by maintaining a list of active transaction ids at the time the ReadView was generated. If the transaction ID in the corresponding historical version is less than the minimum active ID, then the transaction was committed at the time the ReadView was generated. If transaction id of the history of the corresponding version is greater than the maximum transaction id, so that the version of the transaction is in the current transaction open after open, natural cannot read, if in the active list, you need to determine separately, version chain corresponding record transaction id is active in the list, if the transaction has not been submitted, Otherwise, the transaction has been committed.

The biggest difference between committed read and repeatable read is that readViews are generated at different times. Committed read generates a Readview before each statement execution, so that the current transaction can read the latest value after other transactions commit, thus causing the problem of unrepeatable read. Repeatable reads generate the ReadView at the beginning of a transaction and continue with the current ReadView. In this way, the historical version that the transaction can see is determined at the beginning of the transaction, thus avoiding the problem of unrepeatable reads.

consistency

The realization of consistency is special. Atomicity and isolation are the basis for the realization of consistency, but it does not mean that the realization of atomicity and isolation can guarantee consistency, nor does it mean that the realization of consistency is the problem of atomicity and isolation. In fact, mysql has some consistency checks in DDL statements, such as check, but the fact that you use check doesn’t do any good. The realization of consistency mainly depends on the following three aspects:

  1. Mysql does some consistency checks in DDL statements, such as data type, length, non-null, etc.
  2. Rely on mysql stored procedures implementation, but stored procedures are not commonly used.
  3. Leave conformance checking to the programmer who writes the business logic.

In fact, the most important is to rely on our programmers to write their own business inspection code.

conclusion

When we are asked to talk about Innodb’s transaction mechanism, we can think about it in the following way.

  1. First use your own understanding of what a transaction is, but do not need to memorize the definition, as long as according to their own understanding to convey the meaning to others can be.
  2. Describe the characteristics of a transaction.
  3. What safeguards can these features provide
  4. How does Innodb achieve these safeguards? What’s the rationale behind it?

I believe that after finishing these, also need to take up a lot of time, “that person” is anxious to get off work, will not ask you about business questions!