preface

During the interview process, business is also a common point that interviewers often like to ask. The current content is roughly as follows

  • How does Mysql implement this ACID feature
  • How did you resolve long transactions during development, and what problems did long transactions cause

concept

A logical processing unit is a set of SQL statements that perform a series of logical operations on the Mysql database. All of these operations must succeed or fail. There are four properties of A transaction: atomicity (A), consistency (C), isolated rows (I), and persistence (D). These four properties are called ACID properties. ACID theory abstracts Mysql properties

  • Atomicity: All operations in a transaction either complete or fail without stopping any link in the middle. Exceptions occur in the middle and the transaction is rolled back to its original state as if the transaction had not been executed
  • Consistency: At the beginning and end of a transaction, the data states in the database are kept in the same state. For example, when two people transfer money, the sum of the two people before the transfer is 1000, and the sum of the two people after the transfer is completed should be 1000, not more or less, and the data is kept in the same state
  • Isolation: The execution of a transaction cannot be interfered with by other transactions, and concurrently executed transactions cannot interfere with each other. For example, transactions A to B and B to C have no impact (if B to C and A to B, the balance of B depends on the isolation level of the database). The isolation of transactions is usually achieved by locking
  • Persistence: Once a transaction commits, changes to data in the database are permanent and cannot be rolled back

Concurrent transactions

Multiple transactions operating on the database at the same time can cause problems such as data loss, dirty reads, and unrepeatable reads if no mechanism is used

  • Data loss: two transactions, T1 and T2, read and modify data at the same time. T1 commits first and T2 later, and T2 overwrites the changes on T1, resulting in the loss of changes on T1
  • Dirty read: two transactions. T1 modifies one data and writes it back to disk. After T2 reads the same data,T1 rolls back for some reason. At this point, the data read by T2 is not consistent with the data in the database, and the data read by T2 is dirty
  • Non-repeatable read: the same data is read multiple times in a transaction and the data contents are inconsistent. Two transactions T1 reads a data, reads a data if it’s 100, it’s not finished yet T2 also reads the data and changes it to 80 and commits. The T1 transaction then reads the data and finds that the data has changed to 80. The data contents of the two reads in a transaction are inconsistent
  • Phantom read: multiple reads in a transaction, the amount of data read is inconsistent. Two transaction T1 queries the total number of current tables, for example, 5, when T2 adds a new data, T1 checks again, finds that the total number of current tables is 6, there is no reason for an extra data, this is called phantom read

Isolation level

Database must have the ability to isolation concurrently running all kinds of affairs, make it do not influence each other between, avoid all sorts of concurrency issues, such as dirty read, not repeatable read and illusions, Mysql defines the following four isolation levels (from low to high), the four isolation levels can be solved by dirty read, not repeatable read, phantom read this a few questions.

  1. Read Uncommitted simplified RN: A transaction can Read uncommitted changes in another transaction. This isolation level is the worst and causes dirty reads, phantom reads, and unrepeatable reads
  2. Read Commit: A transaction can only Read changes that have been committed by another transaction, avoiding dirty reads and still having problems with non-repeatable and phantom reads. This is the Oracle default isolation level
  3. Repeated Read (Repeated Read): a transaction that reads the same data multiple times will return the same result, avoiding the problem of dirty and non-repeatable reads, but not avoiding phantom reads. Mysql’s default isolation level is this. Mysql solves the phantom read problem with multi-version concurrency Control (MVCC) and gap locking
  4. Serializable: The highest isolation level of a database, where transactions are queued, can solve the above problems, but execution efficiency is poor.

Open the transaction

  • Begin/Start Transaction: Indicates that a transaction is started
  • Commit: Commits a transaction
  • Rollback: Rolls back a transaction

Realize the principle of

Mysql implements atomicity, consistency, and persistence through redo logs and undo logs.

redo log

·Redo log is used for data recovery. It contains two parts: the Redo log buffer, which is lost after a restart. The other part is the redo log file on disk, which is persistent. Let’s take a look at the redo log in detail with an update example

Data Update Process

We generally in the development process, general like this below SQL to update the data in the database, today we are going to study is how to execute the SQL statement Mysql, bloggers in learning related knowledge, found a blog on the CSDN correlogram straightforward (click into the link) and Mysql45 about related knowledge, Today we stand on the shoulders of predecessors to learn about the relevant knowledge points, if there is infringement, please contact the blogger to delete.

update user set name="Programmer Fly" where  id=8
Copy the code

Reference image: blog.csdn.net/javaanddone…

As we see in the figure, the client sends an update request together, requiring both the Server layer and the storage engine layer

Server layer

The storage engines, triggers, and views we often use are all implemented on this layer, while the Server layer only has the following components

  • Connector: used for connection management and permission authentication. After a client requests a TCP connection through a three-way handshake and a four-way handshake, the connector queries the user’s permission
  • Query cache: After the connector is set up, if it is a query statement, it will check to see if there is any data in the cache. Because of the update statement, all updates on the related table are invalid. (PS: Cache is not useful in most cases)
  • Analyzer: After the cache is finished, it will arrive at the analyzer. The analyzer will perform lexical analysis and syntax analysis to see if the SQL has syntax errors and lexical errors, and then hand over the work to the optimizer
  • Optimizer: The optimizer, after analyzing the SQL above, decides to use the index ID and calls the executor
  • Executor: The executor calls the storage engine for related operations. The next two operations are performed above the storage engine layer
Storage engine layer

Storage engines are designed to store and extract data when the executor sends a specific execution plan

  • The storage engine will first find the row id=8, using a tree search, if the page is in memory, it will be returned to the executor, if not, it will be read from disk into memory, and then back
  • The executor takes the data returned by the storage engine, changes name to ‘programmer fly’, and calls the storage engine to write the data
  • The storage engine updates the data in memory and logs the update to the redo log, telling the executor that it is done and ready to commit.
  • The executor generates a binlog of this operation to the binlog buffer pool. (See parameters and sync_binlog control when the buffer pool flusher to the physical disk.) The binlog record is not completed and the executor stores the engine at the same time
  • The storage engine updates the redo log status in the redo log buffer to COMMIT. The update is complete. Innodb_flush_log_at_trx_commit specifies when the redo log is flushed.
    • When set to 0, only the redo log is left in the redo log buffer each time a transaction commits
    • If the value of 1 is set to 1, the redo log is persisted to disk during the prepare phase.
    • The redo log is only written to the page cach each time a transaction commits.

The redo log can be used to restore data after a system crash

undo log

Undo log has two main functions: transaction rollback and multi-version concurrency control (MVCC). When data is modified, not only the redo log is recorded. If the transaction fails due to database exceptions during the process, the transaction will be rolled back through the undo log to achieve atomicity. The logical log is recorded by the undo log.

  • When a data is deleted. The undo log records a corresponding insert record, while the delete record is an insert
  • When an update is performed, a reverse update message is recorded

MVCC

Undo log has two main functions: transaction rollback and multi-version concurrency control (MVCC). In Mysql, data has versions. The same data has multiple versions in the system, and transaction isolation is realized by comparing data versions. Undo log rollback to the previous version and compare the values visible to the current view. Let’s look at an example (from Mysql45). If a value of 1 in the database is changed to 2, 3, and 4 in order, it will be recorded in the rollback log as shown in the figure

For example, the current value of this data is 4, but different transactions have different read-views when querying this data. The corresponding values of views A,B,C, and D are divided into 1,2,3, and 4. Transaction at the time of query of data, A will pass the undo log roll back to view A saw 1, so isolated we think about what good This will increase the concurrent, when we query need not wait for another transaction release the lock, make different transaction read write, write – read operation can execute concurrently, improve system performance

conclusion

  • Atomicity: Using undo log to achieve, if there is an error in the process of transaction execution, the system will use undo log rollback to return the status of the beginning of the transaction
  • Persistence: Use the redo log. Once the redo log is persisted, data can be recovered from a system crash using the redo log
  • Isolation: Using MVCC and locks
  • Consistency: Consistency is achieved through this rollback, recovery, and concurrency

A few questions

Q: Do redo log buffer logs need to be persisted to disk every time they are generated?

A: No, if the Mysql transaction is restarted abnormally during execution, since the transaction has not been committed, the log loss will not be affected

Q: A transaction commit requires two writes to disk, a redo log, and a binlog. What optimizations did Mysql make

A: Mysql uses group commit mechanism for optimization, such as three concurrent transactions t1, t2,t3, corresponding log logical serial number LSN is 10,20,30, we simply understand as a record, t1 commit LSN is 30, t1 return, commit <30. T2 and t3 are returned directly after completion without waiting for disk write.

Q: What are the problems with long transactions

A: Long transactions are not released. The MVCC database has long rollback segments.

The giant shoulder

www.cnblogs.com/f-ck-need-u…

Time.geekbang.org/column/intr…

gossip

Public account has its own summary of a series of articles, need small friends also please pay attention to personal public account programmer Fly dot like, this will be a great encouragement to me ~