Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

Transactions have four characteristics: atomicity, consistency, isolation, and persistence. What is the mechanism by which the four features of transactions are implemented?

  • The isolation of transactions byLocking mechanismThe implementation.
  • The atomicity, consistency, and persistence of a transaction are determined by the transactionalredo Log andundoLog to ensure.
    • REDO LOGReferred to asRedo log, provide write operation, restore commit transaction modification page operation, to ensureTransaction persistence.
    • UNDO LOGReferred to asRoll back logRollback row records to a specific version to ensureAtomicity and consistency of transactions.

Some DBAs might think UNDO is the reverse of REDO, but it is not. REDO and UNDO are both considered recovery operations, but:

  • Redo log: Log generated by innoDB that records “physical level” page modifications, such as page number XXX, offset YYy, and ZZZ data. Mainly to ensure the reliability of data;

  • Undo log: It is a log generated by the storage engine layer (InnoDB). It records logical operation logs. For example, when an INSERT statement operation is performed on a row of data, the undo log records a DELETE operation contrary to the operation. These are mainly used for transaction rollback (undo log records the reverse of each modification) and consistent non-locking reads (Undo log rolls back rows to a particular version-MVCC, for multi-version-concurrency control).

1. Why do I need REDO logs

InnoDB storage engine manages storage space on a page basis. Before you can actually access the page, you need to cache the page on disk into a Buffer Pool in memory before you can access it. All changes must first update data in the buffer pool, and dirty pages in the buffer pool are flushed to disk at a certain frequency (checkPoint mechanism) to optimize the gap between CPU and disk, so that overall performance does not degrade too quickly.

Mysql is guaranteed to be persistent through REDOLOG

  • On the one hand, buffer pools help to bridge the gap between CPU and disk. The checkpoint mechanism ensures that data is eventually dropped. However, since checkpoint is not triggered every time a change is made, it is processed by the master thread at intervals. So the worst case scenario is that the database goes down right after the transaction is committed and just after the buffer pool is written, the data is lost and cannot be recovered.

  • Transactions, on the other hand, contain the feature of persistence, which means that for a committed transaction, changes made to the database cannot be lost even if the system crashes after the transaction is committed.

So how do you guarantee this persistence? A simple solution would be to flush all the pages modified by the transaction to disk before the transaction commits, but there are some problems with this simple and crude approach

  • The amount of modification is out of proportion to the amount of disk refreshing

    Sometimes we change just one byte in a page, but we know that in InnoDB we do disk IO on a page basis, which means we have to flush a whole page from memory to disk when the transaction commits, and we know that a page is 16KB by default, Changing a single byte and flushing 16KB of data to disk is obviously a lot of fuss

  • Random 1O refresh is slow

    A transaction may contain many statements, and even a single statement may modify many pages. If the pages modified by the transaction may not be adjacent to each other, this means that a lot of random I/O is required to flush pages from the Buffer Pool modified by a transaction to disk. Random I/O is slower than sequential I/O. Especially for traditional mechanical hard drives.

Another solution: we just want to make the changes made by committed transactions to the database permanent, so that even if the system crashes later, the changes can be recovered after a restart. So we don’t need to flush all the pages that have been changed in memory to disk every time a transaction commits, just keep a record of what has been changed. For example, a transaction changes the value 1 to 2 of the byte offset 100 on page 10 of the system table space. We just need to record that the offset of page 10 of tablespace 0 at 100 is updated to 2.

InnoDB engine uses write-Ahead Logging (WAL) technology for transactions, the idea of which isLog first, then disk, the transaction is committed successfully only when the log is written successfullyredo log. This parameter can be passed when the system is down and data is not flushed to the diskredo logTo restore the D in ACID, and that’s what it isThe redo log.

2. Benefits and features of REDO logging

  • benefits
    • Redo logReduces the frequency of brushing disks
    • Redo logIt takes up very little space

      Table space D, page numbers, offsets, and values that need to be updated.
  • The characteristics of
    • Redo logs are sequentially written to disk. During a transaction, several RDO logs may be generated for each statement executed. These logs are written to disk in the sequence in which they are written.
    • The redo log is continuously recorded during transaction execution

The difference between redo log and bin log is that redo log is generated by the storage engine layer and bin log is generated by the database layer. Assume a transaction that performs 100,000 row insertions to a table, and the bin log does not record them until the transaction commits.

3. redo log format

As we know from the above, redo logs are essentially a record of what changes a transaction makes to the database. InnoDB designers have defined various types of redo logs for different scenarios of transaction changes to the database, but most types of redo logs have a common structure:

The details of the sections are as follows:

  • Type: indicates the redo log type.

    In MySQL 5.7.21, InnoDB designed 53 different redo log types. We’ll cover the different redo log types later.

  • Space ID: indicates the ID of a tablespace.

  • Page number: indicates the page number.

  • Data: indicates the contents of the redo log.

3.1 Simple Redo Log Types

If InnoDB does not explicitly define a primary key for a table and does not define a Unique key, InnoDB automatically adds a hidden column called row_ID as the primary key. The row_ID hidden column is assigned as follows:

  • The server maintains a global variable in memory every time it is hidden to some containingrow_idWhen a record is inserted into a table of columns, the value of this variable is treated as the new recordrow_idColumn, and increments the variable by 1.
  • Whenever the value of this variable is a multiple of 256, the value of this variable is flushed to the page number of the system table space7A page calledMax Row IDProperty of.
  • When the system starts, the above mentioned will be addedMax Row IDProperty is loaded into memory, and then the value is added to 256 and assigned to the global variable we mentioned earlier (because the value of the global variable may be greater than during the last shutdown)Max Row IDProperty value).

The Max Row ID attribute occupies 8 bytes of storage space. When a transaction inserts a record into a table with a row_ID hidden column and the row_id value assigned to the record is a multiple of 256, 8 bytes of value is written to the corresponding offset of the page with page 7 of the system tablespace. However, it is important to remember that this writing is actually done in the Buffer Pool, so we need to log a redo log of the page changes so that we can recover the changes that were committed to the page after the system crashed. In this case, the modification of the page is very simple. The redo log only records the value of a few bytes changed at an offset of a page. The design of InnoDB calls this very simple redo log physical log. There are several different redo log types based on how much data is written to the page:

  • MLOG_1BYTE(typeThe decimal number corresponding to the field is1) : indicates that 1 byte is written at an offset of the pageredoLog type.
  • MLOG_2BYTE(typeThe decimal number corresponding to the field is2) : indicates that 2 bytes are written at an offset of the pageredoLog type.
  • MLOG_4BYTE(typeThe decimal number corresponding to the field is4) : indicates that 4 bytes are written at an offset of the pageredoLog type.
  • MLOG_8BYTE(typeThe decimal number corresponding to the field is8) : indicates that 8 bytes are written at an offset of the pageredoLog type.
  • MLOG_WRITE_STRING(typeThe decimal number corresponding to the field is30) : indicates that a string of data is written at an offset of the page.

The Max Row ID attribute we mentioned above takes up 8 bytes of storage space, so when we modify this attribute on the page, we log a MLOG_8BYTE redo log. The MLOG_8BYTE redo log structure is as follows:

The other MLOG_1BYTE, MLOG_2BYTE, and MLOG_4BYTE log structures are similar to those of MLOG_8BYTE, except that the data contains the corresponding bytes. A MLOG_WRITE_STRING log represents writing a string of data, but since it is not clear how many bytes are being written, you need to add a len field to the log structure:

Tip: Replace the redo logs with MLOG_1BYTE, MLOG_2BYTE, MLOG_4BYTE, and MLOG_8BYTE by padding the LEN field of the MLOG_WRITE_STRING log with 1, 2, 4, and 8 digits. Why bother designing so many genres? If you don’t have to write len, you don’t have to write len. Every byte saved counts as byte.

3.2 More complex Redo log types

Sometimes a single statement modifies a large number of pages, including system data pages and user data pages (user data refers to the B+ tree corresponding to clustered indexes and secondary indexes). For example, an INSERT statement may update the Max Row ID of the system data into the page of the B+ tree.

  • How many indexes does the table contain? OneINSERTThe statement may update as many trees as possibleB+The tree.
  • For a particular treeB+For tree, it is possible to update leaf node page, update inner node page, or create a new page (when the remaining space of the leaf node where the record is inserted is insufficient to store the record, the page will be split and added to the inner node pageDirectory entry record).

During statement execution, all page changes made by the INSERT statement must be saved to the redo log. For example, when inserting a log into a cluster index, update only the leaf page if there is enough space to store the log. Log only one MLOG_WRITE_STRING log. Would it be good to indicate what data was added at an offset on the page? Don’t forget that a data Page contains File Header, Page Header, Page Directory, etc. (explained in the section on data pages) in addition to the actual records, so every time you insert a record into the data Page represented by a leaf node, There are many other places that will be updated, such as:

  • May be updatedPage DirectorySlot information in.
  • Page HeaderPage statistics, such asPAGE_N_DIR_SLOTSThe number of slots represented is subject to change,PAGE_HEAP_TOPThe minimum address that represents the unused space may change,PAGE_N_HEAPThe number of records on this page is subject to change, etc. All kinds of information can be modified.
  • We know that the records in the data page form a one-way linked list according to the order of the index column from small to large. Every time a record is inserted, the record header information of the previous record needs to be updatednext_recordProperty to maintain the one-way linked list.
  • There are other update places, etc., I will not introduce one…

A simple diagram might look something like this:

All that said, there’s a lot to change when inserting a record into a page. If we use the simple physical redo log described above to log these changes, there are two solutions:

  • Plan 1: Log a redo log at each change.

    As shown above, there are as many physical redo logs as there are bold blocks. The disadvantages of logging redo logs in this way are obvious. Because there are so many redo logs, the redo log may take up more space than the entire page

  • Plan 2: Treat all data from the first modified byte to the last modified byte on the entire page as data in a physical redo log.

    As you can see from the graph, there is a lot of unmodified data between the first byte and the last byte. It would be a waste of time to add unmodified data to the redo log

    Because the above two methods of logging physical redo logs to record changes made to a page are wasteful, the creators of InnoDB, in the interest of thrift, proposed new redo log types, such as:

  • MLOG_REC_INSERT (decimal number 9) : Indicates the type of the redo log when a non-compact line format was inserted.

  • MLOG_COMP_REC_INSERT (decimal number 38) : Indicates the type of the redo log when a compact row was inserted.

Tip: Redundant is a fairly primitive row format that is non-compact. Compact, Dynamic, and Compressed are the newer row formats and are Compact (requiring less storage).

  • MLOG_COMP_PAGE_CREATE(typeThe decimal number corresponding to the field is58) : represents the creation of a page that stores records in compact row formatredoLog type.
  • MLOG_COMP_REC_DELETE(typeThe decimal number corresponding to the field is42) : deletes a record in compact line formatredoLog type.
  • MLOG_COMP_LIST_START_DELETE(typeThe decimal number corresponding to the field is44) : indicates that a series of records in compact line format are deleted from the page starting with a given recordredoLog type.
  • MLOG_COMP_LIST_END_DELETE(typeThe decimal number corresponding to the field is43) :MLOG_COMP_LIST_START_DELETEThe type ofredoLog echo: deletes a series of records untilMLOG_COMP_LIST_END_DELETEThe type ofredoLog corresponding to the record.

Note: When we introduced InnoDB’s data page format earlier, we emphasized that the records in the data page are one-way linked lists in order of index column size. It may be inefficient to write a redo log every time an index column is deleted. Therefore, MLOG_COMP_LIST_START_DELETE and MLOG_COMP_LIST_END_DELETE types of redo logs can greatly reduce the number of redo logs.

  • MLOG_ZIP_PAGE_COMPRESS(typeThe decimal number corresponding to the field is51) : indicates compression of a data pageredoLog type.
  • There are many more types, but I won’t list them until they are used

These types of redo logs have both physical and logical meanings. They refer to:

  • At the physical level, these logs indicate which page of which table space was modified.
  • Logically speaking, when the system crashes and restarts, a certain offset in the page cannot be directly restored to a certain data according to the records in the logs. Instead, some pre-prepared functions need to be called, and the page can be restored to the appearance before the system crashes after executing these functions.

As you can see this may seem confusing, let’s use the redo log of type MLOG_COMP_REC_INSERT as an example to understand what we mean by the physical and logical levels above. Without further discussion, let’s take a look at the structure of the MLOG_COMP_REC_INSERT redo log (because there are too many columns, we’ll use them vertically) :

There are a few things to note about the redo log structure of type MLOG_COMP_REC_INSERT:

  • As we said earlier in our introduction to indexing, records on a data page are sorted in order of the smallest index column, whether leaf or non-leaf. For secondary indexes, records need to be sorted by primary key value when the index column values are the same. In the figuren_uniquesThe value of “means that in a record, the values of several fields are needed to ensure the uniqueness of the record, so that when inserting a record, it can be preceded by the recordn_uniquesFields to sort. For clustered indexes,n_uniquesFor other secondary indexes, this value is the number of index columns + the number of primary key columns. It is important to note here that the value of the only secondary index may beNULL, so the value is still the number of index columns + the number of primary key columns.
  • field1_len ~ fieldn_lenRepresents the amount of storage space occupied by several fields of the record, regardless of whether the field type is of fixed length (e.gINT), or variable length size (e.gVARCHAR(M)), the size occupied by the field should always be writtenredoIn the log.
  • offsetRepresents the address of the previous record on the page. Why record the address of the previous record? This is because every time a record is inserted into the data page, the linked list of records maintained on that page needs to be modifiedRecord header informationContains a name callednext_recordProperty, so when inserting a new record, you need to modify the previous recordnext_recordProperties.
  • We know that a record is actually made byAdditional informationandReal dataThe total size of the two parts is the total storage space occupied by a record. throughend_seg_lenIs used to indirectly calculate the total storage space occupied by a record. Why not directly store the total storage space occupied by a record? This is because of writingredoLogging is a very frequent operation by designInnoDBIs trying to reduceredoHow much storage the log itself takes up, so I came up with some kind of convoluted algorithm to achieve this goal,end_seg_lenThis field is just for savingredoLog storage space and come out. As for the specific designInnoDBWhat magic is used to reduce the bossredoLog size, we won’t go into this, because there are a few small complications, it is a little cumbersome to explain, and it is not useful to explain.
  • mismatch_indexThe value of is also for savingredoThe size of the log set up, you can ignore.

Obviously, the MLOG_COMP_REC_INSERT redo log does not record how the PAGE_N_DIR_SLOTS values were changed, how the PAGE_HEAP_TOP values were changed, how the PAGE_N_HEAP values were changed, and so on. When the system crashes and restarts, the server calls the function that inserts a record to the page. The data in the redo log is used as a parameter to call this function. After calling this function, PAGE_N_DIR_SLOTS, PAGE_HEAP_TOP, PAGE_N_HEAP, and so on are all restored to their pre-crash values. This is what logical logging means.

3.3 Redo Log Format Summary

There’s a lot about redo log formats, but if you’re not writing a redo log parsing tool or developing your own redo log system, there’s no need to explore all the redo log formats in InnoDB. I have briefly described several types of redo log formats, but again I want you to understand that redo logs record all changes made to the database during a transaction and can be restored after a crash and restart.

Note: In order to save redo log storage space, InnoDB may compress some redo log data. For example, SPACd ID and page number are usually stored in 4 bytes, but after compression, they may use less space. The specific compression algorithm will not be introduced.

4. Composition of redo

The Redo log is divided into the following two parts: Redo log buffer and Redo log file

  • Redo log buffers, kept in memory, are volatile.
    • When the server starts up, the operating system requests a large amount of continuous memory called redo log buffertRedo log buffer. This memory space is divided into contiguousredo log block. aredo log blockIt occupies 512 bytes.

Parameter Settings: Innodb_log_buffer_size: Redo log buffer size. The default value is 16 MB. The maximum value is 4096 MB.

mysql> show variables like '%innodb_log_buffer_size%';
+------------------------+----------+
| Variable_name          | Value    |
+------------------------+----------+
| innodb_log_buffer_size | 16777216 |
+------------------------+----------+
Copy the code
  • Redo log files, which are stored on hard disk, are persistent.
    • REDO log files ib_logfile0 and ib_logfile1 are REDO logs.

5. Overall redo flow

Take an update transaction as an example. The redo log flow is shown below:

  • Step 1: First read raw data from disk into memory, modify the memory copy of the data
  • Step 2: Generate a redo log and write to the redo log buffer
  • Step 3: When the transaction is committed, refresh the redo log buffer to the redo log file and append to the redo log file
  • Step 4: Periodically flush the modified data in memory to disk

Tip: write-ahead Log: Persist the corresponding Log page in memory before persisting a data page.

6. Redo log disk flushing policy

InnoDB does not write redo logs directly to disk. InnoDB writes redo logs to the redo log buffer and then flusher them to the redo log file at a certain frequency. Redo log files are written to the redo log buffer, and redo log files are written to disk according to a policy. This is what we call the swipe strategy.

Note that the redo log buffer to the redo log file is not actually flushed to disk, but to the page cache (an optimization made by modern operating systems to make writing to files more efficient). The actual writing is left up to the system (e.g., the Page cache is large enough). Then there is a problem for InnoDB, if the system to synchronize, also if the system down, then the data is lost (although the probability of the whole system down is relatively small).

In this case, InnoDB provides the innodb_flush_log_at_trx_COMMIT parameter, which controls how logs from the redo log buffer are flushed to the redo log file during commit transactions. It supports three strategies:

  • Set to zero: indicates that disks are not flushed each time a transaction is committed. (By default, the master thread synchronizes redo logs every 1s.)
  • Set to 1: indicates that each transaction commit will be synchronized, flush operation (default)
  • Set to 2: Indicates that only the redo log buffer is written to the page cache (the operating system cache) during each transaction submission. It is up to the OS to decide when to synchronize to disk files.

In addition, the InnoDB storage engine has a background thread that writes the redo log buffer to the page cache every second and then calls a flush operation.

That is, a redo log with no committed transactions may flush. Because redo log records are written to the redo log buffer during transaction execution, they are flushed by the background thread.

In addition to the background thread polling once per second, there is a case where the redo log buffer takes up half of innodb_log_buffer_size(16MB by default), and the background thread actively flusher.

7. Demonstrate different flush strategies

  1. The flow chart

When the parameter is 1

If innodb_fush_log_at_trx_commit is set to 1, the redo log is stored in the hard disk and no data is lost.

If MySQL hangs or goes down during a transaction, this part of the log is lost, but the transaction is not committed, so there is no loss of the log. D, which guarantees ACID, never loses data, but is the least efficient.

It is recommended to use the default value, although the probability of operating system downtime is theoretically less than the probability of database downtime, but generally since transactions are used, data security is more important.

When the parameter is 2:

Innodb_flush_log_at_trx_commit =2 when the value is 2, the redo log buffer is only written to the file system cache (page cache) as long as the transaction is committed successfully. If only MySQL had hung, there would be no data loss, but if the operating system had crashed, there would be 1 second of data loss, which would not satisfy the D in ACID. But the number 2 is definitely the most efficient.

When the parameter is 0:

Summary: Innodb_fush_log_at_trx_commit =0 if the value is 0, fsync operations are performed on the master thread every one second. Therefore, the crash instance loses transactions within one second at most. (Master thread is responsible for asynchronously refreshing data from the buffer pool to disk to ensure data consistency.) If the value is 0, it is a compromise. The I/O efficiency theory is higher than 1 and lower than 2.

8. Mini-Transaction

8.1 Write redo logs as a group

The statement may modify several pages during execution. For example, an INSERT statement may modify the Max Row ID attribute of a page with system tablespace page number 7 (it may also update other system pages, but we didn’t list them all), as well as the clustered index and secondary index corresponding pages in the B+ tree. Since changes to these pages occur in the Buffer Pool, a redo log is required after the page is modified. The redo logs generated during statement execution are artificially divided into indivisible groups by InnoDB designers, such as:

  • updateMax Row IDPropertyredoLogs are indivisible.
  • Corresponds to the cluster indexB+Generated when a record is inserted into the page of the treeredoLogs are indivisible.
  • Corresponds to a secondary indexB+Generated when a record is inserted into the page of the treeredoLogs are indivisible.
  • There are other things that are generated when accessing the pageredoLogs are indivisible…

What do you mean by indivisible? Take inserting a record into the B+ tree corresponding to an index as an example. Before inserting the record into the B+ tree, it is necessary to determine which data page the record should be inserted into. After locating the specific data page, there are two possible situations:

  • The first case is that there is enough free space on the data page to hold the record to be inserted. It is simply a matter of inserting the record into the data page and logging a redo log of type MLOG_COMP_REC_INSERT. This case is called optimistic insert. Suppose an index corresponds to a B+ tree that looks like this:

    Now we want to insert a record with a key value of 10, which obviously needs to be inserted into page B. Since page B now has enough space for a record, we can simply insert the record into page B, like this:

  • Situation 2: The data page of the rest of the free space is not enough, so it is a tragedy, said before we encountered this kind of situation to be so-called pages divided operation, which is a new leaf node, and then put the part of the original data page record copy to the new data page, and then inserted the records in this leaf node is inserted into the leaf node list, Finally, add a catalog entry record to the inner node to point to the newly created page. Obviously, this process involves multiple page changes, which means multiple redo logs, a condition we call pessimistic inserts. Suppose an index corresponds to a B+ tree that looks like this:

    Now we want to insert a record with a key value of 10, which obviously needs to be inserted into page B, but as you can see from the diagram, page B is already full of records and there is no more free space for the new record, so we need to split the page, like this:

    If the remaining free space of page A, the inner node, is not sufficient to add an entry record, then the splitting of page A on the inner node continues, which means that more pages are modified, resulting in more redo logs. In addition, for pessimism insert, due to the need for new data page, you also need to change some system page, for example, to modify the statistics information of various segments, area, various chain table statistics (such as what is the FREE list, FSP_FREE_FRAG chain table, etc. We introduced the tablespace introduced all kinds of things in the chapter), and so on and so on, There are 20 or 30 redo logs that need to be logged anyway.

Note: It’s not just the pessimistic log that generates many redo logs. InnoDB may also create multiple redo logs for optimistic log insertions for other functions (we won’t go into details, or we won’t have enough space).

InnoDB’s designers believed that the process of inserting a record into the B+ tree corresponding to an index must be atomic, not stopped halfway through. For example, in a pessimistic insert, the new page is allocated, the data is copied over, and the new record is inserted into the page, but the insert is incomplete without inserting a catalog entry record into the inner node, resulting in an incorrect B+ tree. We know that redo logs are intended to restore the state before a crash during a restart. If only a portion of redo logs are recorded during a pessimistic insert, the index B+ tree will revert to an incorrect state during a crash restart, which is not tolerated by InnoDB’s design team. So they specified that redo logs must be stored as a group when performing atomically important operations, and that during a crash restart recovery, redo logs must be either completely restored or none of the redo logs must be restored. How did you do that? This score is discussed:

  • Some atomicity operations generate multiple redo logs, such as a pessimistic insert into a B+ tree corresponding to an index that generates many redo logs.

    How do I group these redo logs? InnoDB’s authors did a very simple trick. They added a special type of redo log named MLOG_MULTI_REC_END to the last redo log in the group. The type of redo log was 31. There is only one type field:

    So a series of redo logs generated by an atomicity operation must end with a type of MLOG_MULTI_REC_END, like this:

    In this case, during system crash and restart, a complete set of redo logs is resolved only when redo logs of type MLOG_MULTI_REC_END are parsed. Otherwise, discard the redo log.

  • For atomicity operations, only one redo log is generated. For example, updating the Max Row ID attribute generates only one redo log.

    It is possible to follow a log with a redo log of type MLOG_MULTI_REC_END, but InnoDB designers are frugal and don’t want to waste a single bit. Remember that although there are many redo log types, there are dozens of redo log types and they are less than 127, which means that 7 bits are sufficient to cover all redo log types. The type field is actually 1 byte. That is, one bit can be saved to represent a single redo log for the atomicity operation, as shown in the following diagram:

    If the first bit of the type field is 1, this indicates that the atomicity operation generated a single redo log. Otherwise, this indicates that the atomicity operation generated a series of redo logs.

8.2 Concept of Mini-Transaction

One atomic access to the underlying page is called a mini-transaction, or MTR for short. For example, changing the value of Max Row ID is a mini-transaction. Inserting a record into a B+ tree corresponding to an index is also a mini-transaction. As we know from the above description, a so-called MTR can contain a set of redo logs that act as an integral whole during crash recovery.

A transaction can contain several statements. Each statement is actually composed of several MTRS. Each MTR can contain several redo logs.

9 Redo log writing

9.1 Redo Log Block Structure Diagram

A redo log block consists of a log header, a log body, and a log tail. The log header takes up 12 bytes and the log tail takes up 8 bytes, so the amount of data a block can really store is 512-12-8=492 bytes.

Why is a block designed to be 512 bytes?

  • The associated and disk sector, mechanical disk sector by default is 512 bytes, if you want to write more than 512 bytes of data to write sectors must be more than one, then will be involved in the rotation of the plate, find the next sector, under the assumption that now need to write A and B two sectors, if the sector A written to success, and sector B write failure, Non-atomic writes occur, whereas if only 512 bytes are written each time as large as the sector, each write is atomic.

The actual redo log is stored in a 496-byte log block body. The Log Block header and Log Block Trailer store some management information. Let’s take a look at what all this so-called management information has.

  • The log block header attributes are as follows:

    • LOG_BLOCK_HDR_NO: The log buffer is made up of log blocks. The internal log buffer is an array, so LOG_BLOCK_HDR_NO is used to mark the location in the array. It is incremented and recycled, taking up 4 bytes, but since the first bit is used to determine whether it is a flush bit, the maximum value is 2G.

    • LOG_BLOCK_HDR_DATA_LEN: Indicates how many bytes have been used in the block. The initial value is 12 (because the log b1ock body starts at the 12th byte). The value of this property increases as more redo logs are written to the block. If the log block body is fully written, the value of this property is set to 512.

    • LOG_BLOCK_FIRST_REC_GROUP: A single REDO log is also called a single redo log record. A SINGLE MTR produces multiple redo logs, which are called a redo log record group. LOG_BLOCK_FIRST_REC_GROUP represents the offset of the redo log group generated by the first MTR in the block (that is, the offset of the first REDO log generated by the first MTR in the block). If the value has the same size as LOG_BLOCK_HDR_DATA_LEN, it indicates the current log block. Does not contain new logs,

    • LOG_BL0CK_CHECKPOINT_NO: occupies 4 bytes, indicating the checkpoint when the log block is last written.

  • 1og block trai1er

    • LOG_BLOCK_CHECKSUM: indicates the check value of the block, used for correctness check (the value is the same as LOG_BLOCK_HDR_NO), we do not care about it for now.

9.2 Redo logs are written to the log buffer

Redo logs are written to the log buffer in sequential order, starting with the previous block and moving to the next block when the free space in that block is exhausted. The first problem we encountered when writing redo logs to the log buffer was which block and which offset. So InnoDB’s designers specifically provided a global variable called buf_free, This variable specifies where subsequent redo logs should be written to the log buffer, as shown in the following figure:

Each MTR generates a set of redo logs. Use a diagram to describe the logs:

Different transactions may be executed concurrently, so MTR between T1 and T2 may be executed alternately. Whenever an MT execution is complete, the set of redo logs generated with the MTR need to be copied to the log buffer. This means that the MTR of different transactions may be written alternately to the log buffer. Let’s draw all the redo logs generated in an MTR as a whole:

Some OF the redo logs generated by MTR are very large. For example, the REDO logs generated by MTR_t1_2 take up 3 blocks of storage space.

10. Redo log file Log file

10.1 Related Parameter Settings

  • innodb_log_group_home_dir: Specifies the path where the redo log group is located./ by default, it is in the database data directory. MySQL default data directory (var/lib/mysqlBy default, there are two files named ib_logfile0 and ib_logFILe1. The logs in the log buffer are flushed to these two disk files by default. The redo log file location can be modified.
  • innodb_log_files_in_group(log_files Number of files) : indicates the number of redo log files, named ib_logFile0, ibLogFile1… Iblogfilen. The default value is 2 and the maximum value is 100.
mysql> show variables like 'innodb_log_files_in_group';
+---------------------------+-------+
| Variable_name       | Value |
+---------------------------+-------+
| innodb_log_files_in_group | 2   |
+---------------------------+-------+
#ib_logfile0
#ib_logfile1
Copy the code
  • innodb_flush_log_at_trx_commit: Controls the redo log flushing policy. The default value is 1.
  • innodb_log_file_sizeSize of a single log file. The default size is 48 MB. The maximum value is 512 GB. The maximum value refers to the total number of redo log files. That is, innodb_log_files_in_group * innodb_log_file_size cannot be greater than the maximum value of 512 GB.
mysql> show variables like 'innodb_log_file_size'; +----------------------+----------+ | Variable_name | Value | +----------------------+----------+ | innodb_log_file_size 50331648 | | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- +Copy the code

Modify its size according to the business to accommodate larger transactions. Edit the my.cnf file and restart the database to take effect, as shown below

[root@localhost ~]# vim /etc/my.cnf
innodb_log_file_size=200M
Copy the code

If database instances are frequently updated, increase the number and size of the redo log groups. It is not recommended to set the redo log to too large a size, because the redo log is reexecuted during MySQL crash recovery.

10.2 Log File Groups

As you can see from the above description, there is not one redo log file on disk, but a group of log files. These files start with ib_logfi1e[number] (digits can be 0, 1, 2…) Each redo log file is the same size.

When writing redo logs to log file groups, start with ib-1ogFile0, and if ib_1ogFILE0 is full, ib_1ogFILe1 is written. Similarly, ib_1ogfile1 is full ib_1ogfile2, and so on. What should I do if I write to the last file? Go back to ib_logfile0 and continue writing, so the whole process looks like this:

The total size of the redo log file is: innodb_log_file_size x innodb_log_files_in_group.

Do redo logs overwrite previous redo logs if redo logs are cycled to the redo log group? Of course! So InnoDB’s designers came up with the concept of Checkpoint.

10.3 Redo Log File Format

As we mentioned earlier, a log buffer is essentially a contiguous piece of memory divided into 512-byte blocks. The nature of flushing redo logs to disk is to write a mirror of the block to a log file, so redo log files are actually made up of 512-byte blocks.

Each file in the redo log file group is of the same size and format and consists of two parts:

  • The first 2048 bytes, or the first four blocks, are used to store some administrative information.
  • From 2048 bytes onwards is used for storagelog bufferOf the block mirror in.

So the redo log file is iterated from the 2048th byte of each log file.

The log block header, the log block body, and the log block trialer are all in the same format as the log buffer. The first 2048 bytes of each redo log file, or the first 4 special blocks, are in the following format:



As can be seen from the figure, the four blocks are:

  • Log file header: Describes some of the overall properties of the redo log file. Take a look at its structure:

    The specific definition of each attribute is as follows:

    The property name Length (in bytes) describe
    LOG_HEADER_FORMAT 4 redoLog version inMySQL 5.7.21The value is always 1
    LOG_HEADER_PAD1 4 Do byte padding use, no practical meaning, ignore ~
    LOG_HEADER_START_LSN 8 Mark thisredoThe LSN value at the beginning of the log file, that is, the LSN value at the beginning of the file offset of 2048 bytes.
    LOG_HEADER_CREATOR 32 A string, tag bookredoWho is the creator of the log file? During normal operation, the value isMySQLFor example:"MySQL 5.7.21", the use ofmysqlbackupCommand createdredoThe value of the log file is"ibbackup"And creation time.
    LOG_BLOCK_CHECKSUM 4 The check value of this block, all blocks have it, we don’t care

    Note: InnoDB has made several changes to the redo log block format. If you find that the redo log block format is different from the redo log block format in other books, don’t panic. Forget about previous versions. Also, the LSN is something we’ll talk about later, but don’t worry about what the LSN is right now.

  • Checkpoint1: Records some of the properties of checkpoint. Check its structure:

    The specific definition of each attribute is as follows:

    The property name Length (in bytes) describe
    LOG_CHECKPOINT_NO 8 The server to docheckpointDo it every timecheckpoint, the value increases by 1.
    LOG_CHECKPOINT_LSN 8 The server to docheckpointCorresponding to the end ofLSNValue from which system crash recovery will start.
    LOG_CHECKPOINT_OFFSET 8 In the last propertyLSNValues inredoThe offset in the log file group
    LOG_CHECKPOINT_LOG_BUF_SIZE 8 The server is doingcheckpointCorresponding to the operationlog bufferThe size of the
    LOG_BLOCK_CHECKSUM 4 The check value of this block, all blocks have it, we don’t care

    Note: It’s not uncommon to be confused about the properties of checkpoint and LSN. I just want to familiarize you with these properties, but we’ll cover them later. Copy to clipboardErrorCopied

  • The third block is unused and ~ is ignored

  • Checkpoint2: Has the same structure as checkpoint1.

10.4 the Log Sequeue Number

Since the system is running, the page is constantly being modified, which means that redo logs are constantly being generated. The amount of redo log increases. Like a person’s age, it increases from birth and cannot be reduced. InnoDB uses a global variable called Log Sequeue Number (LSN) to record the amount of redo logs that have been written. Unlike a person who is born at age 0, InnoDB’s original LSN was 8704 (i.e., 8704 if no redo logs were written).

We know that redo logs are not written to the log buffer line by line, but as a set of redo logs generated by an MTR. And you actually write the log content in the body of the log block. However, the increment of LSN is calculated according to the log quantity actually written plus the occupied log block header and log Block Trailer. Let’s look at an example:

  • When the system initializes the log buffer after the first startup, buf_free (the variable that marks where the next redo log should be written to the log buffer) points to the first block with an offset of 12 bytes (the size of the log block header). Then the LSN value will also increase by 12:

  • If a set of redo logs generated by an MTR takes up a small amount of storage space, that is, if the block to be inserted has enough free space to hold the log submitted by the MTR, the LSN increases by the number of bytes used by the REDO logs generated by the MTR, like this:

    If we assume that the redo log volume for MTR_1 is 200 bytes, then LSN is increased by 200 bytes from 8716 to 8916.

  • If a set of redo logs generated by an MTR occupies a large amount of storage space, that is, the remaining free space of the block to be inserted is not enough to hold the logs submitted by the MTR. The LSN increase is the number of bytes used for the redo log generated by the MTR plus the additional log block headers and log Block Trailer bytes, like this:

    We assume that the redo log from Mtr_2 was 1000 bytes. In order to write the redo log from Mtr_2 to the log buffer, we had to allocate two additional blocks. Therefore, the value of LSN should be increased by 1000 + 12×2 + 4 ×2 = 1032 based on 8916.

    Tip: Why is the initial LSN 8704? I don't know. That's the rule. You could say you were born a year old, just make sure you get older as time goes by. Copy to clipboardErrorCopiedCopy the code

    As you can see from the above description, each set of REDO logs generated by MTR has a unique LSN value. The smaller the LSN value is, the earlier the redo logs are generated.

10.4.1 flushed_to_disk_lsn

Redo logs are redo log files that are written to the log buffer before being flushed to disk. So the InnoDB designers came up with a global variable called buf_next_to_write, which marks which logs from the current log buffer have been flushed to disk. Let me draw a picture of this:

We said that LSN is the amount of redo logs written to the current system. This includes logs written to the log buffer but not flushed to disk. Accordingly, InnoDB’s design team proposed a global variable that represents the amount of redo logs flushed to disk. It’s called flushed_to_disk_lsn. When the system starts for the first time, the value of this variable is the same as the initial LSN value, which is 8704. As the system runs, redo logs are constantly written to the log buffer, but are not flushed to disk immediately. The LSN value differs from the flushed_to_disk_lsn value. Let’s demonstrate:

  • After the system starts for the first time, the redo logs generated by MTR_1, MTR_2, and MTR_3 are written into the log buffer. Assume that the corresponding LSN values of the three MTR are:

    • Mtr_1:8716 to 8916

    • Mtr_2:8916 to 9948

    • Mtr_3:9948 to 10000

      Flushed_to_disk_lsn = 8704; flusheD_to_disk_lsn = 8704; flushed_to_disk_lsn = 8704;

  • Flushes blocks from the log buffer to the redo log file. Assuming that the mtr_1 and mtr_2 logs are flushed to disk, flushes the number of mtr_1 and mtr_2 logs in flushed_to_disk_lsn. So the value of flushed_to_disk_lsn increases to 9948, as shown in the figure below:

In general, when new redo logs are written to the log buffer, the value of flushed_to_disk_lsn increases, but the flushed_to_disk_lsn does not change. The value of flushed_to_disk_lsn also grows. If the values are the same, all the redo logs in the log buffer have been flushed to disk.

Tip: When an application writes a file to disk, it writes to the operating system’s buffer first. If a write is not returned until the operating system confirms that it has been written to disk, you need to call the fsync function provided by the operating system. The value of flushed_to_disk_lsn increases only when fsync is executed. When only the logs in the log buffer are written to the operating system buffer but not explicitly flushed to disk, another value called write_lsn increases. To make things easier for you, we confuse the term flushed_to_disk_lsn with write_lsn.

10.4.2 Mapping between THE LSN value and the Redo log Offset

Since the value of LSN represents the total amount of redo logs written by the system, the value of LSN increases with the number of logs generated in MTR (of course, sometimes the size of log block header and log Block Trailer should be added). In this way, when the logs generated by MTR are written to disk, It is easy to calculate the offset of an LSN value in the redo log file group, as shown in the following figure:

The initial LSN value was 8704, corresponding to the file offset of 2048, and then grew as many bytes of log were written to disk with each MTR.

10.4.3 Flush LSN in the Linked List

We know that an MTR represents an atomic access to the underlying page. During the access, a set of indivisible redo logs may be generated. At the end of the MTR, the set of redo logs will be written to the log buffer. In addition, one very important thing to do at the end of the MTR is to add pages that may have been modified during MTR execution to the Flush linked list of the Buffer Pool. In case you’ve forgotten what a linked list of Flush is, let’s look at the following picture:

The first time a page is cached in the Buffer Pool, the corresponding control block is inserted to the head of the Flush list, and the next time the page is modified, it is not inserted because it is already in the Flush list. That is, the dirty pages in the Flush linked list are sorted from largest to smallest by the time they were first modified. In this process, two properties about when the page is modified are recorded in the corresponding control block of the cached page:

  • oldest_modification: If a page is loaded toBuffer PoolAfter the first modification, the page will be modifiedmtrThat corresponds to what we started withlsnValue is written to this property.
  • newest_modification: Each time a page is modified, the page is modifiedmtrCorresponding to the end oflsnValue is written to this property. That is, this property represents the system to which the page was last modifiedlsnValue.

Flushed_to_disk_lsn flushed_to_disk_lsn

  • If page A is modified during mTR_1 execution, at the end of mTR_1 execution, the corresponding control block for page A is added to the head of the Flush list. The LSN (8716) corresponding to the start of MTR_1 is written to oldest_modification of the control block corresponding to page A, and the LSN (8916) corresponding to the end of MTR_1 is written to newest_modification of the control block corresponding to page A. To illustrate this (for the sake of aesthetics, oldest_modification is abbreviated to O_M and NEwest_modification to N_M) :

  • Then, assuming that the mTR_2 execution changes pages B and C, at the end of the MTR_2 execution, the corresponding control blocks for pages B and C are added to the head of the Flush list. Write the LSN (8916) corresponding to the start of MTR_2 into the OLdest_modification property of the control block corresponding to page B and page C, and write the LSN corresponding to the end of MTR_2 into the oldest_modification property. 9948 is written to the newest_MODIFICATION property of the control block corresponding to pages B and C. Let me draw a picture:

    As you can see, each time a new node is inserted into the flush list, it is placed at the head, which means that the dirty pages at the front of the flush list are modified later and the dirty pages at the back are modified earlier.

  • Then assume that mTR_3 changes pages B and D, but that page B has already been modified so that its corresponding control block is inserted into the Flush list. At the end of mTR_3, only the corresponding control block of page D is added to the head of the Flush list. Therefore, write 9948 (LSN at the beginning of MTR_3) to oldest_modification of the control block corresponding to page D, and write 10000 (LSN at the end of MTR_3) to Newest_modification of the control block corresponding to page D. In addition, because page B was modified again during the mTR_3 execution, the value of NEwest_modification in the corresponding control block of page B needs to be updated to 10000. Let me draw a picture:

To sum up the above, it is: Dirty pages in the Flush list are sorted according to the chronological order in which the modification occurred, that is, by the VALUE of the LSN represented by OLdest_MODIFICATION. Pages that have been repeatedly updated are not inserted into the Flush list, but are updated with the value of the NEwest_modification property.

10.4:

The unfortunate fact is that our redo log group capacity is limited, so we have to recycle redo log files. This causes the last redo log to end with the first redo log. The redo log is used only to restore dirty pages after a crash. If the dirty pages were flushed to disk, the redo log would not be needed after a restart, so the redo log would not be necessary. The disk space it occupies can then be reused by subsequent redo logs. In other words, the amount of disk space that a redo log occupies is determined by whether the corresponding dirty pages have been flushed to disk. Let’s take a look at the example we’ve been talking about:

As shown in the figure, although the redo logs generated by mTR_1 and Mtr_2 are written to disk, the dirty pages they modify remain in the Buffer Pool, so the space of the redo logs generated by mTR_1 and Mtr_2 cannot be overwritten. Then, as the system runs, if page A is flushed to disk, its corresponding control block is removed from the flush list, like this:

The redo logs generated by MTR_1 are no longer useful, and their disk space is overwritten. The InnoDB designers proposed a global variable checkpoint_lsn that represents the total number of redo logs that can be overwritten in the current system, starting with 8704.

For example, now that page A has been flushed to disk, the redo log generated by Mtr_1 is overwritten, so we can perform a checkpoint_lsn increment, which we call a checkpoint. A checkpoint can be divided into two steps:

  • Step 1: Calculate the maximum value of the LSN for redo logs that can be overwritten on the current system.

    The redo log can be overwritten, which means that the dirty pages are flushed to disk, as long as we calculate the oldest_modification value of the earliest dirty pages on the current system. We assign the checkPOINt_lsn value to any redo log that is generated when the system LSN value is less than the oldest_modification value of the node.

    For example, if page A has been flushed to disk, then the last node in the Flush list is page C, which is the earliest dirty page modified on the current system and whose OLdest_modification value is 8916. We assign 8916 to checkpoint_lsn (that is, the LSN of the redo log is overwritten if the value is less than 8916).

  • Step 2: Write the checkpoint_lsn and the corresponding redo log group offset and the checkpint number to the log file’s management information (checkpoint1 or checkpoint2).

    InnoDB’s designers maintain a variable called checkpoint_no that tells us how many times the system has checked. Each time it has checked, the value of this variable increases by 1. As stated earlier, it is easy to calculate the redo log group offset for a given LSN value, so you can calculate the checkpoint_lsn offset for the redo log group and write all three values to the redo log group management information.

    As mentioned, each redo log file has 2048 bytes of management information, but the checkpoint information is only written to the first log file in the log file group. But do we store it in checkpoint1 or checkpoint2? InnoDB is written to checkpoint1 when the value of checkpoint_no is even and to checkpoint2 when it is odd.

After logging the checkpoint information, the relationship between the LSN values in the redo log group looks like this:

There are two other important properties in the log file group: write POS and checkpoint

  • write posIs the current position of the record, moving backwards as you write
  • checkpointIs the current position to erase, but also later

Each time the redo log is recorded in the log file group, the write POS position is updated. Each time the MySQL database loads the log file group to restore data, it clears the redo log and updates the checkpoint later. The remaining space between write pos and checkpoint can be used to write new redo log records.

If write pos catches up to checkpoint, the log group is full and no new redo logs can be written. MySQL must stop, clear some logs, and push checkpoint forward.

10.5 Batch Flush dirty pages from flush list

As mentioned in the Buffer Pool section, it is usually the background thread that brushes the LRU list and flush list. This is mainly because the flushing operation is slow and the user thread does not want to affect the processing of the request. However, if the page modification operation is frequent, the log writing operation is frequent, and the SYSTEM LSN value increases too fast. If a background scrub fails to flush dirty pages, the system fails to checkpoint, and the user thread may need to synchronize the earliest dirty pages (those with the least oldest_modification) from the Flush list to disk. In this way, the redo log of these dirty pages is no longer used. Then you can check in.

10.6 Viewing LSN Values in the System

We can use the SHOW ENGINE INNODB STATUS command to check the LSN values in the current INNODB storage ENGINE. For example:

mysql> SHOW ENGINE INNODB STATUS\G (... LOG -- LOG sequence number 124476971 LOG uploading up to 124099769 Pages uploading up to 124052503 Last checkpoint at 124052494 0 pending log flushes, 0 pending chkp writes 24 log i/o's done, 2.00 log I/O 's/second ---------------------- (... To Copy to clipboardErrorCopiedCopy the code

Among them:

  • Log sequence number: indicates the systemlsnValue, which is what the current system has already writtenredoLog quantity, including writelog bufferLog in.
  • Log flushed up toRepresentative:flushed_to_disk_lsnIs the value that the current system has written to diskredoLog volume.
  • Pages flushed up toRepresentative:Flush the listThe page that was first modified inoldest_modificationAttribute values.
  • Last checkpoint at: Indicates the current systemcheckpoint_lsnValue.

11. Crash recovery

In the absence of a hung server, redo logging can be a drag and can make performance worse. But in case, and I say in case, in case the database dies, the redo log is a treasure, and we can use the redo log to restore the page to its pre-crash state on reboot. Let’s take a look at what the recovery process looks like.

11.1 Determine the starting point for recovery

As mentioned earlier, all redo logs prior to checkpoint_lsn can be overwritten. That is, the dirty pages corresponding to these redo logs are flushed to disk. Since they have been flushed, there is no need to restore them. For redo logs after checkpoint_lsn, the dirty pages may not have been flushed or may have been flushed, so we need to read redo logs from checkpoint_lsn to restore the page.

Of course, the first file in the redo log group has two blocks containing checkpoint_lsn, so we want to check the most recent checkpoint. Checkpoint_no = checkpoint1; checkpoint2 = checkpoint1; checkpoint2 = checkpoint1; The larger checkpoint_no value indicates that the block stores the latest checkpoint information. This gives us the checkpoint_lsn of the last checkpoint and its offset in the redo log group checkpoint_offset.

11.2 Determine the end point of recovery

The starting point for redo log recovery is defined. What is the end point? This starts with the structure of the block. We say that redo logs are written sequentially, with one block filled and the next block written:

The log block header of a normal block has an attribute called LOG_BLOCK_HDR_DATA_LEN, whose value records how many bytes of space are used in the current block. For filled blocks, the value is always 512. If the value of this property is not 512, then it is the last block to be scanned in the crash recovery.

11.3 How can I Recover data?

Once you have determined which redo logs to scan for crash recovery, the next step is how to do it. Assume that there are five redo logs in the redo log file, as shown in the following figure:

Because redo 0 is after checkpoint_lsn, it can be ignored during recovery. We can now scan the redo logs after checkpoint_lsn for the redo logs and restore the pages to the redo logs. That’s fine, but the people who designed InnoDB have come up with a few ideas to speed up the recovery process:

  • Use hash tables

    Redo logs with the same space ID and page number are hashed in the same slot. If there are multiple redo logs with the same space ID and page number, Then they are linked by a linked list in order of generation, as shown in the figure:

    You can then iterate through the hash table, and because the redo logs that made changes to the same page are placed in a single slot, you can repair one page at a time (avoiding a lot of random IO reading the page), which makes recovery faster. Note also that the redo logs on the same page are sorted in the order they were built, so they are restored in that order. If you do not sort the redo logs in the order they were built, errors may occur. For example, the original modification operation is to insert a record first and then delete it. If the recovery is not in this order, it may be deleted first and then insert another record, which is obviously wrong.

  • Skip pages that have been flushed to disk

    We did not determine whether the redo log was flushed to disk. We did not determine whether the redo log was flushed to disk because of a recent checkpoint. Maybe the background thread keeps flushing dirty pages out of the Buffer Pool from the LRU list and flush list. If the redo logs after checkpoint_lsn correspond to dirty pages that were flushed to disk at the time of the crash, there is no need to modify the redo logs based on their contents during recovery.

    How do you know if a redo log’s dirty pages were flushed to disk at the time of the crash? It starts with the structure of the page, and we said that every page has a section called the File Header, and inside the File Header there’s a property called FIL_PAGE_LSN, This property records the LSN value (newest_modification in the page control block) that was the last time the page was modified. If dirty pages are flushed to disk after a checkpoint, the LSN value of FIL_PAGE_LSN must be greater than the value of checkpoint_lsn. Pages that do this do not need to redo redo logs with LSN values less than FIL_PAGE_LSN, which further improves crash recovery.

12. Missing questions: How is LOG_BLOCK_HDR_NO calculated

As we said earlier, for a normal log block that actually stores redo logs, there is an attribute in the log block header called LOG_BLOCK_HDR_NO (check back if you forgot), and we said that this attribute represents a unique label. This attribute is assigned when the block is first used and depends on the system LSN value at that time. Calculate the LOG_BLOCK_HDR_NO value of the block using the following formula:

((lsn / 512) & 0x3FFFFFFFUL) + 1Copy to clipboardErrorCopied
Copy the code

The 0x3FFFFFFFUL in this formula can be a little confusing, but the binary representation is probably a little friendlier:

As can be seen from the figure, the first two bits of the binary corresponding to 0x3FFFFFFFUL are 0 and the last 30 bits are 1. As we learned when we first started computing, an ampersand with a zero is always going to be zero, and an ampersand with a one is always going to be the original value. To add and to a number and 0x3ffffful means to set the first two bits of the value to 0, so that the value must be less than or equal to 0x3ffffful. The value of ((LSN / 512) &0x3ffffffful) must be between 0 and 0x3FFFFFFFUL, or between 1 and 0x40000000UL, regardless of the LSN size. And the value 0x40000000UL, which you’re probably familiar with, is 1GB. In other words, the system can generate only 1GB of unique LOG_BLOCK_HDR_NO values. The creators of InnoDB specified that the total size of all redo log files in a redo log group should not exceed 512GB, and that each block size should be 512 bytes. This means that a redo log group should have a maximum of 1GB of blocks, so 1GB of unrepeatable numbers is sufficient.

In addition, the first bit of the LOG_BLOCK_HDR_NO value is a special flush bit. If this value is 1, it indicates that this block is the first to be flushed in any attempt to flush a block from the log buffer to disk.

Nodule 13.

We all know the use of redo logs and when and how they are stored:

InnoDB updates using Write Ahead Log policy, that is, Write the Log first, then Write to disk.

Refer to the article

MySQL: InnoDB Storage Engine (version 2) MySQL: InnoDB Storage Engine (Version 2)