This is the 8th day of my participation in the August More Text Challenge

1. Noun explanation

Dirty page: When the contents of the memory data page are inconsistent with those of the disk data page, the memory page is called dirty.

Clean pages: After data is written to the disk, the contents of the data pages in the memory and on the disk are the same, which is called clean pages.

LSN: Log sequence number. In innoDB storage engine, LSN occupies 8 bytes. The value of the LSN increases gradually as logs are written. The update operation in the transaction generates a new LSN. LSNS exist not only in redo logs, but also in data pages.

2. Causes

MySQL has two important components, memory and disk.

Disks are permanent storage, and data in memory can be lost. However, all memory changes are recorded in the redo log, so even lost data can be retrieved from the redo log.

MySQL IO is very performance-intensive, so many operations are done in memory and not immediately flushed to disk.

For example, when updating a row of data

  • If there is no data for that row in memory, the data page for that row is found from disk

  • Read the entire page into memory (updated in memory if changes to the page are recorded in the change Buffer)

  • The row is updated in memory, and the entire update operation is complete

In this scenario, the data in memory and the data on disk are no longer consistent, resulting in dirty pages.

Since the data in the memory is the latest, the data in the memory is directly returned, saving disk read time and improving performance.

3. relationship between memory data and redo log

To understand the process of cleaning dirty pages, first understand the relationship between data in memory and data in the redo log.

  1. The data in memory is consistent with the data in the redo log

  2. How to determine if it is a dirty page:

  • The LSN is recorded in the redo log

  • The checkpoint in the redo log is located on an LSN, which means that all previous data on the LSN has been written to the disk

  • If a clean page is modified and the LSN of the clean page is greater than that of the checkpoint, the page is marked as a dirty page and recorded in the dirty page list during the redo log writing

Dirty pages are essentially inconsistencies between the contents of memory data pages and disk data pages **. Therefore, dirty pages become clean when the memory data page is flushed or the redo log checkpoint is moved backwards. 六四屠杀

4. When to swipe dirty pages


The timing of dirty pages is as follows:

  1. InnoDB redo log is full
  • At this point, the system stops all updates and pushes the checkpoint forward so that the redo log has room to continue writing. All dirty pages of the logs in the checkpoint range are flushed to disk
  1. When new pages of memory are needed and there is not enough memory
  • Eliminate some memory data pages and free up memory for other data pages. If dirty pages are eliminated, the dirty pages are written to disk first
  1. When MySQL considers the system “idle”, it takes the time to swipe dirty pages whenever it can.

  2. When MySQL shuts down properly, dirty pages in memory are flushed to disk

The CHECKPOINT LSN of flushed dirty pages may be higher than the checkpoint LSN of the redo log, but it does not matter. During redo log replaying, if a data page was flushed, it is recognized and skipped.

5. Impact of dirty pages

You need to look at the impact on performance at different times.

Idle and normal shutdown situations do not focus too much on performance. So let’s focus on case one and case two.

  1. Try not to fill the redo log. If that happens, all updates stop
  • The redo log size must be set properly
  1. Memory pages have three states: unused, clean, and dirty. When the data page to be read is not in memory, a data page must be requested from the buffer pool. At this point, only the most unused data pages can be eliminated from memory: if a clean page is to be eliminated, it can be released for reuse; But if it is a dirty page, you must first brush the dirty page to disk, become a clean page before reuse. If a query has too many dirty pages to weed out, the response time of the query will be significantly longer.
  • Properly control the proportion of dirty pages

6. Control the dirty pages

  1. Innodb_io_capacity: Set innodb_io_capacity properly. This value tells InnoDB how fast it can write to disks. If the system is about to flush dirty pages, InnoDB will use this value to calculate the speed of a dirty page. The dirty pages are not brushed according to this value because normal requests cannot be processed because all the capabilities cannot be used for dirty pages.

  2. Innodb calculates the rate of clean pages (R) based on the current percentage of dirty pages (controlled by innodb_max_dirty_pages_pct) and the remaining redo log space, so innoDB_io_capacity *R%

Set the redo log size, innodb_io_capacity, and innodb_max_dirty_pages_pct properly to prevent dirty pages from affecting performance.

data

  1. MySQL45 speak

The last

If you like my article, you can follow my public account (Programmer Malatang)

My personal blog is shidawuhen.github. IO /

Review of previous articles:

  1. Design patterns

  2. recruitment

  3. thinking

  4. storage

  5. The algorithm series

  6. Reading notes

  7. Small tools

  8. architecture

  9. network

  10. The Go