SQL execution process and log

02 | log system: how to execute an SQL update statement?

1. Redo log

Question:

If every update operation needs to be written to disk, the random I/O cost and search cost are too high

Core technologies:

WAL (Write after Log) : When a log needs to be updated, InnoDB writes the log to redo log (sequential I/O) and updates the memory. The update is complete. Update this record to disk when the system is idle.

InnoDB’s redo log is a fixed size. It starts at the beginning and returns to the beginning when it is full.

With redo log, InnoDB guarantees that all previously committed records will not be lost even if the database restarts unexpectedly. This capability is called crash-safe. (WAL technology)

2. Binlog

Question:

Why do you have a bin log when you have a redo log?

1) Redo log is unique to InnoDB engine; Binlog is implemented in MySQL’s Server layer

2) Redo logs are written in a redo log cycle. The binlog can be appended and does not overwrite previous logs

3) Bin log for backup, master/slave synchronization; Redo log is used to improve performance, reduce I/O, and crash-safe

3. Log writing process (two-phase commit)

The engine updates the data to memory and logs the update to the redo log. The redo log is prepared.

The executor is then told that the execution is complete and the transaction can be committed at any time.

The executor generates a binlog of this operation and writes the binlog to disk.

The executor calls the commit transaction interface of the engine, and the engine changes the redo log to the commit state.

4. Backup process

Log files that exist in bin log. Log files that do not exist in bin log are discarded

12 | MySQL will “shake” why me?

1, the flush

When the contents of the memory data page are inconsistent with those of the disk data page, the memory page is called ** “dirty” **.

Quick updates are written to memory and logs, and MySQL jitters are flushed.

2. Situations where flush is triggered

Pay more attention to the percentage of dirty pages, do not let it often approach 75%.

1) The redo log is full. The system stops all updates, pushes checkpoint forward, and leaves room for redo log writing.

2) The system memory is insufficient. When memory runs low, some data pages need to be weeded out. If dirty pages are eliminated, the dirty pages are written to disk first. Weeding out the most unused data page from memory: if it is a clean page to be weeded out, it is freed for reuse; But if it is a dirty page, you must first brush the dirty page to disk, become a clean page before reuse.

3) The system is idle and shut down, but the performance will not be affected at this time

3, innodb_flush_neighbors * * * *

When preparing to brush a dirty page, if the data page next to the data page happens to be a dirty page, it will brush the “neighbor” with it; And the same logic that drags neighbors down can be extended, meaning that for each neighbor page, if the neighboring pages are also dirty, they will also be shared.

In InnoDB, the innodb_flush_NEIGHBORS parameter controls this behavior. This optimization makes sense in the age of mechanical hard drives and can reduce random I/OS. The random IOPS of a mechanical disk is usually only a few hundred. Reducing the random I/O for the same logical operation means a significant improvement in system performance. If high-IOPS devices such as SSDS are used, you are advised to set innodb_flush_NEIGHBORS to 0. In this case, IOPS is not the bottleneck. By “brushing myself”, necessary dirty page operations can be performed more quickly, reducing THE RESPONSE time of SQL statements. Innodb_flush_neighbors defaults to 0 in MySQL 8.0.

| FAQ article 15 (a) : log and indexes related issues

1. How to ensure data integrity if MySQL restarts unexpectedly at different moments of the two-phase submission?

Rules for determining crash recovery:

Commit the redo log if the transaction is complete and has a commit flag.

If the redo log contains only a complete prepare, check whether the binlog exists and is complete: a. If so, commit the transaction; B. Otherwise, roll back the transaction.

This relies on the redo log two-phase commit

How do redo logs and binlogs relate? They have a common data field called XID

How does MySQL know that binlog is complete

A transaction’s binlog has the full format: a statement binlog, followed by a COMMIT; Binlog in row format, with an XID event at the end.

Binlog is also primarily used for master/slave synchronization

3. Why does binlog not support crash recovery

InnoDB engine uses WAL technology. When executing a transaction, memory and log are written and the transaction is completed. If it crashes later, you rely on the log to recover the data pages. The binlog does not record the update details of the data page, so it cannot be patched back.

Redolog supports crash recovery

4. Why not just redo logs

One is filing. Redo logs are written in a redo log. This prevents historical logs from being retained and the redo log from being archived.

One is that the MySQL system relies on binlog. Binlog has been a feature of MySQL since its inception and is used in many ways. The high availability of MySQL system is based on binlog primary/secondary replication.

5. Do redo logs or buffer pools update data in the redo log or buffer pool? Why redo log crash recovery is supported

The redo log does not record the complete data of the data page, so it does not have the ability to update the disk data page

The process of cleaning dirty pages has nothing to do with redo logs. Brushing dirty pages writes values from the buffer pool to disk

In crash recovery scenarios, InnoDB reads a data page to memory if it thinks it may have lost updates during crash recovery, and redo log updates the contents of memory. After the update, the memory page becomes dirty **. Redolog is only responsible for restoring memory

What is a redo log buffer? Do I modify memory first or do I write the redo log file first?

During an INSERT or update, the memory of the data is modified and the redo log buffer is written. The log is actually written to the redo log file when the COMMIT statement is executed.

18 | why these SQL statements logic, the same performance but great differences?

1. Conditional field function operation

The optimizer decides not to use the tree-searching function because functional manipulation of index fields can break the order of index values.

Implicit type conversion

What are the rules for converting data types? In MySQL, when you compare a string to a number, you convert a string to a number.

Why do we need a full index scan when we have a cast? Trigger the above rule: the optimizer disallows tree-searching by performing a function on an index field.

23 | MySQL is how to ensure that data is not lost?

1. Write mechanism of binlog

During transaction execution, logs are first written to the binlog cache, and at transaction commit time, the binlog cache is written to the binlog file. The binlog cache is allocated one memory per thread

** Write refers to writing logs to the file system’s Page cache** without persisting data to disk, so it is faster.

Fsync Indicates the operation of persisting data to disks. Fsync accounts for IOPS of disks.

The timing of write and fsync is controlled by the sync_binlog parameter:

If sync_binlog is 0, only write is committed for each transaction, not fsync.

If sync_binlog=1, fsync will be performed every time a transaction is committed.

When sync_binlog=N(N>1), it indicates that every transaction is committed and fsync is performed after N transactions are accumulated.

In IO bottleneck scenarios, set sync_binlog to a large value to improve performance. However, if sync_binlog is set to N, the binlog of the last N transactions will be lost if the host restarts abnormally.

2. redo log writing mechanism

There are three redo log states:

The redo log buffer is physically stored in MySQL process memory.

Write to disk, but not persist (fsync), physically in the file system’s Page cache;

Persisting to disks corresponds to hard disk.

Log writing to the redo log buffer is fast, wirte to page cache is about the same, but persistence to disk is much slower.

** The innodb_flush_log_at_trx_COMMIT parameter, ** It has three possible values:

If the value is set to 0, only the redo log is left in the redo log buffer for each transaction committed.

If the value is set to 1, the redo log is persisted to disk during each transaction.

If set to 2, only the redo log is written to the page cache each time a transaction commits.

In addition, InnoDB has a background thread that writes logs from the redo log buffer to the page cache of the file system every second, and then calls fsync to persist them to disk.

If innodb_flush_log_at_trx_commit is set to 1, the redo log must be persisted once in the prepare phase because the crash recovery logic depends on the prepare redo log. ** The redo log is written to the page cache of the file system without fsync at commit time. ** The redo log is written to the page cache of the file system

3. Group submission

MySQL’s “double 1” configuration means that both sync_binlog and innodb_flush_log_at_trx_COMMIT are set to 1. In other words, ** a transaction must wait twice for a redo log and a binlog before committing.

Group commit mechanism to minimize disk flushing operations

4. What can I/O performance bottlenecks be improved

Set sync_binlog to a value greater than 1. The risk is that binlog logs will be lost when the host is powered off.

Set innodb_flush_log_at_trx_COMMIT to 2. The risk is that the mainframe loses data when it loses power.

Transactions, isolation mechanisms, locks, MVCC

03 | transaction isolation: why you changed I still can’t see?

1. Transaction concept

The ACID properties. Not all engines support transactions (innoDB support)

2, SQL standard transaction isolation level

Read uncommitted means that changes made by a transaction can be seen by other transactions before it is committed. (Dirty read: read records that have not been committed by other transactions)

Read commit means that after a transaction commits, its changes are seen by other transactions. (Unrepeatable read: changes committed by other transactions are read during the execution of a transaction (update operation), causing different data to be read before and after the transaction.)

Repeatable read means that the data seen during the execution of a transaction is always the same as the data seen at the start of the transaction.

Serialization, as the name implies, is that for the same row, “write” will be added “write lock”, “read” will be added “read lock”. When a read-write lock conflict occurs, the last accessed transaction must wait for the previous transaction to complete before execution can continue.

3. Consistency view read-view

A view is created in the database, and the logical result of the view is used when accessing it.

Repeatable read: The view is created when the transaction starts and is used for the entire life of the transaction.

Read Commit: The view is created at the start of each SQL statement execution

Read uncommitted: Returns the latest value on the record directly, with no view concept

Serialization: Lock directly to avoid parallel access

4,Roll back logundo log

Each record is updated with a **** rollback log ** (undo log). The latest value on the record, by rolling back the log operation, can get the value of the previous state **; The rollback log is deleted when there is no read-view older than the rollback log. This is why it is advisable to avoid using long transactions.

06 | global lock and table lock: add a field to watch how there are so many obstacles?

1. Global locking

Global locking is to lock the entire database instance. The typical use scenario is to make a full logical backup of the database (FTWRL method).

Making the entire library read-only is dangerous:

If you back up on the master, you can’t perform updates during the backup, and the business is basically shut down.

If you are backing up on a secondary database, the secondary database cannot perform the binlog synchronized from the primary database during the backup, resulting in a master-slave delay.

2. MVCC avoids global locking

The official logical backup tool is mysqldump. A transaction is started before the data is derived to ensure that the consistency view is retrieved. Due to the support of MVCC, data can be updated normally during this process.

If a table is using an engine that does not support transactions, the backup can only be done through the FTWRL method. This is often one of the reasons dbAs ask business developers to use InnoDB instead of MyISAM.

3, table locks

For InnoDB’s row-locking engine, the lock tables command is generally not used to control concurrency, because the impact of locking the entire table is still too large.

4, MDL lock

MDL does not need to be used explicitly and is added automatically when accessing a table.

Add MDL read lock when adding, deleting, modifying or checking a table;

Add an MDL write lock to a table when making structural changes to the table.

Read locks are not mutually exclusive, so you can have multiple threads adding, deleting, or modifying a table at the same time. Read/write locks and write locks are mutually exclusive to ensure the security of changing the table structure

Do not alter table in a transaction, because the MDL lock in a transaction is acquired at the beginning of the statement execution, but will not be released immediately after the statement completion, but will be released after the entire transaction committed. This can cause a deadlock

An ideal mechanism would be to set the wait time in the ALTER TABLE statement and not block subsequent business statements without obtaining the lock

07 | row locking merits: how to reduce the row locks the performance impact?

1. Two-stage lock protocol

In InnoDB transactions, row locks are added when they are needed, but are not released immediately when they are not needed, but wait until the end of the transaction. This is the two-phase lock protocol (to prevent two transactions from writing simultaneously)

How does that help us with transactions? If you need to lock more than one row in your transaction, put the locks that are most likely to cause lock conflicts and affect concurrency as far back as possible

Deadlock and deadlock detection

There are two strategies: one is to simply wait until time out. Another strategy is to initiate deadlock detection, where a deadlock is found and one of the transactions in the chain is actively rolled back to allow the other transactions to continue

Under normal circumstances, we still use the second strategy, namely: active deadlock detection; Each time a transaction is locked, it is necessary to see if the thread on which it depends has been locked by someone else, and so on. Finally, it is necessary to determine whether a waiting loop, known as a deadlock, has occurred.

3. Performance problems caused by hot row updates (deadlock detection)

What if all transactions update the same row? Each new blocked thread has to determine whether it has caused a deadlock by adding it, which is an O(n) operation. Assuming 1000 concurrent threads are updating the same row at the same time, deadlock detection operations are on the order of 1 million. Although the final result of the test is no deadlock, it consumes a lot of CPU resources in the process.

1) Control concurrency. Consider implementing it in middleware

2) Consider reducing lock conflicts by changing one row to logically multiple rows

08 | transaction is isolated or not? (key)

1. Transaction start time (view creation time)

Begin /start transaction commands are not the starting point of a transaction, the transaction is actually started by the first statement that operates on the InnoDB table after they are executed. If you want to start a transaction immediately, use the start Transaction with consistent snapshot command.

In the first startup mode, the consistency view is created when the first snapshot read is executed. In the second startup mode, the consistency view is created when start Transaction with Consistent Snapshot is executed.

MVCC: read view+****row trx_id+**undo log

Each transaction in InnoDB has a unique transaction ID and is strictly incremented.

Each time a transaction updates the data, the undo log is also generated and the transaction ID is denoted as row trx_id for the data version. The value of the old data version is overwritten by the new data value, but can be deduced from undo log and the latest data value

InnoDB constructs an array for each transaction to hold all transaction ids that are currently started but not committed at the moment the transaction is started. The minimum value of transaction ids in the array is denoted as the low watermark, and the maximum value of transaction ids that have been created in the current system plus 1 is denoted as the high watermark. This array of high and low water levels forms a read-view of the current transaction.

3, Consistency read (repeatable read isolation mechanism select read operation) (MVCC)

Innodb ensures that all uncommitted transactions are not visible before the transaction starts. It is not enough to store the maximum number of committed transactions. The problem is that transactions that are less than the maximum may be updated later. So the transaction starts with a “list of all transaction ids that are currently executing”, and if a row trx_ID is in the list, it is not visible.

Repeatable reads are consistent reads, implemented in MVCC mode. Read commits are also MVCC controlled, except that the repeatable read View is generated at transaction startup and the read view is regenerated at each SQL statement. There is another problem with consistent reading: phantom reading, which requires gap locking

Determined by the incrementation of the transaction ID, the consistency view (read-view) of the **** transaction, and the row trx_id of the data row:

1) If the row trx_id is less than ****read-view’s low water level, the version is visible if the transaction was committed or generated by the current transaction.

2) If the value of the row trx_id is greater than or equal to ****read-view’s high water level, this version will not be visible because it was generated by a future transaction.

3) If the row trx_id is in between, there are two cases

A. If row trx_id is in an array, this version was generated by a transaction that has not yet committed.

B. If row trx_id is not in the array, this version is generated by a committed transaction.

If the current version of the row is not visible, redo log the previous version until it is visible

4, Current read (update operation) (two-phase lock protocol)

Update data is read before it is written, and this read can only read the current value, called ** “current read” ** (current read). Read locks (S locks, shared locks) and write locks (X locks, exclusive locks)

Update, select for update; Select lock in share mode Read lock

Current read and two-phase lock to avoid update loss:

1) Transaction A starts, transaction B starts, transaction A writes to a row, transaction B writes to a row, transaction A commits, transaction B commits: the update is lost, the case of B overwriting A does not occur. Because a’s write lock is released after the commit, B cannot write before A commits

2) Transaction A starts, transaction B starts, transaction A writes to a row, transaction A commits, transaction B writes to a row, transaction B commits: the update is lost, the case of B overwriting A does not occur. Because b’s write operation is the current read, it bypasses MVCC’s view mechanism

20 | read what is magic, magic to read what’s the problem? (key)

Set at repeatable read isolation level

1, phantom (repeatable read MVCC will not phantom, the current read will occur)

Phantom reading is when a transaction queries the same range twice before and after, and the latter query sees rows not seen by the previous query.

At the repeatable read isolation level, normal queries are snapshot reads and do not see data inserted by other transactions. Therefore, illusion only appears under “current read”. The update operation of a transaction is seen by the SELECT statement of another transaction using the “current read” and cannot be called a phantom read, which only refers to the “newly inserted row”.

Even if you lock all records, you still can’t prevent new records from being inserted, and you can’t prevent phantom reads

2, Gap Lock, next-key Lock

A conflict with gap locking is the operation “insert a record into the gap”. ** There is no conflict between gap locks. For the same interval, multiple transactions can add gap locks at the same time. ** This is different from read and write locks

A gap lock and a row lock are collectively called next-key locks. Each next-key lock is a front-open and back-closed interval

3. Gap lock problems and solutions

1) Update or select… The for UPDATE operation is expensive: all records are locked, and all intervals are locked. Do the same for index fields: only the index field itself and adjacent intervals will be locked.

Therefore, update and DELETE statements without indexes are scary

2) Easy to deadlock. The business logic runs like this: Lock any row, insert it if it doesn’t exist, and update its data if it does. When two transactions are concurrently accessing a row, gap locks can interlock with each other, resulting in deadlocks

Insert… on duplicate key update

Solution: When services do not strongly require consistent read, commit isolation is used.

Gap locks are only effective at the repeatable read isolation level because phantom reads violate the repeatable read consistency principle. So, if you set the isolation level to read commit, there is no gap locking, because read commit does not require consistent reads. At the same time, however, you need to set the binlog format to row to resolve possible data and log inconsistencies

21 | why do I only change a line statements, lock so much? (key)

1, repeatable read consistency (anti-magic read) lock principle

Principle 1: The basic unit of locking is next-key lock. Front open back closed interval.

Principle 2: Next-key lock is applied to all objects accessed during a full table scan on a non-indexed field.

Optimization 1: equivalent query on a unique index. If equivalent condition is met, next-key lock degrades to row lock.

Optimization 2: Equivalent query on a unique index. If equivalent query conditions are not met, next-key lock degrades to a gap lock.

Optimization 3: equivalent query on non-unique index, right traversal will access the first value that does not meet the condition, then the last next-key lock degenerates into a gap lock, traversal process is a complete next-key lock. Instead of degenerate into a row lock, it locks the open interval of row + left and right

** for example, ** already exists id, a: 5,5; 10, 10; Id is the unique index, and A is the common index

Lock only one row of the record whose ID is 5

Operation record 7 (not present), lock interval (5,10)

Operation a is 5, lock interval (0,10) add a is 5 row

Record of operation a = 7 (does not exist), lock interval (5,10)

If the range query of an index field is executed, the next key lock is executed. If the range query of an index field is executed, the next key lock is executed. If the range query of an index field is executed, the next key lock is executed

If a lock in share mode query uses an overwrite index, it does not need to access the primary key index, so there is no lock on the primary key index and it can be modified. Lock in share mode only overwrites the index, but if it is for update, the system will assume that you are going to update the data next, so it will add a row lock on the primary key index that meets the condition.

2, the next – the key lock

Next-key lock= row lock + gap lock. Interlock is not mutually exclusive, row lock is mutually exclusive, analysis of the next key lock in two

| FAQ (2) of article 30: the lock with a dynamic point of view

1. Equivalent query in unequal condition

select * from t where id>9 and id<12 order by id desc for update; This statement locks (0,5), (5,10), and (10, 15) on the primary key index. In other words, the row id=15 is not locked.

The optimizer must find the first value with id<12 in order by id desc. I was looking for the value of id=12, but I didn’t find it, but I found the gap of 10,15.

Then we iterate to the left. During the iteration, instead of an equivalent query, we scan for the row id=5, so we add a next-key lock (0,5).

2, equivalent in query lock process, resulting in deadlock

Select id from t where c in(5,20,10) lock in share mode;

Lock (0,5) and (5,10) when looking for c=5. Similarly, if c=10, the lock ranges are (5,10] and (10,15); When c=20 is executed, the lock ranges are (15,20) and (20,25). These locks are added “one by one during execution” rather than all at once.

Select id from t where c in(5,20,10) order by c desc for update; select id from t where c in(5,20,10) order by c desc for update; The two statements lock the same resource, but in reverse order. When these two statements are executed concurrently, a deadlock can occur.

3. The clearance lock range automatically becomes larger

For example, the delete operation deletes the line id=10, and the original two gaps (5,10) and (10,15) become one (5,15). Another transaction executes a SELECT statement, does nothing, but its lock range is suddenly “larger”

Select * from t where C >=15 and C <=20 order by C desc lock in share mode; The scope of the lock is: on index C (5, 25); Select * from primary key where id=15 and id= 20;

Select * from c where c=20 and next-key lock (15,20); select * from c where c=20 and next-key lock (15,20); Traverses left on index c (range query) and does not stop until c=10, so next-key lock will increment to (5,10)

In other words, the gap is essentially defined by the record to the right of the gap.

Why do 39 | on the primary key is not straight?

1. Where is the self-value-added preserved

InnoDB’s self-increment is stored in memory, and it was not until MySQL 8.0 that self-increment persistence was introduced. The self-increment changes were recorded in the redo log, and the redo log was used to restore the original values during restart.

2. Use mechanics

Alter TABLE AUTO_INCREMENT increment (int id, int id, int id); alter table AUTO_INCREMENT (INT id, int id, int id, int id);

If the id field specifies a value when inserting data, the value specified in the statement is used. If the value is greater than the maintenance self-value-added value, the self-value-added value increases. Otherwise, the value remains unchanged

Auto_increment_offset and auto_increment_increment are two system parameters that are used to indicate the initial value and the step size for incrementing, respectively. The default value for both is 1.

In some scenarios, not all defaults are used. Auto_increment_increment =2; for example, if a double M is used as a primary key, we might set auto_increment_increment=2 so that the ids of one library are odd and the ids of the other library are even

3. Cause of the discontinuous primary key ID of the autoincrement

1) Unique key conflict. Write (0,1,1); Select * from t where id = 2; Change the value of the passed row to (2,1,1); Alter table autoincrement to 3; Insert (id=2); insert (id=2);

**2) Transaction rollback, ** similar to a unique key conflict

3) ** Apply for the id policy during batch insert. ** The number of self-added ids applied for each time is twice that of the last time. Unused ids will be discarded

4. Why can’t self-appreciation be reversed?

If transaction A applies to id=2 and transaction B applies to ID =3, then the increment of table T is 4

Transaction B committed correctly, but transaction A had A unique key conflict. If transaction A is allowed to roll back the increment ID of table T, that is, to change the current increment of table T back to 2, then the situation will occur ** : there are already rows in the table with ID =3, and the current increment ID is 2. ** Next, other transactions that continue will apply to id=2, and then to id=3. At this point, an insert statement error “primary key conflict” occurs. Solving this problem can lead to performance problems

**5, ** self-increment lock

The increment ID lock is not a transaction lock, but is released immediately after each transaction is applied to allow other transactions to apply again

MySQL 5.1.22 introduced a new policy with the innodb_autoinc_lock_mode parameter. The default value is 1.

When this parameter is set to 0, the lock is released only after the statement is executed.

When this parameter is set to 1: for normal INSERT statements, the autolock is released immediately after the request is made. Similar to the insert… Select batch insert data statement, autoincrement lock or wait for the end of the statement before being released;

When the value of this parameter is set to 2, all requests for auto-increment primary keys release the lock on request

Why are the rules for releasing auto-increment locks different for batch inserts?

If statement is set to “binlog_format”, the IDS of session A and SESSION B are not consecutive, but the binlog must remember either session A or Session B first. When the secondary database uses binlog for primary/secondary synchronization, session B generates results with consecutive ids. At this point, data inconsistencies occur in the library.

Two solutions:

**1) ** let the original library batch insert data statement, fixed generation of continuous ID values. Therefore, the auto-increment lock is not released until the end of the statement execution to achieve this purpose. ** Innodb_autoinc_lock_mode set to 1 **

**2) ** No longer relies on auto-increment of primary keys to generate binlog values. Innodb_autoinc_lock_mode is set to 2 and binlog_format is set to row

40 | insert statement lock why so many?

1, insert… select

Insert… Select is a very common way to copy data between two tables. Note that at the repeatable read isolation level, this statement locks the records and gaps scanned in the select table.

When binlog_format=statement, run the following command: insert into T2 (c,d) select C,d from t; This statement locks all rows and gaps in table T

Transaction A is inserted and transaction B is copied. In the case of binlog_format=statement, the statement is executed in the standby database.

2, insert unique key conflict

Insert statements that have unique key conflicts add a shared next-key lock(S lock) to the unique value that conflicts. Therefore, when encountering an error due to a unique key constraint, commit or roll back the transaction as soon as possible to avoid excessively long lock times.

3, insert into… on duplicate key update

Insert into… On duplicate key UPDATE inserts a row of data and executes subsequent updates if unique key constraints are encountered. If more than one column violates the uniqueness constraint, the row that conflicts with the first index is modified in index order.

Index and storage principle

04 | construct index (on)

1, the pros and cons of different types of indexes (why innnoDB uses B+ tree)

1) Hash: Fast read and write is suitable for equivalent query, but disorderly and inconvenient for interval query

2) Ordered arrays: only works with static storage engines because write operations are slow

3) Search tree: compromise

**2, innnoDB index: ** cluster index

Primary key index (clustered index) : The leaf node of the primary key index stores the entire row of data

Non-primary key index (non-clustered index) : The leaf node content of a non-primary key index is the value of the primary key. So you need to go back to the table

3. Use and maintenance

1) Write operation ** : page split **; Read operation: page merge

In order to reduce the occurrence of the above situation: auto-increment primary key insert data mode, every time a new record is inserted, it is an append operation, does not involve moving other records, and does not trigger leaf node splitting

2) The smaller the primary key length, the better

First, the smaller the primary key length, the smaller the leaf node of the normal index, and the smaller the space occupied by the normal index

Second, the smaller the primary key, the more primary keys a data page has. The size of non-leaf nodes of clustered index B+ tree depends on the size of data page, and the more primary keys of data page, the fewer B+ tree layers and higher read and write efficiency

05 | construct index (below)

1. Overwrite indexes

The required fields are also placed in the non-primary key index to avoid the function of the table ****

In addition, there is an application: deferred associative query

2. Left-most prefix rule

In the case of a multi-field joint index, the index entries are sorted in the order of the fields that appear in the index definition. Not only the full definition of the index, but also the left-most prefix, can be used to speed up retrieval. The leftmost prefix can be the leftmost N field of a union index or the leftmost M characters of a string index.

Since the left-most prefix can be supported, there is generally no need to create a separate index on a when there is already a joint index (a,b)

3. Index push down

The left-most prefix can be used to locate records in the index when the left-most prefix rule is met. What about the parts that don’t fit the leftmost prefix?

Prior to MySQL 5.6, only one table was returned. Find the rows on the primary key index and compare the field values. The index condition pushdown feature introduced in MySQL 5.6 can be used to determine the fields contained in the index during index traversal and filter out the records that do not meet the conditions to reduce the number of table returns.

4. Rebuild the index

Is it reasonable to rebuild index K with two ALTER statements and primary key index with two ALTER statements?

Why rebuild the index? An index may be dropped, or a page split, resulting in a void in the data page. The process of reconstructing an index creates a new index and inserts the data in order to maximize page utilization

The method of rebuilding index K is reasonable and can achieve the purpose of saving space. However, the process of rebuilding the primary key does not make sense. Either ** deletes a primary key or creates a primary key, the entire table is rebuilt. ** So if you execute both statements, the first statement is useless. Alter table T engine=InnoDB alter table T engine=InnoDB

5. The storage mode of non-primary key indexes

Leaf nodes that are not primary key indexes hold primary keys. If a field in a non-primary key index contains a field in the primary key index, InnoDB places the primary key field after the index definition field, and of course removes the primary key field.

For example, when the primary key is (a,b), the index defined as C is actually (c,a,b); The index defined as (c,a) is actually (c,a,b); The index defined as (c,b) is actually (c,b,a).

09 | normal index and the only index, how should choose?

1. Comparison of query performance between common indexes and unique indexes

After a normal index finds the first record that meets the condition, it searches for the next record until it encounters the first record that does not meet the condition. Unique index because indexes define uniqueness, the search is stopped after the first record that meets the condition is found.

The performance gap is minimal. Because the engine reads and writes on a page, most of the time only one pointer lookup and one calculation are required.

2 * *, change buffer * *

When a data page needs to be updated, InnoDB will update the data page directly if it is in memory, and if the data page is not already in memory InnoDB will cache these updates in the Change buffer so that it does not need to read the data page from disk. The next time a query needs to access the data page, the data page is read into memory and the change Buffer operations related to the page are performed.

A change buffer is data that can be persisted. The change buffer is copied in memory and is also written to disk. The process of applying the operations in the Change Buffer to the original data page to get the latest results is called merge. In addition to accessing the data page that triggers the merge, the system has background threads that merge periodically.

The change Buffer will significantly improve update performance because it reduces random disk access.

For businesses that write too much and read too little, the probability of the page being accessed immediately after writing is relatively small. In this case, the use of change Buffer is the best. This business model is common in billing and logging systems. On the other hand, if a business update pattern is that a query is made immediately after a write is written, then even if the condition is met, the update is recorded in the Change Buffer first, but then the merge process is triggered immediately because the data page is to be accessed immediately. This will not reduce the number of random ACCESS to IO, but increase the maintenance cost of change buffer. Therefore, change Buffer is a side effect of this business model.

3. Update performance comparison between ordinary and unique indexes

Unique index All update operations must first determine whether the operation violates the uniqueness constraint. For example, to insert the record (4,400), we need to determine whether there is already a record k=4 in the table, which must be read into memory. If it is already read into memory, it is faster to update the memory directly, and there is no need to use change buffer. Therefore, the change Buffer cannot be used for updates to unique indexes, and in fact only normal indexes can be used.

Redo log and change buffer

The redo log saves I/O consumption from random disk reads, while the Change Buffer saves I/O consumption from random disk reads.

The change buffer is recorded in the redo log during the transaction commit, so it can be retrieved during crash recovery.

Merge reads data pages from disk into memory (older data pages). Find out the change buffer records of the data page from the change buffer (there may be more than one) and apply them successively to get the new version of the data page; Write the redo log. The redo log contains data changes and changes to the change buffer. This is the end of the merge. At this time, the data page and the disk position corresponding to the change buffer in memory have not been modified, which belongs to the dirty page. After that, it is another process to brush back their own physical data.

**5, ** normal index and unique index selection

First, business correctness comes first. If the business is not guaranteed, or if the business simply requires the database to do the constraint, then there is no choice but to create a unique index.

However, in some “archive” scenarios, you can consider using plain indexes. For example, online data only needs to be kept for half a year, and historical data is kept in an archive. At this point, archiving the data ensures that there are no unique key conflicts. To improve archiving efficiency, consider changing the unique index in a table to a normal index. Previous question time

10 why sometimes choose the wrong | MySQL index?

1. The basis on which the optimizer selects indexes

The number of rows scanned, whether to use temporary tables, whether to sort, primary key index or non-primary key index and other factors for comprehensive judgment. The key is the judgment of the number of lines scanned

MySQL does not know exactly how many records meet this condition until it actually starts executing the statement, but can only estimate the number of records based on statistics. This statistic is the “differentiation” of the index. Obviously, the more different values there are on an index, the better the index is differentiated

2, index cardinality calculation

Instead of using row statistics, sampling statistics were used. InnoDB will select N data pages by default, count the different values on these pages, get an average, and then multiply by the number of pages in this index to get the cardinality of the index. If the number of changed data rows exceeds 1/M, the index statistics will be restarted automatically. You can select this parameter by setting innodb_stats_persistent

3. Remedial measures

The analyze table t command is used to re-analyze index information

Run the force index command to forcibly select an index

11 | how to add index to the string field?

1. Prefix index

Using a prefix index with a defined length can save space without incurring much additional query cost. When creating a prefix index for a string, is there any way to determine how long a prefix I should use? Count how many different values there are on the index to determine how long prefixes to use.

The use of prefix indexes eliminates the optimization of query performance by overwriting indexes, because prefix indexes are necessarily not unique and require back-table operations.

2. Other common ways

Reverse order storage, and then create prefix index, used to bypass the character string prefix distinction is not enough

Creating hash field indexes ensures stable query performance and extra storage and computing costs

However, range scanning is not supported

13 | why table data delete half, table file size remains the same?

1. Deletion mechanism

The delete command simply marks the location of the record, or data page, as “reusable”, but the disk file size does not change. That is, table Spaces cannot be reclaimed by the delete command. These can be reused, but the unused space looks like ** “empty” **

2, insert, change the value on the index, will bring a void

3, the alter table

Create table B with the same structure as table A and insert rows from table A into table B in ascending order of primary key ids.

This is why it is most appropriate for mysql to insert data in ascending order of primary key ID, right

Alter table A engine=InnoDB; alter table A engine=InnoDB; alter table A engine=InnoDB

The most time-consuming step is to insert data into the temporary table, which can cause data loss if new data is written to table A during this process. Therefore, there can be no updates in table A throughout the DDL process

4, Online DDL

Process:

· Create A temporary file and scan all data pages of primary key of table A;

· Generate B+ trees from the records of table A in the data page and store them in temporary files;

· In the process of generating temporary files, all operations on A are recorded in A row log file **;

· After the temporary file is generated, the operations in the log file are applied to the temporary file to obtain A data file with the same logical data as Table A;

· Replace table A’s data files with temporary files.

DDL is an Online DDL that requires an MDL write lock.

The ALTER statement needs to acquire the MDL write lock when it is started, but the write lock degrades to a read lock before the data is actually copied. To get Online, the MDL read lock does not block add, delete, or change operations. To protect yourself, forbid other threads to DDL the table at the same time. For a large table, the most time-consuming process of Online DDL is the process of copying data to a temporary table, which can be added, deleted, or modified. As a result, the locking time is very short relative to the overall DDL process. For business, it can be considered Online.

It should be added that operations are IO and CPU intensive. So if it’s an online service, you need to be very careful about how much time you spend on it.

Optimize table, Analyze table and ALTER table

As of MySQL 5.6, ALTER table T engine = InnoDB (i.e. set) is the online DDL by default;

Select * from table_name; select * from table_name; select * from table_name;

Table T = math +analyze

16 | “order by” how did it work?

1, sort_buffer

MySQL allocates each thread a block of memory for sorting, called sort_buffer.

Sorting may be done in memory or external sorting may be required, depending on the amount of memory required and the parameter sort_BUFFer_SIZE. If the amount of sorting data is too large to fit in, you have to use temporary files on disk to assist sorting.

2, all field sort

1) Initialize sort_buffer and the fields to be returned (select);

2) Filter the required records according to where, fetch the fields to be returned, and store them in sort_buffer;

3) Quicksort the data in sort_buffer by field;

Only the original table data is read once, and the rest is done in sort_buffer and temporary files. However, if the query returns a large number of fields, the sort_buffer will have too many fields to fit in memory at the same time, and will be divided into many temporary files, resulting in poor sorting performance.

3. Rowid sorting

Max_length_for_sort_data is a parameter in MySQL that controls the length of rows used for sorting. If the length of a single line exceeds this value, MySQL considers the single line to be too large and needs to change algorithms

Only the primary key ID and sort fields are put into sort_buffer, sorted and then retrieved by id ****

Rowid sort accesses table T one more time, reducing the sort_buffer space

4. All-field sorting VS Rowid sorting

One of the design ideas of MySQL is that if you have enough memory, use more memory and minimize disk access.

For InnoDB tables, roWID sorting will require multiple disk reads from tables, so it will not be preferred.

5. Use index sort

Design overlay index: Sort fields + fields to return

It is not necessary to create a federated index for all the fields involved in the statement in order for each query to be able to override the index. After all, indexes have maintenance costs. It’s a tradeoff

Fourth, distributed correlation

Structure mode: master, slave and standby; Primary/secondary replication, primary/secondary delay, primary/secondary switchover, and read/write separation

24 | MySQL is how to ensure the main case of consistent?

1. Master/slave synchronization process (The role of the binlog)

Standby database B maintains A long connection with primary database A. Primary library A has an internal thread dedicated to servicing this long connection for standby library B.

Run the change master command on standby library B to set the IP address, port number, user name, and password of primary library A, and the location from which to request binlog.

Run the start slave command on slave library B. In this case, the slave library starts two threads: io_thread and SQL_thread. Io_thread is responsible for establishing connections to the primary library.

After verifying the user name and password, primary database A reads the binlog from the local database to the location sent by secondary database B and sends the log to secondary database B. Standby database B obtains the binlog and writes it to a local file, which is called the relay log.

Sql_thread reads transfer logs, parses commands in logs, and executes them.

2. Binlog format

Statement: Binlog records SQL commands. It may be the case that the primary library executes this SQL statement with index A; When the SQL statement is executed in the standby database, it uses index B, so the primary and secondary databases execute different rows

Row: the binlog contains the primary key ID and value of the actual deleted rows.

3, Mixed format log of binlog

The statement binlog may cause an inconsistency between the master and the slave.

The downside of the ROW format is that it takes up a lot of space. For example, if you use a DELETE statement to delete 100,000 rows of data, the statement is an SQL statement stored in a binlog that takes up tens of bytes of space. But if you use a row binlog, you write all 100,000 records to a binlog. It also consumes IO resources and affects the execution speed.

Mixed binlog: MySQL determines for itself whether this SQL statement may cause a master/slave inconsistency. If so, use the row format, otherwise use the statement format.

Row also has the added benefit of making it easy to recover data.

The delete statement is executed, and the binlog in row format also stores the entire row information of the deleted row. If a delete statement is executed and the wrong data is deleted, you can convert the delete statement from the binlog statement into an INSERT statement and restore the deleted data.

**4, **** double M structure, ** master/slave synchronous cyclic replication problem

Double M structure (different from m-S structure) : Nodes A and B are always in active/standby mode. Therefore, there is no need to change the active/standby mode during switchover. However, master/slave synchronization has the problem of circular replication

Solution: Specify that the server IDS of the two libraries must be different; A standby database receives a binlog and, during playback, generates a new binlog with the same server ID as the original one. When each library receives a log from its master library, it determines the server ID and discards the log if it is the same as its own.

How to ensure the high availability is 25 | MySQL?

1. Master/slave delay

Master/slave delay: The difference between the completion time of the slave and the completion time of the master for the same transaction. Run the show slave status command on the slave database. Seconds_behind_master is displayed in the return result, which indicates the number of seconds delayed for the current slave database

The most direct manifestation of the master/slave delay is that the slave consumes relay logs at a slower rate than the master generates binlogs

2. Reasons and treatment methods

1) The performance of the machine where the standby library is located is worse than that of the machine where the primary library is located

2) The pressure of standby storage is high, and the pressure control of standby storage is ignored.

It can be handled like this:

One master follows many. In addition to the standby library, you can connect several slave libraries, let these slave libraries to share the pressure of reading.

Binlog output to an external system, such as Hadoop, to provide statistical query capabilities.

3) Big business.

4) Standby library single thread replication (old version), master library multi-thread write

3. Active/standby switchover (considering the existence of active/standby delay)

1) Reliability priority strategy:

Determine the current seconds_behind_master of secondary database B. If it is less than a certain value (say 5 seconds), proceed to the next step. Otherwise, retry this step continuously.

Change primary library A to read-only state, that is, set readonly to true;

Check the seconds_behind_master value of secondary database B until the value becomes 0.

Alter database B to be read/write, i.e. set readOnly to false;

The business request is cut to standby database B.

As you can see, there are unavailable times in this switching process.

2) Availability first policy:

If the connection is directly cut to standby database B and standby database B can read and write, the system will be almost unavailable. We call this switching process, tentatively, the availability first process. The cost of this switching process is the possibility of data inconsistencies.

3) Availability first policy combined with binlog

Data inconsistencies are easier to spot when using a row format binlog. When mixed or statement binlogs are used, data inconsistencies are difficult to detect.

26 why | for library will be delayed several hours?

1, the new standby library multi-thread replication principle

Can transactions be distributed to threads in a polling fashion, where the first transaction is distributed to thread 1 and the second transaction is distributed to thread 2?

I can’t. Due to CPU scheduling policies, it is likely that the second transaction will end up executing before the first. However, if the two transactions update on the same row, it means that the two transactions on the same row are executed in the opposite order on the primary and standby, leading to the problem of master/standby inconsistency.

Can multiple update statements from the same transaction be distributed to different threads for execution?

I can’t. A transaction to update the each row in the table table t1 and t2, the two were assigned to the different threads of the update statement, although the end result is the main equipment consistent, but if the table t1 completes in an instant, a query for library, you will see this transaction “updated half results”, destroyed the transaction logic isolation.

Two basic principles:

Two transactions updating the same row must be distributed to the same thread.

The same transaction cannot be split and must be placed in the same thread.

2. Multi-threaded replication Strategy based on principles (self-designed)

1) Table by table distribution strategy: if two transactions update different tables, they can be parallel; If there are cross-table transactions, put the two tables together

Disadvantages: If hot table is encountered, for example, all update transactions will involve a certain table, all transactions will be assigned to the same worker, which will become single-thread replication.

2) Row-by-row distribution strategy: If two transactions do not update the same row, they can be executed in parallel on the standby repository. Obviously, this pattern requires that the binlog format be ROW

MySQL 5.6 parallel replication strategy

The official MySQL5.6 release supports parallel replication, but at a granularity of library-by-library parallelism

It’s quick to hash, just the name of the library; And you don’t have so many DB’s on an instance that you need to construct a million items. The binlog format is not required. Because statement binlogs can easily get library names. However, this strategy will not work if all the tables on your main database are in the same DB

27 | there is something wrong with the main library, from the library?

1. The basic structure of one master and many slaves

A and A ‘are primary and secondary, and libraries B, C, and D point to primary library A. The master library is responsible for all writes and some reads, while the other read requests are shared by the slave library.

2. Master/slave switchover in a master/slave architecture

After the switchover, A ‘will become the new master library, and the slave libraries B, C and D will also be connected to A’. The complexity of the master/slave switchover is increased by the addition of the process of redirecting from libraries B, C, and D.

To set node B to be the slave library of node A ‘, run the change master command: the MASTER_LOG_FILE and MASTER_LOG_POS parameters indicate the file name and log offset of the master library. For the same log, the locus of **A is different from that of A ‘. ** In order not to lose data during switching, we always look for a “slightly forward” loci, and then use judgment to skip transactions that have already been performed on the slave library B.

There is A problem: on the slave library B, the row R already exists due to the synchronization of A’s binlog; On the new main library A ‘, the line R already exists, and the log is written after that; If we run the change master command from library B, pointing to the location of File A ‘, the binlog inserted into R will be synchronized to execute from library B. The Duplicate entry ‘id_of_R’ for key ‘PRIMARY’ error is reported by the synchronization thread from library B, indicating a PRIMARY key conflict and stopping the synchronization.

So, there are two common ways to actively skip these errors when switching tasks. One way to do this is to actively skip a transaction.

Alternatively, you can set the slave_skip_errors parameter to skip the specified errors. When performing an active/standby switchover, two types of errors are commonly encountered. Error 1062 is a unique key conflict when inserting data. Error 1032 Row not found while deleting data.

3, * * * * GTID

MySQL 5.6 introduced GTID. The Global Transaction Identifier (GTID) is the unique Identifier that is generated when a Transaction is committed.

It consists of two parts in the format ** : GTID=server_uuid:gno**

Server_uuid is automatically generated when an instance is started for the first time and is a globally unique value. Gno is an integer with an initial value of 1 that is assigned to the transaction each time it commits and incremented by 1.

4. Master/slave replication based on GTID

The CHANGE MASTER TO command, MASTER_LOG_FILE and MASTER_LOG_POS parameters do not need TO be specified anymore. Because master_auto_position=1, it indicates that the master/slave relationship uses the GTID protocol

Execute the start slave command on instance B and get the binlog logic like this:

Instance B specifies the primary library A ‘and establishes the connection based on the active/standby protocol.

Instance B sends set_B to primary library A ‘.

Instance A ‘calculates the difference set between set_A and set_B, that is, the set of all gtiDs that exist in set_A but not in set_B, and determines whether A’ local contains all binlog transactions required by the difference set

A. If not, a ‘has deleted the binlog required by instance B, and an error is returned.

B. If it is confirmed that all transactions are included, A ‘finds the first transaction in its binlog file that is not in SET_B and sends it to B; B then starts with the transaction and sends the binlog to B in sequence

Design idea: In the master/slave relationship based on GTID, the system considers that as long as the master/slave relationship is established, it must ensure that the logs sent by the master database to the standby database are complete. If the logs required by instance B no longer exist, A ‘refuses to send them to B. This is different from a site-based master/slave protocol. The site-based protocol is determined by the standby database. The master database sends the site specified by the standby database without checking the integrity of logs.

What are the pit 28 separation |, speaking, reading and writing?

1. Read-write separation architecture

Client direct connection: there is less proxy forwarding, so the query performance is better, the overall architecture is simple, and troubleshooting is more convenient. However, in this scheme, because of the need to understand the details of back-end deployment, the client will be aware of the operation such as master/standby switchover and library migration, and need to adjust the database connection information.

Proxy architecture: more client-friendly. The client does not need to pay attention to the details of the backend. The connection maintenance and back-end information maintenance are all done by the proxy. But the whole thing with proxy architecture is relatively complex.

2. Read/write separation problems and solutions caused by master/slave delay

The phenomenon of reading an expired state of the system from the library is called “expired read” **. Despite various optimizations, master-slave latency is not 100% avoidable.

1) Force requests that require the latest results to be sent to the master library. This scheme is the most commonly used, but if all queries cannot be stale reads, read/write separation is abandoned and all read/write pressure is on the main library

2) After the master database updates, sleep before reading the slave database

3) Determine that there is no delay between active and standby. Check whether **seconds_behind_master is already 0 each time before executing a query request from the library. ** if it is not 0, the query request must wait until this parameter is 0. In addition, you can compare binlog points and compare GTID collections. Retrieved_Gtid_Set is the GTID collection of all logs received by the standby repository; Executed_Gtid_Set is the set of all gtiDs that have been executed for the standby library. If the two sets are the same, the logs received by the standby database have been synchronized. Comparing loci and GTID is more accurate than determining whether seconds_behind_master is 0.

Semi-sync scheme: when the transaction is committed, the master sends the binlog to the slave; After receiving a binlog from the library, send an ACK back to the master library. After the master library receives this ACK, it can return a transaction completion acknowledgement to the client. However, in the case of persistent delays, the problem of excessive waiting can occur

If a power failure occurs in the master database before some binlogs are sent to the slave database, data may be lost if you use the normal asynchronous replication mode, but semi-sync can solve this problem.

Five, the other

14 | count (*) so slow, what should I do?

Why doesn’t innoDB store count (*) directly? MVCC results in having to traverse each row and determine that the show table status command, while fast, is inaccurate. Sampling statistics, not real time statistics, is also the reason why sometimes select index error Redis count? Distributed consistency can’t be consistent in real time solution: use transaction and innoDB isolation mechanism, save a single row in the database to save quantity; Improves concurrent reads, multi-line statistics, and logical merge statistics

17 | how to correctly display random news?

19 statement | why do I check only one line, also perform so slowly?

Waiting for table flush Waiting for MDL lock (find out who holds the MDL write lock and kill it) (select * from t where id=1 lock in share mode) (select * from t where id=1 lock in share mode) A read lock is required. If a transaction already holds a write lock on the row, our SELECT statement will be blocked.

22 | MySQL what are the methods to improve the performance of the “go”?

29 | how to determine a database is the problem?