MySQL Online DDL principle and step pit

The Data Definition Language (DDL) of MySQL includes adding or subtracting fields and indexes. Before MySQL 5.6, MySQL DDL operations would copy the original table and modify it accordingly. For example, MySQL DDL operations on table A are as follows:

  1. Create A new table B as defined in table A
  2. Add A write lock to table A
  3. Perform the DDL specified operations on table B
  4. Copy the data in A to B
  5. Release A’s write lock
  6. Delete A table
  7. Rename table B to A

In the 2-4 process, if the amount of data in table A is large, the copying process to table B consumes A lot of time and occupies additional storage space. In addition, both DDL and DML on table A will block and fail to serve because DDL operations occupy A write lock on table A.

As a result, MySQL 5.6 adds Online DDL, which allows DDL operations to be performed without interrupting database services.

usage

ALTER TABLE tbl_name ADD PRIMARY KEY (column), ALGORITHM=INPLACE, LOCK=NONE;
Copy the code

The ALGORITHM and LOCK arguments can be specified in the ALTER statement to specify the DDL execution mode and DML strategy control during DDL, respectively

  1. ALGORITHM=INPLACE Indicates that no table COPY occurs during DDL execution and DML execution is allowed concurrently. (INPLACE does not consume large disk I/O and CPU as COPY does, thus reducing database load.) It also reduces the use of the buffer pool and avoids the performance problems caused by the massive deletion of the original query cache from the buffer pool.

    If ALGORITHM=COPY is set, DDL will be copied as it was before MySQL 5.6, blocking all DML in the process. Alternatively, you can set ALGORITHEM=DAFAULT and let MySQL choose the execution method as much as possible to ensure DML concurrency.

  2. LOCK=NONE indicates that DML operations are not locked and all DML operations are allowed during DDL. In addition, there are EXCLUSIVE (which holds an EXCLUSIVE lock and blocks all requests, suitable for scenarios where DDL needs to be completed as quickly as possible or the service library is idle), SHARED (which allows SELECT but blocks INSERT UPDATE DELETE, Apply to scenarios such as data warehouses that allow delay in writing data) and DEFAULT (select the value of LOCK based on the DDL type to ensure maximum concurrency)

Not all DDL Operations can be performed in INPLACE. For details, see MySQL Reference Manual – Online DDL Operations.

For example, the data type of the modified column in Table 14.10 does not support INPLACE

Operation In Place Rebuilds Table Permits Concurrent DML Only Modifies Metadata
Changing the column data type No Yes No No

Change the column_name of type FLOAT to INT

ALTER TABLE tbl_name MODIFY COLUMN column_name INT, ALGORITHM=INPLACE, LOCK=NONE;
Copy the code

complains

ERROR: 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.
Copy the code

Implementation process

  1. Initialization: The amount of concurrency allowed in the DDL is calculated based on the storage engine, user-specified operations, user-specified ALGORITHM, and LOCK. During this process, a shared metadata LOCK is obtained to protect the structure definition of the table
  2. Execute DDL: Based on the first step, decide whether to upgrade the shared metadata lock to exclusive metadata lock (only during statement preparation), then generate the statement and execute it. Shared metadata lock during execution ensures that no other DDLS are executed at the same time, but DML can execute normally
  3. Commit: Upgrade the shared metadata lock to exclusive metadata lock, delete the old table definition, and commit the new table definition

Steps that occupy exclusive MDL in an Online DDL procedure execute quickly, so DML statements are rarely blocked.

However, before or while the DDL is executing, other transactions can acquire the MDL. Since the USE of exclusive MDL is required, the above two MDL references must wait until the other transaction holding the Metadata lock is committed or rolled back.

Hit the pit

MDL (MDL) is a table lock introduced by MySQL 5.5. MDL (MDL) is automatically added when accessing a table to ensure the correctness of reading and writing. When a DML operation is performed on a table, the MDL read lock is added. Add MDL write locks when performing DDL operations.

ALGORITHM=INPLACE Online DDL is used in order to ensure that DML can execute concurrently in large table DDL. However, there is still a risk of deadlock. The problem is where the Online DDL process requires exclusive MDL.

For example, if Session 1 performs a SELECT operation in a transaction, the shared MDL is obtained. Since it is executed within a transaction, the shared MDL is released only after the transaction ends.

# Session 1 > START TRANSACTION; > SELECT * FROM tbl_name; # Execute normallyCopy the code

If Session 2 wants to perform a DML operation, all it needs to do is get the shared MDL, and it can still perform.

# Session 2 > SELECT * FROM tbl_name; # Execute normallyCopy the code

If Session 3 tries to perform a DDL operation, it will block because Session 1 is already occupied by the shared MDL, and DDL will not be able to execute properly because Session 1 needs to obtain the exclusive MDL first.

# Session 3 > ALTER TABLE tbl_name ADD COLUMN n INT; # blockCopy the code

You can see by showing ProcessList that the ALTER operation is waiting for MDL.

+----+-----------------+------------------+------+---------+------+---------------------------------+-----------------+
| Id | User            | Host             | db   | Command | Time | State                           | Info            |
│----+-----------------+------------------+------+---------+------+---------------------------------+-----------------+53048 | | | 11 root | 172.17.0.1: demo | Query | 3 | Waiting for table metadatalock | alter table. | +----+-----------------+------------------+------+---------+------+---------------------------------+-----------------+
Copy the code

Since an EXCLUSIVE MDL takes precedence over a shared MDL, all subsequent attempts to obtain a shared MDL will also block

# Session 4 > SELECT * FROM tbl_name; # blockCopy the code

At this point, both DML and DDL will block until Session 1 commits or is rolled back and the shared MDL occupied by Session 1 is released.

There are two main reasons for this problem:

  1. The transaction in Session 1 was not committed in time, thus blocking the DDL for Session 3
  2. Session 3 Online DDL blocks subsequent DML and DDL

In the case of Problem 1, many ORMs (such as Pymysql) wrap user statements as transactions by default, which is what happens in Session 1 if the client program interrupts and exits without committing or rolling back the transaction. In this case, you can find the thread of the unfinished transaction in infomation_Schema. innodb_trx and force it to exit

> SELECT * FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421564480355704
                 trx_state: RUNNING
               trx_started: 2020-07-21 01:49:41
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 0
       trx_mysql_thread_id: 9
                 trx_query: NULL
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 0
     trx_lock_memory_bytes: 1136
           trx_rows_locked: 0
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
       trx_schedule_weight: NULL
1 row in set (0.0025 sec)
Copy the code

You can see that Session 1 is executing a transaction with trx_mysql_thread_id of 9, and then execute KILL 9 to interrupt the transaction in Session 1.

As for Problem 2, when there are too many queries, the number of blocked sessions increases rapidly. In this case, you can first interrupt DDL operations to prevent excessive impact on services. You can also try a master/slave switch after changing the table structure on the slave or use a third-party tool such as PT-OSC.