1. Review and think

1.1 SQLException: Lock wait timeout exceeded; try restarting transaction,please rollback!

When this error occurs again, don’t be too proud to say that the database has a problem, call the DBA… (Multiple occurrences of UAT)

The first problem is currently occurring for the following reasons:

  1. The transaction cannot commit because disk space is full. (Disk full is a very dangerous operation, will cause the binlog write bad, the standby database can not synchronize and need to restore the standby database)
  2. The update transaction did not commit normally and generated an exclusive lock, resulting in other update transactions could not obtain the lock and the transaction timed out.

1.2 condition query stuck, how to re-run all through, how to do, anxious people (after migration comparison actually appear).

CTRL +C terminates in the Truncate table process. Truncate transactions always exist on a fragment, so all operations on the table will time out.

1.3 Query And Update Stuck…… As everyone knows, your previous Alter table did not succeed……

The problem of DBProxy is not discussed in this article. When a query transaction does not commit normally and occupies a shared lock, alter table cannot obtain the MDL lock, causing a waiting time. Waiting fortable metadata lock (show processList)

2 Detailed analysis of principle

2.1 What is MDL lock?

To maintain data consistency of table metadata in a concurrent environment, metadata cannot be written to when there are active transactions (explicit or implicit) on the table. Therefore, MDL lock was introduced from MySQL5.5 to protect the metadata information of the table. It is used to resolve or ensure consistency between DDL operations and DML operations.

For the introduction of MDL, it mainly solves two problems. One is transaction isolation. For example, in the repeatable read isolation level, session A modiates the table structure during the second query, and the results of the two queries are inconsistent, which cannot meet the requirements of repeatable reads. Another problem is data replication. For example, when session A executes multiple update statements, another session B changes the table structure and submits it first. As A result, the slave performs ALTER first and then update again, resulting in replication errors.

Therefore, if there are active transactions (uncommitted or rolled back) on the table during the above operation, the session requesting the write will wait in the Metadata lock wait. For example, the following situation:

Without the protection of an MDL lock, transaction 2 can directly perform DDL operations and cause transaction 1 to fail, as in version 5.1. The MDL lock was added in version 5.5 to protect this from happening. Transaction 1 holds the MDL lock in SHARED_READ mode, and transaction 2 holds the EXCLUSIVE lock in DDL mode, so transaction 2 must wait.

Note: Metadata Lock Wait occurs in InnoDB tables that support transactions and MyISAM tables that do not support transactions. If Metadata Lock Wait occurs, all subsequent access to the table will be blocked. As a result, connections are piled up and services are affected.

MySQL > autocommit=0; When read_commited, regardless of whether the session’s first statement is SELECT or DML, a transaction is started and the MDL lock is held until commit.

The design of Oracle is to create transactions only when the first update statement of a session is initiated, reducing the likelihood of blocking on a system with many reads. In particular, when a developer issues a SELECT statement, it assumes that if there is no update, there is no commit. On MySQL, however, it is dangerous to issue a SELECT statement and forget to commit.

2.2 Common MDL Lock scenarios and detailed explanations

1) Perform the ALTRE operation when DML operation is performed

Mysql > alter table mysqldump/mysqlpump; alter table mysqlpump

3) If a transaction is not committed or rolled back after it is displayed or implicitly enabled, such as if it is not committed or rolled back after the query is completed, using ALTER will be blocked

4) on the table query transaction failure, such as the columns of the query does not exist, statement fails to return, but the transaction did not submit to the alter will still be blocked at this time

Detailed test explanation:

1) Perform the ALTRE operation when DML operation is performed

# SESSION A mysql> insert into yetest2 select * from yetest1; # SESSION B mysql> alter table yetest2 add yeColumn int; // Wait for SESSION A to complete; # SESSION C mysql> show processlist; +-----+------+-------+------+--------------+--------+-----------------------------------------+------------ | Id | User | Host| db | Command | Time | State | Info |+-----+------+------+----- +--------------+--------+------------------------------------------+------------ | 267 | root | localhost | sbtest | Query | 7 | Sending data | insert into yetest2 select * from yetest1 | | 271 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | alter table yetest2 add yeColumn int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------- ---------+ 3 rows in set (0.00 SEC) # SESSION D mysql> select * from yetest2 limit 10; // Wait for metadata lock; # SESSION E mysql> show processlist; +-----+------+-------+------+--------------+--------+-----------------------------------------+------------ | Id | User | Host| db | Command | Time | State | Info |+-----+------+------+----- +--------------+--------+------------------------------------------+------------ | 267 | root | localhost | sbtest | Query | 20 | Sending data | insert into sbtest2 select * from sbtest1 | | 271 | root | localhost | sbtest | Query | 13 |  Waiting for table metadata lock | alter table yetest2 add yeColumn int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 308 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | select *  from yetest2 limit 10 | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------- ---------+ 4 rows in set (0.00 SEC)Copy the code

Transaction 1 (SHARED_READ) holds the MDL lock. Transaction 2 (SHARED_READ) holds the EXCLUSIVE lock. Transaction 2 (SHARED_READ) holds the EXCLUSIVE lock. Queries can be stuck, is not very depressed? This is the scenario of our last migration. Truncate table belongs to DDL and locks table metadata, and can even be upgraded from lock table to lock database.

3) If a transaction is not committed or rolled back after it is displayed or implicitly enabled, such as if it is not committed or rolled back after the query is completed, using ALTER will be blocked

# SESSION A mysql> begin; mysql> select * from test2; # SESSION B mysql> alter table test2 add test3 int; // Wait for SESSION A to complete; # SESSION C mysql> show processlist; +-----+------+-------+------+--------------+--------+-----------------------------------------+------------ | Id | User | Host| db | Command | Time | State | Info |+-----+------+------+----- +--------------+--------+------------------------------------------+------------ | 267 | root | localhost | sbtest | Sleep | 36 | | NULL | | 271 | root | localhost | sbtest | Query | 30 | Waiting for table metadata lock | alter table test2 add test3 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------- -+ 3 rows in set (0.00 SEC)Copy the code

4) on the table query transaction failure, such as the columns of the query does not exist, statement fails to return, but the transaction did not submit to the alter will still be blocked at this time

# SESSION A mysql> begin; mysql> select error from test2; ERROR 1054 (42S22): Unknown column 'error' in 'field list' # SESSION B mysql> alter table test2 add test3 int; // Wait for SESSION A to commit or roll back; # SESSION C mysql> show processlist; +-----+------+-------+------+--------------+--------+-----------------------------------------+------------ | Id | User | Host| db | Command | Time | State | Info |+-----+------+------+----- +--------------+--------+------------------------------------------+------------ | 267 | root | local | test | Sleep | 7  | |NULL | 271 | root | local | test | Query | 3 | Waiting for table metadata lock | alter table test2 add test3 int | |  272 | root | local | NULL| Query | 0 | starting | show processlist | 311 | root | local | NULL | Sleep | 413 | | NULL +-----+------+-----------+--------+---------+------+-------------------------------------------+-------------- 4 rows in Set (0.00 SEC) # SESSION D mysql> select * from information_schema.innodb_trx; The Empty set (0.00 SEC)Copy the code

In fact, the transaction in SESSION A is not started, but the alter action in SESSION B is blocked because the select statement to obtain the table metadata is grammatically valid. Although the execution fails, the metadata lock is still not released.

When you view the current open transaction through SESSION D, you will find none, and thus cannot find the cause. Performance_schema. events_statementS_current: performance_schema. events_statements_current: performance_schema. events_statements_current: performance_schema. events_statements_current: performance_schema. events_statements_current: performance_schema. events_statements_current: performance_schema. events_statements_current

mysql> select * from performance_schema. events_statements_current\G
*************************** 1. row ***************************
              THREAD_ID: 293
               EVENT_ID: 32
           END_EVENT_ID: 32
             EVENT_NAME: statement/sql/select
                 SOURCE: socket_connection.cc:211
            TIMER_START: 212721717099954000
              TIMER_END: 212721717213807000
             TIMER_WAIT: 113853000
              LOCK_TIME: 0
               SQL_TEXT: select error from test2
                 DIGEST: 0bbb2d5d1be45e77debea68111264885
            DIGEST_TEXT: SELECT ERROR FROM `test2` 
         CURRENT_SCHEMA: test
            OBJECT_TYPE: NULL
          OBJECT_SCHEMA: NULL
            OBJECT_NAME: NULL
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 1054
      RETURNED_SQLSTATE: 42S22
           MESSAGE_TEXT: Unknown column 'error' in 'field list'
                 ERRORS: 1
Copy the code

To resolve this problem, you can kill the session where the DDL resides.

In addition, SESSION A must explicitly start A transaction during testing, otherwise the query will be implicitly rolled back to end and the above scenario cannot be reproduced. SESSION B does not block immediately after performing alter, but immediately starts to copy to TMP table. After this process is complete, MDL lock wait is performed. The alter operation is divided into three steps: create a temporary new table, insert data from the old table, and rename to the old table. In this case, the MDL lock is not required until the last step, so the copy process will not be blocked. Show ProcessList and Information_schema.innodb_trx have no information to refer to because no query is in progress and the query has not entered the InnoDB layer (failure returns).

MDL can occur if the following operations are performed:

1. Create and delete indexes.

2. Modify the table structure.

Table maintenance (optimize table, repair table, etc.)

4. Delete the table.

5. Obtain the table-level write lock on the table (lock table tab_name write).

3. To summarize

Many times when the database error occurs, it is not necessarily the problem of the database, do not have to urgently call the database personnel to solve. We need to develop a sense that we’re not just writing apps, we’re writing about financial systems, and we’re supposed to have some troubleshooting skills. If only do problem porters, this is also a question of our own level? Only when people ask good questions can the card center system be stable.

Of course, there are many defects and imperfections between MySQL and Oracle. The introduction of distribution further introduces the complexity of the system. This also put forward a greater test to the level of our operation and maintenance personnel. Operation and maintenance personnel should also learn and think steadily from practice. Only when they meet problems can they be calm and put forward effective solutions.