Author: Liu Kaiyang

I am a DBA in Beijing of ECS Delivery Service Team. I have a keen interest in learning database and surrounding technologies. I like reading books and pursue technologies.

Source of this article: original contribution

* Produced by the open source community of ecoson, the original content is not allowed to be used without authorization, please contact the small edition and indicate the source.


Some time ago in the customer site found a strange lock problem, specially used to share with you.

The phenomenon of

In MySQL version 8.0.18, a thread from a slave library waiting for a connection is observed as follows:

mysql> select * from threads;
+-----------+---------------------------------------------+------------+----------------+------------------+------------------+--------------------+---------------------+------------------+---------------------------------------------+----------------------------------------------------------------------------------------------------------+------------------+------+--------------+---------+-----------------+--------------+----------------+
| THREAD_ID | NAME                                        | TYPE       | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB     | PROCESSLIST_COMMAND | PROCESSLIST_TIME | PROCESSLIST_STATE                           | PROCESSLIST_INFO            | PARENT_THREAD_ID | ROLE | INSTRUMENTED | HISTORY | CONNECTION_TYPE | THREAD_OS_ID | RESOURCE_GROUP |
+-----------+---------------------------------------------+------------+----------------+------------------+------------------+--------------------+---------------------+------------------+---------------------------------------------+----------------------------------------------------------------------------------------------------------+------------------+------+--------------+---------+-----------------+--------------+----------------+
······
|        58 | thread/sql/event_scheduler   | FOREGROUND |          8    | NULL                | NULL                 | NULL       | Sleep         |                 NULL | Waiting on empty queue                               | NULL                                                           |                       1 | NULL | YES         | YES     | NULL            |        34147 | SYS_default    |
|        59 | thread/sql/signal_handler    | BACKGROUND |          NULL | NULL                | NULL                 | NULL      | NULL           |                NULL | NULL                                                              | NULL                                                           |                       1 | NULL | YES          | YES     | NULL            |        34148 | SYS_default    |
|        60 | thread/sql/compress_gtid_table| FOREGROUND |         10   | NULL                | NULL                 | NULL      | Daemon     |         33670997 | Suspending                                                   | NULL                                                           |                       1 | NULL | YES          | YES     | NULL            |        34149 | SYS_default    |
|        61 | thread/mysqlx/acceptor_network| BACKGROUND |         NULL | NULL               | NULL                 | NULL       | NULL          |                NULL | NULL                                                               | NULL                                                          |                     40 | NULL | YES          | YES     | NULL            |        34150 | SYS_default    |
|  46627729 | thread/mysqlx/worker          | BACKGROUND |         NULL | NULL                | NULL                 | NULL      | NULL            |               NULL | NULL                                                                | NULL                                                         |                 NULL | NULL | YES          | YES     | NULL            |        34132 | SYS_default    |
|  57810934 | thread/sql/one_connection     | FOREGROUND |     57808740 | proxy_monitor | 10.108.76.139 | NULL       | Query         |                 718 | Waiting for commit lock                                 | set global read_only=on                           |                 NULL | NULL | YES          | YES     | SSL/TLS       |        76945 | USR_default    |
|  47295992 | thread/sql/slave_io          | FOREGROUND |      47294234 | root                 | localhost            | NULL       | Connect     |         6285905 | Waiting for master to send event                 | NULL                                                           |         47294592 | NULL | YES          | YES     | NULL            |        75880 | SYS_default    |
|  47295993 | thread/sql/slave_sql         | FOREGROUND |      47294235 | root                 | localhost            | NULL       | Query         |            64768 | Waiting for dependent transaction to commit | NULL                                                         |         47294592 | NULL | YES          | YES     | NULL            |        75881 | SYS_default    |
|  47295994 | thread/sql/slave_worker      | FOREGROUND |       47294236 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                           | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75882 | SYS_default    |
|  47295995 | thread/sql/slave_worker    | FOREGROUND |       47294237 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for commit lock                                  | NULL                                                          |         47295993 | NULL | YES          | YES     | NULL            |        75883 | SYS_default    |
|  47295996 | thread/sql/slave_worker    | FOREGROUND |       47294238 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for preceding transaction to commit | NULL                                                          |         47295993 | NULL | YES          | YES     | NULL            |        75884 | SYS_default    |
|  47295997 | thread/sql/slave_worker    | FOREGROUND |       47294239 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                            | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75885 | SYS_default    |
|  47295998 | thread/sql/slave_worker    | FOREGROUND |       47294240 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                            | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75886 | SYS_default    |
|  47295999 | thread/sql/slave_worker    | FOREGROUND |       47294241 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                            | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75887 | SYS_default    |
|  47296000 | thread/sql/slave_worker    | FOREGROUND |       47294242 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75888 | SYS_default    |
|  47296001 | thread/sql/slave_worker    | FOREGROUND |       47294243 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75889 | SYS_default    |
|  47296002 | thread/sql/slave_worker    | FOREGROUND |       47294244 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75890 | SYS_default    |
|  47296003 | thread/sql/slave_worker    | FOREGROUND |       47294245 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75891 | SYS_default    |
|  47296004 | thread/sql/slave_worker    | FOREGROUND |       47294246 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75892 | SYS_default    |
|  47296005 | thread/sql/slave_worker    | FOREGROUND |       47294247 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75893| SYS_default    |
|  47296006 | thread/sql/slave_worker    | FOREGROUND |       47294248 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75894 | SYS_default    |
|  47296007 | thread/sql/slave_worker    | FOREGROUND |       47294249 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75895 | SYS_default    |
|  47296008 | thread/sql/slave_worker    | FOREGROUND |       47294250 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75896 | SYS_default    |
|  47296009 | thread/sql/slave_worker    | FOREGROUND |       47294251 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75897 | SYS_default    |
|  57811108 | thread/sql/one_connection  | FOREGROUND |       57808914 | proxy_monitor  | 10.108.76.140   | NULL     | Query          |              664 | Waiting for commit lock                                    | set global read_only=on                            |                 NULL | NULL | YES          | YES     | SSL/TLS        |        97442 | USR_default    |
|  57811279 | thread/sql/one_connection  | FOREGROUND |       57809085 | proxy_monitor  | 10.108.76.140   | NULL     | Query          |              610 | Waiting for commit lock                                    | set global read_only=on                            |                 NULL | NULL | YES          | YES     | SSL/TLS        |        98136 | USR_default    |
|  57811110 | thread/sql/one_connection  | FOREGROUND |       57808916 | proxy_monitor  | 10.108.76.139   | NULL     | Query          |              663 | Waiting for commit lock                                    | set global read_only=on                            |                 NULL | NULL | YES          | YES     | SSL/TLS        |        97419 | USR_default    |
······                                                                  
mysql> ^C
// 说明:由于连接接近2300条,为方便大家看起来没有那么枯燥,这里进行部分省略,大多数被省略的连接线程为set global read_only的连接

Analysis of the

As can be seen from the above connection: there are several lock waits, some waiting for global read lock, some waiting for commit lock. First of all, we should clarify the corresponding lock wait order to find out “who is waiting for me, and who am I waiting for”.

Threads: Threads: Threads: Threads: Threads: Threads: Threads

  • The thread that waits for the global read lock
  • The thread waiting for the lock to commit
  • Threads that wait for MTS to commit sequentially

Let’s start with the most easily analyzed global read lock and trace back to the source with thread 47295994. What is the lock held by thread 47295994? This is done through metadata_locks in the perfoemance_schema library, which is the metadata lock table that we normally analyze for DDL blocking.

mysql> select * from perfoemance_schema.metadata_locks where THREAD_ID = 47295994; +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- ----+--------------+-----------------+----------------+ | OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID | +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- ----+--------------+-----------------+----------------+ | GLOBAL | NULL | NULL | NULL | 140000636724432 | INTENTION_EXCLUSIVE | EXPLICIT | PENDING | sql_base.cc:3002 | 47295994 | 149780492 | +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- - + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 1 rows in the set (0.00 SEC)

The SQL in thread 47295994 is executing an INSERT statement. It does not observe that the thread is holding the lock. It is waiting for the global read lock.

To further validate the thread with the metadata lock table, let’s look at who holds the global read lock:

mysql> select * from perfoemance_schema.metadata_locks where OBJECT_TYPE = 'GLOBAL' and LOCK_STATUS = 'GRANTED'; +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- ----+--------------+-----------------+----------------+ | OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID | +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- ----+--------------+-----------------+----------------+ | GLOBAL | NULL | NULL | NULL | 139955200762880 | SHARED | EXPLICIT | GRANTED | lock.cc:1014 | 57822380 | 3 | | GLOBAL | NULL | NULL | NULL | 140006539051472 | SHARED | EXPLICIT |  GRANTED | lock.cc:1014 | 57822266 | 3 | | GLOBAL | NULL | NULL | NULL | 139957819325888 | SHARED | EXPLICIT | GRANTED | The lock. Cc: 1014 | 57821979 | 3 | · · · · · · +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- - + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 1104 rows in the set (0.00 SEC)

Let’s pull a few records back into the threads table and see if we can find the corresponding execution SQL:

Mysql > select * from perfoemance_schema. Threads where THREAD_ID in (57822380578226, 6578197, 9); +-----------+---------------------------------------------+------------+----------------+------------------+------------ ------+--------------------+---------------------+------------------+--------------------------------+------------------ ---+------------------+------+--------------+---------+-----------------+--------------+----------------+ | THREAD_ID | NAME | TYPE | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_TIME | PROCESSLIST_STATE | PROCESSLIST_INFO | PARENT_THREAD_ID | ROLE | INSTRUMENTED | HISTORY | CONNECTION_TYPE | THREAD_OS_ID | RESOURCE_GROUP | +-----------+---------------------------------------------+------------+----------------+------------------+------------ ------+--------------------+---------------------+------------------+--------------------------------+------------------ + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 57821979 Thread/SQL/one_connection | FOREGROUND | 57808986 | proxy_monitor | 10.108.76.140 | NULL | Query | 664 | Waiting for commit lock | set global read_only=on | NULL | NULL | YES | YES | SSL/TLS | 97442 | USR_default | | 57822266 | Thread/SQL/one_connection | FOREGROUND | 57819228 | proxy_monitor | 10.108.76.139 | NULL | Query | 679 | Waiting for commit lock | set global read_only=on | NULL | NULL | YES | YES | SSL/TLS | 97434 | USR_default | | 57822380 | Thread/SQL/one_connection | FOREGROUND | 57829246 | proxy_monitor | 10.108.76.140 | NULL | Query | 694 | Waiting for commit lock | set global read_only=on | NULL | NULL | YES | YES | SSL/TLS | 97476 | USR_default | +-----------+---------------------------------------------+------------+----------------+------------------+------------ ------+--------------------+---------------------+------------------+-----------------------------+--------------------- ---+------------------+------+--------------+---------+-----------------+--------------+----------------+ 3 rows in set (0.00 SEC)

Set global read_only=on; set global read_only=on; We can also find the user who issued this statement, proxy_monitor, and the connection IP 10.108.76.139/140. We can further confirm the application or client that issued the global read lock.

As long as the slave library is not read-only, the script keeps issuing commands to configure it. Thousands of SET GLOBAL READ_ONLY connections are waiting in MySQL due to the lock blocking caused by the previous block.

In the threads table, all threads that issue a global read lock are waiting for the COMMIT lock. We then go to the metadata lock table to retrieve:

mysql> select * from perfoemance_schema.metadata_lockswhere OBJECT_TYPE = 'COMMIT' and LOCK_STATUS = 'GRANTED'; +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- ----+--------------+-----------------+----------------+ | OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID | +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- ----+--------------+-----------------+----------------+ | COMMIT | NULL | NULL | NULL | 140011172675968 | INTENTION_EXCLUSIVE | EXPLICIT | GRANTED | handler.cc:1697 | 47295996 | 89520430 | | COMMIT | NULL | NULL | NULL | 140011035402992 | INTENTION_EXCLUSIVE | EXPLICIT | GRANTED | handler.cc:1697 | 47295995 | 113911056 | +-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+--------- - + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 2 rows in the set (0.00 SEC)

Threads 47295994 wait on threads 47295995 and 47295996 for the commit lock to be released. Threads 47295994 wait on threads 47295995 and 47295996 for the commit lock to be released. Threads 47295994 wait on threads 47295995 and 47295996 for the commit lock to be released.

After Waiting for preceding transaction to commit, it means that a transaction in the preceding transaction group has to wait for the preceding transaction in the same transaction commit group to complete first. In MySQL there is a parameter to configure the sequence of commits from the library (slave_commit_commit_order).

We did not verify the issue of “transaction priority commits on connections with small thread numbers” in the customer field, and we will test it later in the local environment to repeat it.

A “picture” to cover

Set global read_only = ON; MDL:: Global Read Lock is held when and further wants to obtain MDL:: Global Commit Lock, and the MDL_SHARED lock is shown in LOCK_TYPE. When slave_commit_commit_order =1, the order of the transaction committed in the relay log group is committed exactly in the order it occurs in the relay log, so neither the transaction nor the SET GLOBAL READ_ONLY statement preserves both locks atomic. Deadlocks occur in parallel replication mode in the following order:

  1. Taking threads as an example, transactions in threads 47295994, 47295995 and 47295996 are committed sequentially in the relay log of parallel replication. This point is not observed in the field, and evidence is given when duplicated.
  2. After waiting for preceding transaction to commit, thread 47295995 obtained the commit lock. After waiting for preceding transaction to commit, thread 47295995 obtained the commit lock.
  3. The client executes SET GLOBAL READ_ONLY and enters Waiting for Commit.
  4. If the INSERT statement of transaction A is executed after GRL is obtained, then transaction A is in the state of Waiting for Global Read Lock.
  5. If the INSERT statement of transaction A is executed before the GRL and XID_EVENT is applied after the global commit lock is obtained, then the state of waiting for commit lock is entered, i.e. the second kind of deadlock. This is the second type of deadlock that repeats itself below.

To solve

  • Solution 1: kill the thread with global read lock, but because the number of threads with global read lock issued by proxy_monitor in the customer environment is up to thousands, it is not easy to solve, and the cost is small but the time is long.
  • PRESERVE_COMMIT_ORDER =0 To avoid this deadlock, set slave_commit_commit_order =0 to disable the sequence of the library binlog. Turning off this parameter does not need to stop replication. It only affects the commit order of parallel replicated transactions in the slave database, which may affect the calculation and risk control business, but has no effect on the final data consistency. Therefore, if there is no special requirement, the binlog order of the slave database must be consistent with that of the master database.
  • When using shell script to detect read-only from slave library, you can modify the script logic: if it is detected that read-only from slave library is not closed, you can stop copying first, and then turn on read-only from slave library, so that you do not need to change the consistency of transaction commit from slave library to avoid related problems.

Any of the following statements from the library can result in the above deadlock situation:

set global read_only ; grant all(FLUSH PRIVILEGES) ; flush table with read lock ;

  1. The global read lock is added to all the statements above. It has been confirmed that there is a certain probability that this deadlock occurs, only in the case that the replay of a transaction is completed in the same group’s relay log and the transaction is committed in the following order. This will cause the slave_commit_commit_order MTS copy to deadlock during execution.
  2. If not, turn off the slave_commit_order parameter. This action depends on the business situation and may affect the business consistency because of the time sequence, but does not affect the final consistency of the data.

Reenact two types of deadlocks

Using Sysbench to simulate business stress, open a window from the library for statement input, and execute SET GLOBAL READ_ONLY =ON several times; Before it gets stuck, observe the output:

Kill set global read_only=ON; After this statement, replication returns to normal, so let’s see what happens when the connection hangs.

Deadlock 1- customer site

Performance_Schema. Threads: The execution order of the client’s field transactions with small thread number is prior to the execution order of the transactions with large thread number. According to the transaction commit order, the corresponding thread commit order is determined, indicating that the transaction commit order of the client field with small thread number is before the transaction with large thread number.

View the lock holding relationships in the performance_schema.metadata_locks table

The locking model is the same as the dead-locking ring in the analysis above.

According to the transaction commit order, the corresponding thread commit order is determined, indicating that the transaction commit order of the client field with small thread number is before the transaction with large thread number.

Deadlock 2

View the information in the corresponding performance_schema.threads table

View the performance_schema.metadata_locks table for lock holding

Draw the corresponding dead locking ring according to the locking relationship

In addition to the necessary locks, the MDL_Intentional _exclusive lock of the MDL_KEY ::GLOBAL lock is requested by the transaction; Before a transaction is committed, the MDL_KEY ::COMMIT lock’s MDL_INTON_EXCLUSIVE lock is requested. This lock is an S-lock, and in the case of scope locks, IX locks and S-locks are incompatible.

According to the transaction commit order, determine the corresponding thread commit order

Reference:

https://sq.163yun.com/blog/ar… http://dbaplus.cn/news-11-187… http://mysql.taobao.org/month… Bug order: https://bugs.mysql.com/bug.ph…