MySQL 8.0 is a milestone release in the history of MySQL development, including a number of major updates. The Generally Available version of MySQL 8.0 is now Available. This solution is touted as a complete solution to the replication delays that have plagued MySQL operators for years.

Speaking of parallel replication, here is a quick review of the evolution of MySQL replication from version to version to help understand the optimization of parallel replication MTS in version 8.0.

MySQL master-slave replication model

The master-slave replication model of MySQL is used to construct the master-slave replication model of MySQL.

MySQL replication model

The Master/slave architecture of MySQL relies on the MySQL Binlog function. The Master node generates binlogs and writes them to Binlog files.

Two threads are started on the Slave node: one IO thread retrives Binlog logs from MySQL and writes them to the local RelayLog log. Another SQL thread reads the RelayLog log and parses the execution, so that all SQL statements executed on the host are executed exactly the same on the slave by sequential reads and writes of several files on the host and slave.

Replication latency refers to how long it takes for a transaction to complete on the Slave after it has been completed by the Master.

Since reading and writing Binlog files and RelayLog files are performed sequentially, I/O threads on the Slave seldom delay in dumping Binlog files in the production environment. In fact, starting from MySQL 5.5, MySQL officially provides the semi-synchronous replication plug-in. The Binlog of each transaction must be transmitted to the Slave and written to the RelayLog before it can be submitted. This architecture provides data integrity between the master and Slave, ensuring that the Slave can have a complete copy of data in case of a failure of the host. Therefore, the replication delay usually occurs during the execution of the SQL thread.

As you can see from the architecture diagram, in the earliest master-slave replication model, only one thread was responsible for performing Relaylog, which means that all operations performed on the master machine were played back sequentially on the slave machine. The problem is that if there is a lot of write pressure on the master, the playback speed from the master may not keep up with the master. (In addition, the MySQL architecture determines that the Binlog file is only written to and dumped to the slave machine during the Commit phase. This also causes execution delays in master/slave transactions. This problem is particularly evident in large transactions, but it is beyond the scope of this article.)

Since the problem of master-slave delay is that the RelayLog playback is too slow in a single thread, the solution to reduce master-slave delay is to improve the parallelism of RelayLog playback on the slave machine.

Parallel replication in 5.7

MySQL has introduced a relatively simple parallel replication scheme in 5.6, which has the following architecture:

(Picture from Teacher Jiang Chengyao’s blog)

The red box is the key to parallel playback. If the parallel playback function is enabled in 5.6, multiple Workthreads will be started, and the SQLThread responsible for playback will become a Coordinator to determine whether transactions can be executed in parallel and distributed to workthreads.

If the transaction belongs to a different Schema, is not a DDL statement, and there is no cross-schema operation, then it can be played back in parallel, otherwise it needs to wait for all Worker threads to complete before executing the current log.

This parallel playback is Schema level parallelism, which benefits if there are multiple schemas on the instance, whereas if there is only one Schema on the instance, the transaction cannot be played back in parallel and is slightly less efficient due to additional distributed operations. In practice, single library and multiple tables are more common.

Although parallel replication in 5.6 did not improve playback speed in most application scenarios, it became the basis for parallel replication in MySQL — RelayLog is played back in parallel on Slave, and the SQL thread is responsible for determining whether it can be played back in parallel and assigns it to the Work thread.

The Group Commit technology was introduced in 5.6 to solve the problem of insufficient concurrency caused by fsync during transaction Commit. In simple terms, fsync is called because the Binlog file must be written to disk during the transaction commit. This is an expensive operation. In the case of concurrent commit, each transaction obtains the log lock and fsync separately, resulting in the transaction actually writing to the Binlog file in a serial manner. This greatly reduces the concurrency of transaction commits.

The Group Commit technique adopted in 5.6 divides the Commit phase into Flush, Sync, and Commit phases. Each phase maintains a queue, and the first thread in the queue is responsible for executing the step. This effectively achieves the goal of fsynching the Binlog of a batch of transactions to disk at a time. Such a batch of simultaneous committed transactions are called transactions of the same Group.

While Group Commit is a parallel Commit technique, it unexpectedly solves one of the challenges of parallel playback from transactions on the machine — how to determine which transactions can be played back in parallel. If a batch of transactions is committed at the same time, they must not have mutually exclusive holding locks or execution dependencies, so they must be played back in parallel.

As a result, MySQL 5.7 introduced a new type of parallel playback, determined by the slave_parallel_type parameter. The default value DATABASE will be used for parallel playback at the SCHEMA level in version 5.6. Set LOGICAL_LOCK to groupCOMMIT-based parallel playback. Transactions within the same Group are played back on the Slave in parallel.

The value of last_COMMITTED and sequence_number can be used to specify the group to which a transaction belongs. Last_committed indicates the number of the previous committed transaction. Sequence_number indicates the sequence number of the committed transaction, which increases monotonically in a Binlog file. If two transactions have the same last_COMMITTED value, they are committed within the same group.

root@localhost:~# mysqlbinlog mysql-bin.0000006 | grep last_committed

#150520 14:23:11 server id 88 end_log_pos 259 CRC32 0x4ead9ad6 GTID last_committed=0 sequence_number=1

#150520 14:23:11 server id 88 end_log_pos 1483 CRC32 0xdf94bc85 GTID last_committed=0 sequence_number=2

#150520 14:23:11 server id 88 end_log_pos 2708 CRC32 0x0914697b GTID last_committed=0 sequence_number=3

#150520 14:23:11 server id 88 end_log_pos 3934 CRC32 0xd9cb4a43 GTID last_committed=0 sequence_number=4

#150520 14:23:11 server id 88 end_log_pos 5159 CRC32 0x06a6f531 GTID last_committed=0 sequence_number=5

#150520 14:23:11 server id 88 end_log_pos 6386 CRC32 0xd6cae930 GTID last_committed=0 sequence_number=6

#150520 14:23:11 server id 88 end_log_pos 7610 CRC32 0xa1ea531c GTID last_committed=6 sequence_number=7

#150520 14:23:11 server id 88 end_log_pos 8834 CRC32 0x96864e6b GTID last_committed=6 sequence_number=8

#150520 14:23:11 server id 88 end_log_pos 10057 CRC32 0x2de1ae55 GTID last_committed=6 sequence_number=9

#150520 14:23:11 server id 88 end_log_pos 11280 CRC32 0x5eb13091 GTID last_committed=6 sequence_number=10

#150520 14:23:11 server id 88 end_log_pos 12504 CRC32 0x16721011 GTID last_committed=6 sequence_number=11

#150520 14:23:11 server id 88 end_log_pos 13727 CRC32 0xe2210ab6 GTID last_committed=6 sequence_number=12

#150520 14:23:11 server id 88 end_log_pos 14952 CRC32 0xf41181d3 GTID last_committed=12 sequence_number=13

In the binlog file, sequence_number 1-6’s last_COMMITTED value is 0, which indicates that sequence_number 1-6’s last_committed value is 0, which indicates that sequence_number 1-6’s last_committed value is 6, which indicates that sequence_number 1-6’s last_committed value is 6, which indicates that sequence_number 1-6’s last_committed value is 6.

The logical_lock-based feature introduced in 5.7 greatly improves the playback speed on the slave machine when the host is under high concurrency pressure, basically controlling how the commit is performed on the host and how the playback is performed on the slave machine.

WriteSet in MySQL MGR

However, in 5.7, parallel replication based on the logical clock Logical_Clock still has some drawbacks. For example, transactions must be committed in parallel on the master before they can be played back in parallel on the slave. If there is not much concurrency pressure on the master, then the benefits of parallel replication cannot be enjoyed. Binlog_group_commit_sync_delay and binlog_group_COMMIT_SYNc_NO_delay_count are introduced in 5.7. Increase the group commit rate on the Master by making Binlog wait a little while before performing fsync. However, the speed of concurrent playback from the top depends on the concurrent commit of the master.

MySQL 8.0 introduces a new mechanism to determine whether a transaction can be played back in parallel. It determines the playback order of the slave machine by detecting whether there are write conflicts during the running of the transaction. This makes the degree of concurrency on the slave machine no longer dependent on the host.

In fact, this mechanism has been quietly used in MySQL 5.7.20. An important feature introduced in 5.7.20 is Group Replication, which distributes binlogs between multiple MySQL nodes using the Paxso protocol, so that a transaction must be committed successfully on most nodes (N/2+1) in the cluster.

To support multi-master writes, MySQL MRG goes through a Certify phase to determine whether transactions in the Binlog are written to RelayLog after the Binlog distribution node completes. Certify phase uses WriteSet to verify conflicts between transactions, and sets last_COMMITTED of non-conflicting transactions to the same value when writing RelayLog.

For example, in 5.7.20, do the following:

> -- create a group replication cluster.

> STOP GROUP_REPLICATION; START GROUP_REPLICATION;

Query OK, 0 rows affected (9.10 sec)

> -- All the next commands on the primary member of the group:

> CREATE DATABASE test_ws_mgr ;

Query OK, 1 row affected (0.01sec)

> CREATE TABLE  test_ws_mgr.test ( id int primary key auto_increment, str varchar(64) not null );

Query OK, 1 row affected (0.01sec)

> INSERT INTO test_ws_mgr.test(`str`) VALUES ("a");

Query OK, 1 row affected (0.01sec)

> INSERT INTO test_ws_mgr.test(`str`) VALUES ("b");

Query OK, 1 row affected (0.01sec)

> INSERT INTO test_ws_mgr.test(`str`) VALUES ("c");

Query OK, 1 row affected (0.01sec)

The above code creates a database and an InnoDB table in an MGR cluster and inserts three records. At this point, querying the Binlog on the Primary node might yield the following result:

# mysqlbinlog mysql-bin.N | grep last_ | sed -e 's/server id.*last/[...] last/' -e 's/.rbr_only.*/ [...] / '

# 180106 19:31:59 [...]. last_committed=0 sequence_number=1 [...] -- CREATE DB

# 180106 19:32:02 [...]. last_committed=1 sequence_number=2 [...] -- CREATE TB

# 180106 19:32:05 [...]. last_committed=2 sequence_number=3 [...] -- INSERT a

# 180106 19:32:08 [...]. last_committed=3 sequence_number=4 [...] -- INSERT b

# 180106 19:32:11 [...]. last_committed=4 sequence_number=5 [...] -- INSERT c

As you can see, these operations have different last_committed values in serial order in a Session. Normally, these binlogEvents should be played back in serial on the slave machine as well. Let’s take a look at RelayLog in the MGR cluster:

# mysqlbinlog mysql-relay.N | grep -e last_ | sed -e 's/server id.*last/[...] last/' -e 's/.rbr_only.*/ [...] / '

# 180106 19:31:36 [...]. last_committed=0 sequence_number=0 [...]

# 180106 19:31:36 [...]. last_committed=1 sequence_number=2 [...] -- CREATE DB

# 180106 19:31:36 [...]. last_committed=2 sequence_number=3 [...] -- CREATE TB

# 180106 19:31:36 [...]. last_committed=3 sequence_number=4 [...] -- INSERT a

# 180106 19:31:36 [...]. last_committed=3 sequence_number=5 [...] -- INSERT b

# 180106 19:31:36 [...]. last_committed=3 sequence_number=6 [...] -- INSERT c

Interestingly, in the RelayLog of the Secondary node, these transactions have the same last_committed value, which means that these transactions can be played back in parallel in the MGR cluster.

In MGR, it is the WriteSet technology that is used to detect whether there are write conflicts between different transactions and replan the parallel playback of transactions. This technology has been moved to the Binlog generation stage in 8.0 and adopted into the master-slave replication architecture.

MySQL 8.0 parallel replication

With that said, MySQL 8.0 is finally on the table. From the above description, the reader should have an outline of how parallel replication optimization works in MySQL 8.0. With writeset-based conflict detection, Binlog generation on the host is no longer based on group commit, but based on the transaction itself update conflict to determine the parallelism.

In MySQL 8.0, this release introduced the parameter binlog_transaction_DEPandency_tracking to control how transaction dependencies are determined.

This value has three options:

  • The default COMMIT_ORDERE means that transaction dependencies continue to be determined using the group-commit method described in 5.7;

  • WRITESET represents the use of write sets to determine transaction dependencies;

  • There is also a WRITESET_SESSION option that uses WriteSet to determine transaction dependencies, but transactions within the same Session do not have the same last_COMMITTED value.

MySQL uses a vector<uint64> variable to store the HASH value of the committed transaction. The values of the modified primary key and non-empty UniqueKey of all committed transactions are hashed against the values in the vector. This determines whether the currently committed transaction updates the same row as the already committed transaction, and determines dependencies accordingly. The size of this vector is controlled by the binlog_transaction_dependency_history_size parameter. The value ranges from 1 to 1000000. The default value is 25000.

Parameter transaction_write_set_extraction control test transaction dependencies at the same time using the HASH algorithm has three values OFF | XXHASH64 | MURMUR32, For example, if binlog_transaction_depandency_tracking is WRITESET or WRITESET_SESSION, the value cannot be OFF and cannot be changed.

WriteSet determines whether a transaction can be played back in parallel by detecting whether two transactions have updated the same record, so committed transactions need to be saved at run time to record which rows were updated by historical transactions. The parameter for recording historical transactions is binlog_transaction_dependency_history_size. The larger this value is, the more committed transactions are recorded, although it is important to note that this value does not refer to the transaction size, but to the number of transaction updates tracked. After WRITESET or WRITESET_SESSION is enabled, MySQL identifies and logs transaction updates in the following manner.

HASH (DB name, TABLE name, Key name, KEY_VALUE1, KEY_VALUE2…) if the row currently updated by the transaction has a Primary Key. Add to the vector Write_set of the current transaction.

HASH (DB name, TABLE name, Key name, KEY_VALUE1) if the row currently updated by the transaction has a Unique Key that is Not NULL… Add to the write_set of the current transaction.

If the updated row has a FOREIGN KEY constraint and is not empty, the HASH of the FOREIGN KEY information and VALUE is added to the write_set of the current transaction. If the primary key of the table being updated by the transaction is a foreign key of some other table, set the current transaction has_relATED_foreign_key = true; If the transaction updates a row and no data is added to the write_set, mark the current transaction has_missing_key = true.

When conflict detection is performed, has_relATED_foreign_key and has_missing_key are checked first. If true, the COMMIT_ORDER mode is switched to COMMIT_ORDER. Otherwise, the HASH value in the transaction’s write_set is compared to the write_set of the committed transaction.

If there is no conflict, the current transaction shares the same last_commited value with the last committed transaction, otherwise all write_sets committed prior to that conflicting transaction will be deleted from the global committed write_set. COMMIT_ORDER evaluates last_committed.

If the value of write_set (last_COMMITTED) exceeds binlog_transaction_dependency_HISTORy_size, the value of write_set (last_committed) exceeds binlog_transaction_dependency_HISTORy_size. Clears the global WRITe_set of committed transactions.

In terms of detection conditions, this feature relies on primary keys and unique indexes. If a transaction involves a table with no primary keys and no unique non-empty indexes, there will be no performance improvement from this feature. In addition, you need to set the Binlog format to Row format.

MySQL High Availability tests replication performance with WriteSet enabled. Here are the results. If you are interested, visit the original blog.

During the test, 100W transactions were performed on the host through Sysbench, and then the Slave replication thread was started. The test environment was executed on Xeon E5-2699-V3 16-core host. The test results are as follows:

As you can see, WRITESET performs best when there are few client threads, and there is little difference between WRITESET_SESSION and COMMIT_ORDER when there is only one connection.

Five, the conclusion

It can be seen from MySQL Hight Availability test that the playback speed of RelayLog on Slave is significantly improved after transaction dependence based on WriteSet is enabled. The playback speed of RelayLog on the Slave is no longer dependent on the parallelism of the Master’s submission, allowing the Slave to use its maximum throughput capacity. This feature is especially effective when the Slave replication is resumed after a period of time.

This feature makes it possible to have higher throughput on the Slave than on the Master. It is also possible to have commit scenarios on the Slave that are not previously committed on the Master while maintaining transaction dependencies. The order of transactions committed may change on the Slave.

While this could happen with parallel replication in 5.7, it will be more common in 8.0 due to the higher concurrency capabilities on the Slave.

Normally this is not a big deal, but if you are doing an incremental binlog-based backup on the Slave, you may need to ensure that the commit order is the same on the Slave as on the Master. In this case, you can enable slave_preserve_COMMIT_ORDER. This is a parameter introduced in 5.7 that ensures that parallel playback threads on the Slave are committed in the order written to the RelayLog.

  • http://jfg-mysql.blogspot.jp/2018/01/an-update-on-write-set-parallel-replication-bug-fix-in-mysql-8-0.html

  • http://jfg-mysql.blogspot.jp/2018/01/write-set-in-mysql-5-7-group-replication.html

  • https://mysqlhighavailability.com/improving-the-parallel-applier-with-writeset-based-dependency-tracking/

Author: Wild goose to the south

Source: https://zhuanlan.zhihu.com/p/37129637