Dmall- New Business and Shared Platform R&d Department

background

In the implementation of the multi-point unit project, the database will be deployed in the central and remote units. In order to realize the closed loop of the business process in the unit, the relevant data writing operations need to be synchronized in different places. At the same time, this scenario has different requirements from ordinary MySQL master-slave/master-master synchronization.

The main differences are:

1. The data between units needs to be filtered based on table and field values, and only some data needs to be synchronized.

2. Filter out all DDL operations to ensure table structure security.

3. Multiple unit data tables are summarized to the center and two-way replication exists.

To meet the above requirements, the database team developed database Binlog parsing, write event filtering, SQL reconstruction, merge execution and other core modules to meet the data synchronization in the unitary scenario. On this basis, the synchronous link management background is integrated and developed to provide a guarantee mechanism for rapid deployment/real-time management/availability. Become a more complete Data Replication Center (DRC).

At present, with the rapid development of the company’s business, more and more database/table migration and split scenarios, we derived a set of automatic tools to ensure smooth data migration with the implementation of DRC.

Realize the principle of

Data synchronization is divided into full synchronization and incremental synchronization.

For full synchronization, run the show master status command to record the current site /GTID information. Then the full data of the source MySQL table is pulled and written to the target MySQL using the replace into statement. After full synchronization, the incremental synchronization process is started. The incremental synchronization uses the loci /GTID information obtained at the beginning of full synchronization to ensure data integrity. The Binlog generated during full synchronization guarantees data consistency due to the idempotent nature of the reconstructed SQL. Incremental synchronization core module is divided into Replicator and Applier. The brief structure is shown as follows:

The Replicator is responsible for establishing a connection to the source MySQL. It sends a master/slave synchronization request to the source MySQL using the start site /GTID obtained in the full synchronization phase and starts Binlog event stream listening. Based on write events such as WRITE_ROWS_EVENT/UPDATE_ROWS_EVENT/DELETE_ROWS_EVENT, the Replicator can sense whether the source database writes rows that meet the filtering rules and the table/column values that meet the filtering rules. The Replicator parses the Protobuff Event data declared by the Binlog to an internal buffer queue for external applications to pull.

The Applier requests the Event data in the internal buffer queue from the Replicator. Each data body contains multiple operations on the table. The Applier internally generates and merges the changed rows in the data body to ensure that the execution sequence is consistent with that of the source end and the SQL execution efficiency is also taken into account. The brief description is as follows:

The Applier sends an ACK notifying the Replicator each time it completes the execution. The Replicator saves the site /GTID corresponding to the execution data to ZooKeeper, ensuring that the Incremental synchronization can continue to listen on the Binlog from the last successfully executed site.

Cluster management and availability assurance

With data synchronization, DRC can only be deployed manually. For visual management and continuous integration, we developed the drC-Manager management backend. It can configure and modify all center and unit links, check running status, rapid deployment, upgrade core modules and other functions. DRC-Manager enables DRC to be an efficient platform operation tool.

Drc-manager implementation and HA

The DRC-Manager cluster is deployed on the servers of all DRC core modules. The primary node is selected by ZooKeeper, and the primary and secondary nodes pass through the heartbeat mechanism to control the running status of the secondary nodes in the cluster. The manager in a single node manages the health report periodically sent by all links of the device, and monitors/guards/alarms during runtime.

When the master node of Manager finds that there is a failure of the slave node, the link running on the failed node can be automatically migrated to other nodes in the unit to ensure availability.

DRC Interworking with the database MHA

In order to ensure that DRC components can sense environmental changes in real time and switch source and destination databases when the database on the line is switched from master to slave, we designed a linkage mechanism between MHA and DRC. Inform the DRC-manager of the old and new master library information (IP,port, site, GTID, etc.) in the trigger program (master_IP_failover) called by MHA when HA occurs. Drc-manager queries affected links in all links, changes synchronization configuration information, switches the source/destination MySQL to the available instance, and starts incremental synchronization from the reliable site /GTID.

Data table smooth migration tool

Currently, the number of databases/tables running online is increasing rapidly. Dbas are often required to migrate tables or divide databases and tables by r&d departments to meet the vertical split of business applications or horizontal expansion of tables. In the past, DBAs had to undergo tedious configuration and data cleaning operations. DRC technology has natural support for table migration/split feature. In this case, we developed a vertical business split/database split table automation tool to make table migration and split smooth.

Since there is no Binlog filtering problem in data migration, we combine the full and incremental Replicator/Applier, and add the target MySQL instance routing and table routing functions. The figure above shows a table_dmall data table split into two libraries and 100 sub-tables according to the order_NO column. According to the verification, two single tables with a data volume of 100-400 million were split online at the same time into 200 sub-tables, and the synchronization time was about 2 hours. During the migration and switchover, online services were not affected.