(image via https://pixabay.com/en/military-stealth-bomber-refueling-602729/)

In the previous article from SQL Server to MySQL (1) : Heterogeneous Database migration – Log4D, we introduced the basic problems and the full solution of heterogeneous database migration from SQL Server to MySQL. The full solution can meet the requirements of some scenarios, but this solution is still flawed: the migration process requires downtime, and the duration of the downtime depends on the amount of data. For the core business, downtime means loss. For example, a user-centric service that uses a full solution with its data volume can result in several hours of downtime during migration. Once the user center goes out of service, almost all systems that depend on this central service go down.

Can you do a seamless online migration? Does the system require no or only very brief downtime? As a technical person with pursuit, we must think of ways to solve the above problems.

Principles and processes of online migration

For Oracle to MySQL, there are mature solutions on the market – YuGONG project of Alibaba. Before we solve the online migration of SQL Server to MySQL, let’s take a look at how Yugong does the online migration of Oracle.

The following figure shows yugong’s incremental migration process for Oracle to MySQL:

There are four steps:

  1. Incremental data collection (creating incremental materialized views of Oracle tables)
  2. Make a full copy
  3. Incremental replication (data verification can be performed in parallel)
  4. Stop writing in the old library and cut to the new library

Oracle Materialized View is a mechanism provided by Oracle. A materialized view is a copy of the master library at a point in time. It can be understood as a Snapshot at that point in time. When the data of the master library is continuously updated, materialized views can be updated in a separate batch, called Refreshes. Changes between a batch of Refreshes correspond to changes in the database content. Materialized views are often used to copy data from a master library to a slave library and to cache complex queries in a data warehouse.

Materialized views can be configured in a variety of ways, but refresh mode and refresh time are important here. There are three refresh modes:

  • Complete Refresh: Delete all data records and regenerate the materialized view
  • Fast Refresh: Refresh incrementally
  • Force Refresh: Use Complete Refresh and Fast Refres depending on the conditions

The Refresh mechanism has two modes: Refresh-on-commit and Refresh-on- Demand.

Based on materialized view, Oracle can complete the acquisition of incremental data, so as to meet ali’s online data migration. To generalize this technical question, what are the features required for online incremental migration? We reached the following conclusion (for the source database) :

  • Incremental changes: Support incremental access to incremental database changes
  • Latency: The time required to obtain the changed data should be as low as possible
  • Idempotent consistency: Should changing data be consumed idempotent, i.e. regardless of the state of the existing data in the target database

Back to the question we face, does SQL Server have a mechanism to satisfy these three features? The answer is yes, SQL Server officially provides CDC functionality.

How CDC works

What is CDC? CDC stands for Change Data Capture and is designed to address incremental Data. It is a new feature in SQL Server 2008. Previously, only after INSERT/after DELETE/After Update Trigger in SQL Server 2005 could be used to obtain data changes.

Here’s how the CDC works:

When a database Table changes, the Capture process captures the changes from the Transaction log and logs them to the Change Table. With this data, users can find out about the changes through specific CDC query functions.

CDC data structure and basic usage

CDC’s core data are those Change tables, here we show you what Change Table looks like, you can have an intuitive understanding.

To open the CDC function of a table, use the following functions: fruits

-- enable cdc for db
sys.sp_cdc_enable_db;
-- enable by table
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'fruits', @role_name = NULL;
-- list cdc enabled table
SELECT name, is_cdc_enabled from sys.databases where is_cdc_enabled = 1;
Copy the code

The CDC function has been enabled. If you want to check which tables have CDC function enabled, you can use SQL:

-- list cdc enabled table
SELECT name, is_cdc_enabled from sys.databases where is_cdc_enabled = 1;
Copy the code

Dbo_fruits_CT = cdc.dbo_fruits_ct = cdc.dbo_fruits_ct

.schema cdc.dbo_fruits_CT
name            default  nullable  type          length  indexed
-------------- ------- -------- ------------ ------ -------
__$end_lsn      null     YES       binary        10      NO
__$operation    null     NO        int           4       NO
__$seqval       null     NO        binary        10      NO
__$start_lsn    null     NO        binary        10      YES
__$update_mask  null     YES       varbinary     128     NO
id              null     YES       int           4       NO
name            null     YES       varchar(255)  255     NO
Copy the code

Fields starting with __ in this table are metadata recorded by CDC, and ID and name are original fields of FRUITS. This means that the CDC table structure corresponds to the original table structure one to one.

Next we do some business operations to make some changes to the database data and then look at the CDC’s Change Table:

-- 1 step
DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10);
-- 2 step
SET @begin_time = 'the 2017-09-11 14:03:00. 000';
SET @end_time   = 'the 2017-09-11 14:10:00. 000';
-- 3 step
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
-- 4 step
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_fruits(@begin_lsn, @end_lsn, 'all');
Copy the code

The operation meaning here is:

  1. Define four variables that need to be used in a stored procedure
  2. Begin_time/end_time is the string format time for Human Readable
  3. Begin_lsn and end_Lsn are Log Sequence numbers converted by the CDC function and represent the unique operation IDS of database changes
  4. Query CDC change data based on begin_lsn/END_lsn

The query data is as follows:

__$start_lsn          __$end_lsn  __$seqval             __$operation  __$update_mask  id  name
-------------------- ---------- -------------------- ------------ -------------- -- ------0000dede0000019f001a null 0000dede0000019f0018 2 03 1 apple 0000dede000001ad0004 null 0000dede000001ad0003 2 03 2 apple2  0000dede000001ba0003 null 0000dede000001ba0002 3 02 2 apple2 0000dede000001ba0003 null 0000dede000001ba0002 4 02 2 apple3 0000dede000001c10003 null 0000dede000001c10002 2 03 3 apple4 0000dede000001cc0005 null 0000dede000001cc0002 1 03 3 apple4Copy the code

$operation = ‘operation’; $operation = ‘operation’;

  • 1 = > delete
  • 2 = > insert
  • 3 => Data before update
  • 4 => Updated data

Based on the retrieved data, we can reconstruct the operation of the database during this period:

  • A newidTwo pieces of data that are 1/2
  • Updated theidIs 2
  • Insert theidIs 3
  • Delete theidIs 3

The CDC tuning

Having the CDC meant we were on the right track, and we breathed a little easier. However, in addition to understanding the principle and usage, we also need to have a deep understanding of the working mechanism of CDC, pressure measurement, tuning, and understanding of its limits and boundaries. Otherwise, once there is an uncontrollable situation online, it will bring huge losses to the business.

Let’s start by looking at the CDC workflow to see what core parameters can be adjusted:

Here’s how a CDC Job works:

  • The blue area indicates the maximum number of Log scans performed per Log scan. Maxscans number (maxscans)
  • The blue area is also controlled by the maximum scan transcation number:maxtrans
  • The light blue area is the scanning interval in seconds:pollinginterval

These three parameters balance CDC server resource consumption, throughput, and latency, and can be adjusted for specific scenarios such as large fields, wide tables, and BLOB tables to meet business needs. Their default values are as follows:

  • maxscanThe default value of 10
  • maxtransThe default value of 500
  • pollingintervalDefault value: 5 seconds

The CDC pressure measurement

With the core parameters that can be adjusted, we are about to run multiple forms of testing on the CDC. Before the manometry, we also need to identify key health indicators, which are:

  • Memory: Buffer-cache-hit/Page-life-expectancy/Page-split
  • SQL > select * from ‘batch-requets/sqL-compilations/SQL-re-compilations/transactions count’ WHERE
  • Resource cost: user-connections/processes-blocked/lock-waits/checkpoint-pages
  • Operating system layer: CPU usage and disk I/O

For space reasons, we can’t post all the test results, so here’s an example of inserting a million data (random data) under concurrent 30:

The test concluded that, under the default CDC parameters:

The opening/closing Process of CDC will result in several Process blocks, and the following (15K TPS) Process of heavy traffic request will result in about 20 Process blocks. During this process, the SERVER I/O/CPU does not fluctuate significantly. Mssql.sql-statisss.sql-compilations fluctuate violently when you enable or disable the SERVER. After the CDC is enabled, QPS/Page I/o does not fluctuate significantly under heavy traffic requests, nor does it fluctuate to server I/o/CPU. After the CDC is enabled, it can work normally under 16k TPS.

If performance is not up to par, there are some simple tuning guidelines:

  • Adjust maxScan maxTrans PollingInterval
  • Reduce insertion immediately after insertion
  • Avoid mass write operations
  • Restrict the fields that need to be logged
  • Close NET Changes whenever possible
  • Run cleanup when no task pressure
  • Monitor the size of log files and I/O pressure to ensure that disks are not written out
  • To set up the filegroup_name
  • Set fileGroup before enabling SP_CDC_ENABle_TABLE

Online migration mechanism of YUGONG

OK, cut to the current location, we already have the CDC tool, but this only provides a possibility, we need another tool to consume the CDC data and feed it to MySQL.

Good thing yugong is there. Yugong officially provides the encapsulation of Oracle to MySQL, and abstracts the Source/Target/SQL Tempalte interfaces. As long as we implement relevant interfaces, we can complete the consumption of data from SQL Server to MySQL.

We won’t expand on it here, but I will spend a special article on how to develop on YUGONG. To give you a preview, we have opened source the YUGONG version of SQL Server.

How to roll back

For projects such as database migration, we not only need to ensure one-way writes from SQL Server to MySQL, but also from MySQL to SQL Server.

The process also considers the elements of incremental writing: incremental consumption, delay, and idempotent consistency.

MySQL binlog can satisfy these three elements. Note that there are three modes of MySQL binlog: Statement based, Row Based, and Mixed. Only Row based can meet the requirement of idempotent consistency.

After confirming that this is theoretically possible, we also need a tool to read the binlog and convert it into a data format that SQL Server can consume and then write it to SQL Server.

We turn to Canal, another Alibaba project. Canal is a binlog incremental subscription & consumption component provided by Ali middleware team. The component is called because Canal provides the Canal-server application and the Canal Client Library. Canal simulates a MySQL instance and connects to the Master as a Slave. And then read the binlog in real time. It’s up to the user to decide what they want to do with the binlog after they’ve read it.

We designed a simple data stream based on Canal and added the following functions to YUGONG:

  • Write function of SQL Server
  • The ability to consume the Canal data source

The binlog in Canal Server can only make one-time consumption, and the internal implementation is a Queue. In order to meet the ability of repeated consumption of data, we also designed an additional link to put Canal’s data into the Queue, and the data can be repeated consumption at any time in the future. We chose Redis as the Queue and the data flow is as follows.

Best practices

Database migration to Windows, is the most do not allow error link. Applications are stateless and can be rolled back quickly if problems occur. However, database migration requires consideration, resource preparation, release process, and troubleshooting.

Considering that many business divisions need to go through this process, our project team solidified every step and formed a best practice. Our migration steps are as follows for your reference:

The big stage phase The matters Whether or not complete head Time consuming The start time Completion time note
day Stock data stage Create the MySQL database and prepare account resources DBA
Open the CDC DBA
Dump a Snapshot from Slave SQLServer to Backup SQLServer DBA
Backup SQL Server consumes data, ETL to MySQL DBA
Incremental data stage Verify that ETL data has been consumed and check the total number of data items DBA
CDC data is consumed from Slave SQLServer and continues to be written to MySQL DBA
Yugong was used to check the consistency of data throughout the day DBA
Check for inconsistent data and check manually 10 minutes later to confirm the CDC delay DBA
Check the total data entry DBA
Full inspection of sampling tables using YUGONG DBA
In the morning Application Release Phase Stop the SQL Server application Technical manager
Check that there is no connection to the SQL Server DBA
Yugong was used to check the consistency of data throughout the day DBA
Check the total data entry DBA
Enable mysql-based applications operations
Testing phase Test whether the application is working properly and return to all functions QA
(Temporarily added) Test ReadOnly DB application access status QA
Complete the stage Access to the traffic operations
(Optional) Rollback phase If a problem is found, the application is directly switched back to the SQL Server operations
After the data audit, new data compensation DBA
(Optional) During the rollback, use Canal to read the binlog and use Canal Client to restore it to SQL Server DBA

Reference

  • Materialized View Concepts and Architecture
  • Tuning the Performance of Change Data Capture in SQL Server 2008 | Microsoft Docs
  • Alibaba/Yugong: Data migration synchronization tool from Alibaba to Oracle (full + incremental, target MySQL/DRDS support)
  • Alibaba/Canal: Incremental subscription & consumption component of Alibaba mysql database Binlog. Ali cloud DRDS (https://www.aliyun.com/product/drds), alibaba TDDL secondary indexes, small table replication powerd by canal.

The original link: https://blog.alswl.com/2018/05/sql-server-migration-2/

Welcome to follow my wechat public account: Peep Leopard

3a1ff193cee606bd1e2ea554a16353ee