A lot of data migration can be done when there are multiple types of databases being used within an enterprise, or when there is a need to change the type of database. Some migrations are simple, while others may be more complex. Have you considered what issues you need to consider and solve in order to successfully complete a complex database migration task?


In previous jobs, I have migrated Oracle to Informix, Oracle and SQLServer, and Oracle to MySQL. In the current company, due to the relationship of O, we have done a lot of migration work, and many pitfalls, so I would like to share with you some practices in the process of migration.


Sharing Outline:

  1. Preparation and consideration before going to O

  2. Identify the target database

  3. Table and data object migration and tool comparison

  4. Migration of other objects

  5. Some performance parameters


I. Preparation and consideration before going to O



Due to various reasons such as cost and budget, the company decides to go to O. Before going to O, the company should first decide what to replace Oracle, what tools to guide the data from the source database to the target database, how to guide and so on. How to process incremental data of derivative process. How to handle differences between source data and targets, data types, database objects such as views, stored procedures, and triggers, and how to handle differences without affecting source database performance. The data comparison after the derivation and the performance of the application after the data is correct should be considered.


Second, determine the target database



Before we did the data migration, we first confirmed the target database, which database to move to. After some research, we chose MySQL from many aspects such as speed and popularity. Because of the belief that the acquisition of Oracle will be better and better.


Of course, I also thought about using PosgreSQL, but I did a test and found that the QPS of MySQL5.7 was higher than that of PG with the same configuration, so I chose MySQL because of the performance requirements of online transactions. After choosing MySQL, the choice of branches and versions of MySQL is also a headache. Percona has added a number of performance related patches, MariaDB supports more engines, and the official version is available to meet current requirements. On the conservative principle, our core database will eventually use the official version, some of the less core databases, and other branches will also be used.


Because of the support relationship of MyCat, version 5.6 was finally selected (currently MyCat1.6 does not support MySQL5.7 very well). In order to achieve a stable architecture like Oracle’s DG/OGG, we made the MySQL architecture into a dual-room MHA, and used MyCat to do read and write separation. Similarly, because Oracle has applications running at the same time, in order to disperse the pressure of Oracle, all synchronization operations are also carried out in the standby database and the OGG end of the machine room.


Table and data object migration and tool comparison



After selecting the right DB to replace Oracle, the next step is to select an appropriate migration tool to do the migration. We spent a lot of time and effort on the choice of migration tools. Migration is a long and difficult task, and we have gone through different stages and used different methods in the process of migration. After upgrading from the most elementary load CSV to my own program, I went to the official recommended tools of Oracle and MySQL, and finally TRIED some ETL tools. After being destroyed by so many tools, FORTUNATELY, I was able to use different ways in different situations.


Next we will give a brief introduction to each of them and some of the problems encountered in their use.


1, SQL LOAD



At the very beginning, we only carried out the migration work of a certain project. Due to the lack of time, we did not select and use the migration tool. The simplest method was SQL LOAD.


All the operation steps are easier than putting an elephant in the refrigerator. It is as simple as two steps. The first step is to import Oracle data into CSV or SQL, and then load or source into MySQL.


There are a number of tools you can use to convert Oracle data into CSV or SQL, such as SQL Developer or Toad, but I prefer Spool. You’ve probably used Spool, which can export content toa specified file in combination with a set and then select a reasonable row and column separator. You can now generate a CSV file.


The advantage of using SQL LOAD is that it is fast and super simple, but it also has many disadvantages. It is difficult to automate and spread across many tables. It’s also hard to copy a comment field like Comments.


Let’s look at a simple example:



The first step is to create a table in Oracle, which is very simple with only four columns, and then insert three data to check the contents.



Made some simple queries that might be used.



If you look at the contents of the exported Spool, it will be more complicated than this, with more function handling for line breaks, time, LOBs, and so on. And then let’s take a look at that.



Then I create the same table in MySQL to load the data into. Load syntax is not what we are going to share today, but what it does is load a file into a table. Row and column separators can be specified. You can see that three lines are loaded, and three warnings are given, one for the second line and two for the third line, respectively. An empty string is passed for the int column and the time column is intercepted. Look at the data in the table and find it is not the same as expected.



Then query the Oracle query again and find that the results are different.


This is because in MySQL if an int is inserted empty, the result is automatically converted to 0.


This is clearly stated in the official documentation:

An empty field value is interpreted different from a missing field:

For string types, the column is set to the empty string.

For numeric types, the column is set to 0.

For date and time types, the column is set to the appropriate “zero” value for the type.



SQL > insert null; SQL > insert null; SQL > insert null; SQL > insert null; Also, we can set it to a strict mode that does not allow null insertions for ints, as discussed in sql_mode below.


2, Python



After moving part of the data, I feel that load data is simple and fast, but it does not hide defects. There are always such and such problems, and a lot of data repair work is often accompanied by migration.


Soon, we have abandoned this operation, here to explain SQL LOAD operation because speed is fast and does not depend on other components, so is suitable for complex data types are not single table operations, and then write python code to replace it to complete the operation of the data migration, use python actually very simple, also can be divided into three steps, The first step is to create the configuration table, and then map with the MySQL table to identify whether it is full or incremental, and if it is incremental, what is used as the increment. The second step is to conduct code, code and code according to the mapping. Finally, write crontab according to different input parameters and then schedule it.


Python allows you to convert some data, allows you to configure some options more flexibly, allows you to have more logical control, and has some drawbacks: it’s much slower (though only slower than Load, which is still much faster than Java). Handling exceptions also takes a lot of code logic, as well as writing code.


We can take a quick look at its implementation:



This code snippet shows the logic for incremental synchronization of data for each day.



This is the log generated after each batch run. As you can see, warning and error are listed, and the number of rows is also counted. It can be said that it is a good small product. It can be seen that 6W data uses 4S and load to be slower, but it is faster than Java and so on.


3, OGG



We looked at OGG because it wasn’t too slow, but because you had to implement strong logic in your own code, and because some of the requirements had to be synchronized to MySQL in real time before Oracle was completely offline. Just to be clear, OGG is used in scenarios that require real-time and possibly write back data.


The usage of OGG is very simple, as long as you configure the Oracle end, configure the MySQL end, and then the corresponding process can be up. But people who have used OGG know that configuring a set of OGG itself is very troublesome, and then synchronization between heterogeneous databases, tuning and available requires a long time of configuration time, so I roughly say the practice, unless there is really such a hard requirement, or not recommended use.



A brief description of the process and considerations of using OGG:


Version 5.6 requires OGG 12.1.2 to support


2. DDL replication between heterogeneous databases is not supported

  • Synchronization from Oracle to MySQL is a heterogeneous architecture that does not support DDL synchronization, including adding and deleting fields, adding and deleting indexes, renaming tables, and analyzing statistics of tables.

  • If DDL operations on both the source and target end are involved, you need to manually perform operations on both the source and target ends.


3. Defgen must be configured and files must be stored in the same directory.


4. If it is two-way, you must set the MySQL server binglog to row

binlog_format: This parameter sets the format of the logs. It must be set to the value of ROW, which directs the database to log DML statements in binary format. Any other log format (MIXED or STATEMENT) causes Extract to abend.


5. GoldenGate only supports InnoDB engine for MySQL. Therefore, when creating MySQL tables, specify the table as InnoDB engine.


create table MySQL (name char(10)) engine=innodb;


All the help can be found in online help

http://docs.Oracle.com/goldengate/c1221/gg-winux/GIMYS/system_requirements.htm#GIMYS122


MySQL Migration Toolkit



OGG is a log-based structured data replication tool that parses the online or archived logs of the source database to obtain incremental changes in the data, and then applies these changes to the target database. Yes is a sure thing.


MySQL also provides a tool for data migration between heterogeneous databases, from MySQL to other databases, or from other databases to MySQL. This tool is the MySQL Migration Toolkit. This tool can be downloaded separately and is also integrated into MySQL WrokBench. However, if downloaded separately, only the Windows version is available.


https://downloads.MySQL.com/archives/migration/


This is a Java-based program, so depending on the JAR package, the first step to using it is to load an ODbc.jar. Then you can configure the source end and the target end of the connection, select the object to import (can contain views, but generally there will be a sub-query error), import can be done.


The advantage of using it is that you can automatically create tables on the MySQL side, but it may be possible to automatically convert the type of the problem, need to participate in the process, for example, Oracle usually set the default value of Timestamp data sysdate, but in MySQL is not recognized.


The disadvantage is that only the Windows platform has it, and it is very likely to hang when handling large amounts of data. So I feel that it is suitable for a small batch of data imported at a time.


5, KETTLE



The above tools have been used one step at a time, but they are not perfect. There are always some shortcomings. Next, we recommend the final etL tool: KETTLE.


It is a piece of software written in pure Java and, as its name suggests (Kettle), is designed to put all sorts of data into a kettle and then flow it out in a specified format. You can also use DS(datastage) or Informatica. But these two are paid, and Kettle is free and open source.


Here is only the simplest of its functions to meet the requirements of migrating data from Oracle to MySQL.


Load the MySQL jar from the MySQL database. Note that you may need to load different JARS if your MySQL version is different. The second step is also to configure the connection information to ensure that both your source and destination are connected successfully, and the final step is simply drag and drop. And then finally run.


Its advantages are that it is faster to configure than OGG, but the real-time synchronization can be achieved through Job, and the processing speed is similar to that of Python. It does not need to write the mapping relationship by itself, and provides a graphical interface. You can also create tables (with a new Java script) and perform type conversions at the same time as the Migration Toolkit, but log in more detail. But the learning cost may be a little higher, to understand some Java error to facilitate debugging.


Let’s look at a quick demo:



To open this screen, run spoon.sh. The view line shows the transformation’s name, data source, process, etc., and the middle area is the action you drag out, one input, one output. This is all part of a simple data migration.



Open the input content, which is a very simple SQL extract data on the source database, of course, this SQL can also be generated by drag, similar to Congos drag report generation. Don’t use a semicolon!



Output content is much richer, select the target data source, and the automatic mapping column information, as well as whether to clear between migrations, migrations will be stopped if there is a problem.



It is shown here that it goes beyond the Migration Tools log granularity to the row level to analyze problems more quickly.



Here is the detailed log output. Generally, if batch processing is timed, it is redirected to a specific log and then treated as an email.


4. Migration of other objects



The above uses a long length to introduce several migration tools, each migration way is different, in the right scenario to choose their own method for operation. However, all that has just been migrated are tables and data objects. We all know that there are other objects in the database, such as views, materialized views, stored procedures, functions, packages, or an index, and whether the same SQL needs to be rewritten is a factor to consider.


Now let’s look at how other objects migrate.


1, the view


Views cannot be nested with subqueries in MySQL:

       create view v_test as select * from (select * from test) t;

       ERROR 1349 (HY000): View’s SELECT contains a subquery in the FROM clause


The solution is to create a nested view:

       create view v_sub_test as select * from test;

Query OK, 0 rows affected (0.02sec)

       create view v_test as select * from v_sub_test;

       Query OK, 0 rows affected (0.00 sec)


2. Materialized view


Materialized views are used to pre-calculate and save the results of time-consuming operations, such as table joins or aggregations, so that when a query is executed, these time-consuming operations can be avoided and results can be obtained quickly. But MySQL doesn’t have this functionality. Through event scheduling and stored procedure simulation of materialized view, the difficulty of implementation lies in updating materialized view. If high real-time update is required and the table is too large, there may be some performance problems.


Trigger, stored procedure, package



1) Oracle allows or when creating triggers, but MySQL does not. So if you need to write two when you’re migrating.


MySQL does not support %type. MySQL does not support %type. This is used too often in Oracle and is a good habit.


3) Elseif’s logical branching syntax is different, and there is no for loop in MySQL.


4) Do not return a cursor in MySQL and assign an object to a cursor when it is declared.


5) Oracle uses packages to classify stored procedures, and common variables/types can be defined in packages, which is convenient for programming and reduces the compilation overhead of the server. There’s no such concept in MySQL. So MySQL functions cannot be overloaded.


6) Predefined functions. There are no to_char() to_date() functions in MySQL, and not all Oracle are good, like substring() and load_file(), which MySQL has but Oracle doesn’t.


MySQL > select * from ‘set’ where ‘=’; And in the MySQL not | | to concatenate strings.


MySQL comments must require a space between them and the content.


9) Only leave can be used to exit the MySQL stored procedure. Return cannot be used.


MySQL can also define and handle exceptions, but the object name is different.



4. Paging statements



The limit keyword is used in MySQL but the rownum keyword is used in Oracle. So any statements that are page-related need to be adjusted.


5, the JOIN



This can be a headache if you have a lot of (+) in your SQL. I need to rewrite it.


6, group by statement


MySQL does not use a group by column as a group by column. But the result may not be the desired result. The reason for MySQL’s strange behavior is the sql_mode setting, which we’ll talk about in more detail later. However, when migrating from Oracle to MySQL, the group by statement will not run, and the migration may take a long time to adjust.


7, bitmap index


In Oracle, bitmaps can be used to implement Bloom filtering and optimize some queries. This feature also provides good support for Oracle data warehouse related operations. However, MySQL does not have this index. So SQL that was previously optimized for Bitmap in Oracle may have significant performance problems in MySQL.


There is no good solution at present, we can try to build btree index to see if it can solve the problem. Requiring MySQL to provide a bitmap index was submitted as a mid-level problem in the MySQL bug library, but it has not been solved yet.


8, Partitioned Table



Special handling is required. Unlike Oracle, MySQL treats partition keys as part of primary and unique keys. To ensure no impact on application logic and queries, the target schema must be redefined with appropriate partitioning keys.


9, role,



There was no role object before MySQL8.0. If you encounter a role during migration, you will need to assemble SQL to reassign the role. But the nice thing about MySQL is that the users of MySQL are host-dependent.


10. Emoticons and special characters


In Oracle, we usually choose AL32UTF8 character set, which can be used for special characters and emojis, because in MySQL, some tables contain a large number of emojis, but the setting of UTF8 does not work, after the passthrough, all the question marks, and then changed to UTF8MB4. Therefore, it is recommended to install all DB as UTF8MB4 by default.


There are many differences between Oracle and MySQL, such as flashback, AWR, and so on. Here are just a few related to migration work.


5. Data verification



After data migration is completed, how to ensure that data migration is correct, without omissions and errors is a difficult problem. The difficulty here is not to implement it, but to automate it, to achieve the goal of saving manpower is a little difficult, because the two data types are different, the amount of data is large, write some scripts to do the check effect is not good.


Our data checking is mainly divided into log and warning during the import process, SHOW WARNINGS and errors during load, and check each error information in detail when using Python, OGG, Kettle and other tools.


1, the count (*)



After the migration or increment operation is complete, use the simplest count(*) to check, check MySQL and Oracle for comparison. If the data amount is consistent, verify the data content. Due to the large amount of data, only sampling detection was carried out. Manual validation If there are no problems, you can use the application to test a copy of the production database, test the application on the standby repository, and then check again.


2. Etl tools


Another recommended method is to use ETL tool to configure data sources of MySQL and Oracle, extract data respectively, and then generate Cube for multi-latitude report presentation. You can see at a glance if the data is skewed.


Data integrity verification is very important, do not be afraid to verify the error after a long time to extract synchronous operations step. Because once the error is not verified and the data is used but messed up, the consequences will be more serious.


3, SQL_MODE



https://dev.MySQL.com/doc/refman/5.5/en/sql-mode.html


MySQL server can work in different SQL modes and apply these modes in different ways for different clients. This allows applications to tailor server operations to their needs. This type of schema defines the SQL syntax that MySQL should support and what validation checks should be performed on the data.


  • TRADITIONAL


Set a “strict mode” that limits acceptable database input data values (similar to other database servers), which is simply described as “error, not warning” when inserting incorrect values into columns.


  • ONLY_FULL_GROUP_BY


MySQL sql_mode=default is not ONLY_FULL_GROUP_BY, that is, a select statement. MySQL allows expressions in target list other than aggregate functions and group by column to be output. The value of this expression may become undefined after the group by operation (MySQL actually represents the value of the corresponding column in the first row of the group).

The values of all columns in the select list are explicit semantics.


Simply put, in ONLY_FULL_GROUP_BY mode, the value in the target list is either the result of the aggregate function or the value of the expression in the group by list.


Without Regard to any trailing spaces

All MySQL collations are of type PADSPACE. This means that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces. “Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant.


MySQL collation rules are PADSPACE. MySQL compares CHAR and VARCHAR values without trailing Spaces, regardless of server configuration or MySQL version.


  • explicit_defauls_for_timestamp


The TIMESTAMP type in MySQL is a bit different from the other types (without explicit_defaults_for_timestamp=1). By default, if the TIMESTAMP column does not explicitly specify null, This column is automatically assigned a NOT NULL attribute (other types of columns allow null values if not explicitly specified). If a null value is inserted into this column, the value of the column is automatically set to current TIMESTAMP. The first TIMESTAMP column in the table is automatically assigned DEFAULT if no null attribute or DEFAULT value is specified and no ON UPDATE statement is specified.


CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP properties. Any TIMESTAMP column after the first TIMESTAMP column that does not specify null and does not specify a DEFAULT value is automatically assigned DEFAULT ‘0000-00-00 00:00:00’. If no value is specified for the column in the INSERT statement, ‘0000-00-00 00:00:00’ is inserted in the column without warning.


If we specify explicit_DEFAULts_for_TIMESTAMP =1 in the configuration file at startup, MySQL will process the TIMESTAMP column as follows.


If the TIMESTAMP column does not explicitly specify the not NULL attribute, then the default value of the TIMESTAMP column can be NULL. In this case, when a null value is inserted into the column, null is directly recorded instead of current TIMESTAMP. The DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP attributes are not automatically added to the first TIMESTAMP column in the table, unless you specify them explicitly when creating the table.


Six, some performance parameters




We can modify some parameters in advance when importing data, to obtain maximum performance of processing, such as can turn off the adaptive hash, Doublewrite off, then adjust the buffer zone, the size of the log file, all can greaten, turn off the can closed all to obtain maximum performance, we said the next a few commonly used:


  • innodb_flush_log_at_trx_commit


If innodb_flush_LOG_at_trx_COMMIT is set to 0, the log buffer is written to the log file once per second, and the flush of the log file occurs simultaneously. In this mode, writing to disk is not actively triggered when a transaction is committed.


If innodb_flush_LOG_at_trx_COMMIT is set to 1, MySQL writes data from the log buffer to the log file at each transaction commit and flush it to disk.


If innodb_flush_LOG_at_trx_COMMIT is set to 2, MySQL writes data from the log buffer to the log file every time a transaction commits. However, the flush operation does not occur simultaneously. In this mode, MySQL performs flush every second.


Note: Due to process scheduling issues, this flush operation per second is not guaranteed to be 100% per second.


  • sync_binlog


The default value of sync_binlog is 0. MySQL does not synchronize to disk, as the operating system does with other files, but relies on the operating system to flush the binary log.


When sync_binlog =N (N>0), MySQL will use fdatasync() to synchronize its binary log to disk every N times it writes the binary log.


Note: If autoCOMMIT is enabled, a write operation is performed for each statement. Otherwise, each transaction corresponds to a write operation.


  • max_allowed_packet


When importing large-capacity data, especially CLOB data, an exception may occur: Packets larger than max_allowed_packet are not allowed. MySQL database has a system parameter max_allowed_packet, its default value is 1048576(1M), you can query its value in the database with the following statement: show VARIABLES like ‘%max_allowed_packet%’;


To change this parameter, find the my.cnf file in the MySQL folder and add a line to the my.cnf file [MySQLd] : max_allowed_packet=16777216


  • innodb_log_file_size


If InnoDB log files are too large, it will affect the recovery time of MySQL crash. If InnoDB log files are too small, it will increase the IO burden, so we need to adjust the log size appropriately. Increase this value when importing data. Avoid unnecessary flush operations in buffer pools. However, we should not make innodb_log_file_size too large, which will significantly increase InnoDB log writes and cause the operating system to require more Disk Cache overhead.


  • innodb_log_buffer_size


InnoDB buffer size in bytes for writing log files to disk. To achieve higher write throughput, you can increase the default value of this parameter. A large log buffer allows a large transaction to run without the need to write logs to disk before the transaction commits. Therefore, if you have many transactions such as update, INSERT, or DELETE, keep the log buffer large enough to save disk I/O.


  • innodb_buffer_pool_size


This parameter caches InnoDB table indexes, data, and the buffer for inserting data. Accelerate optimization of primary parameters for InnoDN. Make it equal to the size of all your innodb_log_buffer_size,

The larger the innodb_log_file_size is, the better.


  • innodb_buffer_pool_instances


The number of zones InnoDB buffer pool splits into. For systems with several GIGABytes of buffer pools, splitting the buffer pool into separate instances helps improve concurrency by reducing contention between different threads reading and writing cached pages.


conclusion



  1. Be sure to choose the right migration tool for you, no one tool is the best.

  2. The validation of data is very important, sometimes we are happy to move to the past, and errors occur during the validation, and then we have to start over.

  3. Repeated migrations are normal. Each migration may take a long time to fit in. There will always be mistakes.