background

In the process of rapid system iteration, business systems are often deployed in the same physical library without physical isolation of core data and non-core data. With the expansion of data volume, this situation will bring stability risks. For example, slow SQL, disk, IO and so on of the library will affect each other as a whole, thus affecting the business stability of the core system. Therefore, the business table of the core business needs to be extracted from the original database and put into the new database separately. The migration of core data involves a key difficulty: how to migrate data smoothly and without users’ perception. This paper will show how to solve this problem based on the migration practice of idle fish commodity database.

Current situation of idle fish commodity data

The data volume of idle fish commodity is more than XX billion, and the MYSQL database cluster with separate tables and libraries and read-write separation is adopted to support online query services, as shown in the figure below. TDDL[1] database middleware is used for efficient and unified management. Some students may not understand the concept of sub-table sub-library, here is a brief introduction.

01 Principle of sub-table and sub-library

The essence is the horizontal splitting of the database. A database is divided into multiple parts and placed on different databases (servers), thus alleviating the performance problem of a single database. The following figure describes the core principle of table splitting and database splitting:

Of course, separate tables and libraries also have negative effects, that is, table structure change and related management is more troublesome than a single table, there is a certain risk, how to choose, or according to the actual situation to analyze.

02 Global Sequence generation in sub-table and sub-library

Separate tables and libraries solve the capacity and performance problems of online services, but also bring about increased complexity in use. Flexible configuration of routing rules and routing data and easy to use encapsulation should be considered so that the business is unaware of this. Ali’s open source middleware product TDDL provides a solution, corresponding to the product on Ali Cloud: DRDS[2].

The key principles of TDDL are not described much. However, primary key conflict is an important fault risk in the process of database migration. This section briefly introduces the generation principle of globally unique primary keys of TDDL.

As shown in the figure above, TDDL Sequence is based on database update and memory allocation. Each operation allocates ids in batches, and the number of ALLOCATED ids is the internal step size of the Sequence. The original ID value is added to the external length value, and subsequent allocation is directly obtained in memory.

In addition, in the process of data migration, in order to ensure that the primary key of the new database does not conflict with the original database, it is necessary to set a primary key with a relatively large transition to prevent the primary key conflict between the two databases, which is one of the key points to pay attention to in the subsequent migration.

Data Migration scheme

Through the simple introduction above, we have a preliminary understanding of the status quo of idle fish commodity database. The following will introduce how idle fish can achieve stable migration of commodity database.

01 Core Ideas

The core idea of data migration is simple in abstraction, that is, how to migrate data stably and smoothly, as shown in the figure below:

But around this process, we will encounter a lot of problems, such as: 1, how to migrate data, one-time? Or in stages? 2. How to verify the correctness of the data migration process? 3. What should we do if we have problems in business transformation? How to find out early? How do I roll back? 4. How does our new library perform?

With these questions in mind, let’s go further and refine the migration scheme.

02 Implementation Scheme

As shown in the figure above, the whole scheme is divided into several parts:

1. System transformation, including SQL transformation, double-write switch, new library sequence creation.

SQL transformation: Load two sets of TDDL data sources, one for the old library and one for the new library, and generate two sets of Mybatis SQL templates. Double write switch: set the switch of writing new library and old library, which is used for double write process and timely rollback when encountering problems in online migration. Sequence creation: When migrating a sequence table, the values in the sequence table of the new library need to be raised to prevent primary key collisions, and a safe value needs to be evaluated in terms of primary key consumption. This is an important detail, again.

2. Stability guarantee, warehouse transfer is a big deal. In the process of transformation, stability is the most important, mainly including system pressure measurement, online flow replay and fault drill.

System pressure test: mainly for the performance test of the new library, to prevent the new library from accidents. Online traffic playback: Edsger W. Dijkstra said that if debugging is a standard process for getting rid of bugs, then programming is the process of putting them in. By introducing online data to play back in the test environment, problems can be found as much as possible to ensure the stability after transformation. Fault drill: By injecting some artificial faults, such as failed to write a new library, new library logic problems, timely drill the rollback strategy.

3. Data migration, mainly using the data migration capability of Aliyun data transmission service DTS[3], involves full migration, incremental migration, consistency check and reverse task.

Full migration: The primary goal of data migration is how to migrate historical full data to the new database. Our approach is to specify a time point, and then find the maximum and minimum IDS of each source table according to this time point, and then import them to the target database in batches, as shown in the figure:

* The whole process is to query the standby database of the online library, so it does not affect the database service of the online business.

Incremental migration: due to business services has been run in the migration process, so the whole amount of migration into completely, and to the whole amount of point data after come back, here is the core principle is synchronous binlog after full amount time site log data to ensure data consistency, it is important to note the incremental time need to reach a small break time (5 minutes), The main reason is that there will be a time difference when the full migration starts, and incremental advance is required to ensure the final consistency of data, as shown in the figure below:

Consistency verification: The migrated through the full amount and increment, source-sink theory with the target data is consistent, but in fact the application after function test, online traffic playback phase, the data in the process is likely to be inconsistent situation now, so before the official launch, need to do data consistency check, its principle is partial query source table query (with full amount transfer method). Then compare with the target library, as shown in the figure:

Reverse task: After the migration to the new database, some offline services will still depend on the old database, and a backflow task needs to be established from the new database to the old database. The principle is the same as incremental migration, but the original database and target database are changed.

03 Warehouse transfer process

Here we should be more clear about the point involved in the transfer of the warehouse, but there is a very important thing, that is, combing the whole transfer step is very key, there are usually two schemes.

Solution a:



















Scheme 2:

1. DTS data leveling means that full synchronization is complete, incremental synchronization is enabled, and the delay is within seconds. 2. Check before going online, including flow replay, pressure test, consistency check and fault drill on main wires. 3, online switch, write new library, at the same time need to close the incremental synchronization task to prevent invalid overwrite. 4, create reverse tasks, data flow back to the old library.

Comparison of advantages and disadvantages of the schemes:

Sum up plan 1 migration process is relatively complex, its control of the migration is more fine, suitable for business is complex, the underlying transformation is more, want to fine control migration steps, the migration plan 2 is relatively simple, process fast, suitable for business process is relatively simple, controllable, want to quickly switch scenes, specific options which solution, Students can choose according to their own business situation.

In consideration of the complex business of idle fish products and many changes at the bottom, the final choice is plan 1 from the perspective of stability.

In scenario 1, the most critical steps are 3, 4, and 5, so you need to plan for rollback in advance.

04 Rollback Scheme

The general principle of the rollback scheme is not to lose data. The most likely occurrence point is that during the double-write period, the new library has a problem, which leads to the online service exception. In this case, just close the new library immediately. In addition, if the new library has a problem (such as performance problem), you can immediately switch back to the old library and maintain data consistency through the reverse task. Finally, if distributed transactions are not enabled, the time of double write is as short as possible, and data inconsistency may occur.

summary

Through careful design of migration scheme, as well as the ability of DTS powerful data migration tool, Xianyu Commodity database successfully completed XX billion online database service migration, independent physical deployment significantly improved the stability of commodity database online service. However, the database situation of different business systems may be different, such as single database to multi-database migration, single table to multi-table migration, etc., but the overall scheme is roughly similar, I hope this paper can provide a feasible reference for the practice of database migration.

This article is from “Idle Fish Technology”, a partner of the cloud habitat community. For relevant information, you can pay attention to “Idle Fish Technology”.