How to complete seamless data migration quickly and stably

Author: Idle fish technology — Changming

Background:

In the content community, content tags are used to assist in explaining content. Content tags play an important role in content distribution and understanding, helping the content community to distribute the right content to the right people. At present, the tags in the community are divided into classification and attribute tags, and the usage of the two tags is different. However, both of them exist in the same tag system at present, and there are the following problems:

• Error-prone, rely on human to identify the classification and attribute standard, it will inevitably make mistakes, there are missed, wrong dozen

• Difficult to expand, attribute and classification belong to different business domains, have their own unique characteristics, the current situation cannot allow them to expand independently

In order to solve the above problems, the original label system is now split into the classification system and the attribute system. Specifically, the attribute label is split from the original label system to the attribute system, and then the original label system is used as the new classification system. The whole process requires not only the migration of the underlying data, but also the migration of the attribute service from the original tag system to the new attribute system.

Status:

As shown in the figure below, the community relies a lot on attribute tags. In addition to its own business system, it also relies on the algorithm, search, data, etc. These upper-level dependencies need to remain intact during migration. In addition, the business side hopes to enable the attribute system as soon as possible to improve the efficiency of marking. To sum up, the migration process faces the following challenges.

• Fast, quick migration completed, early use of the new property system

• Stable and has many dependencies. During migration, ensure that the attribute service is available and does not affect upper-layer services and users are not aware of it

• Accurate, the most basic requirement of migration is to ensure that the data is correct

Migration scheme:

In ascending order of impact dimensions, we set the following rules: 1. Low-level storage synchronization, 2. Build isolation layer, 3. Service read/write migration, 4. Dependency migration, 5. Dependency refers to search, algorithm, and data, and these dependencies have a wide impact. Migration needs to be more careful and the premise of correct service migration must be guaranteed. Therefore, dependency migration needs to be a separate part and carried out after the completion of their own business migration. Each step in the migration requires that the previous step be correct, and if a problem occurs, each step can be rolled back to the previous correct state. In order to ensure the consistency of data, we have data proofreading service throughout the whole process.

The migration process

1. Migrate storage resources

In the whole migration process, we first start synchronization from the underlying storage, in order to ensure that the original label system and the attribute system of the attribute tags are the same. To achieve this goal, we need to start from two aspects. On the one hand, we need to synchronize all data at once. One is real-time incremental synchronization. Once a new attribute tag is written to the original label system, a copy of the attribute system must be synchronized. This routine is relatively fixed. The following is the data processing process of our full synchronization.

In the synchronization task, we directly used the scheduleX distributed task tool provided by Ali Cloud. During full synchronization, you need to control the synchronization rate, configure alarms, and record exceptions and progress. To control the synchronization rate, ensure that the dependent service system is not suspended, affecting other normal services. Alarms are configured to detect and troubleshoot faults in a timely manner. Among them, abnormal records are for troubleshooting problems, which must be recorded. It is also very important to complete the progress record. Once a problem occurs, the synchronization can be started directly from the synchronization site recorded last time.

After full synchronization, let’s talk about the process of real-time incremental splitting. The data flow process is shown in the figure below. That is, once the original label system change is triggered, the change of the attribute object will be synchronized to the attribute system in real time.

2. Isolation layer construction

Masking migration details

After the storage migration, the property system is now the same as the property label in the original tag system, and the business read migration can be performed. Business migration, which is the most complex part of the migration process, requires sorting out many dependencies. After combing, multiple applications and service scenarios use the original label system. If one by one migration transformation, on the one hand more repetitive work, need to repeat the verification of the function of the point, on the other hand, migration details can not be unified management, once there is a problem, need to apply modification one by one. So one thing we did before we migrated was to put all the read and write operations of the original label system scattered across multiple applications into a JAR package, and then centrally control the read and write switching in that JAR. The JAR package acts as an isolation layer, isolating the storage logic of the upper-layer business and attribute tags. In this way, the upper-layer business system does not care whether the attribute tags are read from the original label system or from the attribute system, but only from the methods provided by the isolation layer. With the isolation layer unified management, simultaneous unified migration can reduce unnecessary duplication of work. The specific change goes from a to B.

The interface adapter

Because of the classification system and properties of two kinds of label the business semantics and function of different, content of middle areas of the two system modeling is different also, the corresponding data service interface ability is different also, in order to make the upper business without awareness, we keep the original service interface is changeless, did for classification in the isolation layer and attribute two systems of adaptation.

3. Service read/write migration

Read the migration

With the lead work done, we begin the concrete read migration. Read migration is the best way to do, you can first cut the stream according to the proportion, and gradually read the attribute label from the attribute system, once there is a problem, then directly cut the flow back to read the original label system, there is no problem, until the full volume.

Write the migration

Write migrations are hard to cut across. For one thing, even after combing through, there may be omissions. On the other hand, there are some dependent writes (such as label writes in the content center) that are not on our side and cannot be migrated temporarily. If the write of your business is migrated, your business will only write to the properties system, and the dependent party that is not migrated will only write to the original label system. This will cause inconsistency between the original tag system and the attribute system. This causes two problems. On the one hand, it affects the reading of the dependent party. The dependent party still reads the original label system and cannot read the attribute label on the attribute system. On the other hand, if the two sides are inconsistent, the write cannot be immediately rolled back to the original label system if there is a problem with the attribute system.

Therefore, in the process of migration, it is also necessary to ensure that the attribute tags on the original label system and the attribute system are the same. In the storage split, the synchronization link of the original label system -> attribute system has been created, and now the attribute tags that are only written to the attribute system need to be synchronized back to the original label system.

Now there are two synchronization links: original tag system -> Properties system and Properties system -> Original tag system. If you don’t do any control, there will be an obvious synchronization loop: original tag system -> Properties system -> Original tag system. The common solution to this bidirectional synchronization loop is to add a flag that identifies the source of the data change. When you find that your update message flows back to you, you stop updating, thus cutting off the update loop.

4. Dependent party migration

When your business can be guaranteed, you can do dependency migration. The dependent parties are algorithms, search and data, etc. Links are classified into real-time links and offline links. For real-time link, this part of the work because of the previous service read migration, can be reused; For offline links, provide offline data tables to relevant dependent parties so that they can complete the migration on time.

5. Enable the new properties system

After a period of observation, the dependency can be marked with the new attribute labeling system. After enabling the new system marking, the synchronization link of the original tag system -> properties system is no longer needed and can be stopped. After checking for a period of time, it is no longer necessary to switch read and write data back to the original label system. You can disconnect the synchronization link of the attribute system -> the original label system again, and the migration is complete.

6. Data proofreading tasks

Throughout the synchronization process, there is also an account reconciliation task to see if the synchronized data is correct in real time. If the data is found to be incorrect, it is necessary to find the cause in time, correct it, and then correct the data.

conclusion

This migration is two weeks in total, excluding the migration time of the dependent party.

The attribute service is normal during the migration. Zero customer complaints public opinion.
After migration, there is no data difference.

Finally, some lessons learned from this migration

If there are many service dependencies during migration, an isolation layer can be used to isolate the dependency perception of migration.
Double write, avoid synchronization loop, can be used to update the message flag to avoid.