The migration background

The problem of HIGH RT occurs in Cassandra cluster at intervals, and the impact is that Cassandra requests have a large amount of timeout in a short period of time. The occurrence of this problem has reached an average frequency of once every two weeks, which has affected normal business. There are three main reasons for these problems:

  1. The partition key design was not very reasonable when the table was designed at the beginning. When the data volume increased (the largest single table row number reached ten billion), some large data volume partitions appeared. The maximum amount of data in a single partition reaches millions of rows (Cassandra does not support mysql limit M, n query), which will bring great pressure when querying data in this partition.
  2. The tombstone mechanism of Cassandra itself. One of Cassandra’s features is fast writing. When encountering a delete record, Cassandra will not physically delete the record in real time, but add a logical deletion flag to the record. The next query will load the deleted records and filter them. This can cause serious performance problems when the volume of data queried by a partition is small, but the number of tombstones is large.
  3. Cassandra is also not recommended by corporate DBAs, and it can be difficult to locate and resolve problems when they occur. Therefore, we decided to migrate Cassandra database to mysql, a more mature relational database in the community.

The migration plan

The whole migration scheme is mainly divided into the following five steps:

  1. Full migration: Moving all historical data in the current repository (this process removes most of the data in the repository)
  2. Incremental migration: Records the start time of full migration and data changed during full migration
  3. Data comparison: Through the interface comparison of Cassandra and mysql data, the final data consistency reached more than 99.99%
  4. Dual-write: After data comparison is performed to ensure that full migration and incremental migration are ok, dual-write is enabled. If there are problems with double-write, the data comparison can also find problems in double-write.
  5. Cutting mysql reads: After ensuring that the double write is ok, cut mysql reads by service step by step based on the importance level of the service. After all services have been read by mysql, make sure there is no problem and then shut down Cassandra write, and finally bring Cassandra offline.

Mysql sublibrary subtable scheme

  1. How many tables? Under the recommendation of DBA, the data of a single table should not exceed 200W, and the data volume of the next maximum table is estimated to be about 10 billion. Considering the future data growth, the largest table is divided into 8192 tables, and the data volume of a single table is about 120W. There are altogether 4 physical libraries with 2048 tables in each database.
  2. Field corresponding problem? Cassandra has List, Set, Map and other structures. How can I store Cassandra in mysql? You can choose it based on your actual situation,
    • If the length of the collection structure is less than 1000 characters after being converted to JSON, it can be directly converted to JSON and saved with VARCHAR. Advantages: Easy to process. Disadvantages: Data growth of collections needs to be considered.
    • After converting to JSON, the length is relatively long. Some of them have reached tens of thousands of characters, which are saved in a separate table. Advantages: No need to consider the data growth of the collection. Cons: Cumbersome to handle and requires additional maintenance of new tables.
  3. Select the partition key by Cassandra.
  4. The primary key of mysql table is the same as that of Cassandra table.

Full migration scheme investigation

  1. Copy export: Export keyspace to a file using the copy command provided by CQLSH. Defect:

    • During the test, the export speed was about 4500 rows per second, with occasional timeouts during the export process. If the export is interrupted, it cannot continue from the break point.
    • If the keyspace is large, the generated file is large. So we don’t think about it that way
  2. Sstableloader mode: This mode only supports migration from one Cassandra cluster to another Cassandra cluster. Therefore, this method is not considered

  3. Token loop traversal mode: Cassandra records are stored using a consistent hash policy


    The range of the entire ring is [long.min_value, long.max_value], and each record in the table is hashed by the partition key to determine which position it falls into.

    • For example, there is a table like this:
    CREATE TABLE test_table ( a text, b INT, c INT, d text, PRIMARY KEY ( ( a, b ), c ) );
    Copy the code
    • The table can be traversed by the following two CQL:
    cqlsh:> select token(a,b), c, d from test_table where token(a,b) >= -9223372036854775808 limit 50;
    
    
     token(a, b)          | c | d
    ----------------------+---+----
     -9087493578437596993 | 2 | d1
     ...
     -8987733732583272758 | 9 | x1
    
    (50 rows)
    cqlsh:> select token(a,b), c, d from test_table where token(a,b) >= -8987733732583272758 limit 50
    Copy the code
    • Token (a, b) = long. MAX_VALUE, indicating that the entire table traversal is complete. This approach was finally adopted. All of the above solutions have a common problem. During migration, data is changed, which requires additional consideration.

Full migration details

Finally, the above scheme 3 is adopted, which traverses all the data of Cassandra table by traversing the token ring of Cassandra table and moves the data to mysql. Details are as follows:

  1. The entire token ring is divided into 2048 segments. The purpose of this method is to divide a large migration task for each table into 2048 smaller tasks. When a single migration task has a problem, it is not necessary to start all the data again. Multithreading is used here.
  2. Migration mode: There are two modes: Single and Batch:
    • Single mode: insert into mysql one by one. The speed of Cassandra can reach 1.5W lines per second with 16 threads reading Cassandra in 64 threads.
    • Batch mode: Batch inserts into mysql. Select this parameter when the data volume is large, or when the single table exceeds 100 million. For the largest table of 10 billion data, the migration process actually had a peak speed of only 1.6w rows per second. This is because Cassandra has reached a bottleneck in reading this part. The online application itself consumes some resources. If Cassandra doesn’t hit the bottleneck, doubling the speed is fine.
  3. Migration performance problem: Cassandra, mysql and the application machine themselves may become bottlenecks at this time, with a large amount of data, so try to use machines with better performance. When we migrated, we used a 40-core, 100GB + ram machine.
  4. Some of the problems encountered in this process:
    • Exception handling problem: Cassandra and mysql have different field restrictions. During this process, some records may fail to be written because a column does not meet the mysql column limit. The failed records will be recorded in the file. This process is best covered as thoroughly as possible during testing. Some specific cases are as follows:
      • The length of Cassandra text exceeds the maximum length of mysql
      • Mysql > select * from Cassandra; mysql > select * from Cassandra;
      • The timestamp type of Cassandra exceeds the datetime range of mysql (eg:1693106-07-01 00:00:00)
      • Cassandra decimail type exceeds mysql Decimail range (eg:6232182630000136384.0)
    • Data missing problem: because some tables have many fields, it is better to be careful when converting fields in the code. We have encountered field confusion, field omission and other problems. In addition, there was no test access in this process, so the development went online. After data migration was completed, it was found that the fields were missing, and then it was started again. It took almost 2 weeks for the largest table to be moved from scratch. In retrospect, when the table was migrated, it was reworked more than once. This process is actually quite a waste of time.
    • Slow query problem: Timeouts are more severe during migration for the largest table than for other small tables. In addition, the running speed became slower and slower. It was found that some threads always timed out for a token query. The thread then keeps searching the token in an endless loop. This situation improves when Cassandra timeout is set to 30s, but there are still a few tokens with this problem. The odd thing here is that the data can be queried by logging into the Cassandra machine online and querying directly through CQLSH. The final solution is to add five retries for the token. If the retries are not successful, log the token separately.

Incremental migration details

Record the time when the full migration started, and the accounts that record all changes during that time (one user contains multiple accounts), and send this data to Kafka. This data is then moved to mysql by an additional incremental migration program consuming Kakfa, repeating the process until the data in mysql catches up with the data in Cassandra.

  1. Two KafKa queues are consumed. One is the Account queue for offline changes during full migration, and the other is the account queue for real-time changes in the current service.
  2. Account conflicts in two queues need to be considered during the process. You can lock the queues according to accountid.
  3. The initial migration is incremental, based on the User dimension. In fact, it was found that the speed of relocation in the user dimension could not keep up with the speed of normal business data changes. Then select an account one dimension lower than user (one user contains multiple accounts) for migration.

Data comparison

Why is this step? To ensure that Cassandra and mysql data sources are as consistent as possible.

  1. After the full migration is complete, the comparison function goes live during the incremental migration process. How do you compare them? When data changes occur in online services, according to Accountid, all data of Cassandra under this accountid and all data of mysql are queried and compared through the call interface. Down to the value of a specific field
  2. It was thought that full migration and incremental migration were basically no problem, but a lot of data inconsistency was found through data comparison. The investigation found that there are full migration process caused by, there are also incremental migration process caused by code bugs. Fault Found If a table is faulty during full migration, the table needs to be fully migrated again. And incremental migration needs to start all over again.
  3. All comparison results are stored in the database, and the data that cannot be compared is detected by a scheduled task. Then, incremental migration is performed based on the account dimension.
  4. The main problems encountered are as follows:
    • Time accuracy problem: The timestamp of Cassandra is accurate to milliseconds (DevCenter, a client tool of Cassandra, queries the time only to seconds, and the millisecond part is truncated. If the tool is used for visual comparison, it is not easy to detect the problem). The mysql datetime default is only accurate to seconds.
    • Decimal: the decimal used in Cassandra corresponds to a mysql field type of decimal(18,2). If the decimal used in Cassandra is 0 or 0.000, it will become 0.00 when migrated to mysql.
    • Having two tables holding the same data causes dirty data problems: Because Cassandra queries have many limitations, in order to support multiple query types. Create two tables with identical columns except for different primary keys. Then, each time you add, delete, or change the tables, you add, delete, or change the tables separately. Although this method brings about traversal on the query, it is very likely to produce dirty data. In the process of comparison, it was found that there was a large difference in the amount of data in two tables of the same data. The investigation found that the first table was successfully written and the second table failed to be written due to the early code bug (fortunately, the data were very early data, so the problem was directly ignored). If you migrate to mysql, only one table will be migrated. If the data in two tables are inconsistent, some interfaces must be inconsistent. I personally do not recommend the use of two copies of one copy of data. It is almost impossible to ensure data consistency only through the code logic layer without limiting the physical layer.
    • NULL characters and NULL: The “” in Cassandra is NULL when converted to mysql. This can cause inconsistencies in the data returned by the interface. It is best to ensure full consistency when you are not sure how to use the data downstream.
    • Field missing problem: the comparison found that a field in a table was missing and the table was not migrated at all, except that the table needed to be migrated in full again. Incremental migration also needs to be done all over again (try to avoid this problem, the process is very time consuming).
    • Cassandra data inconsistency problem: the number of results returned by the same SELECT statement for two consecutive queries is inconsistent. The proportion is in the range of 1 in 10,000 to 1 in 1,000, and the problem is that some numbers are always comparable.
    • Problems caused by inconsistent local clock applications: The phenomenon is that with the release of the application, the time of lastModifyTime of a table is smaller than that of mysql, but from the business point of view, the time of mysql is correct. That’s about 5% of the time, and it’s not going down. Maybe with the next release, the problem will disappear. This problem occurred in 3 of the last 10 releases. The final investigation found that the local clock difference of the application machine on the deployment line was 3 seconds, and Cassandra would depend on the time of the client, so the problem was that the write submitted after Cassandra might be overwritten by the write submitted first. Why does this problem occur occasionally with the release? Because the application is deployed in a container, a new container is assigned with each release.

Write a double

After the above steps, it can be considered that the data of Cassandra and mysql are basically consistent. Then turn on double write and turn off incremental migration. At this time if there is a double write problem, through the comparison program can also be found.

Cut the mysql read

After about a week of double-write, if there are no problems, you can gradually cut mysql read by service, and then you can log off Cassandra database.

conclusion

  1. Use Cassandra:
    • Table design: Pay special attention to partition key design and ensure that the data volume of a single partition is not too large.
    • Tombstone mechanism: Attention should be paid to Cassandra’s tombstone mechanism, which mainly generates tombstones, mainly including delete operation and INSERT NULL field. Here, because a user frequently operates an action of his app, the database frequently performs delete operation and then insert operation on the same partition key. After the user performs the operation for hundreds of times, a large number of tombstones are generated on this partition. In the end, a query request is sent to this partition key. As a result, the Cassandra CPU spikes and other partition keys are affected, resulting in a large number of query times out.
    • An inconsistent Cassandra client clock may cause invalid writes.
  2. Migration related:
    • Full migration and incremental migration, it is best to test fully before going live, do pay attention to the field missing dislocation problem, as much as possible to test participation. Before the migration, it is best to create a pre-run repository online. Find out as much as possible about the problems encountered during the formal migration online. Otherwise, it is more troublesome to fix the problem during the official migration.
    • Always have a rollback plan in place when cutting or closing reads and writes.

Copyright Notice by WyCM: juejin.cn/post/684490… Your support is the biggest encouragement to bloggers, thank you for reading carefully. The copyright of this article belongs to the author, welcome to reprint, but without the consent of the author must retain this statement, and give the original text link in the obvious position of the article page, otherwise reserve the right to pursue legal responsibility.