Planet Water friends “write code” question:

Mr. Shen, now our user center is single database and single table. Uid uses the database to increment the primary key. Uid is associated with many businesses and cannot be changed.

Now the amount of data in the user center is gradually increasing, and there is a requirement for database separation. How to upgrade from single database to multiple database, keep the historical UID unchanged, and the newly generated data does not conflict? Is there any good method?

== Problem description ==

There should be a lot of companies using the database “insert data automatic increment ID” as the business ID, this method will make the business and ID generation strong coupling, resulting in the ID generation algorithm is difficult to upgrade.

Today, LET’s briefly discuss what factors should be considered in id generation.

Voice-over: Don’t get me wrong, not that “increment ID” is bad, but that it is coupled to the business and difficult to upgrade.

I. Technical points to be considered in ID generation

Almost all businesses have a business unique identifier:

  • User ID: UID (user-ID)

  • Message ID: mid(msG-id)

  • Order ID: OID (order-id)

This id, in storage systems, is usually the primary key, which uses a clustered index, which is sorted by this ID on physical storage. Therefore, there are requirements for this ID: uniqueness, trend increasing.

_ Voiceover: _ Index “1 Minute to understand the differences between different indexes”.

This id is also often used for traffic load balancing and data load balancing on the basis that the ID must be statistically completely random. Therefore, there is a requirement for this ID: randomness.

At the same time, the ID generation algorithm is upgraded, which is theoretically transparent to the business system. Thus, there is an independence requirement for the generation of this ID.

To ensure the above features of id generation, there should be one:

uint64_t GenID()

The method does not care what id is generated for. It can be used for uid, OID, or log-ID.

Of course, the business doesn’t have to worry about the details of ID generation. That is, the internal implementation of GenID() can either use the database increment id or time increment. The most popular one in the industry is the generation of distributed ids modeled after Snowflake.

This encapsulation, which hides the details of ID generation and preserves the possibility of schema upgrades, is a manifestation of decoupling in system design.

If this method is used to generate business ids, it is easy to scale a database from a single repository to multiple repositories:

(1) Determine a routing algorithm, such as hash modulus;

(2) Migrate the data in single library to multiple libraries through this routing algorithm to reduce the amount of data in single library;

(3) Search for data (read) through this routing algorithm;

(4) Insert data (write) through this routing algorithm;

What if the architecture design did not consider the independent ID generation in advance, and the implementation of a single repository for multiple repositories, what to do?

Second, for the planet water friends mentioned examples

The pit of history has been cast, there is no decoupled ID generation method, and there is no way to batch change ids, what to do?

If you split a single library into 3 libraries, you can play it like this:

(1) Make a 1 master and 2 slave database cluster, which is equivalent to 3 copies of each data;

(2) Set the routing algorithm as the modular hash algorithm, %3;

%3=0; %3=0; %3=0;

%3=1; delete uid of remainder 0 and remainder 2;

%3= 1; %3= 1; %3=2;

(6) Set the increment step of each library to 3, so that the id generation of each library will not be repeated;

(7) Upgrade user center and query UID data according to routing algorithm;

Done, demolition and expansion to achieve:

(1) The amount of single database data decreases to 1/3 of the original;

(2) The number of read and write instances is expanded by 3 times;

(3) There is no conflict between id generation and query;

I hope this trick worked for you.

But I want you to think ahead about the uniqueness, randomness, increasing trend, and independence of id generation.

Think systematically, know what it is and why.

Exercises after class:

Single library demolition of multiple libraries, smooth expansion, do not stop service, how to do?