How to design a globally unique transmitter in a distributed environment

1, UUID

Common way. It can be either database or procedural generation, generally globally unique.

Advantages:

  1. Simple, convenient code.
  2. The generated ID performs very well and has almost no performance problems.
  3. Unique in the world, it can handle data migration, system data consolidation, or database changes.

Disadvantages:

  1. Without sorting, there is no guarantee of increasing trend.
  2. UUID is usually stored in strings, which is inefficient to query.
  3. The storage space is relatively large. If it is a massive database, the storage capacity needs to be considered.
  4. Large amount of data is transmitted
  5. Unreadable.

2. Database self-growing sequence or field

The most common way. Use the database, the whole database is unique.

Advantages:

  1. Simple, easy code, acceptable performance.
  2. Numeric ID natural sorting, helpful for pagination or results that need sorting.

Disadvantages:

  1. Different database syntax and implementation is different, when the database migration or multiple database versions support needs to be handled.
  2. Only one master library can be generated in the case of a single database or read/write separation or one master with many slaves. Risk of single point of failure.
  3. Scaling is difficult when performance is not up to par.
  4. It can be painful to have multiple systems that need to merge or involve data migration.
  5. There will be trouble when dividing tables and libraries.

Optimization scheme:

For the Master library single point, if there are multiple Master libraries, the start number and step size of each Master library are different, which can be the number of masters. For example: Master1 generates 1,4,7,10; Master2 generates 2,5,8,11; Master3 generates 3,6,9,12. This effectively generates unique ids in the cluster and greatly reduces the load of ID generation to the database.

3. Database sequence tables and optimistic locks

You can set A separate table to store the value of the next primary key of all tables. For example, there are three tables A, B, and C. The structure of the sequence table is as follows

Table name (name) Next primary key (ID)
A 10
B 100
C 500

If the update is successful, the primary key is successfully obtained. If the update fails, the current primary key is stolen by another machine. If the update fails, the current primary key is stolen by another machine. Select a new primary key from the database. For example, to get the next primary key for table B, you need to send SQL

select id from sequence where name=B

//Get the id=100Update sequence table update sequence tableset id=id+1 where name=B and id=100
Copy the code

Advantages:

  1. Simple operation, using optimistic locking can improve performance
  2. The generated ids are sequential and sequential
  3. It can be used in distributed environment and can be divided into database and table

Disadvantages:

  1. Need to set up a separate table, waste storage space
  2. The database is updated frequently and the write pressure is too great

Improvement plan:

You can obtain 500 or more primary keys each time instead of one, and then cache them in the current machine. When the 500 keys are used up, you can request the database for updates, which can reduce the read and write pressure of the database, but will cause the discontinuity of primary keys

4. Redis generates the ID

When using a database to generate ids is not performance enough, we can try using Redis to generate ids. This depends on Redis being single-threaded, so it is possible to generate globally unique ids. This can be done using Redis’s atomic operations INCR and INCRBY.

Redis clustering can be used for higher throughput. Suppose there are five Redis in a cluster. You can initialize each Redis with a value of 1,2,3,4,5, and then a step size of 5. The ID generated by each Redis is:

B: 2,7,12,17,22 C: 3,8,13,18,23 D: 4,9,14,19,24 E: 5,10,15,20,25

This, arbitrary load to which machine is determined, difficult to modify in the future. However, 3-5 servers can basically satisfy the server, can obtain different ids. But the step size and initial value must be required beforehand. Redis clustering can also be used to address single point of failure issues.

In addition, it is good to use Redis to generate daily serial numbers starting from 0. For example, order number = date + daily growth number. You can generate a Key every day in Redis and use INCR to accumulate.

Advantages:

  1. Independent of database, flexible and convenient, and better performance than database.
  2. Numeric ID natural sorting, helpful for pagination or results that need sorting.

Disadvantages:

  1. If the system does not have Redis, new components need to be introduced to increase the complexity of the system.
  2. The amount of coding and configuration required is considerable.

Twitter’s Snowflflake algorithm

Snowflflake is an open-source distributed ID generation algorithm for Twitter. Its core idea is a long ID:

  • 41 bits as milliseconds – 41 bits of length can be used for 69 years
  • 10 Bit Indicates the machine ID (five bits indicate the machine ID of a data center.) – A maximum of 1024 nodes can be deployed in a string of 10 bits
  • 12-bit serial number in milliseconds – 12-bit counting sequence number 4096 ID numbers can be generated for each node in milliseconds

Snowflflake graphic

The algorithm can theoretically generate up to 1000*(2^12) ID per second, that is, 400W ID, which can fully meet the needs of services.

The Snowflflake algorithm can be modified according to the needs of its own project. Such as estimating the number of data centers in the future, the number of machines per data center, and the number of concurrent applications in a single millisecond to adjust the number of bits needed in the algorithm.

Advantages:

  1. Independent of database, flexible and convenient, and better performance than database.
  2. ID is incremented by time on a single machine.

Disadvantages:

It is incremented on a single machine, but because of the distributed environment, the clocks on each machine may not be fully synchronized, and sometimes it may not be globally incremented.

Note: Snowflake is a popular sign generator, but it’s hard to use directly. The recommended transmitter is the Vesta.

6, vesta

Vesta is a general-purpose unique serial number generator that is globally unique, roughly ordered, invertible, and manufacturable. It supports three publishing modes: Embedded in release mode, the center server release, REST mode, according to the performance requirements of the business, it can produce maximum and minimum size of two types of ids, its implementation architecture has the high performance, high availability and scalable Internet product quality attributes, is a number of gm’s high-performance products.

Advantages:

  1. A universal unique serial number generator, which has the characteristics of global uniqueness, rough order, reversibility and manufacturability.
  2. Three publishing modes are supported: embedded publishing mode, central server publishing mode, and REST publishing mode.
  3. There are two types of ids that can be generated: maximum peak and minimum granularity
  4. High performance, high availability and scalability are the quality attributes required by Internet products
  5. Machine ID: A total of 1024 servers can be installed
  6. Roughly ordered, inversely solvable, manufacturable

Based on Java development, its experience address is here

Note: Vesta, I haven’t used it yet, and I will share how to use it, advantages and disadvantages.