Globally unique ids are a requirement encountered by almost all systems. This ID is important for searching, storing data, speeding up retrieval, and more. There are several strategies for getting this globally unique ID, and I’ll summarize and compare them here for a few common scenarios.

Briefly analyze the requirements

Globally unique ids often correspond to business requirements for generating unique record identifiers.

This ID is often the primary key of the database, and a cluster index is created, that is, sorted by this field on the physical storage. Queries on this record id often have business requirements for paging or sorting. Therefore, there is usually a time field and a non-cluster index is built on the time field.

A common index stores Pointers to actual records, and its access efficiency is slower than that of a clustered index. If the record identifiers can be basically ordered according to time when they are generated, the index query of the time field can be omitted.

This leads to two core requirements for record id generation:

  • Globally unique
  • Trends and orderly

The pros and cons of common generation strategies

Mysql > alter database auto_increment

Advantages:

  • This method is relatively simple because it uses the original functionality of the database
  • It guarantees uniqueness
  • Increments are guaranteed
  • The step sizes between ids are fixed and customizable

Disadvantages:

  • Availability is difficult to ensure: the common database architecture is one master, multiple slave + read/write separation, generate autoincrement ID is write request master library hang up can not play
  • Poor scalability and performance upper bound: Because writes are single points, the write performance of the database master determines the upper bound of ID generation performance and is difficult to scale

Improvement plan:

  • Redundant primary libraries to avoid single points of write
  • Data is horizontally segmented to ensure that the ids generated by each master database are not repeated

As mentioned in above, written by a library into three libraries written, each write library set different auto_increment initial value, and the growth of the same step length to ensure that each database generated ID is different (above the DB in generation 0 01,3,6,9… , DB 02 generates 1,4,7,10, DB 03 generates 2,5,8,11…)

The improved architecture ensures usability, but has a downside

  • The loss of the “absolute incrementality” of ID generation: accessing DB 01 to generate 0,3, and then DB 02 to generate 1 May result in ID generation not being absolutely incrementable for a very short period of time. (this is not a problem; the goal is trend incrementality, not absolute incrementality
  • Database write pressure is still very large, every time to generate ID to access the database to solve these problems, introduced the following methods:

Method 2: Single point batch ID generation service

One of the important reasons why distributed systems are difficult is that “without a global clock, it is difficult to ensure absolute timing”. In order to ensure absolute timing, we can only use a single point of service and use a local clock to ensure “absolute timing”.

The database write pressure is high because the database is accessed each time the ID is generated. You can use batch method to reduce the database write pressure.

As shown in the figure above, the database uses a double master to ensure availability, and only the maximum value of the current ID, such as 4, is stored in the database.

The ID generation service assumes that five ids are pulled in batches each time. The service accesses the database and changes the maximum value of the current ID to 4. In this way, the application access ID generation service requests the ID.

When all the ids are sent out, change the maximum number of ids to 11, and you can send out ids 6,7,8,9,10,11 again, reducing the database pressure to 1/6 of its original value.

Advantages:

  • The absolute increment order of ID generation is guaranteed
  • Greatly reduce the pressure of the database, ID generation can be done to generate tens of thousands of thousands of shortcomings per second:
  • The service is still a single point
  • If the service is suspended and the service is restarted, it may continue to generate ids that are discontinuous and have a void in the middle. (the service memory holds 0,1,2,3,4, and the database max-id is 4. When the service is allocated to 3, the service restarts. Although tens of thousands of ids can be generated per second, there is still a performance ceiling that cannot scale horizontally

The improved scheme

  • A common high availability optimization for a single point of service is the “standby service”, also known as the “shadow service”, so we can optimize these disadvantages in the following ways:

As shown in the figure above, the externally provided service is the master service, and a shadow service is always in the standby state. When the master service is suspended, the shadow service takes over. This switching process is transparent to the caller and can be done automatically. The common technique is VIP + Keepalived. In addition, ID Generate Service can be extended horizontally to address the above shortcomings, but with consistency issues.

Method 3: UUID/GUID

Whether the ID is generated from a database or a service, the business Application needs to make a remote call, which is time-consuming. Uuid is a common way to generate ids locally.

UUID uuid = UUID.randomUUID();
Copy the code

Advantages:

  • The ID is generated locally without remote invocation and with low latency
  • Good scalability, basically you can think of no performance ceiling

Disadvantages:

  • There is no guarantee of increasing trend
  • The UUID is too long and is often expressed as a string. As a primary key, the query efficiency is low. The common optimization solution is converted to two uint64 integer storage or half-storage (uniqueness cannot be guaranteed after half-storage).

Method 4: Set the current number of milliseconds

Uuid is a local algorithm with high generation performance, but it does not guarantee trend incrementation, and it is inefficient to retrieve as a string ID. Is there a local algorithm that can guarantee incrementation? – Taking the current number of milliseconds is a common solution. Advantages:

  • The ID is generated locally without remote invocation and with low latency
  • The generated ids are in an increasing trend
  • The generated ID is an integer. After the index is created, the query efficiency is high

Disadvantages:

  • If the concurrency exceeds 1000, duplicate ids are generated
  • So this is a fatal flaw, because there’s no guarantee that ID is unique. Of course, using microseconds can reduce the probability of collisions, but you can only generate 1,000,000 ids per second at most, and any more will definitely collide, so using microseconds doesn’t solve the problem at all.

Method 5: Use Redis to generate the ID

When using a database to generate ids is not performance enough, we can try using Redis to generate ids. This depends on Redis being single-threaded, so it is possible to generate globally unique ids. This can be done using Redis’s atomic operations INCR and INCRBY. Advantages:

  • Rely on database, flexible and convenient, and better performance than database.
  • Numeric ID natural sorting, helpful for pagination or results that need sorting. Disadvantages:
  • If the system does not have Redis, new components need to be introduced to increase the complexity of the system.
  • The amount of coding and configuration required is considerable.

Tip 6: Twitter’s open source Snowflake algorithm

Snowflake is an open-source distributed ID generation algorithm for Twitter. Its core idea is a long ID:

  • 41 bits as milliseconds – 41 bits of length can be used for 69 years
  • 10 Bit Indicates the machine ID (five bits indicate the machine ID of a data center.) – A maximum of 1024 nodes can be deployed in a string of 10 bits
  • 12-bit serial number in milliseconds – 12-bit counting sequence number 4096 ID numbers can be generated for each node in milliseconds



    The algorithm can theoretically generate up to 1000*(2^12) ID per second, that is, 400W ID, which can fully meet the needs of services.