In a distributed system, there are some scenarios need to use globally unique ID, can be related to the business scenario, such as payment of serial number, can also has nothing to do with business scenarios, such as the depots table needs to have a globally unique ID, or used as a transaction version number, distributed link tracking, etc., good globally unique ID need to have these features:

  • Globally unique: this is the most basic requirement, cannot be repeated;
  • Incrementing: Some special scenarios must be incremented, such as the transaction version number. The generated ID must be greater than the previous ID. There are some scenarios where incrementing is better than none because incrementing is good for database index performance;
  • High availability: If you have a system or service that generates a unique ID, there will be a lot of calls and it is critical to ensure high availability;
  • Information security: if the ID is continuous, it is easy to be malicious operation or disclosure, such as the order number is continuous, it is easy to see the single amount of a day about how much;
  • In addition, given the storage pressure, the shorter the ID, the better.

So what are the solutions for generating unique ids in distributed scenarios?

Using database generation

The easiest solution to understand is to use a self-growing sequence of database generation: the database generates a unique primary key and provides it to other systems through services. If the system is small, the amount of data and concurrency is not large, this scheme is sufficient support.

If generating one ID at a time may put pressure on the database, you can consider generating N ids in the cache at a time. If the ids in the cache are empty, the next batch of ids will be generated through the database.

  • Pros: Easiest to understand and implement.
  • Disadvantages: It is also very obvious that each database implementation is different, which can be troublesome if the database needs to be migrated; The biggest problem is the performance problem, when the amount of concurrency reaches a certain level, it is estimated that this method will be difficult to meet the performance requirements; In addition, the ID generated through the database self-increment carries too little information and can only play the role of an identifier. Meanwhile, the self-increment ID is also continuous.

Generated using other components/software/middleware

Increby Redis/MongoDB/zookeeper Mongo ObjectId; Zk through zNode data version; Can generate globally unique identifiers.

Take MongoDB’s ObjectId for example:

{"_id": ObjectId("5d47ca7528021724ac19f745")}

Copy the code

MongoDB’s ObjectId contains 12 bytes, including:

  • Prior to 3.2 (including 3.2) : 4-byte timestamp + 3-byte machine identifier + 2-byte process ID + 3-byte random counter
  • Version after 3.2:4-byte timestamp + 5-byte random value + 3-byte increment counter

In both versions, MongoDB’s ObjectId can be at least unique in a cluster. You can set up a service with a globally unique ID. Use MongoDB to generate ObjectId and provide services. (Each language driver of MongoDB implements the ObjectId generation algorithm.)

  • Advantages: Higher performance than database; Cluster deployment can be used; The ID comes with some meanings, such as a timestamp;
  • Disadvantages: Like the database, corresponding components/software need to be introduced, which increases the complexity of the system. Most critically, both options mean that the system (service) that generates globally unique ids becomes a single point, which in software architecture alone means risk; If this service fails, all systems that depend on it will crash.

UUID

This is the most common algorithm for generating unique identifiers in distributed architectures. To ensure the uniqueness of UUID, generating factors include MAC address, timestamp, Namespace, random or pseudo-random number, timing and other elements. There are multiple versions of UUID, each with different algorithms and application scopes:

  • Version 1: The time-based UUID is obtained by timestamp + random number + MAC address. If the application is used on a LAN, you can use an IP address instead of a MAC address. Highly unique (MAC address leakage is also a security issue).
  • Version 2: DCE secure UUID. Change the first 4 positions of the timestamp in Version 1 to POSIX UID or GID. Unique height.
  • Version 3: Name-based UUID (MD5), worth by calculating the MD5 hash of the name and namespace; Only within a certain range.
  • Version 4: a random UUID is generated based on a random or pseudo-random number. There is a probability of repetition.
  • Version 5: Name based UUID (SHA1), similar to Version 3, except that the hash value is calculated using SHA1 algorithm; Only within a certain range.

public class CreateUUID {

 public static void main(String[] args{

  String uuid = UUID.randomUUID().toString();

  System.out.println("uuid : " + uuid);



  uuid = UUID.randomUUID().toString().replaceAll("-"."");

  System.out.println("uuid : " + uuid);

 }

}

Copy the code

  • Advantages: Local generation, no network consumption, no need for third-party components (and therefore no single point of risk), relatively simple generation, good performance.
  • Disadvantages: Long length, bad for storage, and no sorting, relatively poor performance (for example, MySQL’s InnoDB engine, if the UUID is used as the primary key of the database, its disorder will cause the data location to change frequently).

Snowflake

If you want ids to be locally generated but not as disorderly as UUID, consider using the Snowflake algorithm (Twitter open source).

The SnowFlake algorithm generates ids that are 64-bit integers, including:

  • 1 bit: This parameter is not used. The value is 0.
  • 41 bit: timestamp (milliseconds). The value ranges from 0 to 2 to the 41st power. In adulthood, it’s about 69 years;
  • 10 bit: indicates the MACHINE ID. 5-bit machine room ID + 5-bit machine ID; (If the number of service clusters is small, it can be manually configured. If the number of service clusters is large, it can be automatically configured by third-party components, such as Meituan leaf-Snowflake, which uses the persistent sequential node of Zookeeper as the machine ID.)
  • 12 bit: serial number used to record different ids generated within the same millisecond.

In Java, the SnowFlake algorithm generates ids that can be stored with longs.

  • Advantages: Local generation, no network consumption, no need for third-party components (there is no single point of risk), unique within a certain range (basic can meet most scenarios), good performance, according to the time stamp increase (trend increase);
  • Disadvantages: Depending on the machine clock, if the same machine reverses the time, the generated ID will have the risk of duplicate.
image

In addition, there are many excellent Internet companies also provide unique ID generation schemes or frameworks, such as Meituan open source Leaf, Baidu open source UidGenerator and so on.


@Resource

private UidGenerator uidGenerator;



@Test

public void testSerialGenerate(a) {

    // Generate UID

    long uid = uidGenerator.getUID();

    System.out.println(uidGenerator.parseUID(uid));

}

Copy the code

Uncle will point code | article “original”


Please pay attention to the uncle who can code