A “unique ID” is a common requirement in applications to uniquely identify a business object, a resource, a message, and so on. In a database, a unique ID is typically used as a primary key for a piece of data. For those of you who have read the previous article on MySQL indexing, the importance of primary keys to a database is self-evident.

Getting a globally unique ID is very easy in a single machine scenario, and you can use the database increment function.

But if you want to build a globally unique ID in a distributed scenario, it’s a little different. Since distributed systems tend to be highly concurrent, it is not appropriate to use the auto-increment feature of a stand-alone database. If your technology selection happens to be a “non-distributed database” such as MySQL, you should refer to the distributed globally unique ID generation strategy common in the industry.

UUID

UUID is Universally Unique Identifier. The standard form contains 32 hexadecimal digits, 36 characters of the form 8-4-4-4-12 in five hyphens, as shown in the following example: 9628F6e9-70CA-45AA-9F7C-77AFe0D26e05. Up to now, there are 5 ways to generate UUID in the industry. For details, see the UUID specification A Universally Unique IDentifier (UUID) URN Namespace published by the IETF, known as the five versions of UUID.

The UUID class in the JDK can generate version 3 and version 4 UUID. So here’s a quick look at how UUID generation for version 3 and version 4 is done.

  • UUID version 3: value by calculating the MD5 hash of name and namespace.
  • UUID version 4: Generates UUID based on random or pseudo-random numbers. The probability of such UUID duplicates can be calculated, but the probability of duplicates is negligible, so this version is also the one that is often used.

There are also online UUID generation sites that can be used to generate temporary test data if you use UUID on your project. www.uuidgenerator.net/

The advantage of UUID is that it is simple to implement, using the JDK’s native APIS. The disadvantage is that it does not conform to the primary key index strategy of the database based on b-Tree engine, and is not suitable for the database primary key in scenarios requiring high performance.

Based on Redis implementation

Distributed, probably a cache. With caching, you might use Redis. Redis INCR functions operate atomically on a single machine and are guaranteed to be unique and incremental.

Stand-alone Redis may not support high concurrency. If you use Redis clustering, how do you ensure that ids are unique? Step size can be used. For example, a cluster consisting of five Redis nodes generates ids as follows:

A: 1,6,11,16,21

B: 2,7,12,17,22

C: 3,8,13,18,23

D: 4,9,14,19,24

E: 5,10,15,20,25

Class snowflake scheme

Twitter uses Zookeeper to implement a global ID-generated service called Snowflake. The data structure of ID generation is shown in the figure below:

A total of 64 bits, exactly corresponding to the Java long, the first symbol bit is not used, and then 41 bits are used to represent the timestamp. The next 10 digits indicate the ID of the node. If the node has multiple equipment rooms, divide the first five digits to indicate the ID of the equipment room, and the last five digits to indicate the ID of the machine in each equipment room. The last 12 bits are used to represent the serial number, so that it can be done in the same millisecond, the same machine to generate multiple ids, 12 bits count up to 4096.

The Time Stamp here is not the Time Stamp of the current Time, but the difference between the current Time and the start Time. If measured on a millisecond basis, 41 bits last about 69 years.

There are many variations of the Snowflake algorithm. You can adjust the bit allocation according to your actual situation, for example, the timestamp is 42 bits, the machine ID is 9 bits. A 42-bit timestamp gives you 138 years to wait.

Baidu’s UidGenerator and Meituan’s Leaf are both based on snowflake variants.

Snowflake is a good way to generate ids that are globally unique and support high concurrency. It is of type long, trending upward, and can be used for database primary keys. You can also sort by time.

However, it also has the disadvantage of being heavily dependent on the server clock. If the server clock is set back (such as leap seconds or NTP synchronization), the ID will be duplicated.

Meituan Leaf solves the problem of clock callback, the specific process is shown below, you can understand:

The other way

Of course, there are other ID generation schemes, such as:

  • Didi: Time + starting point number + license plate number

  • Taobao order: time stamp + user ID

  • Other e-commerce: time stamp + ordering channel + user ID, some will add the ID of the first item in the order.

  • MongoDB ID: also a kind of snowflake. Time + machine code + PID +inc consists of 12 bytes, 4+3+2+3, and is finally identified as a 24-length hexadecimal character.

conclusion

If you do not use the database primary key, you are advised to use the UUID.

If you want to use it as the primary key of your database and you are not using a distributed database (TiDB, MongoDB, etc.), you can use the Snowflake algorithm instead of meituan Leaf.

The distributed ID of database middleware Sharding-JDBC adopts Twitter’s open source Snowflake algorithm and does not need to rely on any third-party components, thus simplifying its scalability and maintenance to the greatest extent. However, the snowflake algorithm’s flaws (strong time dependence, if the clock is rolled back, it will generate duplicate ids), Sharding-JDBC does not provide a solution, and users need to expand if they want to enhance.

Write articles carefully and share with your heart.

Personal website: Yasinshaw.com

Public number: XY’s technical circle