In distributed systems, there are some scenarios where globally unique ids are required. In this case, 36-bit UUID can be used to prevent ID collisions. However, UUID has some disadvantages

Sometimes we want to use a simpler ID, and we want the ids to be generated in chronological order

Public number: Longtai technical notes

GitHub:acmenlt

What is the Snowflake algorithm

Snowflake, which means Snowflake in Chinese and is often referred to as the Snowflake algorithm, is Twitter’s open-source distributed ID generation algorithm

The Twitter snowflake algorithm is generated as a 64-bit long value, with the time stamp introduced in the component, which basically keeps the increment

Advantages of the SnowFlake algorithm:

  1. High performance high availability: Independent of the database, it is completely generated in memory
  2. High throughput: Generate millions of autoincrement ids per second
  3. ID increment: Saves the ID to the database, which improves index efficiency

Disadvantages of the SnowFlake algorithm:

Depending on the consistency with the system time, if the system time is called back or changed, ID conflicts or duplicate may occur

Snowflake algorithm composition

The snowflake structure is shown below:

There are four components

Not used: 1bit, the highest bit is the sign bit, 0 indicates positive, 1 indicates negative, fixed 0

Timestamp: 41bit, millisecond timestamp (69 years of use for 41 bits)

Identifier bit: 5-bit DATA center ID and 5-bit working machine ID. Combined, a maximum of 1024 nodes can be deployed

Serial number: the 12-bit increasing serial number indicates that the node generates repeated ids within milliseconds. The serial number indicates that the node is unique. The 12-bit number can generate 4096 ids every millisecond

If 4096 non-repeating ids can be generated from serial numbers in one millisecond, 4096 x 1000 = 409W IDS can be generated in one second

The default snowflake algorithm is 64 bits, and the length can be customized. If you want to run longer, increase the number of timestamp bits; If more nodes need to be deployed, increase the identifier bit length. If the concurrency is high, increase the serial number digit

Summary: The snowflake algorithm is not invariable and can be customized according to the specific scene in the system

The Snowflake algorithm applies to scenarios

Because snowflake algorithm order increment, guarantee MySQL B+ Tree index structure insert high performance

Therefore, in daily business use, snowflake algorithm is mostly applied to the primary key ID of database and the primary key of business association

Snowflake algorithm generates ID duplication problem

Hypothesis: an order microservice generates ID through the Snowflake algorithm and deploys three nodes with the same identification bit

At this time, there are 200 concurrent, evenly distributed three nodes, three nodes in the same millisecond with the same serial number to generate ID, then will generate duplicate ID

Based on the above hypothetical scenarios, it can be known that there are certain preconditions for the generation of ID conflicts by snowflake algorithm

  1. The service is deployed in a cluster with the same identifier bits on some machines
  2. Services have a certain amount of concurrency, without which duplicate problems cannot be triggered
  3. When the ID is generated: The sequence numbers are the same for the same millisecond

How is an identity bit defined

If you can ensure that the identity bit is not duplicated, then the snowflake ID will not be duplicated

From the above example, you know the necessary conditions for ID duplication. If you want to avoid duplicate ids within the service, you need to move the article from the identity bit

Let’s take a look at how to define identifier bits in an open source framework using the Snowflake algorithm

Mybatis-Plus v3.4.2 Snowflake algorithm implementation class Sequence, provides two construction methods: no parameter construction, automatic generation of dataCenterId and workerId; Parameter constructs that explicitly specify the identifier bit when creating a Sequence

Hutool V5.7.9 provides a default implementation based on Mybatis-Plus dataCenterId and workerId generation schemes

How to generate dataCenterId and workerId

public static long getDataCenterId(long maxDatacenterId) { long id = 1L; final byte[] mac = NetUtil.getLocalHardwareAddress(); if (null ! = mac) { id = ((0x000000FF & (long) mac[mac.length - 2]) | (0x0000FF00 & (((long) mac[mac.length - 1]) << 8))) >> 6; id = id % (maxDatacenterId + 1); } return id; }Copy the code

The input parameter maxDatacenterId is a fixed value representing the maximum number of data center ids. The default value is 31

Why is the maximum 31? Because the maximum binary value of 5 bits is 11111, which corresponds to the decimal value 31

When obtaining the dataCenterId, the network interface is empty and the default value is 1L. The other is not empty and obtains the dataCenterId from the Mac address

The value of dataCenterId depends on the Mac address

Now look at the workerId

public static long getWorkerId(long datacenterId, long maxWorkerId) {
    final StringBuilder mpid = new StringBuilder();
    mpid.append(datacenterId);
    try {
        mpid.append(RuntimeUtil.getPid());
    } catch (UtilException igonre) {
        //ignore
    }
    return (mpid.toString().hashCode() & 0xffff) % (maxWorkerId + 1);
}
Copy the code

The input parameter maxWorkderId is also a fixed value representing the maximum number of working machine ids. The default value is 31. DatacenterId is taken from the getDatacenterId method above

The value of the name variable is PID@IP, so name needs to be divided according to @ and get the subscript 0 to get PID

The MAC + PID hashcode is used to obtain 16 low values, and then the workerId is finally obtained

Assign identifier bit

The acquisition of Mybatis-Plus identification bit depends on Mac address and process PID. Although it can be done as little as possible, there is still a small probability

How can identity bits be defined so that they are not duplicated? There are two schemes: pre-allocation and dynamic allocation

pre-allocated

Before the application goes online, count the number of nodes currently serving the application and manually apply for an IDENTIFIER

This scheme, without the amount of code development, can be used when service nodes are fixed or projects are few, but it cannot solve the problem of dynamic expansion of service nodes

Dynamic allocation

The identification bit is stored in middleware such as Redis, Zookeeper and MySQL to request the identification bit when the service is started. After the request, the identification bit is updated to the next available one

By storing the identifier bits, we extend the question of whether the ID of the Snowflake algorithm is unique within the service or globally

Take Redis as an example. If you want to be unique in the service, the Redis node storing the identifier bits can be used in its own project. If globally unique, all applications that use the snowflake algorithm use the same Redis node

The only difference is whether Redis is shared between different services. If there is no requirement for global uniqueness, it is best to make it unique within the ID service, because this avoids single point problems

If the number of service nodes exceeds 1024, you need to perform additional expansion. You can extend the 10-bit identifier or choose the open source distributed ID framework

Dynamic allocation implementation scheme

Redis stores a Hash Key that contains two key-value pairs: dataCenterId and workerId

When the application starts, the Lua script goes to Redis to get the identity bit. The dataCenterId and workerId are obtained and incremented in the Lua script, and the call returns the available flag bits

The specific Lua script logic is as follows:

  1. Redis may not have the “snowflake_work_id_key” Hash when obtaining the first service node. If the Hash does not exist, the dataCenterId and workerId will be initialized to 0
  2. If the Hash exists, check whether the dataCenterId and workerId are equal to the maximum value of 31. If the conditions are met, the system returns after the dataCenterId and workerId are initialized to 0
  3. The total number of permutations of dataCenterId and workerId is 1024. The workerId is allocated first
  4. Check whether workerId! If (workerId = 31); if (workerId = 31); If workerId = 31, increment the dataCenterId and set the workerId to 0

DataCenterId and workerId are all pushed down, forming a loop. Through the atomicity of Lua script, ensure that the generation of snowflake algorithm under 1024 nodes is not repeated. If the identity bit is equal to 1024, the loop continues from the beginning

Open source distributed ID framework

Both Leaf and Uid have implemented the snowflake algorithm. Leaf provides the number segment mode to generate ID in addition

Meituan Leaf:https://github.com/Meituan-Dianping/Leaf

Baidu Uid:https://github.com/baidu/uid-generator

The Snowflake algorithm can meet most scenarios. If it is not necessary, it is not recommended to introduce open source solutions to increase system complexity

review

This article helps readers sort out what is the snowflake algorithm and how to solve the problem of ID conflict generated by snowflake algorithm

As for the problem of ID conflict generated by snow ring algorithm, a scheme is proposed: assigning identifier bits; By assigning the component identifier bits of the snowflake algorithm, the default ID generation is unique under 1024 nodes

You can go to see Hutool or Mybatis-Plus snowflake algorithm concrete implementation, to help you better understand

The Snowflake algorithm is not a panacea and cannot be applied to all scenarios. If the ID needs to be globally unique and the number of service nodes exceeds 1024, you can modify the composition of the algorithm itself, that is, expand the identifier bit, or choose LEAF and UID

Creation is not easy, the article read help, point attention to support, wish good