Do you know the unique ID generation algorithm of distributed system?

So one of the questions that you’re going to have to face after you split the tables, is how do I get my ID?

If a table is divided into multiple tables, the id of each table increases from 1.

For example, if your order table is split into 1024 order tables, each table’s ID is accumulated from 1, this must be a problem!

Your system will not be able to query the order by the primary key of the table, such as the order id = 50, in every table!

Therefore, it is necessary to generate globally unique ID under the distributed architecture. After the sub-database sub-table, for the core ID inserted into the database, it cannot simply use the table to increment the ID directly. It is necessary to generate globally unique ID and then insert it into each table to ensure that an ID in each table is globally unique.

For example, the order table is split into 1024 tables, but the order id = 50 will only exist in one table.

So how do you implement globally unique ids? There are several options.

(1) Scheme 1: self-increment ID of independent database

This scheme means that every time your system generates an ID, it inserts a non-business entry into a separate table in a separate library and then retrieves an id from the database. Get this ID and then write to the corresponding sub-database sub-table.

For example, if you have an auto_id library, there is only one table in it, called the auto_id table, and one id is self-increasing.

So every time you want to get a globally unique ID, you just insert a record into this table, get a globally unique ID, and then that globally unique ID can be inserted into the order branch table.

The advantage of this scheme is that it is so convenient and simple that anyone can use it. The disadvantage is that the single library generates the increment ID, if the high concurrency, there will be a bottleneck, because the auto_ID library if bearing tens of thousands of concurrent per second, is certainly not realistic.

(2) Scheme 2: UUID

This, as everyone should know, is the use of UUID to generate a globally unique ID.

The advantage is that each system is locally generated, not database-based

The downside is that the UUID is too long to be used as a primary key.

If you want to randomly generate a file name, number, etc., you can use uUID, but you can’t use UUID for primary key.

(3) Scheme 3: Obtain the current system time

The idea is to get the current time as a globally unique ID.

But the problem is, when the concurrency is very high, for example, thousands of concurrent requests per second, there will be repeated situations, which is definitely not appropriate.

Generally, if you use this scheme, the current time is concatenated with many other business fields, as an ID, if the business is acceptable to you, then it is also ok.

You can concatenate other business field values with the current time to form a globally unique number, such as an order number: timestamp + user ID + business meaning code.

(4) Scheme four: Ideological analysis of Snowflake algorithm

Snowflake is an open-source distributed ID generation algorithm for Twitter.

The core idea is to use a 64 bit long as the global unique ID. One of the 64 bits is unused, and 41 bits are used as the number of milliseconds, 10 bits are used as the id of the working machine, and 12 bits are used as the serial number.

To give you an example, for example, the following 64 bit long number, let’s see

The first part above, which is a bit: 0, is meaningless

The second part above has 41 bits: this is the timestamp

The third part above has five bits: represents the machine room ID, 10001

The fourth part above has five bits: the machine ID, 1 1001

The fifth part above is 12 bits: the serial number, which is the serial number of the ID generated simultaneously on a machine in a machine room in this millisecond, 0000 00000000

1 Bit: No, why not?

Since the first bit in binary is a 1, it is a negative number, but the generated ID is a positive number, so the first bit is always 0

41 bit: Indicates the time stamp, in milliseconds.

41 bits can represent as many as 2^ 41-1 milliseconds, or 69 years in adult years

10 bit: Records the working machine ID, which indicates that the service can be deployed on up to 2^10 machines, or 1024 machines.

However, five bits represent the machine room ID and five bits represent the machine ID. This means a maximum of 2 ^ 5 machine rooms (32 machine rooms), each machine room can represent 2 ^ 5 machines (32 machines).

12 bit: This is used to record different ids generated in the same millisecond.

The largest positive integer represented by 12 bits is 2 ^ 12-1 = 4096, which means you can use the 12 bits to distinguish 4096 different ids in the same millisecond

In simple terms, one of your services assumes that you want to generate a globally unique ID, so you can send a request to the system that has the Snowflake algorithm deployed, and the Snowflake algorithm system generates the unique ID.

The snowflake algorithm must first know the machine room id = 17 and machine ID = 12.

When the Snowflake algorithm receives the request, it first generates a 64-bit long ID using binary arithmetic. The first of the 64 bits is meaningless.

This is followed by 41 bits to use the current timestamp (in milliseconds), 5 bits to set the machine room ID, and 5 bits to set the machine ID.

Finally, determine the number of requests in this millisecond on the machine in the current machine room, and add a sequence number to the request to generate the ID as the last 12 bits.

The result is a 64-bit ID that looks something like this:

This algorithm ensures that a unique ID is generated in the same millisecond on a single machine in a machine room. It is possible to generate multiple ids in a millisecond, but with the ordinal number of the last 12 bits to distinguish them.

Let’s take a quick look at a code implementation of the Snowflake algorithm. This is an example. Once you understand what this means, you can try to modify the algorithm for yourself.

In short, the use of a 64-bit number in each bit to set a different flag bit, to distinguish each ID.

(5) Code implementation of Snowflake algorithm

public class IdWorker {

private long workerId; // This represents the machine id

private long datacenterId; // This represents the machine room id

private long sequence; // This is the latest sequence number representing multiple ids generated in a millisecond

public IdWorker(long workerId, long datacenterId, long sequence) {

// sanity check for workerId

// The machine id cannot exceed 32 and cannot be less than 0

if (workerId maxWorkerId || workerId 0) {

throw new IllegalArgumentException(

String.format(worker Id can’t be greater than %d or less than 0,maxWorkerId));

}

if (datacenterId maxDatacenterId || datacenterId 0) {

throw new IllegalArgumentException(

String.format(datacenter Id can’t be greater than %d or less than 0,maxDatacenterId));

}

this.workerId = workerId;

this.datacenterId = datacenterId;

this.sequence = sequence;

}

private long twepoch = 1288834974657L;

private long workerIdBits = 5L;

private long datacenterIdBits = 5L;

// This is a binary operation, i.e., a maximum of 31 digits in 5 bits, i.e., a maximum of 32 machine ids

private long maxWorkerId = -1L ^ (-1L workerIdBits);

A maximum of 31 digits can be contained in 5 bits. A maximum of 32 digits can be contained in the machine room ID

private long maxDatacenterId = -1L ^ (-1L datacenterIdBits);

private long sequenceBits = 12L;

private long workerIdShift = sequenceBits;

private long datacenterIdShift = sequenceBits + workerIdBits;

private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

private long sequenceMask = -1L ^ (-1L sequenceBits);

private long lastTimestamp = -1L;

public long getWorkerId(){

return workerId;

}

public long getDatacenterId() {

return datacenterId;

}

public long getTimestamp() {

return System.currentTimeMillis();

}

// This is the core method that causes the snowflake algorithm on the current machine to generate a globally unique ID by calling nextId()

public synchronized long nextId() {

// Get the current timestamp in milliseconds

long timestamp = timeGen();

if (timestamp lastTimestamp) {

System.err.printf(

clock is moving backwards. Rejecting requests until %d., lastTimestamp);

throw new RuntimeException(

String.format(Clock moved backwards. Refusing to generate id for %d milliseconds,

lastTimestamp – timestamp));

}

A request is sent to generate an ID within the same millisecond

// The seqence sequence number must be incremented by 1, up to 4096

if (lastTimestamp == timestamp) {

// This means that there can only be 4096 digits in a millisecond. No matter how many you pass in,

// This bit operation is guaranteed to always be in the range of 4096, so that you do not pass a sequence beyond the range of 4096

sequence = (sequence + 1) sequenceMask;

if (sequence == 0) {

timestamp = tilNextMillis(lastTimestamp);

}

} else {

sequence = 0;

}

// The last time the id was generated is in milliseconds

lastTimestamp = timestamp;

// Here is the core bit operation, generating a 64bit ID

// Move the current timestamp left to 41 bit; [Fixed] Move machine room ID left to 5 bit [Fixed] Move the machine ID left to 5 bit Put the serial number in the last 12 bits

// Finally concatenated into a 64 bit binary number, converted to base 10 is a long

return ((timestamp – twepoch) timestampLeftShift) |

(datacenterId datacenterIdShift) |

(workerId workerIdShift) | sequence;

}

private long tilNextMillis(long lastTimestamp) {

long timestamp = timeGen();

while (timestamp = lastTimestamp) {

timestamp = timeGen();

}

return timestamp;

}

private long timeGen(){

return System.currentTimeMillis();

}

/ / — — — — — — — — — — — — — — — test — — — — — — — — — — — — — — —

public static void main(String[] args) {

IdWorker worker = new IdWorker(1,1,1);

for (int i = 0; i 30; i++) {

System.out.println(worker.nextId());

}

(6) Snowflake algorithm a small improvement idea

In practice, the Snowflake algorithm could be improved a bit.

Because you can think about it, when we generate a unique ID, we usually need to specify a table name, such as the order table unique ID.

Therefore, in the above 64 bits, the 5 bits representing the machine room can be replaced by the business table name, for example, 00001 represents the order table.

Because in most cases, there are not so many machine rooms, so the 5 bits may not be very meaningful as machine room ID.

That’s how it works. Each machine in the Snowflake algorithm generates a unique ID for a business table in a millisecond, many ids in a millisecond, and treats them with the last 12 bits.

Do you know the unique ID generation algorithm of distributed system?

Related Posts

Domain penetration – Hook PasswordChangeNotify

Github has made another big change this month!

TCP/IP — TCP three handshakes and four waves