Distributed unique ID solution - Snowflake algorithm

The reading takes about 3 minutes

Attached to the source

[toc]

preface

Gone are the days of monolithic architecture services.

At present, the complexity of system business and data storage is increasing, and distributed system is a very common solution.

Global unique IDS are almost all encountered in system design and play an important role in storage and retrieval.

ID generator

In applications, a globally unique ID is often required as the database primary key. How do I generate globally unique ids?

First, we need to determine whether the globally unique ID is an integer or a string. If it is a string, the existing UUID is sufficient and no additional work is required. The disadvantage is that strings take up a large space as IDS and have lower index efficiency than integers.

If you use an integer as an ID, you first exclude 32-bit ints because the range is too small and you must use 64-bit longs.

When an integer is used as an ID, how can I generate an ID that is incremented, globally unique, and unique?

Database increment

Database increment ID is often used in the system with small amount of data. By using the database increment ID, we can basically achieve continuous increment from 1. Oracle can use SEQUENCE, MySQL can use primary key AUTO_INCREMENT, although not guaranteed global unique, but each table unique, also basically meet the requirements.

The disadvantage of adding an ID to a database is that the ID cannot be obtained before the data is inserted. After the data is inserted, the ID obtained is unique, but the ID is not valid until the transaction is committed. Some bidirectional references have to be inserted and updated again, which is more troublesome.

In our development process, we encountered a master master database synchronization (simple can be understood as, the same SQL is executed again in another database) scenario, if the database increment ID, there will be a primary key inconsistency, or primary key conflict problem.

Distributed ID generator

Solution 1: UUID

It is not recommended in distributed environment

Uuid is the method we thought of first, in java.util; There are corresponding methods in the package. This is a UUID with RFC standard: www.ietf.org/rfc/rfc4122…

Uuid has good performance (local invocation) with no network overhead.

However, Uuid is not easy to store (generated strings, stored too long, and not suitable for many scenarios); Information is insecure (MAC address-based and potentially compromised, a vulnerability that was used to find the creator of Melissa’s virus). ; Increasing (or trending) is not guaranteed; Other bloggers reported that the first 20 unique ids would be duplicated in the case of large numbers (only about 220W).

UUID.randomUUID().toString()

Plan 2: Snowflake

This is the most widely used distributed ID solution and is recommended

Background Twitter cloud cloud is not introduced, is a period of time ago sealed the king of Twitter account.

Algorithm is introduced

The result of the SnowFlake id is an integer of 64bit size. Its structure is shown below:

One bit, no. The highest digit 1 in binary is negative, but the iD we generate is usually an integer, so the highest digit is always 0
41 bits, used to record time stamps (milliseconds).
- 41 bits can represent 2^{41}-1 numbers,
- If used only to represent positive integers (positive numbers include 0 in computers), the range of values can be from 0 to 2^{41}-1, minus 1 because the range of values that can be represented starts at 0, not 1.
- This means that 41 bits represent 2^{41}-1 milliseconds, which translates to (2^{41}-1)/(1000 * 60 * 60 * 24 * 365) = 69 years
10 bits, used to record the working machine ID.
- Can be deployed on 2^{10} = 1024 nodes, including 5-bit datacenterId and 5-bit workerId
- The largest positive integer that can be represented by 5 bits is 2^{5}-1 = 31, that is, 0, 1, 2, 3,…. 31 These 32 numbers represent different datecenterids or workerids
The 12-bit serial number is used to record different ids generated within the same millisecond.
- The largest positive integer that can be represented by 12 bits is 2^{12}-1 = 4095, which can be 0, 1, 2, 3,…. 4094 The 4095 numbers represent 4095 ID numbers generated in the same machine at the same time (millisecond).

Since 64-bit integers in Java are of type long, the ids generated by the SnowFlake algorithm in Java are stored in longs.

SnowFlake guarantees:

All ids generated on the same server increase in time
No duplicate ids are generated across the distributed system (because datacenterId and workerId distinguish between them)

Existing problems:

The machine ID (5-bit) and data center ID (5-bit) configurations are not resolved. In distributed deployment, the same configuration is used, but there is a risk of ID duplication.
Instantiation of objects is required, and there is no out-of-the-box utility class.
If the clock on the machine is dialed back, the number may be repeated or the service may be unavailable. (This doesn’t normally happen.)

The workId is generated using the hostName of the server, and the dataCenterId is generated using the IP of the dataCenterId. In this way, the machine code duplication of 10 bits can be prevented as much as possible. However, since neither ID can exceed 32, only the remainder can be taken, which is still avoidable. However, in practice, the configuration of hostName and IP is usually continuous or similar, as long as they are not exactly 32 bits apart, there will be no problem. Moreover, it is almost impossible to separate hostName and IP by 32 bits at the same time. Usually, there will be no more than 10 containers in distributed deployment.

Docker configuration in production is usually compiled once and then distributed to different containers, with no different configuration. This situation creates uncertainty about the above mentioned situation, which will be referenced in a review article.

The source code

Java version of snowflake ID generation algorithm

package com.my.blog.website.utils;
 
import org.apache.commons.lang3.RandomUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.SystemUtils;
 
import java.net.Inet4Address;
import java.net.UnknownHostException;
 
/** * Twitter_Snowflake

 * The structure of SnowFlake is as follows (each part is separated by -):

 * 0-0000000000 0000000000 0000000000 0000000000 0-00000 - 0000000-000000000000 

 * 1 bit identifier, because the basic type of long is signed in Java, the highest bit is the sign bit, the positive number is 0, the negative number is 1, so the id is generally positive, the highest bit is 0

 * 41 bit truncation (milliseconds), notice, The 41-bit cut-off is not the current cut-off, but the difference between the current cut-off and the start cut-off *. The start cut-off is usually the time our ID generator started using. Specified by our program (as shown in the startTime property of the IdWorker class). The 41-bit time segment can be used for 69 years. The year T = (1L << 41)/(1000L * 60 * 60 * 24 * 365) = 69< BR > * the 10-bit data machine bit can be deployed on 1024 nodes. Includes 5-bit datacenterId and 5-bit workerId< BR > * 12-bit sequence, count in milliseconds, 12-bit count sequence number support each node every millisecond (same machine, same time intercept) 4096 ID sequence number < BR > * add up to just 64 bits, a Long. < BR > * The advantage of SnowFlake is that the whole system is sorted by time increment, there is no ID collision across the distributed system (distinguished by data center ids and machine ids), and it is very efficient. SnowFlake can generate around 260,000 ids per second in tests. * /
public class SnowflakeIdWorker {
 
    // ==============================Fields===========================================
    /** Start time (2015-01-01) */
    private final long twepoch = 1489111610226L;
 
    /** The number of digits in the machine ID */
    private final long workerIdBits = 5L;
 
    /** The number of digits in the data id */
    private final long dataCenterIdBits = 5L;
 
    /** The maximum machine ID supported, resulting in 31 (this shift algorithm can quickly calculate the maximum decimal number represented by several binary digits) */
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
 
    /** The maximum data id supported. The result is 31 */
    private final long maxDataCenterId = -1L ^ (-1L << dataCenterIdBits);
 
    /** The number of digits in the sequence */
    private final long sequenceBits = 12L;
 
    /** The machine ID is shifted 12 bits to the left */
    private final long workerIdShift = sequenceBits;
 
    /** Data id is shifted 17 bits to the left (12+5) */
    private final long dataCenterIdShift = sequenceBits + workerIdBits;
 
    /** The time warp moves 22 bits to the left (5+5+12) */
    private final long timestampLeftShift = sequenceBits + workerIdBits + dataCenterIdBits;
 
    /** Generate the mask of the sequence, here 4095 (0b111111111111= 0xFFf =4095) */
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);
 
    /** ID of the working machine (0~31) */
    private long workerId;
 
    /** DATA center ID(0-31) */
    private long dataCenterId;
 
    /** Milliseconds sequence (0~4095) */
    private long sequence = 0L;
 
    /** The last time the ID was generated */
    private long lastTimestamp = -1L;
 
    private static SnowflakeIdWorker idWorker;
 
    static {
        idWorker = new SnowflakeIdWorker(getWorkId(),getDataCenterId());
    }
 
    //==============================Constructors=====================================
    /** * constructor *@paramWorkerId Indicates the work ID (0 to 31) *@paramDataCenterId dataCenterId (0 to 31) */
    public SnowflakeIdWorker(long workerId, long dataCenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("workerId can't be greater than %d or less than 0", maxWorkerId));
        }
        if (dataCenterId > maxDataCenterId || dataCenterId < 0) {
            throw new IllegalArgumentException(String.format("dataCenterId can't be greater than %d or less than 0", maxDataCenterId));
        }
        this.workerId = workerId;
        this.dataCenterId = dataCenterId;
    }
 
    // ==============================Methods==========================================
    /** * get the next ID (this method is thread-safe) *@return SnowflakeId
     */
    public synchronized long nextId(a) {
        long timestamp = timeGen();
 
        // If the current time is less than the last timestamp generated by the ID, an exception should be thrown when the system clock is rolled back
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(
                    String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }
 
        // If they are generated at the same time, the sequence is performed in milliseconds
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            // Sequence overflow in milliseconds
            if (sequence == 0) {
                // block until the next millisecond to get a new timestamptimestamp = tilNextMillis(lastTimestamp); }}// The timestamp changes and the sequence is reset in milliseconds
        else {
            sequence = 0L;
        }
 
        // The last time the ID was generated
        lastTimestamp = timestamp;
 
        // Shift and put together by or to form a 64-bit ID
        return ((timestamp - twepoch) << timestampLeftShift)
                | (dataCenterId << dataCenterIdShift)
                | (workerId << workerIdShift)
                | sequence;
    }
 
    /** * blocks to the next millisecond until a new timestamp * is obtained@paramLastTimestamp Specifies the time when the ID was last generated *@returnCurrent timestamp */
    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }
 
    /** * returns the current time in milliseconds *@returnCurrent time (ms) */
    protected long timeGen(a) {
        return System.currentTimeMillis();
    }
 
    private static Long getWorkId(a){
        try {
            String hostAddress = Inet4Address.getLocalHost().getHostAddress();
            int[] ints = StringUtils.toCodePoints(hostAddress);
            int sums = 0;
            for(int b : ints){
                sums += b;
            }
            return (long)(sums % 32);
        } catch (UnknownHostException e) {
            // If the fetch fails, use a random number for backup
            return RandomUtils.nextLong(0.31); }}private static Long getDataCenterId(a){
        int[] ints = StringUtils.toCodePoints(SystemUtils.getHostName());
        int sums = 0;
        for (int i: ints) {
            sums += i;
        }
        return (long)(sums % 32);
    }
 
 
    /** * static tool class **@return* /
    public static synchronized Long generateId(a){
        long id = idWorker.nextId();
        return id;
    }
 
    //==============================Test=============================================
    / * * * / test
    public static void main(String[] args) {
        System.out.println(System.currentTimeMillis());
        long startTime = System.nanoTime();
        for (int i = 0; i < 50000; i++) {
            long id = SnowflakeIdWorker.generateId();
            System.out.println(id);
        }
        System.out.println((System.nanoTime()-startTime)/1000000+"ms"); }}Copy the code

Reference: the original blog.csdn.net/xiaopeng927…

Sharing and watching is the biggest encouragement for me. I’m Pub and we’ll see you next time!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Distributed unique ID solution – Snowflake algorithm

preface

ID generator

Database increment

Distributed ID generator

Solution 1: UUID

Plan 2: Snowflake

Algorithm is introduced

The source code

Distributed unique ID solution – Snowflake algorithm

preface

ID generator

Database increment

Distributed ID generator

Solution 1: UUID

Plan 2: Snowflake

Algorithm is introduced

The source code

Related Posts

Node.js learning Diary (5) : TCP

Using the Node.js plugin to send emails

Interviewer: Tell me about Java reflection