What is a Snowflake algorithm? What does it have to do with distributed unique ids? Why do I need a distributed ID? Don’t worry, wait for Lord Hong to talk to you.

First of all, what is distributed unique ID? With the development of the Internet and the evolution of sertization, our single service can no longer meet the needs of The Times, and the amount of data in a table in business is increasing, which greatly reduces the efficiency of query and insertion. A single database and a single table can no longer support the existing services. At this time, master/slave synchronization occurs, and write libraries and readers are separated from each other through master/slave synchronization.

But read and write on the table structure, library or the same data, just separated query and write operation pressure, can not fundamentally solve the problem, the table is a large amount of data, then need to depots database table, but depots table there is a problem, every database table theory belong to the same business, They need a unique identifier to represent and locate each piece of data.

How does that work? Data distributed in different databases, different tables, oh my God? ! Can it be done? You might say that the SQL statement will pass through the server before it is executed, and the code logic in the system can ensure that it is unique, for example, in the system to write a method to generate unique code, this seems to be possible, but considering the concurrent situation, two threads at the same time to insert that how to do? ** You would say lock to ensure that only one thread enters at a time. * * yi? This seems to make sense, but it also has limitations, because it can involve the business scenario of sub-database and sub-table, the single application cannot be satisfied, the system must already be a distributed application, and each distributed node has a cluster to ensure its high availability. In such a case, how do you ensure that each system generates a different and unique ID?

At this time, A student raised his hand and Hung Jue asked him to answer. Student A said: Hung Jue, I think it is possible to maintain A center. Every time we need to generate A unique ID, we call the service through RPC to get A unique ID, or the center can be A Redis service and so on. Yi? That seems to work. However, there are also many shortcomings. The first one is that communication needs to be carried out every time, and the consumption of network transmission time is relatively large. The second is that the QPS is too high to be maintained by only one center, and it also needs to ensure that no duplicate ID is generated, so the cost of maintaining such a center alone is high.

** Is there a way for each server to internally generate an ID that is globally unique and has some self-incrementing power? What does ** mean? For example, server A generates A unique Id: 1001 at 10:10 and server B generates A unique Id: 998 at 10:11. This is something we should try to avoid. The unique Id and its insertion time should keep increasing.

What? There is such an algorithm in the world? Is it true? I’m still young. Don’t lie to me!

Can, please believe hong Jue, Hong Jue to tell you one of the implementation is the snowflake algorithm! At the thought of snow, can not help but tears, Lord Hong long so big, has not seen the snow floating in the sky, if it snows in Hangzhou, I hope to go to the West Lake to see the broken bridge remnants of snow… . Ahem, getting off topic, let’s get back to the main point.

Since the snowflake algorithm is one of the distributed unique ID generation strategies, it must have its advantages. Let’s talk about it carefully. After reading this article, we can ensure that we can understand the implementation of the snowflake algorithm, and how to ensure global uniqueness.

Snowflake algorithm is an open source distributed Id generation algorithm of Twitter. The Id generated by the algorithm is a 64-bit integer, and its composition is as follows (the number of bits in each part is not completely fixed and can be modified according to the corresponding business scenarios) :

1. The highest bit is a sign bit. 1 represents a negative number and 0 represents a positive number.

41bit- Timestamp: used to record milliseconds of time (such as the current time point – a fixed time point). The larger the fixed time point minus, the more space there is to store the 41bit. With timestamps to ensure some degree of increments of unique ids, the 41bit size can theoretically be used (2^ 41-1) / 1000/60/60/24/365 is approximately 49 years.

3. 10bit- Machine Id: indicates different machine ids. 10bit can represent 1024 machine nodes.

12bit- Serial number, which indicates the maximum number of ids that can be generated by each machine in the same millisecond. This can happen concurrently. 2^12-1=4095 ids.

When you need to support concurrency value larger scene, can increase the digit serial number, if you need more machine node, occupied by the digits, can improve the machine Id, of course, also want to sacrifice, increased the one part of the digits, another part of the figures would be reduced, in accordance with the corresponding scenarios to consider.

Take a look at the code implementation:

public class SnowFlake {

    /** * component */

    // A fixed timestamp 2017-09-04 15:00:00 can be 0, the specific value can be customized
    private final long fixedTimeStamp = 1504508400L;

    / Id/room
    private long computerRoomId;

    / / machine Id
    private long machineId;

    // Serial numbers 0 to 4095
    private long sequence = 0L;

    /** ** The size of the bit */

    // The timestamp is high so it is the rest of the bit occupied by the rest of the component

    // The number of bits occupied by the equipment room Id
    private final long computerRoomIdBitCnt = 5L;

    // The number of bits occupied by the machine Id
    private final long machineIdBitCnt = 5L;

    // Number of bits occupied by serial number
    private final long sequenceBitCnt = 12L;

    /** * The number of bits that each component needs to move */

    // The number of bits the machine Id needs to move to the left
    private final long machineIdShift = sequenceBitCnt;

    // Number of bits to move to the left
    private final long computerRoomIdShift = machineIdShift + machineIdBitCnt;

    // The number of bits the timestamp needs to shift to the left
    private final long timeStampShift = computerRoomIdShift + computerRoomIdBitCnt;

    /** ** -1 ^ (-1 << x) = 2 ^ x -1 */

    // Supports a maximum of 31 EQUIPMENT room ids
    private final long maxComputerRoomId = -1 ^ (-1 << computerRoomIdBitCnt);

    // The maximum machine Id size is 31
    private final long maxMachineId = -1 ^ (-1 << machineIdBitCnt);

    // Serial number mask
    private final long sequenceMask = -1 ^ (-1 << sequenceBitCnt);

    // Last generated timestamp
    private long lastTimeStamp = -1L;

    The /** * constructor initializes the machine room Id and machine Id@param computerRoomId
     * @param machineId
     */
    public SnowFlake(long computerRoomId, long machineId) {
        if(computerRoomId < 0 || maxComputerRoomId < computerRoomId) {
            throw new IllegalArgumentException("computerRoomId out of range");
        }
        if(machineId < 0 || maxMachineId < machineId) {
            throw new IllegalArgumentException("machineId out of range");
        }

        this.computerRoomId = computerRoomId;
        this.machineId = machineId;
    }

    /** * returns the current time in milliseconds *@return* /
    protected long getCurrentTime(a) {
        return System.currentTimeMillis();
    }

    public synchronized long getNextId(a) {
        long currentTimeStamp = getCurrentTime();

        // Within the same millisecond
        if(currentTimeStamp == lastTimeStamp) {
            sequence = (sequence + 1) & sequenceMask;
            // indicates a cycle waiting for the next millisecond to regenerate into an Id
            if(sequence == 0) { currentTimeStamp = getNextMillis(); }}else {
            // The sequence is not initialized in the same millisecond
            sequence = 0;
        }

        // Maintain the current timestamp
        lastTimeStamp = currentTimeStamp;

        return ((currentTimeStamp - fixedTimeStamp) << timeStampShift)
                | (computerRoomId << computerRoomIdShift)
                | (machineId << machineIdShift)
                | sequence;
    }
    
    protected long getNextMillis(a) {
        long currentTimeStamp = getCurrentTime();
        // If it is the same millisecond, continue to wait
        while(currentTimeStamp <= lastTimeStamp) {
            currentTimeStamp = getCurrentTime();
        }

        return currentTimeStamp;
    }

    public static void main(String[] args) {
        SnowFlake snowFlake = new SnowFlake(1.2); System.out.println(snowFlake.getNextId()); }}Copy the code

The code involved in * * operator &, |, ^, < <, etc., not understanding of children’s shoes, hurry to get to know!!! ** Hongjue here will not do the extension, if you want to hongjue out of a bit operator can be a public number private letter me!

Code in the machine room Id and machine Id how to ensure that certain different problems, there are many ways, such as maintenance of a center, by the center for distribution and distribution and so on.

Ok, I hope this period is helpful to you, don’t forget to knock again!

May everyone read this article with skepticism and explore how it works.

Road obstruction and long, the past for preface, the future for chapter.

Looking forward to our next meeting!