Golang implements Twitter snowFlake algorithm to efficiently generate globally unique ids

Recently, I started to prepare an H5 game, because this is my first time to contact the game category, even though the volume is not large, I want to do it well. When DESIGNING the table structure, I thought of the problem of table global unique ID. Since it is a game, it must be multiplayer online dot dot (ideal operation state hahaha). Initially, I wanted to use mongoDB’s objectId as a globally unique ID, but string indexing is not as efficient as integer indexing

The main difference between the two is that character types have the concept of a character set, and there is a character set encoding process from the storage side to the presentation side. The main cost of this process is CPU resources, which is a significant cost for in-memory operations. Using integer substitution can reduce CPU computation and memory and IO overhead.

So in the end, for optimal efficiency and visuals (integers), consider a purely integer id alternative and stumble upon Twitter’s snowFlake algorithm

This post is mostly borrowed from the web and put together to help you and your audience better understand how snowFlake works

SnowFlake Algorithm

Snowflake ID algorithm is a unique ID generation algorithm used by Twitter. In order to meet twitter’s request of tens of thousands of messages per second, each message has a unique and sequential ID, and supports distributed generation.

The principle of

In fact, it is very simple, just need to understand: a machine with a separate identity (assign a separate ID to the machine) generates ids with different numbers in 1 millisecond so the generated ID is sequential and unique

Constitute a

Here is a direct reference to predecessors, just to give you a more clear explanation

The structure of the Snowflake ID is a 64 bit int.

1 bit: The highest 1 bit in binary is negative, but all ids we need should be integers, so the highest bit here should be 0
41 bit behind: used to record id is generated at the time of the millisecond time stamp, milliseconds here only to represent a positive integer is an integer (computer contains 0), so it can be said of numerical value range is 0 to 2 ^ 41-1 (many will fan confused here why – 1, remember, starting from 0 in the computer rather than 1)
The next 10 bits: used to record the ID of the working machine 2^10 = 1024, so the current rule allows the maximum number of distributed nodes to be 1024 nodes. We can specify the number of workers and the id sequence number that can be generated in 1 millisecond for each machine according to business requirements
The last 12 bits are used to represent the id generated by a single machine every millisecond. The largest positive integer that can be represented by the 12-bit bit is 2^ 12-1 = 4096. 4095 The 4096 numbers are used to indicate the number of ids generated by the machine in one millisecond.

Finally, the above four bits are spliced together to form a 64-bit bit

implementation

Here we use Golang to implement snowFlake. First of all, we define the basic constants of snowFlake. The users of each constant are explained in detail through comments

// The number of nodes in each cluster is as const (numberBits uint8 = 12) The number of nodes can be up to 2^10=1024, that is, 2^12-1=4096 unique ids can be generated every millisecond Corresponding to the penultimate paragraph in the figure above // where the maximum value is achieved using bit operation, the binary of -1 is represented as the complement of 1, WorkerMax int64 = -1 ^ (-1 << workerBits) NumberMax int64 = -1 ^ (-1 << numberBits) TimeShift uint8 = workerBits + numberBits // If you start the system on January 1, 2010 and don't subtract the January 1, 2010 timestamp, you will waste 40 years of the timestamp. Epoch int64 = 1525705533000; epoch int64 = 1525705533000; epoch epoch int64 = 1525705533000Copy the code

The time stamps start at workerBits + numberBits (22) from right to left. The time stamps start at workerBits + numberBits (22). You can just count it and it’s easy to understand the same thing with workerShift

Worker Work node

Since it is a distributed ID generation algorithm, we need to generate multiple workers, so the basic parameters required by a Woker work node are abstracted here

Type Worker struct {mu sync.Mutex // Add a Mutex to ensure concurrent security timestamp int64 // Record the last time when the id was generated Int64 // Node ID number int64 // Serial number of ids generated in the current millisecond (starting from 0) a maximum of 4096 ids can be generated in 1 millisecond}Copy the code

Instantiate the work node

Since we are in a distributed situation, we should assign a separate ID to each machine through an external configuration file or other means

Func NewWorker(workerId int64) (*Worker, Error) {/ / need to detect whether workerId within defined above the if workerId < 0 | | workerId > workerMax {return nil, New("Worker ID excess of quantity")} return Worker{timestamp: 0, workerId: workerId, number: 0, }, nil }Copy the code

You can use Redis to generate unique ids for each machine in a distributed environment

This part is not included in the algorithm

To generate the id

// The build method must be mounted under some woker, Func (w *Worker) GetId() int64 {// The most important point to get the ID add lock add lock w.mlock () defer w.mllock () // Remember when the build is complete Now := time.now ().unixnano () / 1e6 // if w.timstamp == now {w.mumber ++ // Specifies whether the current working node generates numberMax ids in 1 ms. If w.number > numberMax {// If the number of ids generated by the current working node exceeds the upper limit in 1 ms, wait 1 ms before generating ids. For now <= w.timstamp {now = time.now ().unixnano () / 1e6}}} else {// If the current time is different from the time when the ID was last generated on the working node, reset the ID of the working node Here's a code that you can see a lot of predecessors outside of if, The value is reassigned regardless of whether the timestamp of the last id generated by the node is the same as the current time. This will add a little extra overhead, so I choose to place w.timstamp = now // Update the last id generated by the machine to the current time} id := int64((now - epoch) << timeShift | (w.workerId << workerShift) | (w.number)) return ID }Copy the code

Introduction to a lot of new friends may see the final ID: = XXXXX | XXXXXX < < < < XXX xx | XXXXX mentally Here is the place for each part of the bit to and through a bitwise or operation (this is the ‘|’) will use a diagram to explain its integration

I think you will be very clear after watching it

What about a paragraph where you might not have enough digits to start with? Don’t worry binary space will automatically fill 0!

In view of the “|” to explain

The two numbers involved in the operation, after conversion to binary (0, 1), or operation. As long as there is a 1 in the corresponding bit, the bit is taken as 1, which is not 1, that is, 0

Also read the graph is very clear (Baidu will not say I steal the graph ah T.T)

Test

Let’s use Golang’s test package to test the code we just generated

Package snowFlakeByGo import ("testing" "FMT") func TestSnowFlakeByGo(t * test.t) { err := NewWorker(1) if err ! Println(err) return} ch := make(chan int64) count := 10000 // Make goroutine count for snowflake ID generation for I : = 0; i < count; i++ { go func() { id := worker.GetId() ch <- id }() } defer close(ch) m := make(map[int64]int) for i := 0; i < count; I++ {id := < -ch // if there is a key with id in the map, the generated snowflake id is duplicate _, ok := m[id] if ok {t. ror(" id is not unique! Println("All", count, "snowflake ") {// add id to map m[id] = I} "snowflake ID Get successed!" )}Copy the code

The results of

I tested it with a 17-version 13-inch macbook Pro (not Touch Bar)

wbyMacBook-Pro:snowFlakeByGo xxx$ go test All 10000 snowflake ID Get successed! PASS ok github.com/holdno/snowFlakeByGo 0.031 sCopy the code

It takes 0.031 seconds to generate 10,000 ids concurrently. If you can run on a distributed server, it is estimated to be faster

This article is combined with the network content and some of their own small optimization organized into the last attached github address: github.com/holdno/sno…. You can give me a star if you think it’s useful