Highly concurrent distributed system unique ID generation

Post Views

Those familiar with the Paxos algorithm should know that the Paxos algorithm requires the Proposal ID to be globally unique (and increasing). In fact, the generation of globally unique (and increasing) ids itself requires some technique. There are many ways to generate globally unique ids, but meeting the requirements of high QPS, high availability, and low latency is not easy. As a rookie who has no chance to participate in high concurrency system, I can only understand and learn through dACHang’s sharing. To begin with, this article summarizes unique ID generation schemes for production environments in three large companies, and skips over methods that do not meet the criteria (high QPS, high availability, and low latency). In addition, there may be many other advanced methods, but I do not know due to my limited knowledge.

Twitter’s Snowflake algorithm

The Snowflake algorithm is actually a relatively simple and common unique ID generation algorithm. The ID generation algorithm inside MongoDB is very similar to Snowflake. Snowflake generates 64-bit ids represented by long (JVM language). The generated ID structure is as follows:

Just to explain the structure of ID, the first bit is the identifier bit, because ID is usually represented by long, and this bit is 0, which means it’s a positive number; This is followed by a 41-bit timestamp in milliseconds. This event is usually relative time; Then the next 10 bits are used to represent different machines, with 5 bits representing the data center and 5 bits representing the ids of the machines in the data center. The next 12 bits are the serial number of the ID, distinguishing the different ids generated by the same machine in the same millisecond.

Twitter open-source the Snowflake algorithm, but it appears to no longer maintain it. Here I have cut a piece of IDWorker code, using Scala, but if you do not know Scala, it does not affect reading.

/** Copyright 2010-2012 Twitter, Inc.*/ package com.twitter.service.snowflake import com.twitter.ostrich.stats.Stats import com.twitter.service.snowflake.gen._ import java.util.Random import com.twitter.logging.Logger class IdWorker(val workerId: Long, val datacenterId: Long, private val reporter: Reporter, var sequence: Long = 0L) extends Snowflake.Iface { private[this] def genCounter(agent: String) = { Stats.incr("ids_generated") Stats.incr("ids_generated_%s".format(agent)) } private[this] val exceptionCounter = Stats.getCounter("exceptions") private[this] val log = Logger.get private[this] val rand = new Random Val twepoch = 1288834974657L Private [this] val workerIdBits = 5L // The number of bits used by the data center private[this] val datacenterIdBits = 5L private[this] val maxWorkerId = -1L ^ (-1L << workerIdBits) private[this] val maxDatacenterId = -1L ^ (-1L << datacenterIdBits) private[this] val sequenceBits = 12L private[this] val workerIdShift =  sequenceBits private[this] val datacenterIdShift = sequenceBits + workerIdBits private[this] val timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits private[this] val sequenceMask = -1L ^ (-1L << sequenceBits) / / the last timestamp generated ID private [this] var lastTimestamp = 1 l / / sanity check for workerId if (workerId > maxWorkerId | | workerId < 0) { exceptionCounter.incr(1) throw new IllegalArgumentException("worker Id can't be greater than %d or less than 0".format(maxWorkerId)) } if (datacenterId > maxDatacenterId || datacenterId < 0) { exceptionCounter.incr(1) throw new IllegalArgumentException("datacenter Id can't be greater than %d or less than 0".format(maxDatacenterId)) } log.info("worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d", timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, Def get_id(userAgent: String): Long = {if (! validUseragent(useragent)) { exceptionCounter.incr(1) throw new InvalidUserAgentError } val id = nextId() genCounter(useragent) reporter.report(new AuditLogEntry(id, useragent, rand.nextLong)) id } def get_worker_id(): Long = workerId def get_datacenter_id(): Long = datenterid def get_timestamp() = system.currentTimemillis [snowflake] def nextId(): Long = synchronized {var timestamp = timeGen() If (timestamp < lastTimestamp) {exceptionCounter.incr(1) log.error("clock is moving backwards requests until %d.", lastTimestamp); throw new InvalidSystemClock("Clock moved backwards. Refusing to generate id for %d milliseconds".format( lastTimestamp -timestamp))} if (lastTimestamp == timestamp) {sequence = (sequence + 1) &sequencemask If (sequence == 0) {timestamp = tilNextMillis(lastTimestamp)}} else {sequence = 0} lastTimestamp = timestamp ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift) | (workerId << workerIdShift) | sequence } protected def tilNextMillis(lastTimestamp: Long): Long = { var timestamp = timeGen() while (timestamp <= lastTimestamp) { timestamp = timeGen() } timestamp } protected def timeGen(): Long = System.currentTimeMillis() val AgentParser = """([a-zA-Z][a-zA-Z\-0-9]*)""".r def validUseragent(useragent: String): Boolean = useragent match { case AgentParser(_) => true case _ => false } }Copy the code

As the code suggests, Snowflake’s algorithm is simple. There are two key points to note in the code snippet above, the first being the issue of clock rollback. Because of timing and leap seconds, the Snowflake algorithm responds to the clock rollback by simply throwing an exception. The second problem is 12-bit sequence number overflow, where the number of ids generated in a 1ms request exceeds 4096 ids. The Snowflake algorithm does this by waiting until the next MS regenerates an ID.

However, this should be hard to come by. At a rate of 4096 ids per millisecond, the theoretical capacity of a single ID generating server could reach 350 billion/day. Snowflake supports a maximum of 1024 deployment instances. According to the wechat team, the number of calls to the wechat serial number generation service is trillions per day. So snowflake is fine as far as QPS is concerned.

2. Leaf, meituan-Dianping’s ID generation service

They used a distributed ID generation system called Leaf, shared by meituan-Dianping’s technical team. The Leaf system implements two solutions, the first called Leaf-Snowflake. This solution is not much different from normal Twitter Snowflake, in which leaf-Snowflake uses Zookeeper to implement the configuration of the machine ID to improve the system’s scalability and fault tolerance to a certain extent. The architecture of Leaf-Snowflake is shown below:

In addition, the Leaf-Snowflake node is weakly dependent on ZooKeeper, and the local node also has an alternate machine ID. This is understandable, since a registry was never required for Snowflake. In my opinion, the biggest function of Zookeeper is to facilitate upper-layer management and monitoring.

Let’s take a look at leaf-snowflake segment. The core idea of Leaf-segment is actually very simple, that is, “make up the parts”. Centralized database management is adopted in leaf-segment algorithm. On the upper layer of the ID database are multiple proxy servers. The proxy server applies for a IDs segment from the database each time, and then uses the IDs segment as the ID pool to provide IDs for external clients. The overall architecture of leaf-segment is as follows:

In fact, when you look at the architecture diagram, everything is clear. The main bottleneck of this system is the single point of failure of DB server. To this end, the Meituan dianping team adopts a master and slave database architecture. The slave DB continues to serve after the master DB goes down, improving availability. The binlog subscription method used by the master and slave db can be out of sync, which is a drawback. If CP system like ZooKeeper is used here, there may be a problem that QPS cannot work. There is obviously an obvious trade-off here.

In addition, it is worth mentioning that the proxy server for Leaf-Segment uses double buffering to optimize the maximum latency for requests, much like the display double buffering for windowing systems.

3. WeChat seqsvr

The role of wechat seQSVR is also serial number generation. However, wechat as an IM application, its ID generation mode has certain particularity. Personal understanding of wechat is a big feature – it belongs to the social network of acquaintances. Wechat always seems to have a limit on the number of friends, and the relationship is mutual. Therefore, the data information about users can be relatively simple for horizontal segmentation, such as simple segmentation by user ID. The architecture of SEQSVR can be divided into two layers, namely StoreSvr and AllocSvr (storage layer and cache intermediate layer). Its overall structure is as follows:

Vertically, AllocSvr is responsible for assigning the sequence directly to the client, while StoreSvr is responsible for the reliable storage (via NRW) of the current maximum sequence NUM. From this perspective, it is very similar to meituan’s leaf-segment. The sequence generated by seqSVR is an ID below the UID (user ID), in other words — UID + sequence = global ID. The second point is that SEQSVR uses NRW to achieve strong consistency of data storage.

From a horizontal perspective, SEQSVR shards horizontally using Uids for load balancing. Analysis of a single set shows that AllocSvr can run entirely in memory without persistence. In the worst case, some oF the ids in AllocSvr go down before they are consumed, and then a new id is applied after the restart, so some of the ids are skipped. However, wechat’s sequence num has 64 bits and is below the UID, so skipping an ID is perfectly acceptable.

Therefore, the main pressure of SEQSVR is still in StoreSvr. Even after Set sharding, NRW storage mode still has to pay a price in throughput. Presumably after further experimentation, SEQSVR was further optimized — segment number shared storage. As shown in the figure below, max_seq is the current maximum sequence number. A group of users share a max_seq, again shelled using UID. As a result, the StoreSvr pressure in SEQSVR is reduced. For example, if everyone operates at an equal frequency, then StoreSvr’s write pressure drops to 1 / N before shard (N is the number of users in the group). StoreSvr also reduces the loading pressure of data to 1 / N.

After some analysis, it can be found that the design of wechat SEQSVR is still excellent. After all, there must be many masters behind the scenes to make such a successful application. In addition, SEQSVR has a number of Dr Migration designs that I will not write here.

References:

[1] Architecture design and evolution of wechat serial number generator

[2] Leaf — Meituan-Dianping distributed ID generation system

[3] Twitter snowflake

Highly concurrent distributed system unique ID generation

Twitter’s Snowflake algorithm

2. Leaf, meituan-Dianping’s ID generation service

3. WeChat seqsvr

Related Posts

At present the most complete Java service troubleshooting routine

There are so many clever ways to select database.

To use live Redis, Lua scripts are difficult to circumvent