Send you the following Java learning materials, at the end of the article there is a way to receive







1. Why use distributed ID?

Before we talk about the implementation of distributed ID, let’s briefly analyze why we use distributed ID. What characteristics should a distributed ID satisfy?

1. What is distributed ID?

Take the MySQL database as an example:

When the amount of our business data is not large, a single library and single table can fully support the existing business, and a MySQL master and slave synchronous read and write separation can also be dealt with when the data is larger.

However, with the increasing data, the synchronization between master and slave can not be carried out, so it is necessary to divide database into tables. However, after the database is divided into tables, a unique ID is needed to identify a piece of data, and the self-increasing ID of the database obviously cannot meet the requirements. Special points such as orders, coupons also need to have a unique ID to do identification. At this point, a system that can generate globally unique IDs is essential. Then this globally unique ID is called a distributed ID.

2. What conditions should distributed ID meet?
  • Globally unique: You must ensure that IDs are globally unique, a basic requirement
  • High performance: high availability, low latency, ID generation response to block, otherwise it will become a business bottleneck
  • High Availability: 100% availability is deceptive, but it’s infinitely close to 100% availability
  • Good access: The design principle should be used out of use and the design and implementation of the system should be as simple as possible
  • Trend increasing: the best trend increasing, this requirement must see specific business scene, generally not strict requirements

II. What are the generation methods of distributed ID?

Today, we will analyze the following 9 ways of distributed ID generator and their advantages and disadvantages:

  • UUID
  • The database auto-increment ID
  • Database multi-master schema
  • Them roughly mode
  • Redis
  • Snowflake algorithm
  • Didi Produced (Tinyid)
  • Baidu (UIDGenerator)
  • Meituan (Leaf)

So how do they all work? And what are their strengths and weaknesses? Let’s go down

Pictures from the Internet

The above pictures from the network, if there is infringement contact delete

1. Based on UUID

In the Java world, the first thing that comes to mind when you want to get a unique ID is the UUID, which is, after all, unique in the world. So can a UUID be a distributed ID? The answer is yes, but not recommended!



public static void main(String\[\] args) {   
       String uuid = UUID.randomUUID().toString().replaceAll("-","");  
       System.out.println(uuid);  
 }  


UUID generated simple to only one line of code, the output c2b8c2b9e46c47e3b30dca3b0d447718, but UUID does not apply to the actual needs of the business. Strings such as the UUID used for the order number are meaningless and contain no useful information about the order; However, as a business primary key ID for a database, it is not only too long or a string, but also time-consuming to store poor performance queries, so it is not recommended to use as a distributed ID.

Advantages:

  • The generation is simple enough, the local generation has no network consumption, and has the uniqueness

Disadvantages:

  • An unordered string that does not have the trend increment property
  • No specific business implications
  • As long as the length of a 16-byte, 128-bit, 36-bit string, storage and query will cost MySQL a lot of performance, MySQL officials clearly recommend that the primary key should be as short as possible, as the database primary keyUUIDThe disorder of, can cause frequent changes in data location, which can seriously affect performance.
2, based on the database to increase the ID

MySQL > create an auto_increment table that can be used as a distributed ID. MySQL > create an auto_increment table that can be used as a distributed ID.



CREATE DATABASE \`SEQ\_ID\`;  
CREATE TABLE SEQID.SEQUENCE\_ID (  
    id bigint(20) unsigned NOT NULL auto\_increment,   
    value char(10) NOT NULL default '',  
    PRIMARY KEY (id),  
) ENGINE=MyISAM;  



``````


insert into SEQUENCE\_ID(value)  VALUES ('values');  


When we need an ID, insert a record into the table to return the primary key ID, but this approach has a relatively fatal disadvantage, MySQL itself is the bottleneck of the system when the traffic surge, using it to implement distributed services is relatively risky, not recommended!

Advantages:

  • Simple implementation, ID monotonous self – increase, numerical type query speed is fast

Disadvantages:

  • DB single point has the risk of downtime and cannot withstand high concurrency scenarios
3. Based on the database cluster mode

As mentioned above, the single point database approach is not desirable, so do some high availability optimization on the above approach, switch to the master-slave mode cluster. For fear that one of the primary nodes will die and become unusable, do a dual-primary cluster, where both MySQL instances can independently produce auto-increment IDs.

If you have two instances of MySQL that start at 1, you will generate duplicate IDs. If you have two instances that start at 1, you will generate duplicate IDs.

Solution: Set the starting value and increase the step size

MySQL \ _1 configuration:

set @@auto\_increment\_offset = 1; SET @@AUTO \ _INCREMENT \ _INCREMENT = 2; - step

MySQL \ _2 configuration:

set @@auto\_increment\_offset = 2; SET @@AUTO \ _INCREMENT \ _INCREMENT = 2; - step

MySQL > select * from mysql. MySQL > select * from mysql. MySQL >

One, three, five, seven, nine


Two, four, six, eight, ten

What if the performance of the cluster is not high and concurrency? MySQL will be expanded to add nodes, which is a more troublesome matter.

Insert the picture description here

As can be seen from the figure above, the horizontal scaling of the database cluster is beneficial to solve the problem of single point of pressure on the database. At the same time, for the ID generation feature, the self-increasing step size is set according to the number of machines.

MySQL > add a third MySQL instance and set the starting position of the third MySQL instance at a distance from the current maximum increment of ID, but must not grow to the starting value of the third MySQL instance. Otherwise, the autoincrement ID will be repeated, and if necessary, it may also need to stop for modification.

Advantages:

  • Solve the DB single point problem

Disadvantages:

  • It is not conducive to subsequent capacity expansion, and in fact, the pressure of a single database itself is still large, which still cannot meet the high concurrency scenario.
4. Database based segment pattern

Segment pattern is one of the mainstream implementation methods of distributed ID generator at present. Segment pattern can be understood as the batch acquisition of autoincrement ID from the database. Each time, a segment range is fetched from the database, such as (1,1000) represents 1000 IDs. The table structure is as follows:

CREATE TABLE id\_generator (id int(10) NOT NULL, Max \_id bigint(20) NOT NULL Step int(20) NOT NULL COMMENT 'biz\_type ', biz\_type int(20) NOT NULL COMMENT ', Version int(20) NOT NULL COMMENT 'version number ', PRIMARY KEY (\' id\ '))

BIZ \ _TYPE: Represents different business types

Max \_id: The largest currently available ID

Step: Represents the length of the paragraph

Version: is an optimistic lock that updates version each time to ensure the correctness of data during concurrency

id

biz\_type

max\_id

step

version

1

101

1000

2000

0

SQL > update max_id= max_id + step; SQL > update max_id= max_id + step; SQL > update max_id= max_id + step The new segment range is (max_id,max_id +step).



update id\_generator set max\_id = #{max\_id+step}, version = version + 1 where version = # {version} and biz\_type = XXX  


Since multiple business terminals may operate at the same time, optimistic version locking is adopted to update. This distributed ID generation method is not strongly dependent on the database and will not frequently access the database, which puts much less pressure on the database.

5. Based on the Redis model

Redis can also achieve the same principle is to use Redis incr command to achieve atomic ID increment.

127.0.1:6379 \> set seq\_id 1 // add 1 OK 127.0.0.1:6379\> incr seq\_id // add 1 and return an integer 2

One caveat to implementing Redis is that Redis persistence is taken into account. Redis has two types of persistence: RDB and AOF

  • RDBWill periodically take a snapshot for persistence, if the continuous incrementredisRedis failed to be persisted in a timely way, and when Redis is restarted, the ID duplicate will occur.
  • AOFEach write command is persisted, even ifRedisIf you hang up, you won’t have a duplicate ID, but because of the special nature of the incr command, it willRedisThe data recovery time is too long.
6. Based on the Snowflake pattern

Snowflake algorithm is an ID generating algorithm adopted by Twitter’s internal distributed project. After being open source, it has been widely praised by domestic major companies. Under the influence of this algorithm, each major company has developed a distributed generator with its own characteristics.

Insert the picture description here

The above pictures from the network, if there is infringement contact delete

Snowflake generates an ID of type Long. One Long has 8 bytes and each byte has 8 bits. That means one Long has 64 bits.

Snowflake ID consists of a Long type consisting of a positive number (1 bit) + a timestamp (41 bits) + a machine ID (5 bits) + a data center (5 bits) + a self-increment (12 bits).

  • The first bit (1bit) : The highest bit of a long in Java is the sign bit for positive or negative numbers. Positive numbers are 0 and negative numbers are 1. The default is 0.
  • Time stamp part (41bit) : milliseconds time, it is not recommended to save the current timestamp, but (current timestamp – fixed start timestamp) difference value, can make the generated ID from a smaller value; The 41-bit timestamp can be used for 69 years, (1L << 41)/(1000L * 60 * 60 * 24 * 365) = 69 years
  • Work machine ID (10bit) : also known asworkId, this can be configured flexibly, the machine room or machine number combination can be.
  • Serial number part (12bit), self increment support in the same millisecond the same node can generate 4096 ID

According to the logic of the algorithm, this algorithm only needs to be implemented in the Java language, encapsulation method as a tool, so the business application can directly use the tool method to get distributed ids, only need to ensure that each business application has its own working machine ID can, without the need for a separate to set up an access to the application of distributed ids.

Java version of theSnowflakeAlgorithm implementation:

'/**' '* Twitter's SnowFlake algorithm, using the SnowFlake algorithm to generate an integer, And then converted into 62 base into a short URL address ` ` * ` ` * https://github.com/beyondfengyu/SnowFlake ` ` * / ` ` public class SnowFlakeShortUrl {` ` / * * ` ` * */ ' 'private final static long START_TIMESTAMP = 1480166465631L; */ ' 'private final static long Sequence_bit = 12; Private final static long bit = 5; private final static long bit = 5; Private final static long DATA_CENTER_BIT = 5; private final static long DATA_CENTER_BIT = 5; / ' 'private final static long MAX_SEQUENCE = -1l ^ (-1l << SEQUENCE_BIT); /' 'private final static long MAX_SEQUENCE = -1l ^ (-1l << SEQUENCE_BIT); ` `private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT); ` `private final static long MAX_DATA_CENTER_NUM = -1L ^ (-1L << DATA_CENTER_BIT); */ ' 'private final static long MACHINE_LEFT = Sequence_bit; ` `private final static long DATA_CENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT; ` `private final static long TIMESTAMP_LEFT = DATA_CENTER_LEFT + DATA_CENTER_BIT; ` `private long dataCenterId; // private Long MachineID; // private long sequence = 0L; // 'private long lastTimeStamp = -1L; Private Long GetNextMill () {' 'Long Mill = GetNewTimeStamp (); ` `while (mill <= lastTimeStamp) {` `mill = getNewTimeStamp(); ` `}` `return mill; ` `}` `private long getNewTimeStamp() {` `return System.currentTimeMillis(); ' '} ' ' '/**' '** *' * generates the specified serial number ' ' '*' '*' '* @param dataCenterId' ' '* @param machineId' */ 'public SnowFlakeShortUrl(long dataCenterId, long machineId) {` `if (dataCenterId > MAX_DATA_CENTER_NUM || dataCenterId < 0) {` `throw new IllegalArgumentException(" DtacEnterid can't be greater than MAX_DATA_CENTER_NUM or less than 0!" ); ` `}` `if (machineId > MAX_MACHINE_NUM || machineId < 0) {` `throw new IllegalArgumentException("MachineId can't be Greater than MAX_MACHINE_NUM or less than 0!" ); ` `}` `this.dataCenterId = dataCenterId; ` `this.machineId = machineId; ` `} ` ` / * * ` ` * produced under an ID ` ` * ` ` * @ return ` ` * / ` ` public synchronized long nextId () {` ` long currTimeStamp = getNewTimeStamp(); ` `if (currTimeStamp < lastTimeStamp) {` `throw new RuntimeException("Clock moved backwards. Refusing to generate id"); ' '} ' 'if (currTimeStamp == lastTimeStamp) {'' '= sequence = (sequence + 1) &max_sequence; ' 'if (sequence == 0L) {'' currTimeStamp = getNextMill(); ' ' '} ' ' '} else {' ' ' ' '= 0; ` `}` `lastTimeStamp = currTimeStamp; ` ` return (currTimeStamp - START_TIMESTAMP) < < TIMESTAMP_LEFT / / part timestamp ` ` | dataCenterId < < DATA_CENTER_LEFT / / data center part ` ` | MachineId < < MACHINE_LEFT / / machine identifier portion ` ` | sequence; } ' 'public static void main(String[] args) {'' SnowFlakeShortUrl snowFlake = new SnowFlakeShortUrl(2, 3); ` `for (int i = 0; i < (1 << 4); I++) {` ` / / decimal ` ` System. Out. The println (snowFlake. NextId ()); ` `} ` `} ` ` `}
7, Baidu (uid-generator)

Uid – the generator is developed by baidu technology, project making address https://github.com/baidu/uid-…

UID-Generator is implemented based on the Snowflake algorithm. Unlike the original Snowflake algorithm, the UID-Generator supports customizing the number of bits of each part, such as timestamp, worker ID, and sequence number. In addition, the UID-Generator adopts a user-defined WorkID generation strategy.

UID-generator needs to be used in conjunction with the database, and a new WORKER_NODE table needs to be created. When the application is started, it will insert a data item into the database table. After successful insertion, the returned autoincrement ID is the workId of the machine, which consists of host and port.

For uid-generator ID composition structure:

WorkID, which takes up 22 bits, time takes up 28 bits, and serialization takes up 13 bits. Note that, unlike the original snowflake, time is measured in seconds, not milliseconds, and the WorkID is different, and the same application consumes a WorkID every time it is restarted.

reference


https://github.com/baidu/uid-…\_cn.md

8. Meituan (Leaf)

The Leaf, which is developed by Meituan making address: https://github.com/Meituan-Di…

Leaf supports both segment mode and snowflake algorithm mode, which can be switched between.

Them roughly mode

The guide into the source https://github.com/Meituan-Di… And create a table leaf_alloc

DROP TABLE IF EXISTS \`leaf\_alloc\`; CREATE TABLE \ 'leaf\_alloc\' (\ 'biz\_tag\' varchar(128) NOT NULL DEFAULT '' COMMENT ', \ 'Max \_id\' bigint(20) NOT NULL DEFAULT '1' COMMENT ', \ 'step\' int(11) NOT NULL COMMENT ', Also dynamically adjust the minimum step size ', \ 'description\' varchar(256) DEFAULT NULL COMMENT ', \ 'update\_time\' timestamp NOT NULL DEFAULT CURRENT\_TIMESTAMP ON update CURRENT\_TIMESTAMP COMMENT ', PRIMARY KEY (\`biz\_tag\`) ) ENGINE=InnoDB;

Then turn on the segment mode in the project, configure the corresponding database information, and turn off the snowflake mode



leaf.name=com.sankuai.leaf.opensource.test  
leaf.segment.enable=true  
leaf.jdbc.url=jdbc:mysql://localhost:3306/leaf\_test?useUnicode=true&characterEncoding=utf8&characterSetResults=utf8  
leaf.jdbc.username=root  
leaf.jdbc.password=root  
  
leaf.snowflake.enable=false  
#leaf.snowflake.zk.address=  
#leaf.snowflake.port=  


Start the LeafServerApplication project of the Leaf-Server module and it will run

Them roughly model for distributed on the ID test url: http://localhost:8080/api/segment/get/leaf-segment-test

Monitor them roughly mode: http://localhost:8080/cache

Snowflake mode

Leaf’s snowflake pattern relies on ZooKeeper, which is different from the original snowflake algorithm and mainly generates workID. In Leaf, workID is generated based on the order ID of ZooKeeper. When each application uses Leaf-snowflake, ZooKeeper generates a sequential ID at startup, which is equivalent to a sequential node for each machine, namely a WorkID.

Leaf. Snowflake. Enable = true leaf. The snowflake. Zk. Address = 127.0.0.1 leaf. The snowflake. Port = 2181

Snowflake model for distributed on the ID test url: http://localhost:8080/api/snowflake/get/test

9. Didi (Tinyid)

Tinyid developed by drops, making the address: https://github.com/didi/tinyid.

TinyID is based on the same principle as Leaf, each service gets a segment (1000,2000), (2000,3000), (3000,4000)

Insert the picture description here

TinyID provides both HTTP and TinyID-Client access

HTTP Access

(1) import Tinyid source code:

git clone https://github.com/didi/tinyi…

(2) create data table;

CREATE TABLE \ 'tiny\_id\_info\' (\ 'id\' bigint(20) unsigned NOT NULL AUTO\_INCREMENT COMMENT ', \ 'biz\_type\' varchar(63) NOT NULL DEFAULT '' COMMENT ', Unique ', \ 'begin\_id\' bigint(20) NOT NULL DEFAULT '0' COMMENT 'start id \ 'Max \_id ', \' Max \_id\ 'bigint(20) NOT NULL DEFAULT '0' COMMENT ', \ 'step\' int(11) DEFAULT '0' COMMENT 'step ', \' delta\ 'int(11) NOT NULL DEFAULT '1' COMMENT ', \ 'remainder\' int(11) NOT NULL DEFAULT '0' COMMENT 'remainder ', \ 'create\_time\' timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'create time ', \ 'update\_time\' timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'update' _time\ 'timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT' \ 'version\' bigint(20) NOT NULL DEFAULT '0' COMMENT ', PRIMARY KEY (\ 'id\'), Unique KEY \ 'uniq\_biz\_type\' (\ 'biz\_type\')) ENGINE=InnoDB AUTO\_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT 'id '; CREATE TABLE \ 'tiny\_id\_token\' (\ 'id\' int(11) unsigned NOT NULL AUTO\_INCREMENT COMMENT ', \`token\` varchar(255) NOT NULL DEFAULT '' COMMENT 'token', \ 'biz\_type\' varchar(63) NOT NULL DEFAULT '' COMMENT 'identifier of business type accessed by this token ', \ 'remark\' varchar(255) NOT NULL DEFAULT 'COMMENT ', \ 'create\_time\' timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'create time ', \ 'update\_time\' timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'update' _time\ 'timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT' PRIMARY KEY (\ 'id\')) ENGINE=InnoDB AUTO\_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT 'token information table '; INSERT INTO \`tiny\_id\_info\` (\`id\`, \`biz\_type\`, \`begin\_id\`, \`max\_id\`, \`step\`, \`delta\`, \`remainder\`, \`create\_time\`, \`update\_time\`, \`version\`) VALUES (1, 'test', 1, 1, 100000, 1, 0, '2018-07-21 23:52:58', 'the 2018-07-22 23:19:27', 1); INSERT INTO \`tiny\_id\_info\` (\`id\`, \`biz\_type\`, \`begin\_id\`, \`max\_id\`, \`step\`, \`delta\`, \`remainder\`, \`create\_time\`, \`update\_time\`, \`version\`) VALUES (2, 'test\_odd', 1, 1, 100000, 2, 1, '2018-07-21 23:52:58', 'the 2018-07-23 00:39:24, 3); INSERT INTO \`tiny\_id\_token\` (\`id\`, \`token\`, \`biz\_type\`, \`remark\`, \`create\_time\`, \`update\_time\`) VALUES (1, '0f673adf80504e2eaa552f5d791b644c', 'test', '1', '2017-12-14 16:36:46', 'the 16:36:48 2017-12-14'); INSERT INTO \`tiny\_id\_token\` (\`id\`, \`token\`, \`biz\_type\`, \`remark\`, \`create\_time\`, \`update\_time\`) VALUES (2, '0f673adf80504e2eaa552f5d791b644c', 'test\_odd', '1', '2017-12-14 16:36:46', 'the 16:36:48 2017-12-14');

(3) Configure database:

datasource.tinyid.names=primary datasource.tinyid.primary.driver-class\-name\=com.mysql.jdbc.Driver datasource.tinyid.primary.url=jdbc:mysql://ip:port/databaseName?autoReconnect=true&useUnicode=true&characterEncoding=UTF -8 datasource.tinyid.primary.username=root datasource.tinyid.primary.password=123456

(4) Test after starting tinyid-server

Access to distributed on the ID: http://localhost:9999/tinyid/id/nextIdSimple? BizType = = 0 f673adf80504e2eaa552f5d791b644c 'test&token return results: 3 batch get distributed on the ID: http://localhost:9999/tinyid/id/nextIdSimple? BizType = test&token = 0 f673adf80504e2eaa552f5d791b644c & batchSize = 10 ',5,6,7,8,9,10,11,12,13 return results: 4
Java client access

Repeat the HTTP (2) (3) operations

Introduction of depend on



       <dependency>  
            <groupId\>com.xiaoju.uemc.tinyid</groupId\>  
            <artifactId>tinyid-client</artifactId>  
            <version>${tinyid.version}</version>  
        </dependency\>  


The configuration file



tinyid.server =localhost:9999  
tinyid.token =0f673adf80504e2eaa552f5d791b644c  


Test, tinyid.token are pre-inserted data in the database table, test is the specific business type, tinyid.token represents the accessible business type

Long ID = tinyid. nextId(" test "); List< Long > ids = tinyid. nextId(" test ", 10);

conclusion

This article is just a brief introduction to each of the distributed ID generators, aiming to give you a detailed study direction. Each generation method has its own advantages and disadvantages, and how to use it depends on the specific business requirements.