preface

Mysql 解决 Mysql 解决 Mysql 解决 Mysql 解决 Mysql 解决 Mysql 解决 Mysql 解决 Mysql 解决

Distributed global unique ID problem

Can an incremented integer continue to be used?

In a single library, auto-increment integers are used as the primary key because of the advantages of small footprint, fast retrieval speed, low IO consumption, and avoiding page splitting. So can we continue to use it in a distributed database?

It is theoretically possible to continue using unsigned BigInt; the upper limit is large enough.

  • Set the step size: mysql itself supports setting the starting value auto_INCREment_offset and the step size auto_INCREment_increment. However, this method is not conducive to the continuation of the horizontal table, and the expansion needs to reset the step size.
  • Partition: for example, specify that a table can only hold 3kW data, suppose there are two horizontal sub-tables T1, T2. In this case, the range of the self-added id of t1 is [0,29999999], that of t2 is [3000000059999999], and so on.

If partitions are used, the data can be used and expanded without moving old data. T1 and T2 can be automatically inserted into corresponding sub-tables using load balancing to solve the performance bottleneck of centralized write. In addition, the corresponding sub-tables can be found according to the range during query to solve the performance bottleneck of centralized read. Remove write load balancing if a library has reached its limit.

conclusion

Yes, but to partition a table, you need to know the value range of the corresponding table ID, control the load balance that cannot be exceeded, and remove the write when the upper limit is reached. It is not complicated to implement.

SnowFlake- SnowFlake algorithm

It’s a total of 64 bits and can eventually be stored using Bigint

  • The highest bit is fixed to 0 and is a positive integer
  • The 41-bit timestamp represents the millisecond level
  • Ten bits represent the machine ID, and five bits represent the dataCenterId and workerId.
  • 12 is the serial number, starting at 0.

conclusion

Can see, in the case of concurrency is not a high number is also increasing, can achieve similar effect, and the id is simple to use, but for the sake of the data in the table is evener, need to write for load balancing, at the same time need to monitor a single table whether the amount of data to the ceiling, in turn, need to be level points again.

This looks similar to the implementation complexity and end result of the auto-increment integer ID.

Java introduces third-party library generation

<dependency> <groupId>cn. Hutool </groupId> <artifactId>hutool-all</artifactId> <version>5.7.20</version> </dependency>Copy the code

Note that dataCenterId and workerId need to be assigned.

System. The out. Println (new SnowflakeGenerator (0, 0). The next ());Copy the code

Customize specific rules for generating globally unique ids

The above two kinds of schemes can actually find an obvious shortcomings, is if conditional filtering needs to scan all the table, though you can use the multithreading and lines to check the final consolidated return data, but blind to each table query will no doubt bring performance (multithreaded query consumption, various depots query pressure) of the bottleneck, For this, specific rules to generate globally unique IDS can be customized according to business optimization (such as meituan order sub-database sub-table).

The generation rule of MEituan ID is: > timestamp + user ID + random number, which is based on the user ID to do the operation and finally modulus.

advantages

Because most scenarios are queried by business principal identity (such as the above by user identity), this optimization can satisfy many scenarios in the business.

disadvantages

However, it cannot meet all scenarios. For example, all sub-tables need to be scanned when orders are queried according to shopID. In addition, the generated ID length is slightly longer will have adverse effects on IO, retrieval, etc. As for page splitting, because of the timestamp, it will not occur when the concurrency is not large. Another problem is that the distribution of data can be particularly uneven at the extremes.

conclusion

From the above analysis, it can be seen that the effects of auto-increment integer and SnowFlake are similar, with advantages of small space, fast retrieval speed, low IO consumption and avoiding page splitting. However, since there is no optimization combined with business, the author believes that most scenarios should choose the third option.