Leaf is a distributed ID generation service launched by Meituan Basic RESEARCH and development platform. Its name is derived from the words of German philosopher and mathematician Leibniz: “There are no two identical leaves in the world.” Leaf has high reliability, low latency, global uniqueness and so on. At present, it has been widely used in meituan finance, Meituan takeout, Meituan hotel and other departments. Specific technical details. Recently, the Leaf project has been opened on Github: github.com/Meituan-Dia… Hope to exchange and build with more technical counterparts.

The Leaf features

Leaf was designed with the following requirements:

  1. Globally unique, the ID will never be duplicated, and the overall ID trend increases.
  2. Highly available, the service is completely based on distributed architecture, even if MySQL is down, it can tolerate a period of database unavailability.
  3. High concurrency and low latency. On CentOS 4C8G VMS, remote QPS can be called up to 5W+, and TP99 can be called within 1ms.
  4. Access is simple and can be directly accessed through the corporate RPC service or HTTP call.

The birth of the Leaf

The first version of Leaf uses the pre-distribution method to generate IDS, that is, N servers can be hung on the DB. When each Server is started, it will go to the DB to get the ID List of fixed length. This allows for a completely distributed architecture and is also efficient because ids are distributed in memory. Next comes the problem of data persistence. Leaf takes a List of ids of fixed length from DB every time, and persists the largest ID, that is, not every ID is persisted, but only the largest ID in a batch. This approach is similar to the periodic archiving feature in the game, except that the archiving is an ID that will be sent to the user at some point in the future, greatly reducing the stress of DB persistence.

The specific processing process of the entire service is as follows:

  • Leaf Server 1: load number segment [1, 1000] from DB.
  • Leaf Server 2: load number segment from DB [1001,2000].
  • Leaf Server 3: load number segment from DB [2001,3000].

The user invokes each Leaf Server service in round-robin mode, so the ID sequence obtained by a Client may be: 1,1001,2001,2,1002,2002… It could be: 1, 2, 100, 2001, 2002, 2003, 3, 4… When a Leaf Server segment is used up, the next request loads a new segment from the DB, ensuring that each segment is incrementally loaded.

The number segment table in Leaf database is as follows:

+-------------+--------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+-------------------+-----------------------------+ | biz_tag | varchar(128) | NO | PRI | | | | max_id | bigint(20) | NO | | 1 | | | step | int(11) | NO | | NULL | | | desc |  varchar(256) | YES | | NULL | | | update_time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | +-------------+--------------+------+-----+-------------------+-----------------------------+Copy the code

SQL statement for Leaf Server load number segment:

Begin
UPDATE table SET max_id=max_id+step WHERE biz_tag=xxx
SELECT tag, max_id, step FROM table WHERE biz_tag=xxx
Commit
Copy the code

On the whole, V1 version is relatively simple to implement, mainly in order to solve the problem of DB pressure on the business layer as soon as possible, and a version of rapid iteration. Therefore, in the production environment, also found some problems. Such as:

  1. Time spikes occur when updating the DB. The maximum system time depends on the time of updating the DB number segment.
  2. When the DB number segment is updated, if the DB is down or a primary/secondary switchover occurs, the service may be unavailable for a period of time.

Leaf dual Buffer optimization

In order to solve these two problems, Leaf adopts the strategy of asynchronous update and double Buffer to ensure that no matter when DB has problems, a Buffer number segment can normally provide external services. As long as DB recovers within a period of Buffer delivery, the availability of the whole Leaf will not be affected.

This version of the code ran smoothly online for about half a year before Leaf ran into a new problem:

  1. The number length is always fixed, if Leaf could have maintained normal operation for 10 minutes in the case of DB unavailable, then if the flow increased by 10 times, it could only maintain normal operation for 1 minute.
  2. If the number segment length is too long, the number segment in the cache is not used up. As a result, the span between the new number segment updated in DB and the number segment ID delivered last time is too large.

Leaf dynamically adjusts Step

Assuming that the service QPS is Q, the number segment length is L, and the number segment update period is T, then Q * T = L. The length of L is fixed at first, so as Q grows, T gets smaller and smaller. But Leaf essentially wants T to be fixed. So if L can be positively correlated with Q, then T can approach a fixed value. Therefore, when Leaf updates the number segment each time, the next number segment length nextStep is determined according to the period T of the last number segment update and the number segment length step:

  • T < 15min, nextStep = step * 2
  • 15min < T < 30min, nextStep = step
  • T > 30min, nextStep = step / 2

So far, the demand of number segment consumption tending to a certain time interval is satisfied. Of course, in the face of instantaneous flow of dozens or hundreds of times of sudden increase, this scheme still can not meet the tolerance of database in a period of time is not available, the system can still run stably. Because in essence, although Leaf does some fault-tolerant schemes in the DB layer, the ID delivery in the number segment mode still needs to rely strongly on DB.

High availability of the MySQL

At the MySQL layer, Leaf currently adopts a semi-synchronous way to synchronize data, and switches master/slave through the company’s DB middleware Zebra and MHA. Switching to MySQL Group Replication will be considered in the future for full strong consistency.

At present, as the strong consistency feature of the company’s database is still evolving, Leaf adopts a temporary solution to ensure data consistency in the case of machine room disconnection:

  • The database is deployed in multiple rooms. Each room has one instance to ensure data synchronization across rooms.
  • The semi-synchronization timeout period is set to infinite to prevent the semi-synchronization mode from degenerate into asynchronous replication.

The Leaf monitoring

For the monitoring of the service itself, Leaf provides a memory data mapping interface on the Web layer to see the delivery status of all number segments in real time. For example, the usage of the double buffer for each number segment and the location to which the current ID is sent can be viewed on the Web interface.

Leaf Snowflake

Snowflake, Twitter’s open-source distributed ID generation algorithm. Based on the 64-bit implementation, the following figure shows the ID composition of the Snowflake algorithm.

  • The first position is 0.
  • Bits 2-42 are relative timestamps, which are generated by subtracting the current timestamp from a fixed historical timestamp.
  • Bits 43 to 52 are the machine ID workerID. The machine ID is different for each Server.
  • Bits 53 through 64 are auto-increment ids.

In this way, completely distributed ID delivery is realized by the combination of time, machine number and self-increasing ID.

In this case, Leaf provides a Java version of the implementation, and makes a weak dependency on Zookeeper to generate machine numbers, so if Zookeeper has problems, the service will not be affected. Leaf caches a workerID file on the native file system after it first retrieves the workerID from Zookeeper. Even if ZooKeeper fails and the machine restarts at the same time, the service can run normally. This reduces the dependency on third-party components and increases SLA to some extent.

The future planning

  • Segment loading optimization: At present, the first request of Leaf after restart will still load MySQL synchronously. The reason why this is done instead of service initialization is that Leaf keys in MySQL are not necessarily loaded by the Leaf service node. If each Leaf node is initializing all Leaf keys, a lot of number segments will be wasted. Therefore, in the future, when the Leaf service is Shutdown, the Leaf Key List used by the service node in the past day will be backed up, so that the number segment in the Key List will be loaded from MySQL in advance after the restart.
  • Monotonic increment: In a simple way, the increment can be guaranteed as long as the SAME Leaf Key obtains ID from a Leaf service node at the same time. Note that the number segment used by the old Leaf service needs to be discarded when the Leaf service node is switched. The routing logic can be implemented by using the master/slave model or routing table configuration for each Leaf Key.

About open source

There are many schemes for distributed ID generation, and the open source version of Leaf provides two ways of ID generation:

  • Number segment mode: low trend growth, less ID number segment waste, can tolerate short periods of MySQL unavailable.
  • Snowflake mode: Fully distributed, ids have semantics.

You can select an ID delivery mode suitable for your own service scenarios. We hope that the plan of Meituan can give you some help, and also hope that you can exchange and build together.

Leaf project Github address: github.com/Meituan-Dia… .

If you have any questions, please leave a message