First of all, consistency hashing is a modification of classical hashing

Classical hashing uses a hash function to generate a pseudorandom number, which is then divided by the size of the memory space to convert a random identifier into a location in the available space

location = hash(key)mod size

In classical hashing, we always assume that the number of memory locations is known and will never change

However, there is a problem with this hash processing model. After the size changes, the location corresponding to many keys will change, which is a disaster for mass data storage

How to minimize or reduce the location change after the size change?

So consistency hashing comes in

What is a consistent hash

Denotes a virtual ring structure (called a hash ring, or HashRing) in which both nodes (hardware) and requests (data) are hashed through a hash algorithm and then modulo 0 to 2 to the 32nd power -1 to determine the position of the hash ring

Let’s make a contrast here with classical hashing

classicHash     = hash(key)mod size
ConsistentHash  = hash(key)mod 2^32
Copy the code

Change size to a constant 2^32 (this constant is large enough)

Change 2. The position determined by the classical result becomes a clockwise range. The change in the size of the classical hash table actually interferes with all mappings, but the consistent hash table does not, which is the reason for the consistency

For example, add node E between BC, if the location of E is between KEY3 and 4, then you only need to change the ownership of KEY3

For example, to delete node D, you only need to change the ownership of KEY5 to A

Characteristics of consistent hashing

1. Balance: Balance means that the result of hashing can be distributed among all buffers as far as possible, so that all buffer space can be utilized. Many hash algorithms can satisfy this condition.

As I understand it, all requests can go into the hash ring

2. Monotone: Monotone means that if some data is hashed to the corresponding machine, a new machine adds to the system. The result of hashing should ensure that the previously allocated content can be mapped to the original or new machine, and not to other machines in the old machine set. The original data either stays on the same machine or is migrated to the new machine rather than the old machine.

As I understand, when node E is added between BCS, the data between BCS will either be retained to belong to C or changed to the newly added node E, which cannot be assigned to other original nodes (such as ABD).

Consistent hash skew solution: virtual nodes

In the consistent hash algorithm, the situation in the figure above may occur, that is, the data is not evenly distributed and the hash skew occurs. How to solve this problem?

By adding virtual nodes (A, B, C, D4 in the figure above) and changing virtual nodes (A1, A2, B1, B2, C1, C2, D1, D2), they are not real physical nodes

For example, if you belong to B2 or B1, you belong to B, which will reduce the severity of the occurrence of hash tilt, but it still cannot be completely avoided. The following figure shows a better situation of hash tilt solved by adding virtual nodes in the figure above

The application of consistent hashing

The Sharding method of Redis client jedis is to fragment the data in the client side by means of consistent hash

Load balancing algorithm for RPC requests

All distributed storage shards

Summary: Request -> Hardware allocation Storage -> Hardware allocation

It is usually in the sense that the allocatee should support that the change has little impact on the whole world (if the change of the allocatee affects the whole world and causes a lot of work).

Apply consistent hashing

Guava encapsulates the operation of consistent hashing

After passing in the hash value, calculate which bucket the key belongs to. Add or subtract buckets and then calculate which bucket the key belongs to

Public void testConsistentHash(){List<String> Servers = Lists. NewArrayList ("server1", "server2", "server3", "server4", "server5" ,"server11", "server21", "server31", "server41", "server51" ,"server31", "server32", "server33", "server34", "server35" ,"server51", "server52", "server53", "server54", "server55"); List<String> userIds = Arrays.asList("zxp123549440","zxp","2222","asdhaksdhsakjdsa","! @ # $% ^ "); System. The out. Println (" = = = = = = = = = = = = = there are "+ servers. The size () +" server "); userIds.forEach(userId -> { int bucket = Hashing.consistentHash(Hashing.murmur3_128().hashString(userId, Charset.forName("utf8")), servers.size()); System.out.println("userId:"+userId+" bucket:"+bucket); }); // Remove server Servers.remove (17); servers.remove(11); servers.remove(10); System. Out. Println (" = = = = = = = = = = = = = offline 3 and after "+ servers. The size () +" server "); userIds.forEach(userId -> { int bucket = Hashing.consistentHash(Hashing.murmur3_128().hashString(userId, Charset.forName("utf8")), servers.size()); System.out.println("userId:"+userId+" bucket:"+bucket); }); } @test public void testClassicHash(){List<String> Servers = Lists. NewArrayList ("server1", "server2", "server3", "server4", "server5" ,"server11", "server21", "server31", "server41", "server51" ,"server31", "server32", "server33", "server34", "server35" ,"server51", "server52", "server53", "server54", "server55"); List<String> userIds = Arrays.asList("zxp123549440","zxp","2222","asdhaksdhsakjdsa","! @ # $% ^ "); System. The out. Println (" = = = = = = = = = = = = = there are "+ servers. The size () +" server "); userIds.forEach(userId -> { long bucket = Math.abs(Hashing.murmur3_128().hashString(userId, Charset.forName("utf8")).padToLong()) % servers.size(); System.out.println("userId:"+userId+" bucket:"+bucket); }); // Remove server Servers.remove (17); servers.remove(11); servers.remove(10); System. Out. Println (" = = = = = = = = = = = = = offline 3 and after "+ servers. The size () +" server "); userIds.forEach(userId -> { long bucket = Math.abs(Hashing.murmur3_128().hashString(userId, Charset.forName("utf8")).padToLong()) % servers.size(); System.out.println("userId:"+userId+" bucket:"+bucket); }); }Copy the code
ConsistentHash Output five ids, ============= There are 20 servers userId:zxp123549440 bucket:3 userId: ZXP bucket:3 userId:2222 bucket:10 userId:asdhaksdhsakjdsa bucket:18 userId:! @#$%^ bucket:0 ============= there are 17 servers after 3 offline servers userId:zxp123549440 bucket:3 userId: ZXP bucket:3 userId:2222 bucket:10 userId:asdhaksdhsakjdsa bucket:0 userId:! @#$%^ bucket:0 ClassicHash Full change of bucket location ============= There are 20 servers userId:zxp123549440 bucket:17 userId: ZXP bucket:9 userId:2222 bucket:2 userId:asdhaksdhsakjdsa bucket:14 userId:! @#$%^ bucket:18 ============= There are 17 servers after 3 offline servers userId:zxp123549440 bucket:2 userId: ZXP bucket:4 userId:2222 bucket:10 userId:asdhaksdhsakjdsa bucket:9 userId:! @#$%^ bucket:1Copy the code

If the scene switches to Jedis sharding, contrast sharding through consistent hash, classic hash sharding:

After receiving the set operation, Sharding got the key, calculated the hash value through the Murmur algorithm, and obtained the number of available “cluster” nodes from Redis. Then Hashing. ConsistentHash could calculate which bucket the key should be placed in. So what is the advantage of Sharding using consistent hash? If a Redis instance is dropped, most of the keys may still be found in the original bucket, but if we use the classic hash, most of the keys have changed the position of the Redis instance, or not as good as the consistent hash result