When asked about the consistency Hash algorithm, tell the interviewer!

Public number back to “learning”, free access to high-quality learning materials

Scan the poster below to listen

This article is contributed by uncle who knows how to code

➤ Let’s look at an example

We often use Redis as a cache to put some data on it to reduce the pressure of the data.

When the amount of data is small and the access pressure is not large, usually a Redis can be done, in order to high availability, get a master slave is enough;

As the amount of data increases and the amount of concurrency increases, it becomes difficult to put all the cached data on a single machine, since the resources of a single machine are limited

Usually, we will set up a cluster environment so that data can be evenly distributed in each Redis as far as possible. For example, there are 4 Redis in our cluster.

So how to put the data into the four Redis as evenly as possible? The simplest is the modulo algorithm:

Hash (key) % N, where N is the number of Redis, where N = 4;

This looks pretty good, because by doing this, we can store the data on an average of four Redis, and when a new request comes in, we can locate which Redis the data will be on, so that we can query the cached data precisely.

But 4 Redis is not enough, need to add 4 more Redis; Then the algorithm becomes hash(key) % 8;

As you can imagine, most of the current cache would be in the wrong location, causing a cache avalanche in extreme cases.

A consistent Hash algorithm can solve this problem well, and its general process looks like this:

Take 0 as the starting point, 2^32-1 as the ending point, draw a line, overlap the starting point and the ending point, the line becomes a circle, clockwise from small to large. The first point to the right of 0 is 1, then 2, and so on.

Hash the IP addresses or other keywords of the three servers and modulo 2^32, so that it must fall somewhere on the circle, denoted as Node1, Node2, Node3.

Then do the same for the data key, which will inevitably fall somewhere on the circle; Then, walking clockwise, you can find a Node, which is the server where the key is stored.

If a server is added or deleted, only part of the data is affected.

However, if there are too few nodes or the nodes are not evenly distributed, data skew is easily caused. That is, most data is concentrated on a certain server.

To solve the data skew problem, the consistent Hash algorithm proposes virtual nodes, which compute multiple hashes for each service node and place them at different locations on the circle.

Of course, we can also find that the consistent Hash algorithm only solves most of the data problems.

END

If you find something, please click on the bottom and click “Watching”, thank you!

Welcome to long click on the picture below to pay attention to the architecture notes of public number Ishiji

BAT architecture experience

When asked about the consistency Hash algorithm, tell the interviewer!

Related Posts

Potholes encountered by publishing artifacts to Maven’s central repository

Node.js child processes: Everything you need to know

Redis transactions