This paper will explore the evolution of consistent Hash algorithms

Common Hash algorithms in clusters – Past lives

Here’s the simplest way to look at routing access in a cluster

First of all, we have three clients here, and we have three servers. The client first accesses a load balancing server. The load server does not directly process services, but forwards requests to one of N internal machines through certain algorithms. This assumes that a simple calculation is used: Machine number = Hash (IP) % Number of nodes The number of nodes is set to 3

Simple Hash algorithms are used in clusters

The access relationship between the requesting machine and the corresponding server is

The client The server
The client 1 Tomcat 2
Client 2 Tomcat 1
The client 3 Tomcat 3

The capacity of a cluster is expanded or scaled down, and the system breaks down

Everything was fine until a machine popped. This assumes that Tomcat3 has diedAnd when that happens, the load balancing server, when it detects that Tomcat3 is down, adjusts its algorithm,Machine ID = Hash (IP) % Number of nodesThe number of nodes is set to 2. This allows the load balancing function to continue, but the mapping between the client and server has changed considerably. The new mapping already looks like this.

The client The server
The client 1 Tomcat 1
Client 2 Tomcat 2
The client 3 Tomcat 1

Hash algorithm problems in the cluster

In the real situation, there will be many servers, so the impact will be great, in the case of capacity reduction and expansion, there will be the same problem. The original user’s session in the server is lost.

Consistent Hash algorithms – this world

A problem that needs to be solved at present is: how to minimize the number of users affected by machine expansion, machine capacity reduction and machine downtime

What is a consistent Hash algorithm?

  • Hash ring

First, there is a line, which starts and ends with 1 and 2 to the 32 minus 1 respectively, which is equivalent to an address. For such a line, it bends to form a loop to form a closed loop. Such a loop is called a hash ring.

  • Using hash ring

We hash the IP or host name of the server to the hash ring, and then for the client user, we hash the IP to some bit on the ring

  • Determine the client routing

How do you determine which server to route a client to? Locate the nearest server node in the clockwise directionFor example, when the user’s Ip address is hashed and the value is between 0 and node 1, the user searches for the nearest node clockwise, which is node 1

Consistent Hash algorithm for machine expansion

At this time, we added a server. After the IP Hash, it fell between 2 and 3 nodes and became node 5. In this case, only users whose Ip Hash value is between 3-5 and 5-2 will be affected. All other users will not be affected at all.

Machine sizing for consistent Hash algorithms

When our machine 3 jumped, the original browser Ip Hash value was between 2-3 and 3-4. Go directly to the first machine clockwise, node 4. In this case, the machine access addresses between other nodes are not affected

Virtual nodes for consistent Hash algorithms

Data skew problem

However, when there are too few service nodes, it is easy to cause the data skew problem because the nodes are not evenly distributed. For example, if there are only two servers in the system, the loop distribution is as follows, node 2 can only handle a very small segment, and a large number of client requests fall on node 1. This is the data (request) skew problem

Virtual node

In order to solve the problem of data skew, the consistent hashing algorithm introduces the virtual node mechanism, that is, multiple hashes are calculated for each service node, and one service node is placed in each computed result position, which is called the virtual node.

This can be done by adding a number after the server IP address or host name. For example, three virtual nodes can be computed for each server, so the Hash value for each machine can be computed separately

  • “Ip# 1 of node 1”
  • “Ip# 2 of node 1”
  • “Ip# 3 for node 1”
  • “Ip# 1 of node 2”
  • “Ip# 2 of node 2”
  • “Ip# 3 for node 2”

So six virtual nodes are formed, and the six virtual nodes are distributed more evenly across the ring. When the client is routed to the virtual node, the system finds the real node corresponding to the virtual node. In this way, each browser IP address can be evenly distributed to two real servers.

Mock write consistent Hash algorithm

Common Hash implementation scheme

public class GeneralHash {

    public static void main(String[] args) {
        // Define the client IP address
        String[] clients = new String[]{"10.78.12.3"."113.25.63.1"."126.12.3.8"};

        // Define number of servers
        int serverCount = 5;// (number corresponding to 0,1,2)

        // hash(ip)%node_counts=index
        // Lock the Tomcat server that should be routed to based on index
        for(String client: clients) {
            int hash = Math.abs(client.hashCode());
            int index = hash%serverCount;
            System.out.println("Client:" + client + "Is routed to the server number:"+ index); }}}Copy the code

Output result:

Client: 10.78.12.3 The number of the server to which the route is routed is: 4 Client: 113.25.63.1 The number of the server to which the route is routed is: 0 Client: 126.12.3.8 The number of the server to which the route is routed is: 1Copy the code

No virtual nodes for consistent Hash

Analysis of ideas:

  1. During initialization, an ordered Map is used to store node IPHash values and corresponding IP values

  2. Evaluate the IPHash value for the client

  3. The value is evaluated on the Hash ring based on the IP Hash value

    3.1 The Hash value of IP is not in the middle of the map -> Take the first value on the ring

    3.2 The Hash value of the IP address is in the middle of the map


public class ConsistentHashNoVirtual {

    public static void main(String[] args) {
        //step1 initialize: map the hash value of the server node IP to the hash ring
        // Define the server IP address
        String[] tomcatServers = new String[]{"123.111.0.0"."123.101.3.1"."111.20.35.2"."123.98.26.3"};

        // We are using a sort Map here
        SortedMap<Integer,String> hashServerMap = new TreeMap<>();

        for(String tomcatServer: tomcatServers) {
            // Compute the hash value of each IP address, and then apply the hash ring to store the mapping between the hash value and the IP address
            int serverHash = Math.abs(tomcatServer.hashCode());
            // Stores the mapping between hash values and IP addresses
            hashServerMap.put(serverHash,tomcatServer);
        }

        //step2 obtain the hash value for the client IP address
        // Define the client IP address
        String[] clients = new String[]{"10.78.12.3"."113.25.63.1"."126.12.3.8"};

        for(String client : clients) {
            int clientHash = Math.abs(client.hashCode());

            //step3 for the client, find the server that can handle the current client request (clockwise on the hash ring)
            // Use the client IP hash to find out which server node can handle ()

            // The best thing about this is the use of tailMap, which values up
            SortedMap<Integer, String> integerStringSortedMap = hashServerMap.tailMap(clientHash);
 
            if(integerStringSortedMap.isEmpty()) {
                // Take the first server clockwise on the hash ring
                Integer firstKey = hashServerMap.firstKey();
                System.out.println("==========>>>> client: + client + "Routed to server:" + hashServerMap.get(firstKey));
            }else{
                Integer firstKey = integerStringSortedMap.firstKey();
                System.out.println("==========>>>> client: + client + "Routed to server:"+ hashServerMap.get(firstKey)); }}}}Copy the code

Output result:

= = = = = = = = = = > > > > client: 10.78.12.3 be routed to the server: 111.20.35.2 = = = = = = = = = = > > > > client: 113.25.63.1 be routed to the server: 123.98.26.3 = = = = = = = = = = > > > > the client: 126.12.3.8 Routed to the server: 111.20.35.2Copy the code

Consistent Hash using virtual nodes

In the case of virtual nodes, we only need to generate virtual nodes by adding #+ number after IP for each node during initialization

for(String tomcatServer: Int serverHash = math.abs (tomcatServer.hashCode()); Hashservermap. put(serverHash,tomcatServer); For (int I = 0; i < virtaulCount; i++) { int virtualHash = Math.abs((tomcatServer + "#" + i).hashCode()); Hashservermap. put(virtualHash,"---- request mapped from virtual node "+ I + ": "+ tomcatServer); }}Copy the code

The complete code is as follows:

public class ConsistentHashWithVirtual {

    public static void main(String[] args) {
        //step1 initialize: map the hash value of the server node IP to the hash ring
        // Define the server IP address
        String[] tomcatServers = new String[]{"123.111.0.0"."123.101.3.1"."111.20.35.2"."123.98.26.3"};

        SortedMap<Integer,String> hashServerMap = new TreeMap<>();


        // Define several virtual nodes for each real server
        int virtaulCount = 3;


        for(String tomcatServer: tomcatServers) {
            // Compute the hash value of each IP address, and then apply the hash ring to store the mapping between the hash value and the IP address
            int serverHash = Math.abs(tomcatServer.hashCode());
            // Stores the mapping between hash values and IP addresses
            hashServerMap.put(serverHash,tomcatServer);

            // Process virtual nodes
            for(int i = 0; i < virtaulCount; i++) {
                int virtualHash = Math.abs((tomcatServer + "#" + i).hashCode());
                hashServerMap.put(virtualHash,"---- by virtual node"+ i  + "Mapped request:"+ tomcatServer); }}//step2 obtain the hash value for the client IP address
        // Define the client IP address
        String[] clients = new String[]{"10.78.12.3"."113.25.63.1"."126.12.3.8"};
        for(String client : clients) {
            int clientHash = Math.abs(client.hashCode());
            //step3 for the client, find the server that can handle the current client request (clockwise on the hash ring)
            // Use the client IP hash to find out which server node can handle ()
            SortedMap<Integer, String> integerStringSortedMap = hashServerMap.tailMap(clientHash);
            if(integerStringSortedMap.isEmpty()) {
                // Take the first server clockwise on the hash ring
                Integer firstKey = hashServerMap.firstKey();
                System.out.println("==========>>>> client: + client + "Routed to server:" + hashServerMap.get(firstKey));
            }else{
                Integer firstKey = integerStringSortedMap.firstKey();
                System.out.println("==========>>>> client: + client + "Routed to server:"+ hashServerMap.get(firstKey)); }}}}Copy the code

Output result:

= = = = = = = = = = > > > > client: 10.78.12.3 be routed to the server: 111.20.35.2 = = = = = = = = = = > > > > client: 113.25.63.1 be routed to the server: - 2 by the virtual node mapping to request: 111.20.35.2 = = = = = = = = = = > > > > client: 126.12.3.8 be routed to the server: - 0 by the virtual node mapping request, 123.101.3.1Copy the code

Nginx uses a consistent Hash algorithm

Ngx_http_upstream_consistent_hash module

The ngx_HTTP_upstream_consistent_hash module is a load balancer that uses an internal consistent hash algorithm for selection

Suitable rear node.

The module can map requests evenly to back-end machines in different ways based on configuration parameters,

  • Consistent_hash $remote_ADDR: mapping by client IP address

  • Consistent_hash $request_URI: specifies the URI mapping based on the client request

  • Consistent_hash $args: map according to the parameters carried by the client

Separate installation

The ngx_HTTP_upstream_consistent_hash module is a third-party module that needs to be downloaded and installed

  1. Github Download nginx consistent Hash load balancing module github.com/replay/ngx_…

  1. Upload the downloaded package to the nginx server and decompress it

  2. We have compiled and installed nginx, now go to the source directory of nginx at that time and execute the following command

/configure -add-module =/root/ngx_http_consistent_hash-master make make installCopy the code
  1. Configure load balancing policies in the nginx.conf file