Manually implement the consistent Hash algorithm

The well-known consistent Hash algorithm is a high quality solution to solve the poor system scalability in distributed data layout or distributed environment. This paper aims to implement a set of this algorithm manually using Java language.

The background,

The simplest application scenario is cache. When the amount of cache in a single machine is too large, the cache needs to be divided into different libraries, and then hash modulus calculation is performed on the specified machine based on the relevant information, for example, index = hash(IP) % N.

However, when adding or subtracting nodes, the N value of the above formula changes, so most or even all of the cache will be invalidated. The most straightforward solution for this scenario is to use a consistent hash algorithm.

2. Introduction to consistent Hash algorithms

1. Simple consistency Hash

The simple abstract description of a consistent Hash algorithm is a circle with 2^32 nodes evenly arranged on it, such as [0,1,2,4,8…]. , and then hash our machine nodes on the ring, using hash(IP) or hash(domain name) rules.

When looking for data, just hash the key of the target data on the ring and read the data clockwise from the nearest machine node.

In the simple version of the following figure, if there are three data nodes (A, B and C) in total, when the key of the data to be searched is calculated between A and B, node B can be found by clockwise search.

The biggest advantage is that, with the case above as the background, when node B breaks down, C is found clockwise. In this way, the scope of influence is only between A and B, and other data is not affected.

2. Virtual nodes

However, when the data nodes are hashed, the compactness will be affected by the hash algorithm. For example, the data servers A, B, and C will be hashed on nodes 1, 2, and 4 after the hash calculation, which will lose the balance due to the density. ** For example, after the hash operation of the key of the data we want to search, there is A high probability that the key will appear between 4 and 1, that is, after C. In this case, the clockwise search will find A, and the server A will bear almost all the load, which loses the significance of this algorithm.

At this point, virtual nodes appear. For example, the three servers above are split into one virtual node (A1, B1, and C1), which can solve the balance problem of consistent hash to some extent.

Array primitive version

1, the train of thought

A brief description of the idea: In fact, we use an array to store all the node information, and then we need to manually sort it, because it is ordered, so we compare the hash value of each node from index 0 until we find a node whose hash value is larger than the hash(key) of our target data. Otherwise return the data for the first node.

2, the code is actually very simple, direct masturbation:

package  com.jet.mini.utils;

import java.util.Arrays;

/ * * *@ClassName: SortArrayConsistentHash
* @Description: consistent hash algorithm implemented by the initial algebraic group *@Author: Jet.Chen
* @Date: 2019/3/19 and *@Version: 1.0 * * /

public  class  SortArrayConsistentHash  {

    /** * the most core data structure */
    private  Node[]  buckets;

    /** * The initial size of the bucket */
    private  static  final  int  INITIAL_SIZE  =  32;

    /** * The current bucket size */
    private  int  length  =  INITIAL_SIZE;

    /** * The current bucket usage */
    private  int  size  =  0;

    public  SortArrayConsistentHash(a){

        buckets  =  new  Node[INITIAL_SIZE];

    }

    /** * Specifies the array length constructor */

    public  SortArrayConsistentHash(int  length){

        if  (length  <  32)  {
            buckets  =  new  Node[INITIAL_SIZE];
        }  else  {
            this.length  =  length;
            buckets  =  newNode[length]; }}/ * * *@Description: Writes data *@Param: [hash, value]
     * @return: void
     * @Author: Jet.Chen
     * @Date: 2019/3/19 now * /
    public  void  add(long  hash,  String  value){
        // Size Determine whether to expand the capacity
        if  (size  ==  length)  reSize();
        Node node  =  new  Node(value,  hash);
        buckets[++size]  =  node;
    }

    / * * *@Description: Deletes the node *@Param: [hash]
     * @return: boolean
     * @Author: Jet.Chen
     * @Date: 2019/3/20 0:24 * /
    public  boolean  del(long  hash)  {
        if  (size  ==  0)  return  false;
        Integer  index  =  null;
        for  (int  i  =  0;  i  <  length;  i++)  {
            Node node  =  buckets[i];
            if  (node  ==  null)  continue;
            if  (node.hash  ==  hash)  index  =  i;
        }

        if(index ! =null)  {
            buckets[index]  =  null;
            return  true;
        }

        return  false;
    }

    / * * *@DescriptionSort: *@Param* : []@return: void
     * @Author: Jet.Chen
     * @Date: 2019/3/19 23:48 * /

    public  void  sort(a)  {
        // The sorting here is not concerned with eQals
        Arrays.sort(buckets,  0,  size,  (o1,  o2)  ->  o1.hash  >  o2.hash  ?  1  :  -1);
    }

    / * * *@DescriptionExpansion and: *@Param* : []@return: void
     * @Author: Jet.Chen
     * @Date: 2019/3/19 23:42
     */
    public  void  reSize(a)  {
        // Expand by 1.5 times
        int  newLength  =  length  >>  1  +  length;
        buckets  =  Arrays.copyOf(buckets,  newLength);
    }

    / * * *@Description: Obtains the node value * based on the consistent hash algorithm@Param: [hash]
     * @return: java.lang.String
     * @Author: Jet.Chen
     * @Date: 2019/3/20 0:16 * /

    public  String  getNodeValue(long  hash)  {
        if  (size  ==  0)  return  null;
        for  (Node bucket  :  buckets)  {
            // Prevent empty nodes
            if  (bucket  ==  null)  continue;
            if  (bucket.hash  >=  hash)  return  bucket.value;
        }

        // prevent the loop from docking with the head
        // Scenario: list only node hash values, [null, 2, 3...] , but the hash value is 4, the first loop above obviously failed to find the node 2, so we need to loop again
        for  (Node bucket  :  buckets)  {
            if(bucket ! =null)  return  bucket.value;
        }

        return  null;

    }

    /** * node records the hash and the original IP address */
    private  class  Node  {
        public  String  value;
        public  long  hash;
        public  Node(String  value,  long  hash)  {
            this.value  =  value;
            this.hash  =  hash;
        }

        @Override
        public  String  toString(a)  {
            return  "Node{hash="+hash+", value="+value+"}"; }}}Copy the code

3, disadvantages

Arrays.sort() : TimSort (arrays.sort)

② Hash algorithm: The hash algorithm is not mentioned above and needs to be improved;

(3) Data structure: the above is used array, but need to manually sort, the advantage is that the insertion speed is ok, but the capacity expansion inconvenience, and need to manually sort, sort of timing is uncertain, need to improve;

④ Virtual nodes: Virtual nodes are not considered and need to be improved.

4. Advanced TreeMap

The above implementation has some drawbacks, then improve it:

① Data structure: We can use TreeMap data structure, the advantage is that the data structure is ordered, no need to sort, and the data structure has a function called tailMap, the function is to get a larger set of data than the specified key;

② Hash algorithm: Here we use the FNV1_32_HASH algorithm, which verifies that the hash distribution is uniform and the hash collision is ok;

③ Virtual nodes: We temporarily set the number of virtual nodes for fission of each node lock to 10.

The code is also not difficult and is also directly stroked:

package  com.jet.mini.utils;

import java.util.SortedMap;
import java.util.TreeMap;

/ * * *@ClassName: TreeMapConsistentHash
* @Description: An evolved version of treeMap implementation consistent hash *@Author: Jet.Chen
* @Date: 2019/3/20 fell *@Version: 1.0 * * /
public  class  TreeMapConsistentHash  {

    /** * Main data structure */
    private  TreeMap<Long,  String>  treeMap  =  new  TreeMap<>();

    /** * Specifies the number of virtual nodes */
    private  static  final  int  VIRTUAL_NODE_NUM  =  10;

    /** * Normal add node */
    @Deprecated
    public  void  add  (String key, String value)  {
        long  hash  =  hash(key);
        treeMap.put(hash,  value);
    }

    /** * Virtual node */ exists
    public  void  add4VirtualNode(String key, String value)  {
        for  (int  i  =  0;  i  <  VIRTUAL_NODE_NUM;  i++)  {
            long  hash  =  hash(key  +  "&&VIR"  +  i);
            treeMap.put(hash,  value);
        }
        treeMap.put(hash(key),  value);
    }

    /** * Reads the node value *@param key
     * @return* /
    public  String  getNode(String key)  {
        long  hash  =  hash(key);
        SortedMap<Long,  String>  sortedMap  =  treeMap.tailMap(hash);
        String  value;
        if(! sortedMap.isEmpty()) { value = sortedMap.get(sortedMap.firstKey()); }else  {
            value  =  treeMap.firstEntry().getValue();
        }

        return  value;
    }

    /** * uses FNV1_32_HASH */
    public  long  hash(String key)  {
        final  int  p  =  16777619;
        int  hash  =  (int)2166136261L;
        for(int  i  =  0;  i  <  key.length();  i++)  {
            hash  =  (hash  ^  key.charAt(i))  *  p;
        }
        hash  +=  hash  <<  13;
        hash  ^=  hash  >>  7;
        hash  +=  hash  <<  3;
        hash  ^=  hash  >>  17;
        hash  +=  hash  <<  5;
        // If the value is negative, take the absolute value
        if  (hash  <  0)  hash  =  Math.abs(hash);
        returnhash; }}Copy the code

Five, the other

1. Number of virtual nodes

Relationship between number of virtual nodes and number of servers (to be considered)

On the X-axis is the number of virtual nodes, and on the Y-axis is the number of servers. Obviously, the more servers, the fewer virtual nodes are recommended.

The original address: www.jetchen.cn/consistent-…