preface

Recently, when we understand distribution, we often can’t get around an algorithm: the consistent hashing algorithm. So after understanding and practicing this algorithm, there is this article.

The comparison between algorithms

In distributed sharding, there are several algorithms: modular, segmented, and consistent hash.

modulus piecewise Consistency hashing
Whether the upper layer is aware is is no
Costs of migration high high Low. Only adjacent nodes are involved
Impact of single point of failure high high Low: only adjacent nodes are affected
Algorithm complexity low low high
Hot data There are There are There are

Consistency hashing mainly solves the problem

From the above comparison, it can be seen that consistent hashing mainly reduces the data migration cost brought by the up-down of nodes. Meanwhile, the changes in the number of nodes and the sharding principle are insensitive to the upper layer, which makes the upper layer more focused on the writing of the logic in the domain and makes the overall architecture more flexible.

Consistent hash principle

  1. Basic data structure

The basic data type is human. Most conventional hashing methods use the data hash into a 2^32-1 space. Consistent hashing usually turns the space into a ring on this basis. As shown in the figure below.

This implementation uses a hash table with keys arranged by size. The original plan is to use arrays to undertake data, but the sorting cost increases with the increase of keys, so it is abandoned.

  1. Data is stored

    Data storage is roughly the same as the hashing algorithm, which calculates the hash value of the stored data into the corresponding hash slot. But the consistent hash difference is that when the computed hash is not on the ring, the data is stored in the first hash slot.

    Scenario Scenario: Four nodes (server1 to 4) are online, and the hash value is hash1 to 4. Five data (hash1 ~ 5) are stored in the node, as shown in the figure below.

The implementation of the idea is

1. Calculate the hash value of the stored data. 2. If the server is not found in 2, write to the first (hash minimum) nodeCopy the code
  1. Node online

The new node is added to the consistent hash ring by calculating the hash value represented by the node, mapping the node to the ring according to the calculated value, and finally migrating the data of adjacent nodes to the new node.

Scenario Scenario: Four servers (server1 to 4) are online, and their hash values are hash1 to 4. A new node (hash5) node is now online on the ring. The result is shown below.

The implementation of the idea is

Calculate the hash value of the online node. 2. Calculate the hash value of the virtual node added to the online node if the specified number of virtual nodes is initialized. Search for the nearest node (the smallest node hash value greater than the hash value of the online node and the virtual node) and fetch the node data. 4. Add 1 and 2 points to the Hash ring. 5. Add the data taken out in 3 to the Hash ring againCopy the code
  1. Nodes offline

When an existing node is offline, the system computs the hash value of the node to retrieve the data contained in the node. When the node is offline, the retrieved data is put back into the Hash ring.

Scenario Scenario: Five servers (server1 to 5) are online, and the hash values are hash1 to 5. The current node Server4 is offline. The result is shown below.

The implementation of the idea is

1. Calculate the hash value for offline nodes. 2. Remove offline nodes and virtual nodes (if the specified number of virtual nodes is initialized) from the Hash ring. 4. Put the data in 2 back into the ringCopy the code

Code implementation

Consistency hashing is divided into two schemes: no virtual node and virtual node. The implementation of the two schemes is similar, so this implementation will be implemented together. The implementation is as follows.

package org.CommonAlgorithms.ConsistentHash;

import org.CommonAlgorithms.HashAlgorithm.HashService;
import java.util.List;

/**
 * consistentHashing interface
 * @author cartoon
 * @since  10/01/2021
 * @version1.1 * /
public interface ConsistentHashing {

    /**
     * put data to hash loop
     * @param data data list
     * @return put result
     */
    boolean putData(List<String> data);

    /**
     * put data to hash loop
     * @param data data
     * @return put result
     */
    boolean putData(String data);

    /**
     * remove node from hash loop
     * @param node removing node
     * @return remove result
     */
    boolean removeNode(String node);

    /**
     * add node to hash loop
     * @param node adding node
     * @return add result
     */
    boolean addNode(String node);

    /**
     * inject hash method to hash loop
     * @param hashService hash method
     * @throws UnsupportedOperationException if loop already has node
     */
    void setHashMethod(HashService hashService);

    /** * print all data in loop according ascending hash value with nodes */
    void printAllData(a);

}


Copy the code
package org.CommonAlgorithms.ConsistentHash;

import org.CommonAlgorithms.HashAlgorithm.HashService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.*;

/**
 * consistent hash achieve
 * @author cartoon
 * @since2021/01/17 * /
public class ConsistentHashingImpl implements ConsistentHashing {

    private static final Logger log = LoggerFactory.getLogger(ConsistentHashingImpl.class);

    /** * virtual node name template */
    private static final String virtualNodeFormat = "%s&&%d";

    /** * real node and its virtual node mapping */
    private SortedMap<String, List<String>> realNodeToVirtualNode;

    /** * hash and its node mapping */
    private SortedMap<Integer, String> hashToNodes;

    /** * node and its data mapping */
    private Map<String, List<String>> nodeToData;

    /** * determine virtual node's number of each node */
    private int virtualNodeNum;

    /** * inject hash method, if null, use loop default hash method */
    private HashService hashService;


    public ConsistentHashingImpl(a) {
        this(0.new String[0]);
    }

    public ConsistentHashingImpl(String... nodes) {
        this(0, nodes);
    }

    public ConsistentHashingImpl(int virtualNodeNum) {
        this(virtualNodeNum, new String[0]);
    }

    public ConsistentHashingImpl(int virtualNodeNum, String... nodes) {
        //1. intercept virtual num smaller than 0
        if(virtualNodeNum < 0){
            log.error("virtual num is not allow smaller than 0");
            throw new IllegalArgumentException();
        }
        //2. initialize loop member attributes
        this.virtualNodeNum = virtualNodeNum;
        realNodeToVirtualNode = new TreeMap<>();
        hashToNodes = new TreeMap<>();
        nodeToData = new HashMap<>();
        for(String server : nodes){
            hashToNodes.put(getHash(server), server);
            nodeToData.put(server, new LinkedList<>());
        }
        //3. if virtual node number bigger than 0, add virtual node
        if(virtualNodeNum > 0) {for(String server : nodes){ addVirtualNode(server); }}}@Override
    public boolean putData(List<String> data) {
        //1. circulate call put data method to add data to loop
        for(String incomingData : data){
            if(! putData(incomingData)){return false; }}return true;
    }

    @Override
    public boolean putData(String data) {
        if(hashToNodes.isEmpty()){
            log.error("put data, usable server is empty");
            return false;
        }
        //1. calculate data's hash value
        int currentHash = getHash(data);
        //2. get usual node(node's hash value is bigger than data's hash value), if usual node list is empty, get first node in loop
        SortedMap<Integer, String> usableNodes = hashToNodes.tailMap(currentHash);
        String node = usableNodes.isEmpty() ? hashToNodes.get(hashToNodes.firstKey()) : usableNodes.get(usableNodes.firstKey());
        //3. add data to node
        List<String> dataList = nodeToData.get(node);
        dataList.add(data);
        log.info("put data, data {} is placed to server {}, hash: {}", data, node, currentHash);
        return true;
    }

    @Override
    public boolean removeNode(String node) {
        //1. calculate hash value of removing node
        int removeServerHash = getHash(node);
        if(! hashToNodes.containsKey(removeServerHash)){ log.error("remove server, current server is not in server list, please check server ip");
            return false;
        }
        //2. get data from removing node
        List<String> removeServerData = nodeToData.get(node);
        //3. get removing node's virtual node data, remove all virtual node with removing node
        if(virtualNodeNum ! =0) {for(String virtualNode : realNodeToVirtualNode.get(node)){ removeServerData.addAll(nodeToData.get(virtualNode)); hashToNodes.remove(getHash(virtualNode)); nodeToData.remove(virtualNode); }}//4. remove node from hash loop
        hashToNodes.remove(removeServerHash);
        nodeToData.remove(node);
        if(hashToNodes.size() == 0){
            log.info("remove server, after remove, server list is empty");
            return true;
        }
        //5. put data to loop by call put data method
        putData(removeServerData);
        log.info("remove server, remove server {} success", node);
        return true;
    }

    @Override
    public boolean addNode(String node) {
        //1, calculate adding node's hash value
        int addServerHash = getHash(node);
        //2. add node and migrate data
        if(hashToNodes.isEmpty()){
            //2.1 add node and its virtual node to loop directly when current loop is empty
            hashToNodes.put(addServerHash, node);
            nodeToData.put(node, new LinkedList<>());
            if(virtualNodeNum > 0){ addVirtualNode(node); }}else{
            They were migrated according to their religions
            SortedMap<Integer, String> greatServers = hashToNodes.tailMap(addServerHash);
            String greatServer = greatServers.isEmpty() ? hashToNodes.get(hashToNodes.firstKey()) : greatServers.get(greatServers.firstKey());
            List<String> firstGreatServerData = new LinkedList<>(nodeToData.get(greatServer));
            //2.2.2 add node and its virtual node to loop
            hashToNodes.put(addServerHash, node);
            nodeToData.put(greatServer, new LinkedList<>());
            nodeToData.put(node, new LinkedList<>());
            if(virtualNodeNum ! =0){
                addVirtualNode(node);
            }
            // Migrate 2.2.1 Data to loop by call put data method
            putData(firstGreatServerData);
        }
        log.info("add server, server {} has been added", node);
        return true;
    }

    @Override
    public void printAllData(a) {
        nodeToData.forEach((server, data) -> log.info("server {} contains data {}", server, data));
    }

    @Override
    public void setHashMethod(HashService hashService) {
        if(! hashToNodes.isEmpty()){throw new UnsupportedOperationException();
        }
        this.hashService = hashService;
    }

    private void addVirtualNode(String realNode){
        if(virtualNodeNum > 0){
            List<String> virtualNodeList = new LinkedList<>();
            for(int cnt = 0; cnt < this.virtualNodeNum; cnt++){
                //1. generate virtual node name by default format
                String virtualNodeName = String.format(virtualNodeFormat, realNode, cnt);
                //2. calculate each virtual node's hash value
                int virtualNodeHash = getHash(virtualNodeName);
                //3. current node already exist in loop, continue
                if(hashToNodes.containsKey(virtualNodeHash)){
                    continue;
                }
                //4. add node to loop
                virtualNodeList.add(virtualNodeName);
                hashToNodes.put(virtualNodeHash, virtualNodeName);
                nodeToData.put(virtualNodeName, new LinkedList<>());
            }
            //5. map virtual node to real noderealNodeToVirtualNode.put(realNode, virtualNodeList); }}private int getHash(String data){
        return hashService == null ? defaultGetHash(data) : hashService.getHash(data);
    }

    private int defaultGetHash(String data){
        int res = 0;
        for(char tempChar : data.toCharArray()){
            if(tempChar >= '0' && tempChar <= '9'){ res += tempChar; }}returnres; }}Copy the code

The test results

Consistency hashing without virtual nodes
The test code
package ConsistentHash;

import org.CommonAlgorithms.ConsistentHash.ConsistentHashing;
import org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/ * * *@author cartoon
 * @date2020/12/27 * /
public class ConsistentHashingWithoutVirtualNodeTest {

    private static final Logger log = LoggerFactory.getLogger(ConsistentHashingWithoutVirtualNodeTest.class);

    private ConsistentHashing consistentHashing;

    private String[] servers;

    private String[] data;

    @Before
    public void before(a){
        servers = new String[]{"000"."111"."222"."333"."555"};
        consistentHashing = new ConsistentHashingImpl(servers);
        data = new String[]{"000"."111"."222"."333"."555"};
    }

    @Test
    public void testConsistentHashing(a){
        for(String str : data){
            Assert.assertTrue(consistentHashing.putData(str));
        }
        consistentHashing.removeNode("333");
        consistentHashing.addNode("444");
        consistentHashing.putData("444"); consistentHashing.printAllData(); }}Copy the code
The test results
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 000 is placed to server 000, hash: 144
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 111 is placed to server 111, hash: 147
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 222 is placed to server 222, hash: 150
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 333 is placed to server 333, hash: 153
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 555 is placed to server 555, hash: 159
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 333 is placed to server 555, hash: 153
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - remove server, remove server 333 success
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 555 is placed to server 555, hash: 159
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 333 is placed to server 444, hash: 153
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - add server, server 444 has been added
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 444 is placed to server 444, hash: 156
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 000 contains data [000]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 111 contains data [111]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 222 contains data [222]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 444 contains data [333, 444]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 555 contains data [555]
Copy the code
Consistent hash tests with virtual nodes
The test code
package ConsistentHash;

import org.CommonAlgorithms.ConsistentHash.ConsistentHashing;
import org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/ * * *@author cartoon
 * @date2021/01/17 * /
public class ConsistentHashingWithVirtualNodeTest {

    private static final Logger log = LoggerFactory.getLogger(ConsistentHashingWithVirtualNodeTest.class);

    private ConsistentHashing consistentHashing;

    private String[] servers;

    private String[] data;

    @Before
    public void before(a){
        servers = new String[]{"000"."111"."222"."333"."555"};
        consistentHashing = new ConsistentHashingImpl(3, servers);
        data = new String[]{"000"."111"."222"."333"."555"};
    }

    @Test
    public void testConsistentHashing(a){
        for(String str : data){
            Assert.assertTrue(consistentHashing.putData(str));
        }
        consistentHashing.removeNode("333");
        consistentHashing.addNode("444");
        consistentHashing.putData("444");
        consistentHashing.putData("555 && 0"); consistentHashing.printAllData(); }}Copy the code
The test results
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 000 is placed to server 000, hash: 144
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 111 is placed to server 111, hash: 147
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 222 is placed to server 222, hash: 150
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 333 is placed to server 333, hash: 153
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 555 is placed to server 555, hash: 159
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 333 is placed to server 555, hash: 153
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - remove server, remove server 333 success
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 555 is placed to server 555, hash: 159
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 333 is placed to server 444, hash: 153
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - add server, server 444 has been added
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 444 is placed to server 444, hash: 156
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - put data, data 555&&0 is placed to server 555&&0, hash: 207
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 000&&2 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 000&&1 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 000&&0 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 111&&1 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 111&&2 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 555&&1 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 555&&2 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 222&&2 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 444&&0 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 444&&1 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 444&&2 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 555&&0 contains data [555&&0]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 000 contains data [000]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 111 contains data [111]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 222 contains data [222]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 222&&0 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 444 contains data [333, 444]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 555 contains data [555]
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 222&&1 contains data []
[main] INFO org.CommonAlgorithms.ConsistentHash.ConsistentHashingImpl - server 111&&0 contains data []
Copy the code

Implementation defects

1. The hash algorithm is too simple and the probability of hash conflict is high. 2. The sequence conflict between the real node and the existing virtual node is not resolvedCopy the code

Afterword.

All codes of this implementation have been uploaded to Github. The project mainly includes some commonly used algorithms, such as sorting algorithm and simple implementation of flow limiting algorithm. Welcome to issue.