The well-known consistent Hash algorithm is a high quality solution to solve the poor system scalability in distributed data layout or distributed environment. This paper aims to implement a set of this algorithm manually using Java language.
The background,
The simplest application scenario is cache. When the amount of cache in a single machine is too large, the cache needs to be divided into different libraries, and then hash modulus calculation is performed on the specified machine based on the relevant information, for example, index = hash(IP) % N.
However, when adding or subtracting nodes, the N value of the above formula changes, so most or even all of the cache will be invalidated. The most straightforward solution for this scenario is to use a consistent hash algorithm.
2. Introduction to consistent Hash algorithms
1. Simple consistency Hash
The simple abstract description of a consistent Hash algorithm is a circle with 2^32 nodes evenly arranged on it, such as [0,1,2,4,8…]. , and then hash our machine nodes on the ring, using hash(IP) or hash(domain name) rules.
When looking for data, just hash the key of the target data on the ring and read the data clockwise from the nearest machine node.
In the simple version of the following figure, if there are three data nodes (A, B and C) in total, when the key of the data to be searched is calculated between A and B, node B can be found by clockwise search.
The biggest advantage is that, with the case above as the background, when node B breaks down, C is found clockwise. In this way, the scope of influence is only between A and B, and other data is not affected.
2. Virtual nodes
However, when the data nodes are hashed, the compactness will be affected by the hash algorithm. For example, the data servers A, B, and C will be hashed on nodes 1, 2, and 4 after the hash calculation, which will lose the balance due to the density. ** For example, after the hash operation of the key of the data we want to search, there is A high probability that the key will appear between 4 and 1, that is, after C. In this case, the clockwise search will find A, and the server A will bear almost all the load, which loses the significance of this algorithm.
At this point, virtual nodes appear. For example, the three servers above are split into one virtual node (A1, B1, and C1), which can solve the balance problem of consistent hash to some extent.
Array primitive version
1, the train of thought
A brief description of the idea: In fact, we use an array to store all the node information, and then we need to manually sort it, because it is ordered, so we compare the hash value of each node from index 0 until we find a node whose hash value is larger than the hash(key) of our target data. Otherwise return the data for the first node.
2, the code is actually very simple, direct masturbation:
package com.jet.mini.utils;
import java.util.Arrays;
/ * * *@ClassName: SortArrayConsistentHash
* @Description: consistent hash algorithm implemented by the initial algebraic group *@Author: Jet.Chen
* @Date: 2019/3/19 and *@Version: 1.0 * * /
public class SortArrayConsistentHash {
/** * the most core data structure */
private Node[] buckets;
/** * The initial size of the bucket */
private static final int INITIAL_SIZE = 32;
/** * The current bucket size */
private int length = INITIAL_SIZE;
/** * The current bucket usage */
private int size = 0;
public SortArrayConsistentHash(a){
buckets = new Node[INITIAL_SIZE];
}
/** * Specifies the array length constructor */
public SortArrayConsistentHash(int length){
if (length < 32) {
buckets = new Node[INITIAL_SIZE];
} else {
this.length = length;
buckets = newNode[length]; }}/ * * *@Description: Writes data *@Param: [hash, value]
* @return: void
* @Author: Jet.Chen
* @Date: 2019/3/19 now * /
public void add(long hash, String value){
// Size Determine whether to expand the capacity
if (size == length) reSize();
Node node = new Node(value, hash);
buckets[++size] = node;
}
/ * * *@Description: Deletes the node *@Param: [hash]
* @return: boolean
* @Author: Jet.Chen
* @Date: 2019/3/20 0:24 * /
public boolean del(long hash) {
if (size == 0) return false;
Integer index = null;
for (int i = 0; i < length; i++) {
Node node = buckets[i];
if (node == null) continue;
if (node.hash == hash) index = i;
}
if(index ! =null) {
buckets[index] = null;
return true;
}
return false;
}
/ * * *@DescriptionSort: *@Param* : []@return: void
* @Author: Jet.Chen
* @Date: 2019/3/19 23:48 * /
public void sort(a) {
// The sorting here is not concerned with eQals
Arrays.sort(buckets, 0, size, (o1, o2) -> o1.hash > o2.hash ? 1 : -1);
}
/ * * *@DescriptionExpansion and: *@Param* : []@return: void
* @Author: Jet.Chen
* @Date: 2019/3/19 23:42
*/
public void reSize(a) {
// Expand by 1.5 times
int newLength = length >> 1 + length;
buckets = Arrays.copyOf(buckets, newLength);
}
/ * * *@Description: Obtains the node value * based on the consistent hash algorithm@Param: [hash]
* @return: java.lang.String
* @Author: Jet.Chen
* @Date: 2019/3/20 0:16 * /
public String getNodeValue(long hash) {
if (size == 0) return null;
for (Node bucket : buckets) {
// Prevent empty nodes
if (bucket == null) continue;
if (bucket.hash >= hash) return bucket.value;
}
// prevent the loop from docking with the head
// Scenario: list only node hash values, [null, 2, 3...] , but the hash value is 4, the first loop above obviously failed to find the node 2, so we need to loop again
for (Node bucket : buckets) {
if(bucket ! =null) return bucket.value;
}
return null;
}
/** * node records the hash and the original IP address */
private class Node {
public String value;
public long hash;
public Node(String value, long hash) {
this.value = value;
this.hash = hash;
}
@Override
public String toString(a) {
return "Node{hash="+hash+", value="+value+"}"; }}}Copy the code
3, disadvantages
Arrays.sort() : TimSort (arrays.sort)
② Hash algorithm: The hash algorithm is not mentioned above and needs to be improved;
(3) Data structure: the above is used array, but need to manually sort, the advantage is that the insertion speed is ok, but the capacity expansion inconvenience, and need to manually sort, sort of timing is uncertain, need to improve;
④ Virtual nodes: Virtual nodes are not considered and need to be improved.
4. Advanced TreeMap
The above implementation has some drawbacks, then improve it:
① Data structure: We can use TreeMap data structure, the advantage is that the data structure is ordered, no need to sort, and the data structure has a function called tailMap, the function is to get a larger set of data than the specified key;
② Hash algorithm: Here we use the FNV1_32_HASH algorithm, which verifies that the hash distribution is uniform and the hash collision is ok;
③ Virtual nodes: We temporarily set the number of virtual nodes for fission of each node lock to 10.
The code is also not difficult and is also directly stroked:
package com.jet.mini.utils;
import java.util.SortedMap;
import java.util.TreeMap;
/ * * *@ClassName: TreeMapConsistentHash
* @Description: An evolved version of treeMap implementation consistent hash *@Author: Jet.Chen
* @Date: 2019/3/20 fell *@Version: 1.0 * * /
public class TreeMapConsistentHash {
/** * Main data structure */
private TreeMap<Long, String> treeMap = new TreeMap<>();
/** * Specifies the number of virtual nodes */
private static final int VIRTUAL_NODE_NUM = 10;
/** * Normal add node */
@Deprecated
public void add (String key, String value) {
long hash = hash(key);
treeMap.put(hash, value);
}
/** * Virtual node */ exists
public void add4VirtualNode(String key, String value) {
for (int i = 0; i < VIRTUAL_NODE_NUM; i++) {
long hash = hash(key + "&&VIR" + i);
treeMap.put(hash, value);
}
treeMap.put(hash(key), value);
}
/** * Reads the node value *@param key
* @return* /
public String getNode(String key) {
long hash = hash(key);
SortedMap<Long, String> sortedMap = treeMap.tailMap(hash);
String value;
if(! sortedMap.isEmpty()) { value = sortedMap.get(sortedMap.firstKey()); }else {
value = treeMap.firstEntry().getValue();
}
return value;
}
/** * uses FNV1_32_HASH */
public long hash(String key) {
final int p = 16777619;
int hash = (int)2166136261L;
for(int i = 0; i < key.length(); i++) {
hash = (hash ^ key.charAt(i)) * p;
}
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;
// If the value is negative, take the absolute value
if (hash < 0) hash = Math.abs(hash);
returnhash; }}Copy the code
Five, the other
1. Number of virtual nodes
On the X-axis is the number of virtual nodes, and on the Y-axis is the number of servers. Obviously, the more servers, the fewer virtual nodes are recommended.
The original address: www.jetchen.cn/consistent-…