memories

I still remember that when I joined the new company after graduation, my superior explained to my front-end classmates that the back-end technology stack was huge and complicated. One example is “How can multiple machines provide data cache storage service?” “, turned around and asked me. At that time, he directly said that he used the mode of hash to share the data.

Then I must have been asked how to deal with a machine failure and how to reduce the impact of node failure, only to be scorned and have memorized the word consistency hash ever since.

Although the working time is not short, now if YOU ask me what the consistent hash algorithm is, I can probably only answer a ring, there are many virtual nodes in the ring, after the key hash to locate the corresponding virtual node, it is true that I have never written a line of code by myself.

Li-an old hero, costraint. Let’s get started

Consistent hash algorithm

Consistent hashing algorithm was proposed by Karger et al., MIT in 1997 in the solution to distributed Cache. It was designed to solve the Hot spot problem in the Internet. Consistent hashing fixes the problem with the simple hashing algorithm used by CARP, making DHT truly useful in a P2P environment.

Consistent hash is widely used in data storage to reduce data skews. When nodes fail or increase, only a small amount of data is affected.

The figure above shows a ring in which we hash four nodes

We then put it into the next node corresponding to the nearest node according to the calculation result of key value.

When we add node5, the hash value is calculated and put into the ring, and only part of the data in Node4 (hash value less than node5) is repositioned to Node5.

When node4 is removed, only node4 data is moved to the next node node3.

Consideration: After node4 fails, all data pressure on Node4 is transferred to Node3, and the pressure on Node3 increases. Will a chain reaction occur, leading to the collapse of all nodes?

In this case, we need to add virtual nodes to share the pressure of Node3, and distribute more physical nodes to the ring through hash calculation. Relatively speaking, data hash keys can be more randomly distributed to different nodes. Ideally, when one node fails, multiple nodes share the data pressure

Logic implementation

MyServiceNode Node operation

Package main import (" FMT ""strconv") type ServiceNode struct {Ip String Port string} // Func NewServiceNode(IP string, port string) *ServiceNode {return &ServiceNode{IP: IP, port: func NewServiceNode(IP string, port string) *ServiceNode {return &ServiceNode{IP: IP, port: Port,}} func main() {// instantiate three entity nodes node1 := NewServiceNode("127.0.0.1", "3305") node2 := NewServiceNode("127.0.0.1", "3306") node3: = NewServiceNode (" 127.0.0.1 ", "3307") / / add the corresponding virtualization node number 1, 1, 3 virtualNodeService. AddVirtualNode (node1, 3) virtualNodeService.addVirtualNode(node2, 3) virtualNodeService.addVirtualNode(node3, PrintRouteList() // loop test for I := 0; i < 20; I++ {/ / remove 2 nodes virtualNodeService. RemoveVirtualNode (2, 3) / / to get corresponding node address serviceNode: = virtualNodeService. GetVirtualNodel (" get_cc "+ strconv. Itoa (I)) FMT. Println (serviceNode. Ip + Func PrintRouteList() {for key, val := range virtualNodeService.VirtualNodes { fmt.Println("key=" + strconv.Itoa(int(key)) + ", host=" + val.Ip + ":" + val.Port) } }Copy the code

VirtualNodeService Virtual node

package main import ( "crypto/md5" "encoding/hex" "sort" "strconv" "sync" ) var virtualNodeService = NewVirtualNode() Type NodeType []uint32 //Len() func (s NodeType) Len() int {return Len(s)} //Less(): the result is sorted from low to high func (s NodeType) Less(i, j int) bool { return s[i] < s[j] } //Swap() func (s NodeType) Swap(i, j int) { s[i], s[j] = s[j], S [I]} Type VirtualNode struct {VirtualNodes map[uint32]*ServiceNode NodeKeys NodeType sync.rwmutex} Func NewVirtualNode() *VirtualNode {return &VirtualNode{VirtualNodes: func NewVirtualNode() *VirtualNode {return &VirtualNode{VirtualNodes: Map [uint32]*ServiceNode{},}} // addVirtualNode func (v *VirtualNode) addVirtualNode(ServiceNode *ServiceNode, VirtualNum uint) {// defer v.lock () for I := uint(0); i < virtualNum; i++ { hashStr := serviceNode.Ip + ":" + serviceNode.Port + ":" + strconv.Itoa(int(i)) V.virtualnodes [v.gethashCode (hashStr)] = serviceNode} // Sort virtual nodes hash values v.orthash ()} // Remove virtual nodes func (v *VirtualNode) RemoveVirtualNode (serviceNode * serviceNode, virtualNum uint) {// Defer v.lock () for I := uint(0); i < virtualNum; i++ { hashStr := serviceNode.Ip + ":" + serviceNode.Port + ":" + strconv.Itoa(int(i)) delete(v.VirtualNodes, V *VirtualNode) sortHash() {v *VirtualNode) sortHash() {v = nil for k := range v.VirtualNodes { v.NodeKeys = append(v.NodeKeys, Func (v *VirtualNode) getVirtualNodel(routeKey String) *ServiceNode { V.lock () defer v.unlock () index := 0 hashCode := v.gethashCode (routeKey) I := Sort.search (len(v.nodekeys), func(I int) bool {return v.nodekeys [I] > hashCode}) When I is less than the number of nodes, If I < len(v.nodekeys) {index = I} else {index = 0} return v.voirtualNodes [v.nodekeys [index]]} Func (v *VirtualNode) getHashCode(nodeHash String) uint32 {// Crc32 hash code //return crc32.ChecksumIEEE([]byte(nodeHash)) md5 := md5.New() md5.Write([]byte(nodeHash)) md5Str := hex.EncodeToString(md5.Sum(nil)) h := 0 byteHash := []byte(md5Str) for i := 0; i < 32; i++ { h <<= 8 h |= int(byteHash[i]) & 0xFF } return uint32(h) }Copy the code