Solomon_ Shogo Hash Hash load balancing for distributed micro services. Thumbs up, thumbs up, thumbs up.

Follow my public account Solomon for more surprises

It should be emphasized that there are some differences between Dubbo’s Hash mapping model and the ring queue Hash mapping model described in most online literature. For me, the ring queue Hash mapping model is not enough to give me a thorough understanding of consistent Hash. It wasn’t until I looked at Dubbo’s implementation of consistent Hash that things clicked.

Hash mapping model of ring queue

  • This scheme is based on modular operation. Modulo 2^32, then the Hash value interval is [0, 2^32-1]. There are two parts to do:

Map service

Construct a specific identifier (such as MD5 code) for the service address (IP + port) according to certain rules, and then use the identifier to modulo 2^32 to determine the Hash position of the service. Assume that Node1, Node2, and Node3 services exist. The mapping is as follows:

Mapping requests, location services

When making a request, we often take parameters that we can use to determine which service to call. Suppose there are requests R1, R2 and R3, and after a series of operations to calculate specific identifiers and mod for their parameters, there is the following mapping:

From the figure, we can see that R1 requests are mapped between 0-Node1, R2 requests are mapped between Node1-Node2, and R3 requests are mapped between Node2-Node3. We take the first service whose Hash value is greater than the requested Hash value as the actual calling service. That is, R1 requests will invoke the Node1 service, R2 requests will invoke the Node2 service, and R3 requests will invoke the Node3 service.

Adding a Service Node

Suppose the new service Node4 is mapped before Node3, which happens to break one of the original mappings:

In this way, requests to R3 will actually invoke service Node4, but requests R1 and R2 are not affected.

Deleting a Service Node

If service Node2 is down, R2 requests will be mapped to Node3:Original R1 and R3 requests are not affected

As you can see, when a service is added or removed, the number of requests affected is limited. Unlike simple mapping, the global mapping needs to be adjusted when the service changes.

Balance with virtual nodes

In our hypothesis above, we assume that Node1, Node2, and Node3 services are Hash mapped to locations that exactly cut the loop into equal thirds, and the distribution of requests is basically balanced. But in practice, it’s hard to be so clever when calculating the Hash value of a service. Maybe it accidentally maps to something like this:

This results in most requests being mapped to Node1. Thus, virtual nodes are introduced.

In addition to Hash mapping the address of the service itself, a virtual node performs some processing on its address (for example, in Dubbo, the enumerator 1, 2, 3 is added to the string of IP +port… , respectively representing virtual nodes 1, 2, and 3) to achieve the purpose of mapping multiple nodes for the same service. By introducing virtual nodes, we can further split the requests mapped to Node1 in the figure above:

As shown in the figure above, if a request falls between Node3 and Node1 ‘, the request should invoke Node1 ‘service, but because Node1’ is a virtual node of Node1, it actually invokes Node1 service. By introducing virtual nodes, the distribution of requests is more balanced.

The use of consistent Hash and the introduction of load balancing policies

  • The XML configuration
<dubbo:reference loadbalance="Consistenthash" />
Copy the code
  • The Properties configuration:
dubbo.reference.loadbalance=consistenthash
Copy the code
  • annotations
@ the Reference (loadbalance = "consistenthash")
Copy the code

Dubbo load balancing policy introduction phase

After the interface proxy class is generated and assembled, the invocation of the service is basically as follows: Proxy -> MockClusterInvoker -> cluster policy (e.g. FailoverClusterInvoker) -> initialize the load balancing policy -> determine Invoker based on the selected load balancing policy.

The load balancing policy is initialized in AbstractClusterInvoker’s initLoadBalance method:

protected LoadBalance initLoadBalance(List<Invoker<T>> invokers, Invocation invocation) {
    if (CollectionUtils.isNotEmpty(invokers)) {
        return ExtensionLoader.getExtensionLoader(LoadBalance.class).getExtension(invokers.get(0).getUrl()
                .getMethodParameter(RpcUtils.getMethodName(invocation), LOADBALANCE_KEY, DEFAULT_LOADBALANCE));
    } else {
        returnExtensionLoader.getExtensionLoader(LoadBalance.class).getExtension(DEFAULT_LOADBALANCE); }}Copy the code

This part of the code logic is divided into two parts:

LOADBALANCE_KEY = LOADBALANCE_KEY; LOADBALANCE_KEY = LOADBALANCE_KEY; LOADBALANCE_KEY = LOADBALANCE_KEY;

2. Use the SPI mechanism to initialize and load the load balancing policy represented by this value.

All load balancing policies inherit the LoadBalance interface. In all clustering strategies, AbstractClusterInvoker’s Select method will be called, and AbstractClusterInvoker will call LoadBalance’s select method in doSelect, The load balancing policy starts to be executed.

Dubbo consistent Hash load balancing implementation

It is important to note that the implementation of the load balancing strategy I am talking about is to select one of the providers as the remote call object for the current Consumer. In the code, the Provider is encapsulated as an Invoker entity, so the load-balancing strategy is implemented simply by selecting an Invoker from the list of Invokers.

Therefore, Dubbo’s consistent Hash algorithm can also be divided into two steps compared to the implementation of common consistent Hash:

1. Map Provider to Hash value interval (actually map Invoker);

Map the request and find the first Invoker that is greater than the requested Hash value.

Mapping Invoker

AbstractLoadBalance is an implementation of Dubbo that inherits AbstractLoadBalance from all load balancing implementation classes in Dubbo. The LoadBalance select method is called with an implementation of AbstractLoadBalance:

@Override
public <T> Invoker<T> select(List<Invoker<T>> invokers, URL url, Invocation invocation) {
    if (CollectionUtils.isEmpty(invokers)) {
        return null;
    }
    if (invokers.size() == 1) {
        return invokers.get(0);
    }
    // doSelect enter the execution logic of the specific load balancing algorithm
    return doSelect(invokers, url, invocation);
}
Copy the code

Can see here call doSelect, Dubbo consistency Hash specific implementation class name is ConsistentHashLoadBalance, let’s take a look at what it doSelect methods do:

@Override
protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
    String methodName = RpcUtils.getMethodName(invocation);
    // Key format: interface name. The method name
    String key = invokers.get(0).getUrl().getServiceKey() + "." + methodName;
    // identityHashCode is used to identify whether invokers have changed
    int identityHashCode = System.identityHashCode(invokers);
    ConsistentHashSelector<T> selector = (ConsistentHashSelector<T>) selectors.get(key);
    if (selector == null|| selector.identityHashCode ! = identityHashCode) {// If no "interface exists. Initializes a selector corresponding to the method name, or if the Invoker list has changed
        selectors.put(key, new ConsistentHashSelector<T>(invokers, methodName, identityHashCode));
        selector = (ConsistentHashSelector<T>) selectors.get(key);
    }
    return selector.select(invocation);
}
Copy the code

There’s an important concept here: selector. This is the data structure that hosts the entire mapping in the Dubbo consistent Hash implementation. It mainly has the following parameters:

/** * TreeMap */ stores Hash values and node mappings
private final TreeMap<Long, Invoker<T>> virtualInvokers;

/** * Number of nodes */
private final int replicaNumber;

/** * The Hash code that identifies whether the Invoker list has changed */
private final int identityHashCode;

/** * The index of the parameters used for the Hash mapping in the request */
private final int[] argumentIndex;
Copy the code

When creating a ConsistentHashSelector object, it iterates through all Invoker objects, calculates the MD5 code corresponding to its address (IP +port), and initializes all service nodes and virtual nodes according to the value of replicaNumber:

ConsistentHashSelector(List<Invoker<T>> invokers, String methodName, int identityHashCode) {
    this.virtualInvokers = new TreeMap<Long, Invoker<T>>();
    this.identityHashCode = identityHashCode;
    URL url = invokers.get(0).getUrl();
    // Get the number of configured nodes
    this.replicaNumber = url.getMethodParameter(methodName, HASH_NODES, 160);
    // Gets the index of the configured parameter used for the Hash mapping
    String[] index = COMMA_SPLIT_PATTERN.split(url.getMethodParameter(methodName, HASH_ARGUMENTS, "0"));
    argumentIndex = new int[index.length];
    for (int i = 0; i < index.length; i++) {
        argumentIndex[i] = Integer.parseInt(index[i]);
    }
    // Iterate over all Invoker objects
    for (Invoker<T> invoker : invokers) {
        // Obtain the IP address and port of the Provider
        String address = invoker.getUrl().getAddress();
        for (int i = 0; i < replicaNumber / 4; i++) {
            byte[] digest = md5(address + i);
            for (int h = 0; h < 4; h++) {
                longm = hash(digest, h); virtualInvokers.put(m, invoker); }}}}Copy the code

ReplicaNumber takes the default value 160 as an example. Assuming that the current traversal Invoker address is 127.0.0.1:20880, it will obtain 127.0.0.1:208800, 127.0.0.1:208801, etc. , “127.0.0.1:2088040” md5 digest, after each digest is obtained, the digest is hashed four times. You can probably guess that the purpose is to enhance the hashing effect. (I wish someone could tell me the rationale.)

Virtualinvokers.put (m, invoker) stores the mapping between the currently calculated Hash value and invoker.

In simple terms, this code creates replicaNumber nodes for each Invoker. The mapping between Hash value and Invoker represents a node, which is stored in TreeMap.

The mapping request

Let us return to ConsistentHashLoadBalance doSelect method, if not found the selector will be new selector, find the selector will call after the selector of the select methods:

public Invoker<T> select(Invocation invocation) {
    The key is determined based on the [parameter value] invocation, and the first parameter is used by default for hash calculation
    String key = toKey(invocation.getArguments());
    // Get the MD5 encoding of parameter value
    byte[] digest = md5(key);
    return selectForKey(hash(digest, 0));
}

// Get the parameters according to the parameter index and concatenate all the parameters into a string
private String toKey(Object[] args) {
    StringBuilder buf = new StringBuilder();
    for (int i : argumentIndex) {
        if (i >= 0&& i < args.length) { buf.append(args[i]); }}return buf.toString();
}

// Find Invoker according to the MD5 encoding of the argument string
private Invoker<T> selectForKey(long hash) {
    Map.Entry<Long, Invoker<T>> entry = virtualInvokers.ceilingEntry(hash);
    if (entry == null) {
        entry = virtualInvokers.firstEntry();
    }
    return entry.getValue();
}
Copy the code

ArgumentIndex is assigned along with the initial Selector, representing which request arguments need to Hash to get the Invoker. For example, methodA(Integer a, Integer b, Integer c) is used. If the argumentIndex value is {0,2}, the Hash value is computed using a concatenated string of a and c.

We already know that virtualInvokers is a TreeMap, and the underlying implementation of TreeMap is a red-black tree. TreeMap’s method, ceilingEntry(hash), is used to get the first element that is larger than the value passed in. As you can see, this is exactly the same logic as a normal consistent Hash algorithm.

But the loopback logic is a little different here. For modulo taking operations, when greater than the maximum value, the loop will automatically start from 0, and the logic here is: When there is no incoming ceilingEntry () method of element, value virtualInvokers. CeilingEntry (hash) will be null, then, Use virtualInvokers. FirstEntry () to get the first element of the entire TreeMap.

Once the Invoker is retrieved from the selectForKey, the load balancing policy is complete. Subsequent call processes such as obtaining remote call clients are not described.

yourLike and followIs the continuing power of the Solomon_ Shogo shell structure.

Hot historical Articles

  • ๐Ÿ”ฅServerless Microservices elegant shutdown practices

  • ๐Ÿ”ฅ this algorithm can not understand! How are the 9 images presented

  • ๐Ÿ”ฅSpringBoot Mastery – Custom Condition annotations (Series 1)

  • ๐Ÿ”ฅJava is how to take off the beautiful woman’s clothes

  • ๐Ÿ”ฅ High-performance gateway was originally designed this way

  • ๐Ÿ”ฅREST FUL look still don’t understand, you play me!

  • How much ๐Ÿ”ฅServerless affects programmers

  • ๐Ÿ”ฅ How does distributed transaction XID connect all microservices in tandem

  • ๐Ÿ”ฅ Microservices Distributed Transaction TCC core implementation

  • ๐Ÿ”ฅ hundreds of millions of traffic site performance optimization methodology steps

  • ๐Ÿ”ฅ microservice Nacos implements proximity access through CMDB to improve performance

  • ๐Ÿ”ฅ Micro service architecture DNS service registration and discovery mechanism

  • . More and more

Chat ๐Ÿ† technology project stage v | distributed those things…