This is the fifth day of my participation in the August More text Challenge. For details, see:August is more challenging

Redis Cluster request route

Request redirection

The Redis client may calculate the slot value of the key, calculate the node location of the slot, and send the command to the specified node when issuing any key command. However, the problem of MOVED redirection may occur during the whole process. When the command is issued, the key slot has been migrated and the migration process is complete, but the client’s local slot<-> Node mapping cache has not been updated, so Redis Server will respond with a MOVED redirection, which contains the details of the redirection, and the client can use it to update the local cache. You can reissue commands to new nodes.

Note: In Jedis, there is a difference between ASK redirection and MOVED redirection. The ASK redirection means that the client’s local cache is not updated during the migration process and is only used temporarily to make new requests. However, MOVED redirection means that the migration has ended but the local cache has not been refreshed. Need to refresh the client cache with the latest information.

Key slot calculation

The default calculation of the key slot is to use the CRC16 algorithm to obtain the hash value of the key and then divide the remaining 16384 to get the slot location, the general calculation is to use the whole key value, but in some requirements need to carry out batch operations such as pipeline or MGET, MSET and so on, they are not cross-slot. So Redis provides a hash_tag format for naming keys, such as: Test1: ABC} {: test2 such format, in the process of calculating the key value of slot due to will only use use {} tag in the word, the tag keyword is called hash_tag, hash_tag Lua scripts of pipeline operation involved in practice is a good solution.

Jedis client analysis

The Jedis client maintains a group of slot→node mapping relationship, so that the local key-to-node search can be realized, so as to ensure the maximum IO efficiency. MOVED Redirection is responsible for assisting Jedis client to update slot→node mapping. The following is the process of Jedis operating Redis Cluster:

1. In Jedis’ JedisCluster, the cluster slots command is sent to randomly select a node during client initialization to initialize the slot-node cache.

2.JedisCluster parses the response from Cluster Slots, saves the information to the JedisClusterInfoCache, and creates a separate JedisPool connection pool for each node

3. Execute corresponding key commands. This process is relatively complex.

**a.** Calculates the slot and obtains the target node connection from the slots cache, and sends the command.

**b.** If a connection error occurs, re-execute the key command using a random connection and subtract 1 from maxAttempts each time.

**c.** Caught a MOVED redirection error using the cluster slots command to update the slots cache (renewSlotCache method). MOVED redirection error captured. Update the Slots cache using the Cluster slots command (renewSlotCache method).

D. * * * * repeat step 1 ~ 3), until the command is successful, or when maxAttempts < = 0 when thrown Jedis ClusterMaxRedirectionsException anomalies.

The relevant code implementation is as follows (Jedis2.9.0) :

/ / redis. Clients. Jedis. JedisClusterCommand `.
public abstract class JedisClusterCommand<T> {
    // Cluster nodes connect to processors
    private JedisClusterConnectionHandler connectionHandler;
    // The maximum number of retries. The default is 5
    private int maxAttempts;
    private ThreadLocal<Jedis> askConnection = new ThreadLocal();

    public JedisClusterCommand(JedisClusterConnectionHandler connectionHandler, int maxAttempts) {
        this.connectionHandler = connectionHandler;
        this.maxAttempts = maxAttempts;
    }
    // Template callback method
    public abstract T execute(Jedis var1);

    public T run(String key) {
        if (key == null) {
            throw new JedisClusterException("No way to dispatch this command to Redis Cluster.");
        } else {
            return this.runWithRetries(SafeEncoder.encode(key), this.maxAttempts, false.false); }}// There is a retry command
    private T runWithRetries(byte[] key, int attempts, boolean tryRandomNode, boolean asking) {
        / / more than the maximum retries thrown JedisClusterMaxRedirectionsException
        if (attempts <= 0) {
            throw new JedisClusterMaxRedirectionsException("Too many Cluster redirections?");
        } else {
            Jedis connection = null;

            Object var7;
            try {
                if (asking) {// If the key calculates the slot after the first request, redis-server responds to the ASK redirection and executes the ASK redirection logic
                    connection = (Jedis)this.askConnection.get();
                    connection.asking();
                    asking = false;
                } else if (tryRandomNode) {// If this is the first visit or future visits to the Moved redirection, get the active node connection randomly
                    connection = this.connectionHandler.getConnection();
                } else {
                    // Use the slot cache to get the target connection
                    connection = this.connectionHandler.getConnectionFromSlot(JedisClusterCRC16.getSlot(key));
                }

                Object var6 = this.execute(connection);
                return var6;
            } catch (JedisNoReachableClusterNodeException var13) {
                throw var13;
            } catch (JedisConnectionException var14) {
                // The connection is abnormal. Release the connection
                this.releaseConnection(connection);
                connection = null;
                if (attempts <= 1) {

                    this.connectionHandler.renewSlotCache();
                    throw var14;
                }

                var7 = this.runWithRetries(key, attempts - 1, tryRandomNode, asking);
                return var7;
            } catch (JedisRedirectionException var15) {
                if (var15 instanceof JedisMovedDataException) {
                    // If the MOVED redirection exception is displayed, run cluster slots to obtain cluster information and refresh the cache
                    this.connectionHandler.renewSlotCache(connection);
                }

                this.releaseConnection(connection);
                connection = null;
                if (var15 instanceof JedisAskDataException) {
                    asking = true;
                    this.askConnection.set(this.connectionHandler.getConnectionFromNode(var15.getTargetNode()));
                } else if(! (var15instanceof JedisMovedDataException)) {// If Redis Server responds to a Moevd redirection then JedisMovedDataException is thrown. RunWithRetries is a nested call to catch this exception in the method one layer outside. The cluster slots command is sent and the local slot-Node mappings are updated using renewSlotCache.
                    throw new JedisClusterException(var15);
                }
                Retry maxAttempts-1 every time
                var7 = this.runWithRetries(key, attempts - 1.false, asking);
            } finally {
                this.releaseConnection(connection);
            }

            returnvar7; }}Copy the code

Problem analysis:

1. Internally, JedisCluster maintains a mapping between a data slot and cluster nodes, and maintains a separate JedisPool for each node. Each pool has multiple connections.

2. Common exception – JedisClusterMaxRedirectionsException more than the number (redirect) reason is that the node disk drive or connection timeout can throw JedisConnectionException, this exception leads to try again, The exception is thrown when maxAttempts<=0

3. JedisConnectionException, received this exception Jedis will think abnormal node connection, need to update the local random retry JedisClusterInfoCache cache. This exception is thrown when a socket error occurs on a node.

B. Thrown when all commands or Lua script read/write times out.

C. In the old version of Jedis, the timeout of obtaining Jedis object from JedisPool is also thrown. However, after 2.8.1, the timeout of C connection pool is changed to JedisException to avoid triggering random retries.

4.Redis Cluster supports automatic failover, which takes a certain amount of time. When a node is down, all commands directed to this node will trigger random retries. The code is as follows:


public void renewClusterSlots(Jedis jedis) {
    if (!this.rediscovering) {
        try {
            // Obtain the read/write lock
            this.w.lock();
            this.rediscovering = true;
            if(jedis ! =null) {
                try {
                    this.discoverClusterSlots(jedis);
                    return;
                } catch(JedisException var17) { ; }}// If the connection is null trigger the following random retries
            // Get a random connection pool object and send the cluster slots command to get the cluster slots allocation details (internally encapsulated).
            Iterator var2 = this.getShuffledNodesPool().iterator();

            while(var2.hasNext()) {
                JedisPool jp = (JedisPool)var2.next();

                try {
                    jedis = jp.getResource();
                    this.discoverClusterSlots(jedis);
                    return;
                } catch (JedisConnectionException var15) {
                    ;
                } finally {
                    if(jedis ! =null) { jedis.close(); }}}}finally {
            this.rediscovering = false;
            this.w.unlock(); }}}Copy the code

Abnormal operation of a few nodes causes the slots cache to be updated frequently and the cluster slots command to be invoked multiple times. High concurrency consumes the Resources of the Redis node. If the cluster slot<->node mapping is large, the more information returned from Cluster slots, the more bandwidth usage, and the more serious the problem. When a JedisConnectionException occurs, the command is sent five times: 4 retries command +1 Cluster Slots command is executed only once cluster slots because the relink variable ensures that only one thread is allowed to change the cache at a time.