preface

Scan is a redis scan database command, compared to keys command has better performance, scan should be included in the dict section, but I forgot to open a new article to analyze the source code implementation of scan command

The body of the

The scan command is used to iterate over keys in the database. It is a cursor based iteration command. During the first iteration, the cursor is specified as 0, and a new cursor is returned after each iteration.

SCAN cursor [MATCH pattern] [COUNT count] [Type typename]
Copy the code
  • Cursor cursor
  • Pattern Matching
  • Count Returns the number
  • Typename Specifies the type of the value to be scanned

Examples of the scan command are:

  • The SSCAN command is used to iterate over elements in a collection key.
  • The HSCAN command is used to iterate over the key-value pairs in a hash key.
  • The ZSCAN command is used to iterate over elements in an ordered collection (including element members and element score)

The difference between these commands is that the scan command applies to the whole DB, and these commands need to query the corresponding data structure through key and then iterate.

void scanCommand(client *c) {
    unsigned long cursor;
    if (parseScanCursorOrReply(c,c->argv[1],&cursor) == C_ERR) return;
    scanGenericCommand(c,NULL,cursor);
}

void zscanCommand(client *c) {
    robj *o;
    unsigned long cursor;

    if (parseScanCursorOrReply(c,c->argv[2],&cursor) == C_ERR) return;
    if ((o = lookupKeyReadOrReply(c,c->argv[1],shared.emptyscan)) == NULL ||
        checkType(c,o,OBJ_ZSET)) return;
    scanGenericCommand(c,o,cursor);
}
Copy the code

As can be seen from the source code, ZScan needs to obtain the object through key, and then determine whether it is of ZSET type, and finally call the scanGenericCommand method

void scanGenericCommand(client *c, robj *o, unsigned long cursor) {
    int i, j;
    list *keys = listCreate();
    listNode *node, *nextnode;
    long count = 10;
    sds pat = NULL;
    sds typename = NULL;
    int patlen = 0, use_pattern = 0;
    dict *ht;


    serverAssert(o == NULL || o->type == OBJ_SET || o->type == OBJ_HASH ||
                o->type == OBJ_ZSET);

    // if o is null, the index of the entire db is changed
    i = (o == NULL)?2 : 3;

    /* Step 1: Parse options. */
    while (i < c->argc) {
        j = c->argc - i;
        if(! strcasecmp(c->argv[i]->ptr,"count") && j >= 2) {
            if (getLongFromObjectOrReply(c, c->argv[i+1], &count, NULL)
                != C_OK)
            {
                goto cleanup;
            }

            if (count < 1) {
                addReply(c,shared.syntaxerr);
                goto cleanup;
            }

            i += 2;
        } else if(! strcasecmp(c->argv[i]->ptr,"match") && j >= 2) {
            pat = c->argv[i+1]->ptr; patlen = sdslen(pat); use_pattern = ! (pat[0] = =The '*' && patlen == 1);

            i += 2;
        } else if(! strcasecmp(c->argv[i]->ptr,"type") && o == NULL && j >= 2) {
            /* SCAN for a particular type only applies to the db dict */
            typename = c->argv[i+1]->ptr;
            i+= 2;
        } else {
            addReply(c,shared.syntaxerr);
            gotocleanup; }}Copy the code

A list is constructed to hold the keys of the iteration, followed by parsing the parameters

ht = NULL;
    // The default db is used for objects that are not specified
    if (o == NULL) {
        ht = c->db->dict;
        // Use the hashset of the hash result
    } else if (o->type == OBJ_SET && o->encoding == OBJ_ENCODING_HT) {
        ht = o->ptr;
        // Iterating hash and skiplist count needs to be multiplied by 2 to return key-val
    } else if (o->type == OBJ_HASH && o->encoding == OBJ_ENCODING_HT) {
        ht = o->ptr;
        count *= 2;
    } else if (o->type == OBJ_ZSET && o->encoding == OBJ_ENCODING_SKIPLIST) {
        zset *zs = o->ptr;
        ht = zs->dict;
        count *= 2;
    }
Copy the code

Ht is initialized according to the type of o passed in to adapt to commands such as Zscan and HSCAN. By default, null indicates that the entire DB is scanned

if (ht) {
        void *privdata[2];
        // The number of iterations is 10 times that of the hash. To prevent too many iterations of the Hashtable from being sparse
        long maxiterations = count*10;
        
        privdata[0] = keys;
        privdata[1] = o;
        do {
            cursor = dictScan(ht, cursor, scanCallback, NULL, privdata);
        } while (cursor &&
              maxiterations-- &&
              listLength(keys) < (unsigned long)count);
Copy the code

The Hashtable is scanned by dictScan method

unsigned long dictScan(dict *d, unsigned long v, dictScanFunction *fn, dictScanBucketFunction* bucketfn, void *privdata) { dictht *t0, *t1; const dictEntry *de, *next; unsigned long m0, m1; if (dictSize(d) == 0) return 0; d->iterators++; if (! dictIsRehashing(d)) { t0 = &(d->ht[0]); m0 = t0->sizemask; if (bucketfn) bucketfn(privdata, &t0->table[v & m0]); de = t0->table[v & m0]; while (de) { next = de->next; fn(privdata, de); de = next; } / m0 = * * * 2 ^ n - 1 ~ m0 = 0 v * here should be the same * / v | = ~ m0; /** */ v = rev(v) +1 */ v = rev(v); v++; v = rev(v);Copy the code

A hashtable scan is divided into two cases. The first case is when there is no Rehash. Get the slot you want to scan through the cursor, and add all the data under a slot to the list.

Note how the cursor is updated. The cursor is updated by +1 at each high level, in this order:

Binary: 000 100 010 110 001 101 111 Decimal: 0 4 2 6 1 5 7

If the original hashtable had 4 slots, it would be as follows:

00, 01 November

With eight slots, it looks like this:

000 100 010 110 001 101 011 111

The original data will be partitioned to the higher level. If we increase by 1 from the higher level, the repeated traversal will not occur during capacity expansion, but may occur during capacity reduction

else {
        t0 = &d->ht[0];
        t1 = &d->ht[1];

        // Make sure t0 is a small table
        if (t0->size > t1->size) {
            t0 = &d->ht[1];
            t1 = &d->ht[0];
        }

        m0 = t0->sizemask;
        m1 = t1->sizemask;

        if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);
        de = t0->table[v & m0];
        while (de) {
            next = de->next;
            fn(privdata, de);
            de = next;
        }

        do {
            if (bucketfn) bucketfn(privdata, &t1->table[v & m1]);
            de = t1->table[v & m1];
            while (de) {
                next = de->next;
                fn(privdata, de);
                de = next;
            }

            /* Increment the reverse cursor not covered by the smaller mask.*/
            v |= ~m1;
            v = rev(v);
            v++;
            v = rev(v);

            / * * * as m1 is a big table So on v | = ~ v is highest when the m1 0 * in additional to rev (v) the not add will appear inconsistent * ^ m1 (m0) & v is to judge whether the high of the v is 1 * 00100 incoming cursor, First traversal * 10100 Second traversal * 01100 Third traversal * 11100 Fourth traversal * 00010 Returned cursor */
        } while (v & (m0 ^ m1));
    }
    
    d->iterators--;

    return v;
Copy the code

When a rehash occurs, the original table data is collected first and then the new table is collected, visible in comments

else if (o->type == OBJ_SET) {
        int pos = 0;
        int64_t ll;

        while(intsetGet(o->ptr,pos++,&ll))
            listAddNodeTail(keys,createStringObjectFromLongLong(ll));
        cursor = 0;
    } else if (o->type == OBJ_HASH || o->type == OBJ_ZSET) {
        unsigned char *p = ziplistIndex(o->ptr,0);
        unsigned char *vstr;
        unsigned int vlen;
        long long vll;

        // loop through the ziplist into the list
        while(p) { ziplistGet(p,&vstr,&vlen,&vll); listAddNodeTail(keys, (vstr ! =NULL)? createStringObject((char*)vstr,vlen) :
                                 createStringObjectFromLongLong(vll));
            p = ziplistNext(o->ptr,p);
        }
        cursor = 0;
    } else {
        serverPanic("Not handled encoding in SCAN.");
    }

Copy the code

Continuing with the scanGenericCommand code, the set and Ziplist are easy to traverse all nodes in one direction, and the cursor ends up being 0

node = listFirst(keys);
    while (node) {
        robj *kobj = listNodeValue(node);
        nextnode = listNextNode(node);
        int filter = 0;

        // Delete the element from keys when filter is 1
        if(! filter && use_pattern) {if (sdsEncodedObject(kobj)) {
                // The re match failed
                if(! stringmatchlen(pat, patlen, kobj->ptr, sdslen(kobj->ptr),0))
                    filter = 1;
            } else {
                char buf[LONG_STR_SIZE];
                int len;

                serverAssert(kobj->encoding == OBJ_ENCODING_INT);
                len = ll2string(buf,sizeof(buf),(long)kobj->ptr);
                if(! stringmatchlen(pat, patlen, buf, len,0)) filter = 1; }}// Filter unmatched types
        if(! filter && o ==NULL && typename){
            robj* typecheck = lookupKeyReadWithFlags(c->db, kobj, LOOKUP_NOTOUCH);
            char* type = getObjectTypeName(typecheck);
            if (strcasecmp((char*) typename, type)) filter = 1;
        }

        // Filter expired keys
        if(! filter && o ==NULL && expireIfNeeded(c->db, kobj)) filter = 1;
        
        // Delete a node
        if (filter) {
            decrRefCount(kobj);
            listDelNode(keys, node);
        }

        // Since hash or zset contains key-val, skip its value node and delete it if filtered
        if (o && (o->type == OBJ_ZSET || o->type == OBJ_HASH)) {
            node = nextnode;
            nextnode = listNextNode(node);
            if (filter) {
                kobj = listNodeValue(node);
                decrRefCount(kobj);
                listDelNode(keys, node);
            }
        }
        node = nextnode;
    }

    // Return to the client
    addReplyArrayLen(c, 2);
    addReplyBulkLongLong(c,cursor);

    addReplyArrayLen(c, listLength(keys));
    while((node = listFirst(keys)) ! =NULL) {
        robj *kobj = listNodeValue(node);
        addReplyBulk(c, kobj);
        decrRefCount(kobj);
        listDelNode(keys, node);
    }

Copy the code

The last step is to filter the data according to match and type, mark the unqualified data filter as true, and finally return the remaining data to the client after deleting the node.

conclusion

Cursor reflects the subscripts of the current Hashtable. Increasing the cursor position prevents rehash data location redistribution after expansion, which may cause duplicate keys in the next scan command. However, duplicate keys may occur in the scaling down of the scan command.