The traversal process of Redis dictionary is quite complicated, and there is very little analysis and explanation on this part on the Internet. I also spent a lot of time sorting out the details of the source code to present my personal understanding of dictionary traversal logic to the reader. Readers may have a better understanding of dictionary traversal than I do, and please feel free to comment.

Edit as you go through it

We know that the backbone of the Redis object tree is a dictionary, which can be large if there are many objects. When we use the keys command to search for keys for a given pattern, it traverses the entire trunk dictionary. It is important to note that during traversal, if a key that meets the pattern matching criteria is found, it is also necessary to determine whether the object that the key points to has expired. If it is out of date, the key needs to be removed from the trunk dictionary.

void keysCommand(client *c) {
    dictIterator *di; / / the iterator
    dictEntry *de; // Iterates the current entry
    sds pattern = c->argv[1]->ptr; // matching mode parameters for keys
    int plen = sdslen(pattern);
    int allkeys; // Whether to get all keys for instructions like keys *
    unsigned long numkeys = 0;
    void *replylen = addDeferredMultiBulkLength(c);

    // why safe? 
    di = dictGetSafeIterator(c->db->dict);
    allkeys = (pattern[0] = =The '*' && pattern[1] = ='\ 0');
    while((de = dictNext(di)) ! =NULL) {
        sds key = dictGetKey(de);
        robj *keyobj;

        if (allkeys || stringmatchlen(pattern,plen,key,sdslen(key),0)) {
            keyobj = createStringObject(key,sdslen(key));
            // Delete the element if it expires
            if (expireIfNeeded(c->db,keyobj) == 0) {
                addReplyBulk(c,keyobj);
                numkeys++;
            }
            decrRefCount(keyobj);
        }
    }
    dictReleaseIterator(di);
    setDeferredMultiBulkLength(c,replylen,numkeys);
}
Copy the code

So, have you thought about the difficulty of having to modify the dictionary while iterating through it, and the pointer safety issue?

Repeated traversal

The dictionary will be gradually migrated during expansion, and the old and new Hashtables will exist. The two hashtables need to be traversed sequentially, first traversing the old hashtable and then traversing the new hashtable. If a rehashStep is performed during a walk to move elements from an old hashtable that has already been walked over to the new hashtable, does the walk repeat elements? This is where traversal comes in. Let’s look at how Redis solves this problem.

Structure of an iterator

Redis provides two types of iterators for dictionary traversal, one is safe and the other is unsafe.

typedef struct dictIterator {
    dict *d; // Target dictionary object
    long index; // The slot position currently traversed is initialized to -1
    int table; // ht[0] or ht[1]
    int safe; // This property is critical because it indicates whether the iterator is safe
    dictEntry *entry; // The object to which the iterator is currently pointing
    dictEntry *nextEntry; // The object to which the iterator points next
    long long fingerprint; // Iterator fingerprint, where the dictionary is modified during iteration
} dictIterator;

// Get an unsafe iterator, a read-only iterator that allows rehashStep
dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));

    iter->d = d;
    iter->table = 0;
    iter->index = - 1;
    iter->safe = 0;
    iter->entry = NULL;
    iter->nextEntry = NULL;
    return iter;
}

// Get a security iterator that allows expiration processing to be triggered and disallows rehashStep
dictIterator *dictGetSafeIterator(dict *d) {
    dictIterator *i = dictGetIterator(d);

    i->safe = 1;
    return i;
}
Copy the code

The “safety” of an iterator means that the dictionary can be looked up and modified during traversal without fear, because look-ups and modifications trigger expiration judgments and delete internal elements. Another meaning of “safe” is that there will be no element duplication during iteration, and rehashStep is disabled to ensure that there is no duplication.

An “unsafe” iterator means that the dictionary is read-only during the traverse and you cannot modify it. You can only call dictNext to continuously traverse the dictionary and cannot call any function that may trigger the expiration judgment. The benefit, however, is that it doesn’t affect rehash, and the cost is that the iterated elements can be duplicated.

The safe iterator marks the dictionary at the beginning of the iteration, with which the rehashStep is not executed and the iteration element is not repeated.

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx;
    // This is the tag that represents the number of security iterators currently added to the dictionary
    unsigned long iterators;
} dict;

// Disable rehash if there are safe iterators
static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}
Copy the code

The iterative process

Safe iterators allow elements to be deleted during traversal, meaning that elements in the attached list below the first array of the dictionary might be removed, and the next pointer to the element might change. Does this affect the iteration process? Let’s take a closer look at the code logic for iterating functions.

dictEntry *dictNext(dictIterator *iter)
{
    while (1) {
        ifDictht *ht = &iter->d->ht[iter->table]; dictht *ht = &iter->d->ht[iter->table]if(iter->index == -1 && iter->table == 0) {// First traversal, just entered the traversal process // that is, the list below the first element of the ht[0] arrayif(iter->safe) {// Make the dictionary safe to disable the dictionaryrehash
                  iter->d->iterators++;
                } elseIter ->fingerprint = dictFingerprint(iter->d); // If there is any change in the dictionary during traversal, the fingerprint will change iter->fingerprint = dictFingerprint(iter->d); } } iter->index++; // index=0 to enter the first slotif(iter->index >= (long) ht->size) {// The last slot is iteratedif(dictIsRehashing(iter->d) && iter->table == 0) {// If atrehashGo through the second hashtable iter->table++; iter->index = 0; ht = &iter->d->ht[1]; }else{// End traversalbreak; Iter ->entry = ht->table[iter->index]; }elseIter ->entry = iter->nextEntry; }if(iter->entry) {// It is very important to record the next element in the iterator. // In case the current element is expired and deleted during safe iteration, the next element can not be foundrehash// The old list will be split in two, and then reattached to two slots in the new array. // The result is that the current list will be iterated over again. // IfrehashIter ->nextEntry = iter->entry->next;returniter->entry; }}returnNULL; } // To release the iterator after traversal, the safe iterator needs to undisable the dictionaryrehashVoid dictReleaseIterator(dictIterator *iter) {// Non-secure iterators also need to check for fingerprints. If they change, the server will fail.if(! (iter->index == -1 && iter->table == 0)) {if(iter->safe) iter->d->iterators--; // Remove the banrehashThe tagelseassert(iter->fingerprint == dictFingerprint(iter->d)); } zfree(iter); } // If only the value of one element is changed, the fingerprint will be changed. DictFingerprint (dict * D) {long long integers[6],hash = 0;
    int j;

    integers[0] = (long) d->ht[0].table;
    integers[1] = d->ht[0].size;
    integers[2] = d->ht[0].used;
    integers[3] = (long) d->ht[1].table;
    integers[4] = d->ht[1].size;
    integers[5] = d->ht[1].used;

    for (j = 0; j < 6; j++) {
        hash += integers[j];
        hash = (~hash) + (hash< < 21);hash = hash ^ (hash> > 24);hash = (hash + (hash< < + (3))hash< < 8);hash = hash ^ (hash> > 14);hash = (hash + (hash< < 2)) + (hash< < 4);hash = hash ^ (hash> > 28);hash = hash + (hash< < 31); }return hash;
}
Copy the code

It is worth noting that the dictionary is rehashed when it is expanded, migrating the linked list from the old array to the new one. A linked list in a specific slot can be migrated to only two slots in the new array.

hash mod 2^n = k
hash mod 2^(n+1) = k or k+2^n
Copy the code

Iterator selection

Except that the keys directive uses a safe iterator because the result is not allowed to be repeated. Are there other places where safe iterators are used, and when are unsafe iterators appropriate for traversal?

To put it more simply, use SafeIterator if repetition is not allowed during traversal, as in the following two cases

  1. Bgaofrewrite needs to persist over all the object conversion instructions, with no room for duplication
  2. Bgsave also needs to iterate over all objects to persist, again without allowing duplication

If you need to deal with expired elements and modify the dictionary during traversal, you must also use SafeIterator because unsafe iterators are read-only.

In other cases, where individual elements are allowed to repeat during traversal and no structural changes to the dictionary are required, unsafe iterators are used.

thinking

Keep thinking about what rehash does to the non-secure traversal process. What elements are repeated, and how much or how little is repeated?

Scan the wechat public account “code hole” and take you to “code out the future” step by step.