We all know in third grade that Redis is a pure memory storage middleware, so how does it go down? Will the data be lost? The answer is yes. In fact, Redis provides two data persistence mechanisms — RDB and AOF — to ensure that data is not lost during downtime.

RDB periodically dumps all the data in the memory to the disk, so that the previous data can be directly loaded at the next startup. The problem with RDB is that it can only provide a snapshot of the data at a certain point in time, which cannot guarantee that the data after the snapshot is created will not be lost. Therefore, Redis also provides AOF. Aof is an Append Only File, which writes all changes to disk one by one. In this blog we will focus on the implementation of RDB persistence and save aOF for the next blog.

RDB related source code

In Redis, RDB saving is triggered in the following ways.

The save command

The RDB file will be generated by calling the save command directly under redis-cli. If there is no child process generating RDB in the background, rdbSave() will be called to generate the RDB file and save it to disk.

void saveCommand(client *c) {
    // Check whether there is a process executing save in the background, if there is, stop executing save.
    if (server.child_type == CHILD_TYPE_RDB) {
        addReplyError(c,"Background save already in progress");
        return;
    }
    rdbSaveInfo rsi, *rsiptr;
    rsiptr = rdbPopulateSaveInfo(&rsi);
    if (rdbSave(server.rdb_filename,rsiptr) == C_OK) {
        addReply(c,shared.ok);
    } else{ addReplyErrorObject(c,shared.err); }}Copy the code

Redis rdbSave function is the real RDB persistence function, its general flow is as follows:

  1. Start by creating a temporary file.
  2. Create and initialize Rio. Rio is a redis abstraction of IO that provides read, write, flush, checksum… Methods.
  3. Call rdbSaveRio() to write the current Redis memory information to a temporary file in full.
  4. The fflush, fsync, and fclose interfaces are called to write the files to disk.
  5. Use rename to rename temporary files to formal RDB files.
  6. Clear server.dirty, which is used to record how many data changes have been made since the last RDB generation, and will be used in serverCron.

The specific code is as follows:

/* RDB disk write operation */
int rdbSave(char *filename, rdbSaveInfo *rsi) {
    char tmpfile[256];
    char cwd[MAXPATHLEN]; /* Current working dir path for error messages. */
    FILE *fp = NULL;
    rio rdb;
    int error = 0;

    snprintf(tmpfile,256."temp-%d.rdb", (int) getpid());
    fp = fopen(tmpfile,"w");
    if(! fp) {char *cwdp = getcwd(cwd,MAXPATHLEN);
        serverLog(LL_WARNING,
            "Failed opening the RDB file %s (in server root dir %s) "
            "for saving: %s",
            filename,
            cwdp ? cwdp : "unknown",
            strerror(errno));
        return C_ERR;
    }

    rioInitWithFile(&rdb,fp);  // Initialize Rio,
    startSaving(RDBFLAGS_NONE);

    if (server.rdb_save_incremental_fsync)
        rioSetAutoSync(&rdb,REDIS_AUTOSYNC_BYTES);
    // Dump memory data to RDB
    if (rdbSaveRio(&rdb,&error,RDBFLAGS_NONE,rsi) == C_ERR) {
        errno = error;
        goto werr;
    }

    /* Flush the data to disk to make sure there is no data left in the operating system buffer */
    if (fflush(fp)) goto werr;
    if (fsync(fileno(fp))) goto werr;
    if (fclose(fp)) { fp = NULL; goto werr; }
    fp = NULL;
    
    /* Rename temporary files to official file names */
    if (rename(tmpfile,filename) == - 1) {
        char *cwdp = getcwd(cwd,MAXPATHLEN);
        serverLog(LL_WARNING,
            "Error moving temp DB file %s on the final "
            "destination %s (in server root dir %s): %s",
            tmpfile,
            filename,
            cwdp ? cwdp : "unknown",
            strerror(errno));
        unlink(tmpfile);
        stopSaving(0);
        return C_ERR;
    }

    serverLog(LL_NOTICE,"DB saved on disk");
    server.dirty = 0;
    server.lastsave = time(NULL);
    server.lastbgsave_status = C_OK;
    stopSaving(1);
    return C_OK;

werr:
    serverLog(LL_WARNING,"Write error saving DB on disk: %s", strerror(errno));
    if (fp) fclose(fp);
    unlink(tmpfile);
    stopSaving(0);
    return C_ERR;
}
Copy the code

Because Redis is a single-threaded model, it cannot process requests during the save process. The single-threaded model can save without data changes, but the save may take a long time, which causes Redis to fail to process read and write requests properly, which is very fatal for online services. So Redis also provides the BGSave command, which can perform the save operation without affecting normal read and write. Let’s take a look at the implementation.

Bgsave command

Bgsave provides the function of generating RDB files in the background, bg means background, how to implement it? Fork () generates a child process and completes the save process in the child process.

void bgsaveCommand(client *c) {
    int schedule = 0;

    /* The SCHEDULE option changes the behavior of BGSAVE when an AOF rewrite * is in progress. Instead of returning an error a BGSAVE gets scheduled. */
    if (c->argc > 1) {
        if (c->argc == 2 && !strcasecmp(c->argv[1]->ptr,"schedule")) {
            schedule = 1;
        } else {
            addReplyErrorObject(c,shared.syntaxerr);
            return;
        }
    }

    rdbSaveInfo rsi, *rsiptr;
    rsiptr = rdbPopulateSaveInfo(&rsi);

    if (server.child_type == CHILD_TYPE_RDB) {
        addReplyError(c,"Background save already in progress");
    } else if (hasActiveChildProcess()) {
        if (schedule) {
            server.rdb_bgsave_scheduled = 1;  // If bgSave is already under execution, the execution will be placed in serverCron
            addReplyStatus(c,"Background saving scheduled");
        } else {
            addReplyError(c,
            "Another child process is active (AOF?) : can't BGSAVE right now. "
            "Use BGSAVE SCHEDULE in order to schedule a BGSAVE whenever "
            "possible."); }}else if (rdbSaveBackground(server.rdb_filename,rsiptr) == C_OK) {
        addReplyStatus(c,"Background saving started");
    } else{ addReplyErrorObject(c,shared.err); }}int rdbSaveBackground(char *filename, rdbSaveInfo *rsi) {
    pid_t childpid;

    if (hasActiveChildProcess()) return C_ERR;

    server.dirty_before_bgsave = server.dirty;
    server.lastbgsave_try = time(NULL);
    // Create a child process, redisFork is actually the encapsulation of fork
    if ((childpid = redisFork(CHILD_TYPE_RDB)) == 0) {
        int retval;

        /* Child process */
        redisSetProcTitle("redis-rdb-bgsave");
        redisSetCpuAffinity(server.bgsave_cpulist);
        retval = rdbSave(filename,rsi);
        if (retval == C_OK) {
            sendChildCowInfo(CHILD_INFO_TYPE_RDB_COW_SIZE, "RDB");
        }
        exitFromChild((retval == C_OK) ? 0 : 1);
    } else {
        /* Parent process */
        if (childpid == - 1) {
            server.lastbgsave_status = C_ERR;
            serverLog(LL_WARNING,"Can't save in background: fork: %s",
                strerror(errno));
            return C_ERR;
        }
        serverLog(LL_NOTICE,"Background saving started by pid %ld", (long) childpid);
        server.rdb_save_time_start = time(NULL);
        server.rdb_child_type = RDB_CHILD_TYPE_DISK;
        return C_OK;
    }
    return C_OK; /* unreached */
}
Copy the code

Bgsave simply puts the save process into the child process so that it does not block the parent process. So one of the questions I had when I first looked at this was, how does the subprocess keep a snapshot at a certain point in time while the parent process is constantly reading and writing to memory? There is no special handling in this reDID and it relies on the fork() provided by the operating system. When a process calls fork(), the operating system makes a copy of the current process, including its memory contents. So we can assume that as soon as fork() succeeds, the data in the current memory is copied in full. In order to improve fork() performance, the kernel uses copy-on-write technology. Only when the copied data is changed by the parent or child process will the data be copied.

serverCron

Both of the above methods of generating RDBS are passively triggered, and Redis also provides a mechanism for generating RDBS on a regular basis. Redis has the following configuration for RDB generation:

save <seconds> <changes> 
# # for example
save 3600 1 RDB is generated if there is 1 write book in 3600 seconds
save 300 100 RDB is generated if 100 books are written in 300 seconds
save 60 10000 RDB is generated if there are 1000 write books in 60 seconds
Copy the code

The periodic RDB generation implementation is in serverCron in server.c. ServerCron is a scheduled eventloop task executed by Redis every time, which contains RDB and AOF execution logic. RDB is as follows:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    /*. Leave out other code */
    /* Check whether bgSave and AOF overrides are in progress */
    if (hasActiveChildProcess() || ldbPendingChildren())
    {
        run_with_period(1000) receiveChildInfo();
        checkChildrenDone();
    } else {
        /* If there is not a background saving/rewrite in progress check if * we have to save/rewrite now. */
        for (j = 0; j < server.saveparamslen; j++) {
            struct saveparam *sp = server.saveparams+j;

            /* Check that the criteria for executing save are met */
            if (server.dirty >= sp->changes &&
                server.unixtime-server.lastsave > sp->seconds &&
                (server.unixtime-server.lastbgsave_try >
                 CONFIG_BGSAVE_RETRY_DELAY ||
                 server.lastbgsave_status == C_OK))
            {
                serverLog(LL_NOTICE,"%d changes in %d seconds. Saving...",
                    sp->changes, (int)sp->seconds);
                rdbSaveInfo rsi, *rsiptr;
                rsiptr = rdbPopulateSaveInfo(&rsi);
                rdbSaveBackground(server.rdb_filename,rsiptr);
                break; }}}/*. Leave out other code */
    Rdb_bgsave_scheduled =1 if the last bgSave was triggered and a process was already in progress, it is marked rdb_BGSAVE_scheduled =1 and placed in serverCron * to execute */
    if(! hasActiveChildProcess() && server.rdb_bgsave_scheduled && (server.unixtime-server.lastbgsave_try > CONFIG_BGSAVE_RETRY_DELAY || server.lastbgsave_status == C_OK)) { rdbSaveInfo rsi, *rsiptr; rsiptr = rdbPopulateSaveInfo(&rsi);if (rdbSaveBackground(server.rdb_filename,rsiptr) == C_OK)
            server.rdb_bgsave_scheduled = 0;
    }
    /*. Leave out other code */
}
Copy the code

RDB file format

The RDB file format is relatively simple, as follows:

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- #52 45 44 49 53# magic"REDIS"
30 30 30 33ASCII RDB version number"0003" = 3---------------------------- FA # Secondary field $string-encoded-key # may contain multiple meta information $string-encoded-value # such as redis version number, creation time, memory usage... . ---------------------------- FE00                       Db number = 00FB # identifies the db size $length-encoded-int         # hash table size (int)
$length-encoded-int         # expire hash table size (int)-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # start here is the specific data FD $k - vunsigned-int# How many seconds will the data expire (4byte unsigned int$value-type # specifies the value data type.1 byte)
$string-encoded-key         # key, redis string type (SDS)
$encoded-value              # value, the type depends on $value-type
----------------------------
FC $unsigned long# how many milliseconds is the data out of date (8byte unsigned long$value-type # specifies the value data type.1 byte)
$string-encoded-key         # key, redis string type (SDS)
$encoded-value              # value, the type depends on $value-type
----------------------------
$value-type                 # redis data key-value, no expiration time
$string- encoded - $encoded key - value -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- FE # $length - encoding FE data ends before a db, Then add the length of the data ----------------------------... # Other redis db k-V data,... FF # FF end identifier of the RDB file8-byte-checksum ##8bCRC64 checksum of yteCopy the code

conclusion

To a certain extent, RDB ensures that the data of redis instances will not be lost in the event of abnormal downtime. Because the RDB snapshot is generated periodically, the changes generated after the snapshot cannot be added to the RDB file, so RDB cannot completely guarantee the data will not be lost. Therefore, Redis also provides another data persistence mechanism, AOF. We will see that in the next article. In addition, bgSave execution is highly dependent on the operating system’s fork() mechanism, which can also have a significant performance cost. See hidden overhead of Linux fork – Outdated fork.

The resources

  1. Redis RDB persistence details
  2. RDB. Fnordig. DE/file_format…
  3. Redis Persistence
  4. Linux fork hidden overhead – Obsolete fork

This article is a Redis source code analysis series of blog posts, but also with the corresponding Redis Chinese annotated version, there are students who want to further study Redis, welcome star and attention. Redis Chinese Annotated version warehouse: github.com/xindoo/Redi… IO /s/1h Redis source code analysis column: zxs. IO /s/1h If you find this article useful, welcome to three links.