Checkpoint is an important concept in PostgreSql. This article explains what a checkpoint is, why it exists, and how it works.

What is:

When playing games, people often need to save, which is translated from the word checkpoint.

In the official Postgresql documentation, checkpoint is defined as follows:

A checkpoint is a point in the write-ahead log sequence at which all data files have been updated to reflect the information in the log. All data files will be flushed to disk.

A checkpoint is a point in a WRITE-ahead log (WAL). All data in the database before this point is the same as that in WAL logs.

It’s a bit tricky and confusing: what information is reflected in WAL logs? Why is all the data in the database different from what is reflected in WAL logs?

With that question in mind, let’s move on to the next part.

Why checkpoint

Before explaining this, let’s add a little context.

How does PostgreSQL write data

Take an SQL statement as an example: INSERT INTO TBL VALUES(1); The execution process is shown in the figure below

  1. Write the INSERT 1 operation to the WAL log
  2. Modify information about the page in the shared buffer (if the page is not in the buffer, fetch it from disk)
  3. The background process will flush the shared buffer to disk at some point (highlighted in red). But this does not happen immediately; it is an asynchronous operation.

This answers the first question above: WAL logs can be thought of as redo logs. WAL logs record all operations as they are. But WAL logs are actually physical logs that record changes made to a file or a block.

But this raises a few questions: why WAL logs, and what does it do?

The PostgreSQL fault is recovered

In this chapter we will look at the utility of WAL logging. For ease of description, the information in WAL is simplified here to make it more readable. Once again, if the background process is flushing data from the shared buffer to disk, your computer crashes before it finishes. Your previous SQL INSERT INTO TBL VALUES(1) appears to be invalid.

After a while, your computer restarts, and PG enters recovery mode to make sure that your last SQL was not written in vain. To your great pleasure, WAL logs log redo and restores the data to the database

This is where WAL logs function as redo logs. But two problems arise:

  1. How do I know when the database crashes and where WAL logs should be replayed?
  2. WAL logs just go on and on and on and on and on and on and on and on and on.

Play:

Finally, we introduce the main character: Checkpoint. A checkpoint is a point in a WAL log. What does this point mean?

All dirty pages in the shared buffer prior to this site are flushed into storage

So how did this site come about?

Yes, it’s caused by the checkpoint operation. The checkpoint operation writes checkpoint points to WAL logs, although it is a bit confusing to say that checkpoint is both a noun and a verb.

Let’s look at the checkpoint operation

  1. The checkpoint operation records the start of the checkpoint as the redo point.
  2. Checkpoint flusher data from the shared buffer to disk
  3. SQL insert 3
  4. Checkpoint flush is complete. Data before the redo point is flushed to disk (data 1 and 2).
  5. The checkpoint point (red) is recorded in WAL logs, indicating that the checkpoint operation is complete. The checkpoint records information such as the value of the redo point (where to start the redo).
  6. Record the latest checkpoint in the pg_control file

If the database is restored, the database finds the latest checkpoint in the PG_control file and replays the log from checkpoint to redo point.

Note that data 1 and 2 have been persisted to disk in checkpoint, and WAL logs only need to replay INSERT 3.

This brings us to the final question from the previous section: where should WAL logs be replayed from?

There is only one question left, is WAL logging going on forever and not being cleaned up?

WAL logs generated before redo Point are no longer used. WAL logs generated before redo Point are no longer used. WAL logs generated before redo Point are no longer used.

Therefore, the second function of checkpoint is that WAL logs prior to this point can be reclaimed.

Finally, let’s review two important things checkpoint does:

  1. The redo point is recorded in checkpoint. All data before the redo point is flushed and persistent storage is complete
  2. WAL logs before redo Point can be cleaned and reclaimed

Checkpoint implementation in code

Take 13.2, the latest version of PostgreSQL, as an example to explain the checkpoint implementation. In order to make it easier to understand, the following code will be simplified to remove locking and so on.

First, we navigate to the xlog.c file’s CreateCheckPoint function, which, as its name suggests, performs a checkpoint operation.

According to our logic, checkpoint should create a redo Point first.

curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos); . . freespace =INSERT_FREESPACE(curInsert);
if (freespace == 0)
{
        if (XLogSegmentOffset(curInsert, wal_segment_size) == 0)
                curInsert += SizeOfXLogLongPHD;
        else
                curInsert += SizeOfXLogShortPHD;
}
checkPoint.redo = curInsert;
Copy the code

All this code does is find the current last XLOG record location and calculate the next valid XLOG record location.

This is then done in the CheckPointGuts function.

/* * Flush all data in shared memory to disk, and fsync * * This is the common code shared between regular checkpoints and * recovery restartpoints. */
static void
CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
{
	CheckPointCLOG();
	CheckPointCommitTs();
	CheckPointSUBTRANS();
	CheckPointMultiXact();
	CheckPointPredicate();
	CheckPointRelationMap();
	CheckPointReplicationSlots();
	CheckPointSnapBuild();
	CheckPointLogicalRewriteHeap();
	CheckPointBuffers(flags);	/* performs all required fsyncs */
	CheckPointReplicationOrigin();
	/* We deliberately delay 2PC checkpointing as long as possible */
	CheckPointTwoPhase(checkPointRedo);
}
Copy the code

After the flush is complete, checkpoint should be written to WAL as a log.

/* * Now insert the checkpoint record into XLOG. */
XLogBeginInsert();
XLogRegisterData((char *) (&checkPoint),sizeof(checkPoint));
recptr = XLogInsert(RM_XLOG_ID, shutdown ? XLOG_CHECKPOINT_SHUTDOWN : XLOG_CHECKPOINT_ONLINE);
XLogFlush(recptr);
Copy the code

The checkpoint has been recorded in the WAL log. The last thing you need to do is update the checkpoint location information in the PG_control file.

/* * Update the control file. */
if (shutdown)
        ControlFile->state = DB_SHUTDOWNED;
ControlFile->checkPoint = ProcLastRecPtr;
ControlFile->checkPointCopy = checkPoint;
ControlFile->time = (pg_time_t) time(NULL);

/* crash recovery should always recover to the end of WAL */
ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
ControlFile->minRecoveryPointTLI = 0;

ControlFile->unloggedLSN = XLogCtl->unloggedLSN;

UpdateControlFile(); // Update the control file
Copy the code

When does checkpoint run

This part is not the focus of this article, so I will briefly mention it.

  1. The system periodically runs checkpoint. You can set the checkpoint interval.
  2. Manually run the checkpoint command to perform a checkpoint operation.