What is a…

Write Ahead Log (WAL) is a common method in the database system to ensure atomicity and persistence of data operations.

In computer science, write-ahead logging (WAL) is a set of techniques used in relational database systems to provide atomicity and persistence (two of the ACID properties). On systems that use WAL, all changes are written to a log file before committing.

Log files typically contain redo and undo information. The purpose of this can be illustrated by an example. Suppose a program is in the process of performing some operation and the machine loses power. At reboot, the program may need to know whether the action performed at that time was successful, partially successful, or failed. If WAL is used, the program can examine log files and compare what was planned to be done in the event of a sudden power failure to what was actually done. Based on this comparison, the program can decide whether to undo what it has done, continue with what it has done, or leave it as it was.

WAL allows in-place updates to the database. Another approach used to implement atomic updating is shadow paging, which is not in-place. The main advantage of in-place updates is to reduce index and block list changes. ARIES is a common algorithm for the WAL family of technologies. In file systems, WAL is often called journaling. PostgreSQL also uses WAL to provide point-in-time recovery and database replication features.

The backup

Let’s think about if we want to ensure that an operation on a piece of data can be recovered. What can be done? You don’t have to think about how the database is implemented, and you don’t have to think too much about it. It’s actually a very simple problem, and we deal with it all the time. The simplest way is to back up data: when I need to update a piece of data, I back it up in one place and then update it. If the update fails, I can write back from the backup data. This ensures that the transaction is rolled back and that the data operation is atomic. In fact, SQLite implemented atomic transactions in this way before WAL was introduced. It is called rollback Journal. The principle of ROLLBACK Journal is as follows: Before making changes to the data in the database file, back up the data in the page where the changes are made in another location, and then write the changes to the database file; If the transaction fails, copy the backup data back and undo the changes. If the transaction succeeds, the backup data is deleted and the changes are committed.

WAL

Continue with the question above? How is data recoverable (atomicity) and successfully committed data persisted to disk (persistence)? Another mechanism is WAL. WAL is also simple: changes are not written directly to the database file, but to another file called WAL. If the transaction fails, records in WAL are ignored and changes are undone. If the transaction is successful, it will be written back to the database file at a later time to commit the changes.

The advantages of…

  1. Read and write can be executed completely concurrently without blocking each other (but still not concurrently between writes).
  2. WAL, in most cases, has better performance (because you don’t have to write two files every time you write).
  3. Disk I/O behavior is more predictable.
  4. Fewer fsync() operations are used, reducing the vulnerability of the system.

Improve performance

As we all know, the biggest performance challenge for databases is disk reading and writing. Many predecessors have struggled to provide data storage performance, proposing and experimenting with one method after another. In fact, all the schemes are finally summed up in three kinds: random read and write to change the sequential read and write, buffer single read and write to change the batch read and write, single thread read and write to change the concurrent read and write. WAL is actually an implementation of these two ideas. On the one hand, WAL records transaction updates and writes random dirty pages into sequential log flushing through WAL. On the other hand, WAL changes single disk flushing into buffer batch flushing through buffer. In addition, concurrent synchronization can be used to synchronize WAL data to final data. This greatly improves database write performance, so WAL’s write capability sets an upper limit on overall database performance, especially at high concurrency.

checkpoint

As mentioned above, database systems using WAL do not flush each new WAL log into the database file. They generally accumulate a certain amount of WAL logs and then write them in batches, usually in pages, which are written to disk. Synchronizing A WAL file with a database file is called a checkpoint. It occurs when a WAL file has accumulated a certain number of pages and is modified. Of course, some systems can also checkpoint manually. WAL files can be cleared after checkpoint execution to ensure that WAL files are not too large to degrade performance.

Some database system read requests can also use WAL. By reading the latest WAL logs, you can obtain the latest state of data.

The specific implementation

The WAL mechanism is commonly used in common databases, although it may vary from system to system and implementation. Mysql, SQLite, PostgresQL, ETCD, hbase, ZooKeeper, ElasticSearch, and more all have their own implementations.

mysql

WAL for mysql, you’re probably familiar with it. Mysql implements WAL through redo and undo logs. The redo log is called a redo log. Each operation is written to the redo log before the data changes, so that operations can continue after a restart in the event of a power failure. The undo log is called the undo log. When some changes cannot be completed halfway through execution, the undo log can be used to restore the state between the changes. Mysql uses the redo log to repair data (transaction persistence) in the event of a Crash restart, and the undo log to ensure atomicity of transactions.

zookeeper

Like most distributed systems, ZooKeeper also has write-ahead-log (WAL). For each update operation, ZooKeeper writes WAL first, then updates data in the memory, and notifies the Client of the update result. In addition, ZooKeeper periodically takes a Snapshot of the directory tree in memory and drops it to disk. The main purpose of this is to persist data and to speed up the recovery after a restart. It would be slower if you used Replay WAL to restore data.

elasticsearch

Elasticsearch cannot guarantee that data will survive a power outage or even a normal exit without using fsync to flush data from the file system cache to disk. To ensure reliability, you need to ensure that data changes are persisted to disk.

When updating the index dynamically, ElasticSearch says that a full lift will flush segments to disk and write a commit point containing a list of all segments. Elasticsearch uses this commit point to determine which segments belong to the current shard while starting or re-opening an index.

Even if near-real-time search is enabled with refresh per second, ElasticSearch still needs to commit in full frequently to ensure it can recover from failures. But what about documents that change between commits?

Elasticsearch added a translog, or transaction log, that is logged every time Elasticsearch is operated on.

etcd

Those of you who have used ETCD may notice that the data directory of ETCD has two subdirectories wal and SNAP. They are used to implement WAL mechanisms.

Wal: Stores write-ahead logs that record the entire process of data changes. In ETCD, all data changes are written to WAL before committing.

Snap: stores snapshot data. Etcd Snapshots created to prevent excessive WAL files and stores ETCD data status.

The WAL mechanism enables ETCD to provide the following two functions:

  • Fast recovery from failure: When your data is corrupted, you can quickly recover from the original data to the state before the data was corrupted by performing all the modifications recorded in WAL.
  • Undo/redo: Because all changes are recorded in WAL, rollback or redo is required, and only operations in the log are performed in the direction or forward direction

hbase

Hbase implements WAL by storing HLog and Hbase RegionServer data in MemStore until certain conditions are met and the data is flushed to disk. This avoids creating lots of small files. Memory storage is unstable, and HBase uses WAL to solve this problem. Each update operation is logged, and the write log and update operation are in the same transaction.

Recommend series

Apache Druid is an Apache Druid storage system. The Apache Druid storage system is an Apache Druid storage system. The Apache Druid storage system is an Apache Druid storage system

For more knowledge about data storage, please follow my official account.