LevelDB provides three interfaces for write operations:

  • Put: Inserts or modifies a record.
  • Delete: deletes a record.
  • Write: To insert/modify/delete multiple records atomically.

Both Put and Delete are implemented by direct calls to Write:

  • leveldb::DBImpl::Put => leveldb::DB::Put => leveldb::DBImpl::Write
  • leveldb::DBImpl::Delete => leveldb::DB::Delete => leveldb::DBImpl::Write

The Write interface

The leveldb::DBImpl::Write function is declared as follows:

virtual Status Write(const WriteOptions& options, WriteBatch* updates) = 0;
Copy the code
  • leveldb::WriteOptionsIs a write control parameter with only one member variablesyncIndicates whether the log is flushed to external storage after each write.
  • Leveldb ::WriteBatch Indicates operations involving multiple key-value data.

The realization of the Write

Leveldb ::DBImpl::Write.

  1. Construct a Writer object using the parameters passed in to represent the write operation. Writer is defined as follows:
struct DBImpl::Writer { explicit Writer(port::Mutex* mu) : batch(nullptr), sync(false), done(false), cv(mu) {} Status status; // WriteBatch* batch; // Updated data (1 to multiple key-values) bool sync; // Whether to flush, writeoptions. sync bool done; Port ::CondVar CV; // Conditional variables for concurrency control};Copy the code
  1. Get the mutex, put yourself in the write queue, and wait for notification of the condition variable. There are two ways to jump out of the wait: 1) The write operation has been written by another thread; 2) The write operation becomes the head of the write queue. This involves LevelDB concurrency control and performance optimization for writes: since MemTable and WAL do not support concurrent writes, only writers at the head of the write queue perform actual writes. The writer at the head of the queue merges multiple requests in the queue into one request, then performs batch write, and updates the status of each writer.

  2. Check writer.done, and return the result if it has already been written by another thread. Otherwise, it is the team leader writer. Continue to execute.

  3. Call MakeRoomForWrite to check: Does the number of level-0 files exceed the limit? Check whether the MemTable exceeds the threshold and needs to be switched. And so on. MakeRoomForWrite provides a force parameter that indicates whether to force a new MemTable switch and trigger a Compaction. In the normal write process, force is false. The detailed logic of MakeRoomForWrite is as follows:

  4. The call to BuildBatchGroup merges consecutive qualified writers, starting at the head of the queue, into tmp_batch_. The main considerations in the merger are:

    1. The size of the data written to merge, max_size is 1MB by default. If the size of the first write request is small (less than 128 KB, 128 << 10), max_size is size + 128 KB. This is done to avoid small requests being slowed down by other requests.
    2. If the first write request sync == false, do not add the write request sync == true.
  5. Sets the sequence of data to be written to.

  6. Release the mutex. This code guarantees that only one thread will write at a time.

  7. Write logs (WAL).

  8. Determine whether to sync logs based on the parameters.

  9. Update the MemTable.

  10. Get the mutex.

  11. If sync fails, set bg_error_ and all subsequent writes will fail.

  12. Empty the batch operation for temporary merge (tMP_batch_).

  13. Update the LastSequence.

  14. Notifies all threads that data has been written to.

  15. Notify the thread that is still writing to the queue.

summary

That’s the LevelDB writing process. Write queue + merge write operation, logic and code are very simple. On the downside, the entire write process is single-threaded.