In this section we’ll take a look at LevelDB’s various features. LevelDB is developed in C++. In this section we will use the Java language to describe LevelDB’s features. Other language stacks should not worry, because the LevelDB interface API is the same for different languages.

Open and close

LevelDB’s data is stored in a specific directory with many data files, log files, and so on. Open the directory using the LevelDB API, and you get the DB reference. We will then use this DB reference to perform reads and writes. The following code is pseudocode described in the Java language.

class LevelDB {
  public static LevelDB open(String dbDir, Options options);
  void close(a); // Close the database
}
Copy the code

Opening the database has many options to configure, such as setting block cache size, compression, and so on

Based on the API

LevelDB works just like a HashMap, but slightly weaker, because the PUT method cannot return an old value and the DELETE operation does not know if the corresponding key actually exists.

class LevelDB {
  byte[] get(byte[] key)
  void put(byte[] key, byte[] value)
  void delete(byte[] key). }Copy the code

Atomic batch

Multiple consecutive write operations may be partially completed if downtime occurs. LevelDB provides batch processing for this purpose. Batch operations are like transactions, and LevelDB ensures that column writes are performed atomically, either at all or at all.

class WriteBatch {
  void put(byte[] key, byte[] value);
  void delete(byte[] key);
}

class LevelDB {...void write(WriteBatch wb);
}
Copy the code

The log file

When we call LevelDB’s PUT method to write data to the library, it first writes the data to memory and then persists it to disk through a special policy. The problem is that if there is a sudden outage, the data is lost before it can be written to disk. LevelDB uses a similar policy to Redis AOF logging, which writes the change logs to a disk file before performing the actual write operation.

This way, even if an outage occurs, the database can be recovered from the log files at startup.

Secure write (Synchronous write)

Those of you who know Redis know that its AOF write strategy has several configurations, depending on how often the log files are synchronized to the disk. The higher the frequency, the less data is lost in the event of an outage. Synchronizing dirty data from files in the kernel to disk requires disk I/O, which affects access performance, so it is usually not synchronized too often.

LevelDB is similar in that if you use the previous insecure write, the API call will be successful, but the operation log may be lost due to downtime. So it provides safe write operations at the cost of poor performance.

class LevelDB {...void putSync(byte[] key, byte[] value);
  void deleteSync(byte[] key);
  void writeSync(WriteBatch wb);
}
Copy the code

There is often a trade-off between security and performance, so we typically use synchronous writes timed for milliseconds or every write. This minimizes data loss while maintaining write performance.

concurrent

LevelDB disk files will be placed in a file directory with a number of related data and log files. It does not support multiple processes opening the directory simultaneously for read and write access using the LevelDB API. But the LevelDB API supports multithreaded safe reads and writes for the same process. LevelDB uses special locks internally to control concurrent operations.

traverse

The keys in LevelDB are ordered, in lexicographical order from smallest to largest. LevelDB provides a traversal API to access all key/value pairs one by one in order, and you can specify that the traversal starts in the middle.

class LevelDB {
  ...
  Iterator<KV> scan(byte[] startKey, byte[] endKey, int limit);
}
Copy the code

Snapshot isolation

LevelDB supports concurrent multithreaded reads and writes, which means that two consecutive reads with the same key may read different data because the data in between may have been modified by other threads. This is called “repeat reading” in database theory. LevelDB provides snapshot isolation to protect continuous read and write operations from changes by other threads within the same snapshot scope.

class Snapshot {
  byte[] get(byte[] key)
  void put(byte[] key, byte[] value)
  void delete(byte[] key)
  void write(WriteBatch wb); .void close(a);  // Close the snapshot
}

class LevelDB {...Snapshot getSnapshot(a);
}
Copy the code

While snapshots are amazing, they actually work quite simply, which we’ll cover in more detail later.

Custom Key comparator

LevelDB uses lexicographical order for keys by default, although it also provides custom collations. You can register a custom sorting function, such as sorting by number. As much as possible, you must ensure that collation remains constant throughout the life of the database because sorting affects the storage order of disk key and value pairs, which cannot be changed dynamically.

Options options = new Options();
options.comparator = new CustomComparator();
db = LevelDB.open("/tmp/ldb", options);
Copy the code

Custom comparators are dangerous. Use them with caution. An improper comparison algorithm severely affects storage efficiency. If you do have to change the collation, you need to plan ahead, and there’s a special trick here that requires understanding the details of disk storage, so we’ll talk more about it later.

A block of data

LevelDB’s disk data is stored in database blocks, the default block size is 4K. Appropriately increasing the block size will benefit the efficiency of batch large-scale traversal operation. If random reads are frequent, the performance of small blocks will be better, which requires us to compromise.

Options options = new Options();
options.blockSize = 8092;
db = LevelDB.open("/tmp/ldb", options);
Copy the code

The block size should not be too small or less than 1K, and the block size should not be too large and set to several M. Such excessive setting will not bring much improvement to the performance, but will greatly increase the performance fluctuation of the database in different read and write occasions. We choose the middle ground and float around the default block size. Once initialized, the block size cannot be changed again.

The compression

LevelDB disk storage is compressed by default. The Snappy algorithm is used in the industry. The compression efficiency is very high, so you do not need to worry about performance loss. If you don’t want to use compression, you can also turn it off dynamically. Turning off the compression switch usually doesn’t provide significant performance gains, so we leave it alone as much as possible.

Options options = new Options();
options.compression = CompressionType.kSnappyCompression;
// options.compression = CompressionType.kNoCompression; // Turn off compression
db = LevelDB.open("/tmp/ldb", options);
Copy the code

Piece of cache

LevelDB has a cache of hot data that was recently read and written to LevelDB. If the requested data is not found in the hot data, the disk file will be used to find the requested data. LevelDB has added a block cache to reduce the number of searches for disk files. This cache contains the uncompressed contents of recently used data blocks.

Options options = new Options();
options.blockCache = LevelDB.NewLRUCache(100 * 1024 * 1024); // 100M
db = LevelDB.open("/tmp/ldb", options);
Copy the code

Block caching is not enabled by default, and options can be set manually when opening the database. The block cache takes up a portion of memory, but it usually doesn’t need to be too large — around 100 megabytes is fine, and any larger than that doesn’t make much of a difference.

Note the impact of the traversal operation on the cache. To prevent the traversal operation from flushing a lot of unpopular data into the block cache, you can set the fill_cache option during the traversal to control whether the data blocks traversed by the disk need to be synchronized to the cache.

Bloom filter

LevelDB added a bloom filter to each disk file to reduce the number of disk reads. This filter consumes disk space but reduces the number of disk reads. Bloem filter data is stored after the data block in the disk file.

LevelDB: LevelDB: LevelDB: LevelDB: LevelDB: LevelDB: LevelDB: LevelDB: LevelDB: LevelDB: LevelDB: LevelDB So if you go looking for a key that doesn’t exist, you’re going to need a lot of disk file reads, which can be very time consuming. Bloom filters can save up to 95 percent of the time you spend searching for files on your disk.

A Bloom filter is similar to a memory Set that stores the fingerprint information of all keys within a specified range of disk files. When it finds that the fingerprint of a key is not found in the Set, it can conclude that the key must not exist.

If the corresponding fingerprint can be found in the set, it is not certain that it exists. Because different keys may produce the same fingerprint, this is the error rate of the Bloom filter. The lower the miscalculation rate is, the more Key fingerprint information is required, and the more memory space is consumed.

If bloom’s filter knows exactly whether a Key exists, there is no miscalculation and no wasted disk reads. Such an extreme form of Bloom filter is a HashSet — all keys are stored in memory, which of course is unacceptable.

Options options = new Options();
// The fingerprint size of each key is 10 bits
options.filterPolicy = LevelDB.NewBloomFilterPolicy(10);
db = LevelDB.open("/tmp/ldb", options);
Copy the code

When using bloom filters, we need to make a tradeoff between memory consumption and performance. If you want to learn more about how bloom filters work, check out Redis Deep Adventures, which has a separate section devoted to the inner workings of bloom filters.

The bloom filter is not enabled by default. The filter_policy parameter must be set when the database is opened to take effect. Bloom filters are the last bastion to reduce disk reads. Bitmap data inside the Bloom filter is stored in disk files, but usage is cached in memory.

Data validation

LevelDB has a strict data verification mechanism that is accurate to 4K byte blocks. Checksums waste a bit of storage space and computation time, but can be used to recover healthy data precisely in the event of block corruption.

class LevelDB {...public void static repairDB(String dbDir, Options options);
}
Copy the code

By default, the mandatory validation option is not enabled when the database is started. If it is enabled, an error is reported when a validation error is encountered. In the event that the data does fail, LevelDB also provides a repairDB() method that can help us recover as much data as possible.

summary

After this lesson, students should be able to form the following concept map in their minds. The “hot data” in the figure refers to the recently modified key-value pairs, which are the fastest to read. If the hot data is not available, it will be read from the block cache. If not, it is divided into two cases, one is really not exist, another is on disk. If it exists on disk, it is read through a finite number of layers, usually the colder data is at the bottom. If it does not exist, the disk search IO is greatly reduced by going through a Bloom filter, whose data and key-value pair data are stored together in a hierarchical data file.

In the next section, let’s get our hands on LevelDB using real code.