We know that Redis is a typical key-value database, so today, I’m going to walk you through a simple key-value database. Why do you do that?

Remember what I said in the opening paragraph? Redis itself is rather complicated. If we directly study one specific technical point at the beginning, such as “single thread” and “cache”, although we can directly learn the specific content and even solve some small problems immediately, it is easy to get lost in the details.

From my own experience, the better way to learn is to develop a “systems view” first. That is to say, if we want to deeply understand and optimize Redis, we must have a global understanding of its overall architecture and key modules, and then dive into specific technical points. And that’s what we’re going to stick to in this course.

I believe this process will make it much easier to identify and solve problems in practice, and you can transfer this learning style to other learning activities. I hope you can thoroughly grasp this learning idea, so that their study, work efficiency is higher.

Anyway, let’s get back to the topic of today’s lecture. Today, we only need to focus on the overall architecture and core modules when constructing this simple key-value database. This is the medical equivalent of dissecting a mouse before dissecting a human body. We quickly grasp the key to learning and tuning Redis by dissecting this simplest key-value database.

I call this simple key-value database SimpleKV. It should be noted that there is also a project on GitHub called SimpleKV, which is not the same thing as SimpleKV, but a key-value database architecture with key components.

Ok, if you are ready, let’s build SimpleKV together.

When you start building SimpleKV, the first thing you need to think about is what kind of data you can store in it and what you can do with it, that is, the data model and the interface. They may seem simple, but they are an important foundation for understanding how Redis is often used in scenarios like caching, seckilling, distributed locking, and so on.

Understanding the data model will help you understand why, in some cases, key-value databases can be used to store data that was previously stored in a relational database. For example, user information (user ID, name, age, gender, and so on) is typically stored in a relational database. In this scenario, each user ID corresponds to a collection of user information. This is a data model for a key-value database, which can also fulfill this storage requirement.

However, if you only know the data model and not the operation interface, you may not understand why there are some scenarios where using a key-value database is inappropriate. For example, in the same scenario above, if you were averaging the ages of multiple users, the key-value database wouldn’t be able to do it. Because it only provides a simple operation interface, it cannot support complex aggregation computing.

So what can Redis do and what can’t it do? Only by understanding its data model and operation interface can we truly “use this good steel to the blade”.

Next, let’s look at what data can be stored first.

What data can be stored?

For key-value databases, the basic data model is the key-value model. For example, “hello” : “world” is a basic KV pair, where “hello” is key and “world” is value. SimpleKV is no exception. In SimpleKV, key is a String and value is a basic data type, such as String, integer, etc.

However, SimpleKV is, after all, a simple key-value database, and the value type can be complex for a key-value database in a real production environment.

The key types supported by different key-value databases generally differ little, while the value types differ greatly. When selecting a key value database, an important consideration is the value type it supports. For example, Memcached supports only String values, whereas Redis supports strings, hash tables, lists, collections, and so on. Redis can be widely used in actual business scenarios because it supports diversified types of values.

From the perspective of usage, the realization of different value types can not only support the data requirements of different businesses, but also imply differences in performance, space efficiency and other aspects of different data structures, resulting in differences between different value operations.

Only with a deep understanding of the principles behind this can we be comfortable selecting Redis value types and optimizing Redis performance.

What can you do with the data?

Now that we know the data model, let’s look at its basic operations on the data. SimpleKV is a simple key-value database, so the basic operation is nothing more than add, delete, change and check.

Let’s start by looking at the three basic operations SimpleKV needs to support, namely PUT, GET, and DELETE.

  • PUT: Writes or updates a key-value pair.
  • GET: reads the corresponding value based on a key;
  • DELETE: deletes the entire key-value pair based on a key.

Note that some new write/update operations for key databases are called sets. While new writes and updates are performed using a single operation interface, the actual execution of the new write or update process is based on whether the key exists or not.

In a real business scenario, we often encounter the situation of querying a user’s access records over a period of time. This operation is a SCAN operation in the key-value database, which returns the corresponding value based on the range of a key. Therefore, PUT/GET/DELETE/SCAN is a basic set of operations for a key-value database.

In addition, real business scenarios often have richer requirements, such as the need to determine whether a user exists in a whitelist application. If the user ID is used as the key, an EXISTS interface can be added to determine whether a key EXISTS. For a specific key-value database, you can see the detailed operation interface by looking at the operation documentation.

Of course, when a key-value database has multiple value types, it needs to include corresponding operation interfaces. For example, Redis value has a list type, so its interface should include operations on list values. I will also describe the impact of different operations on Redis access efficiency later.

Speaking of which, we have constructed the data model and the operation interface, which is our basic work. Next, let’s take it a step further and consider a very important design question: are key-value pairs stored in memory or out of memory?

The advantage of keeping it in memory is that it is fast to read and write, since memory access speeds are generally at the 100 ns level. However, the potential risk is that all data will be lost in the event of a power failure.

Stored in external storage, while avoiding data loss, the overall performance of a key-value database can be slowed down by slow reads and writes to disk (typically at several ms levels).

Therefore, when making design choices, we usually need to consider the main application scenarios of key-value databases. For example, if the data in a cache scenario needs to be accessible quickly but can be lost, the key-value database used for this scenario typically holds key-value data in memory. Memcached and Redis are both in-memory key-value databases. Caching is an important application scenario for Redis. I’ll focus later on the key mechanisms and advantages of Redis as a cache, as well as common optimization methods.

In keeping with Redis, our SimpleKV store key-value data in memory. Next, let’s look at the basic components of SimpleKV.

Generally speaking, a key-value database consists of four parts: access frame, index module, operation module and storage module (see figure below). Next, we will continue to build our SimpleKV from these four parts.

What access mode is used?

Libsimplekv.so, for example, is a dynamic link library that links to our own programs to store keys and values. The other is to provide key-value pair operations externally in the form of Socket communication through the network framework, which can provide a wide range of key-value storage services. In the figure above, we can see that the network framework includes Socket Server and protocol parsing.

Different key-value database server and client interaction protocol is not the same, we in the key-value database secondary development, new functions, we must understand and master the key-value database communication protocol, so as to develop compatible client.

The actual key-value database is basically the same as the above two methods. For example, RocksDB is used as a dynamic link library, while Memcached and Redis are accessed through a network framework. I will also introduce you to the existing Redis clients and communication protocols.

Providing the key-value storage service through the network framework, on the one hand, expands the application area of the key-value database, but on the other hand, it also provides different design choices for the performance and operation model of the key-value database, which brings some potential problems.

For example, when a client sends a command like this, the command is wrapped in a network package and sent to the key value database:

PUT "hello" and "world"Copy the code

After the key-value database network framework receives the network packet and parses it according to the appropriate protocol, it knows that the client wants to write a key-value pair and begins the actual writing process. At this point, we will encounter a system design problem, in simple terms, is the processing of network connection, network request parsing, and data access processing, with a thread, multiple threads, or multiple processes to deal with the interaction? How to design and make trade-offs? We commonly refer to this problem as I/O model design. Different I/O models have different impacts on the performance and scalability of a key-value database.

For example, if a thread has to process network connections, parse requests, and access data, if one operation blocks, the entire thread will block, slowing down the system response time. If we use different threads to handle different operations, then when one thread is blocked, other threads can still run. But what about competing threads that need to access shared resources, which can affect system efficiency? So, this is really a “dilemma” that requires careful design.

You often hear that Redis is single-threaded, so how does Redis achieve “single-threaded, high performance”? I’ll talk to you about it later.

How do I locate key-value pairs?

When SimpleKV parses the request from the client and knows what key-value pair operation to perform, it needs to find out if the key-value pair exists, depending on the key-value database index module. The function of an index is to enable the key-value database to find the storage location of the corresponding value according to the key, and then perform operations.

There are many types of indexes, common hash table, B+ tree, dictionary tree and so on. Different index structures have different characteristics in performance, space consumption, concurrency control and so on. If you look at other key-value databases, you will see that different key-value databases use different indexes. For example, Memcached and Redis use hash tables as key-value indexes, while RocksDB uses hop tables as key-value indexes in memory.

In general, in-memory key-value databases (such as Redis) use hash tables as indexes, in large part because their key-value data is mostly stored in memory, and the high performance random access features of memory can well match the operational complexity of hash table O(1).

SimpleKV index: value (key); However, unlike SimpleKV, Redis is very interesting in that its value supports multiple types. When we find the value corresponding to a key through the index, we still need to further find the data we actually need from the complex structure of value (such as set and list). The efficiency of these operations itself depends on their implementation structure.

Redis uses some common efficient index structure as the underlying data structure of some value types, which provides a good support for Redis to achieve high performance access.

What is the specific logic of the different operations?

SimpleKV’s index module is responsible for finding the corresponding value location based on the key. Once the storage location is found, the exact logic of what further operations need to be performed varies from operation to operation. SimpleKV’s operation module implements specific logic for different operations:

  • For the GET/SCAN operation, the value is returned based on the storage location of the value.
  • To PUT a new key-value pair, SimpleKV needs to allocate memory for that key-value pair;
  • For the DELETE operation, SimpleKV needs to DELETE the key-value pair and free the corresponding memory space, which is done by the allocator.

I don’t know if you’ve noticed, but for both PUT and DELETE operations, in addition to new writes and deletes, memory is allocated and freed. Which brings us to SimpleKV’s storage module.

How can I quickly provide services after a restart?

SimpleKV uses the common memory allocator Glibc’s malloc and free, so memory management is not a special concern for SimpleKV. However, key-value databases often have key-value pairs of varying sizes, and gliBC’s allocator does not do a good job of handling random allocation of large and small chunks of memory. Once the size of the stored key-value pair data is too large, it can cause serious memory fragmentation problems.

Therefore, allocator is a key factor in key-value databases. This is especially important for Redis, which focuses on memory storage. Redis’s memory allocator offers a variety of options and allocation efficiencies, which I’ll talk more about later.

SimpleKV relies on memory to store data and provide fast access, but I also want SimpleKV to be back in service quickly once it restarts, so I added persistence to SimpleKV’s storage module.

However, since disk management is more complex than memory management, SimpleKV goes straight to the file format, storing key-value data on disk by calling the operation interface of the local file system. SimpleKV only needs to consider when to save the key-value data in memory to a file.

One way is for SimpleKV to save every key-value pair to a drop disk. This makes SimpleKV’s data more reliable, but since it has to be written to a disk every time, SimpleKV’s performance will suffer greatly.

Alternatively, SimpleKV simply periodically saves the key-value data in memory to a file, thus avoiding the performance impact of frequent disk writes. However, a potential cost is that SimpleKV’s data is still at risk of being lost.

Like SimpleKV, Redis provides persistence. However, Redis provides a number of implementation mechanisms and optimizations for persistence to suit different business scenarios, and I’ll walk you through the key design considerations for Redis persistence later.

summary

So far, we have constructed a simple key-value database, SimpleKV. As you can see, the first two steps are designed from an application perspective, which is the application perspective; The next four steps are the complete internal structure of SimpleKV.

SimpleKV contains the basic components of a key-value database, and once you know these components, you’ll have a much easier time learning the rich version of SimpleKV in Redis.

To support richer business scenarios, Redis extends or fine-optimizes these components or functions to meet functional and performance requirements.

From this comparison, we can see that the evolution from SimpleKV to Redis has the following important changes:

  • Redis is accessed mainly through the network framework, rather than the dynamic library, which also makes Redis accessible as a basic network service and expands the application range of Redis.
  • The Redis data model is rich in value types, so it also brings more operation interfaces, such as list-oriented LPUSH/LPOP, set-oriented SADD/SREM, etc. In the next lesson, I’ll talk to you about the data structures and operational efficiencies behind these value models and their impact on Redis performance.
  • The Redis persistence module can support two methods: Log (AOF) and snapshot (RDB). These two methods have different advantages and disadvantages, which affect the access performance and reliability of Redis.
  • SimpleKV is a simple single-machine key-value database, but Redis supports highly reliable clustering and highly scalable clustering, therefore, Redis contains the corresponding cluster function support module.

By building SimpleKV in this lesson, I believe that you have gained an overall understanding of the basic structure and important modules of the key-value database, which is the core foundation of Redis standalone. I will focus on the important evolution of Redis in the following courses. At the same time, I will combine the actual situation, so that you can not only understand the principles, but also really learn to apply, improve the actual combat ability.

Each lesson asking

Here’s a quick question: Do you think SimpleKV is missing any features or modules compared to what you know about Redis?

Feel free to post your thoughts and answers in the comments section, so we can discuss and share today’s content with your friends.