Redis is what? In Redis’ official words:

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker.

Redis is an open source, memory-based data structure storage that can be used as a database, cache, and messaging middleware.

What??? This thing keeps data in memory and wants to be a database? Why “data structure store” and not “data store”? Can also be used as message middleware? What are you doing with your life?

Yes, Redis is so good (~ ▽ ~)~*

These problems can be solved by starting with Redis’s most common feature, caching.

If you are new to Redis, or have never worked with Redis before, this article will help you quickly understand not only how Redis works, but also some of the art of architectural design. If you are an old Redis driver, then hopefully this article will bring you something new.

How would you implement a cache?

If you were asked to design a cache, what would you do?

I’m sure you’ll all think of using a Map, like this:

// get value from cache
String value = map.get("someKey");
if(null == value) {
 // get value from DataBase
 value = queryValueFromDB("someKey");
}
Copy the code

What Map do you use? Use HashTable or ConcurrentHashMap instead of HashMap or TreeMap.

No matter what Map you use, it has a key-value Hash structure behind it, which is designed to achieve O(1) complexity. Redis does this, as does Memcached, another popular caching framework.

What is the data structure of a Hash table? As many of you know, here’s a simple picture:


Simply put, a Hash table is an array, and the elements of that array are linked lists.

Why are elements linked lists? In theory, if our array can be made infinitely large, then every time we get a key, we can put it in a new location. But that obviously doesn’t work, the larger the array, the more memory it takes up.

Therefore, we need to limit the size of the array. Let’s say it is 16. After calculating the hash value of the key, modulo 16 to get a number between 0 and 15, and place it in the corresponding position of the array.

Key1 = index 2, key9 = index 2, key1 = index 2, key9 = index 2 So key1’s information must be stored in a list structure, so that when KEY9 arrives, all you need to do is point next of the list node where KEY1 is located to the list node of KEY9.

Is that all right? Imagine what the problem would be if the list got longer and longer?

Obviously, the longer the list, the worse the performance of queries, inserts, deletions, and so on. In extreme cases, if all elements are in a single list, the complexity drops to O(n), which is similar to a sequential lookup algorithm. (Because of this, the HashMap in Java8 turns from a linked list to a red-black tree when elements grow to a certain point, slowing down lookup performance.)

How to solve it? Rehash.

The resize function of the Java HashMap is resize. A little internal on Redis key Value storage implementation to understand redis rehash algorithm, you will be surprised to find that there are two Hashtables in Redis.

Ok, the above takes you to peek at Redis from a very micro perspective. In the following sections, we will take you to observe Redis from a macro perspective.

C/S architecture

As Redis users, how do we put data into the Hash table mentioned above?

We can use the Redis command line, and we can also use the Redis API in various languages to operate on the Hash table in code. These are Redis clients, and the Hash table is on the Redis Server. In other words, Redis is a C/S architecture.

Obviously, the Client and Server can be on the same machine or not:


If you want to play with Redis, but don’t want to build your own environment, Try Redis is a very fun website. You can follow the tips above to familiarize yourself with the basic commands of Redis and get a feel for the C/S mode of Redis.

It is worth mentioning that Redis Server is a single-threaded Server that processes Client requests based on event-loop mode, which is similar to NodeJS. The benefits of using a single thread include:

  • Don’t worry about thread safety. Many operations do not need to be locked, which simplifies development and improves performance;
  • Reduce the time consumed by thread switching. With more than one thread, it is very time-consuming for the CPU to cut back and forth between threads. Single-threaded servers do not have this worry;

Of course, the biggest problem with single-threaded servers is that they don’t take full advantage of multiple processors, but don’t forget that machines are cheap these days. Keep reading.

The cluster

Ok, now that we know that Redis is a C/S architecture framework, let’s start using Redis to cache information and relieve database stress.

We set up a framework, one client, one Redis cache server:


At first it was sunny and the system worked well.

As time passed, more and more clients on our system used Redis, which became something like this:


This raises two questions:

  • Redis memory shortage: As more and more clients use Redis, the cache data on Redis is getting bigger and bigger. After all, the memory of a machine is limited, so it can’t put so much data.
  • Redis throughput is low: there are more clients, but there is still only one Redis, and we already know that Redis is single threaded! It’s like if I opened a restaurant and I started with 100 people a day and I hired one waiter, and then the business grew and I had 1,000 people a day and I still hired one waiter. A machine’s bandwidth and processor are limited, Redis is naturally overwhelmed, and the throughput is not enough to support our increasingly large systems.

After analyzing the problem, the solution is clear – cluster. One Redis is not enough, then add more!


Requests from clients are distributed to Redis servers using a load balancing algorithm, usually a consistent Hash. With clustering, we implemented two features:

  • Expand cache capacity;
  • Improve throughput;

It solves the two problems mentioned above.

A master-slave replication

Ok, now we have upgraded Redis to cluster, which is really effective, but after running for a period of time, operation and maintenance came back to feedback two problems:

  • Poor data availability: If one of the Redis is down, then all the cached data on it will be lost. As a result, all the requests that could have been obtained from the cache will go to the database, and the database pressure will increase sharply.
  • Slow data query: The monitor found that Redis 1 had very high traffic for a period of time each day, and most of the requests were to look up the same cached data, causing Redis 1 to be very busy and not enough throughput to support the high query load.

After analyzing the problem, the first thing that comes to mind is the master-slave mode, which is often used in databases. Therefore, we add a Slave to each Redis:


With master-slave mode, we implement two more features:

  • High data availability: The Master receives write requests from the client and synchronizes data to the Slave for data backup. If the Master dies, you can promote the Slave to Master.
  • Improve query efficiency: If the Master finds that it is too busy, it can forward some query requests to the Slave for processing. That is, the Master is responsible for reading/writing or only writing, and the Slave is responsible for reading.

To make master-slave mode more powerful, we can of course put more slaves, like this:


Master/ Slave Chains is also used to back up data in chains. Slave chains are also used to back up data in chains.


That’s right, we’re making slaves have their own slaves, kind of like the ancient system of servitude.

So the topmost Master has less backup pressure, it only needs to back up twice, and then let the two slaves below it back up with their slaves.

You can refer to this article for Master/ Slave Chains
RedisLab Master/slave chains

Redis is not that simple

This article just takes you around the manor of Redis, so that you can have a preliminary understanding of Redis from micro to macro.

In fact, there are many more issues to deal with within Redis:

  • Data structure. As mentioned at the beginning of this article, Redis is not just a data store, but a data structure store. That’s because Redis allows clients to plug various types of data structures directly into it, such as Strings, lists, sets, sortedSets, maps, and so on. Is that a big deal, you might ask? If I write a HashTable in Java, I can also put various data structures. If you have a HashTable, you can only put Java objects in it. If you have a HashTable that supports multiple languages, you can plug data structures into Redis regardless of whether your client is Java or Python. This is one of the big differences between Redis and Memcached. Of course, Redis supports data structure storage at the expense of more memory. Redis Data Types Redis Data Types
  • Elimination strategy. The cache can’t grow indefinitely, so some data has to be removed to make way for new cache data, right? This requires LRU algorithm, you can refer to: Redis LRU Cache
  • Load balancing. When clustering is used, it is inevitable to use load balancing. What load balancing algorithm is used? Where is load balancing used? Here’s one example: Redis Partitioning
  • Presharding. What if you start with three Redis servers and find that you need one more to meet your business needs? Redis offers a strategy called Presharding
  • Data persistence. If my machine suddenly loses all power, will my cached data be recovered? Redis says, trust me, yes, how else can I use it as a database? Check this out: Redis Persistence
  • Data synchronization. This article mentions master-slave replication, so how does Redis do master-slave replication? According to CAP theory, since we have selected cluster, i.e. P, partition tolerance, then the remaining two, Consistency and Availability can only choose one, then Redis to support final Consistency or strong Consistency? See: Redis Replication

References & Learning resources

Website:

  • The official website of Redis (the reason why I suggest reading the official website is because this is primary learning material, and other materials can only be regarded as secondary at most, primary materials mean the most authoritative and accurate)
  • Try Redis (if you’re too lazy to install environments, this might be a good option…)

Books (I have not read these books, but I feel very good after reading the catalogue, I want to learn more about Redis in the future, please refer to it) :

  • Redis combat (based on the good impression of combat series, suitable for systematic learning Redis)
  • Redis design and Implementation (source code)
  • Redis development and operation (see how to use Redis)

Paper (put one of them here) :

  • Persisting Objects in Redis Key-Value Database