Data structures, not types

Many articles will say that Redis supports five common data types, which is a big ambiguity. All the data stored in Redis is binary data, which is actually byte array (byte[]). These byte data have no data type. Only when they are decoded in a reasonable format, they can be turned into a string, integer or object, then they have data type. That must be kept in mind. So anything that can be converted to a byte array (byte[]) can be stored in Redis. Whether you’re a string, number, object, image, sound, video, or file, just make it a byte array. Therefore, the String in Redis is not a String. It actually represents the simplest data structure, that is, a key can correspond to only one value. Both key and value are byte arrays, except that key is usually converted from a string to a byte array, and value depends on the actual situation. In certain cases, there are some requirements for a value. For example, to increment or decrement, the byte array corresponding to the value must be able to be decoded into a number; otherwise, an error will be reported. Therefore, the data structure of List actually means that a key can correspond to multiple values, and the values are in sequence, and the values can be repeated. Set is a data structure that indicates that a key can correspond to multiple values. The values are not in sequence and cannot be repeated. Hash is a data structure that indicates that a key can correspond to multiple key-value pairs. In this case, the order between these key-value pairs is not significant. It is a data structure that is accessed based on name semantics, not location semantics. Sorted Set is a data structure that indicates that a key can correspond to multiple values. The values are Sorted by size and cannot be repeated. Each value is associated with a floating point number called score. Elements are sorted by score and then value. Now that you have a better understanding of these five data structures, their corresponding commands will seem like minor cases to you.

Cluster problems and solutions

The benefits of clustering are obvious, such as increased capacity, enhanced processing power, and dynamic capacity expansion and reduction as required. But it will also introduce some new problems, at least the following two. One is data distribution: which node to put data in and which node to look for data in. Second, data movement: when a new node is added to a cluster, where does the data on the node come from? Cluster downsizing, where data on a node goes when it is to be removed. The two problems above have one thing in common: how to describe and store the mapping relationship between data and nodes. Since the location of data is determined by keys, the problem becomes how to establish the relationship between each key and all nodes in the cluster. The number of nodes in a cluster is relatively fixed and few, although nodes are added and removed. But the keys stored in a cluster are completely random, irregular, unpredictable, massive, and trivial. It’s like the relationship between a university and all its students. It would be confusing if universities were directly linked to students. The reality is that several layers are added between them, first with departments, then with majors, then with grades, and finally with classes. After these four layers of mapping, the relationship is much cleaner. This is actually a very important conclusion, there is no problem in the world that can’t be solved by adding a layer. If so, add another layer. The same is true in computers. Redis adds a new layer between data and nodes, called slot, because this layer is mainly related to hash, also called hash slot. You end up with slots on the nodes, and slots with data. The slot solves the granularity problem, which is equivalent to making the granularity larger, so as to facilitate data movement. Hash solves the mapping problem, using the hash value of the key to calculate the slot, which facilitates data distribution. One way to think about it is that your study desk is cluttered with books and it is very difficult to find a book. So you buy large bins, put the books in different bins according to the length of their titles, and put the bins on the table. So this becomes, on the desk is a storage bin, and in the storage bin are books. This way the book is very easy to move, pick up a box and go. It’s also easy to find books, just count the length of the title and look in the corresponding box. We didn’t really do anything. We just bought a couple of boxes and put the books in them according to certain rules. Such a simple move, completely changed the original scattered situation. Isn’t that a little magical? A cluster can contain only 16384 slots, numbered from 0 to 16383. These slots will be allocated to all primary nodes in the cluster without any requirement in the allocation policy. You can specify which numbered slots are assigned to which primary node. The cluster records the mapping between nodes and slots. And then we hash the key, and then mod 16384, and then the remainder of the key falls into the slot. Slot = CRC16(key) % 16384. Moving data by slot is easier because the number of slots is fixed, and the data movement problem is solved. The hash function is used to calculate the hash value of the key, so that it can calculate its corresponding slot, and then use the mapping relationship between the slot and node of the cluster storage to query the node where the slot is located, so that the data and node are mapped, so that the data distribution problem is solved. What I want to say is that ordinary people only want to learn all kinds of technologies, while masters care more about how to jump out of technologies and seek a solution or direction of thinking. If you follow this direction, you will probably find the answer you want.


The selection of command operations by the cluster

After establishing a link with a node in the cluster, the client can obtain information about all nodes in the cluster. It also gets all the hash slot and node mapping information, which is cached on the client side because it’s useful. A client can send a request to any node, so which node should it send a request to after receiving a key? In fact, the cluster is the set of key and node mapping theory to the client side. Therefore, the client needs to implement a hash function like the cluster side, calculate the hash value of the key first, and then mod 16384, so as to find the hash slot corresponding to the key, using the client cache slot and node corresponding information, you can find the node corresponding to the key. The next step is to send the request. You can also cache the mapping between the key and the node, so that the next time you ask for the key, you get the corresponding node directly, and you don’t have to calculate it again. There is always a gap between theory and reality, the cluster has changed and the client cache has not yet been updated. There must be a case where you get a key and make a request to the corresponding node, but the key is no longer on that node. What should this node do at this point? The node can retrieve the data from the node where the key is located and return it to the client, or it can directly tell the client that the key is no longer with me and attach the node where the key is now, so that the client can request it again, similar to the 302 redirection of HTTP. It’s a matter of choice, and a matter of philosophy. As a result, the Redis cluster chooses the latter. Therefore, the node will only process the keys it owns, and for keys that it does not own will return a redirection error, namely -Moved key 127.0.0.1:6381, and the client will re-send the request to the new node. So choice is a philosophy as well as wisdom. More on that later. Let’s look at another situation, which has some similarities to this problem. Redis has a command that can take more than one key at a time, such as MGET, which I call multi-key commands. The request for the multi-key command is sent to a node. There is a potential problem. Do you think that the multiple keys in the command must be located on the same node? If multiple keys are not on the same node, the node can only return a redirection error. However, multiple keys may be on multiple nodes, and the returned redirection error will be very messy. Therefore, the Redis cluster does not support this situation. If multiple keys are located on the same node, theoretically there is no problem, redis cluster support and redis version depends on the specific use of their own test on the line. One of the interesting things we discovered in this process is that it is necessary to map a set of related keys to the same node to improve efficiency and fetch multiple values at once through multiple key commands. So the question is, how do you name these keys so that they all end up on the same node? You have to compute a hash value, and then you have to take a remainder. Of course not. Redis has helped us figure it out. For two keys to be on the same node, they must have the same hash value. For the hash values to be the same, the strings passed into the hash function must be the same. So we’re going to have to pass in two identical strings, so that’s the same key, and the last one overwrites the previous one. The problem here is that we’re using the whole key to hash, and that causes the key to be coupled to the string that’s involved in the hash, so you need to decouple them, because the key is related to the string that’s involved in the hash but not the same. Redis provides a solution based on this principle called key hash tags. For example, {user1000}. Following, {user1000}. Followers, believe you already see the trick: only use the string in Key between {and} to calculate the hash value. This ensures that the hashes are the same and fall on the same node. But keys are different, they don’t overwrite each other. Using hash tags to associate a set of related keys, the problem is solved easily. As you’ll see, it’s clever thinking that solves problems, not necessarily clever technology and clever algorithms. This is the roach, small and powerful. Finally, the philosophy of choice. The core of Redis is the fastest key/value access to commonly used data structures and the operations around those data structures. For those that are irrelevant to the core or will drag down the core, we choose to weaken or not to deal with them. This is to ensure that the core is simple, fast and stable. In fact, in the face of breadth and depth, Redis chose depth. So nodes don’t handle keys they don’t own, and clusters don’t support multiple keys. On the one hand, you can quickly respond to clients and on the other hand, you can avoid a lot of data transfer and merging within the cluster.

Single threaded model

There is only one thread per node in the Redis cluster that accepts and executes all requests sent by clients. Technically using multiplexed I/O, using Linux’s epoll function, allows a single thread to manage many socket connections. In addition, there are several reasons for choosing a single thread: 1, Redis is the operation of memory, the speed is very fast (10W+QPS) 2, the whole time is mainly consumed in the network transmission 3, if the use of multithreading, need multithreading synchronization, This can be more complex to implement. 4. Threads take longer to lock than memory operations. 5

The transaction

Transactions are known as bundling together multiple operations that either execute (successfully) or none (rolled back). Redis also supports transactions, but not quite what you might expect. Take a look. Redis transactions can be divided into two steps, defining transactions and executing transactions. Use the multi command to start a transaction and sequence all the commands to be executed. This defines a transaction. Use the exec command to execute the transaction at this point, or use the discard command to discard the transaction. You may want to use the watch command to monitor keys that you care about and cancel the transaction if they are handled by another command before the transaction starts. You can also use the unwatch command to unmonitor these keys. Redis transactions have the following characteristics: 1, if the wrong before executing a transaction, it is all the command does not perform 2, once you begin, ensure all commands at once performed in sequence without interruption 3, if encounter errors in the implementation process, will continue to perform, will not stop the encountered in the process of 4, for the error, is not to roll back after watching these, really want to ask a word, Do you call this business? Obviously, this is not what we normally think of as a transaction, since it is not even atomicity guaranteed. Atomicity is not guaranteed because Redis does not support rollback, but it also gives a reason why it does not. Redis does this to keep the internal implementation simple and fast. 3. Redis also believes that rollback does not solve all problems

The pipe

The interaction between the client and the cluster is serialized blocking, that is, the client sends a command and has to wait for the response before sending a second command, which is a round-trip time. If you have a lot of commands and you do it one after the other, it can get slow. Redis provides a pipeline technology that allows the client to send multiple commands at once without waiting for a response from the server. After all the commands are sent, the client can receive all the responses in turn. This saves a lot of time and improves efficiency. If you have multiple commands, you have multiple keys. If you have multiple commands, you have multiple keys. If you have multiple keys, you have multiple keys. However, it can be simulated on the client side, which is to use multiple connections to send commands to multiple nodes at the same time, and then wait for all nodes to return the response, and then arrange them in the order of sending commands back to the user code. Oh, what a bother.

agreement

A simple understanding of redis protocol, know redis data transmission format. Protocol for sending requests: * Number of parameters CRLF$Number of bytes of parameter 1 CRLF Data of parameter 1 CRLF… $Number of bytes of parameter N CRLF Data of parameter N CRLF For example, if SET name lixinjie is used, the actual data sent is:

*3\r\n$3\r\nSET\r\n$4\r\nname\r\n$8\r\nlixinjie\r\n Protocol for receiving the response: single line reply, the first byte is +

Error message, the first byte is –

Integer number. The first byte is:

Batch reply, the first byte is $

Multiple batch replies, the first byte being * for example,

+OK\r\n

-ERR Operation against\r\n

:1000\r\n

$6\r\nfoobar\r\n

*2\r\n$3\r\nfoo\r\n$3\r\nbar\r\n You can see that redis protocol design is very simple.