Redis delays troubleshooting problems

This article is a translation of Redis Official document Redis Latency Problems Troubleshooting.

This document will help you understand what happens when you encounter latency issues with Redis.

In this context, latency refers to the time elapsed between sending a command from the client and receiving a response. Typically, Redis processes commands in very short, subsubtle times, but some situations can result in high latency.

I’m busy. Give me the list

What follows is important for running Redis with low latency. However, I understand that we are all busy, so let’s start with a quick list. If you have tried the following steps and failed, please come back and read the full documentation.

Make sure you run a slow query command and block the service. Use the Redis slow logging feature to confirm this.
For EC2 users, make sure you use HVM on a modern EC2 based instance, such as m3.medium. Otherwise fork() will be very slow.
Transparent Huge Pages of the kernel must be used. Use echo never > / sys/kernel/mm/transparent_hugepage/enabled to disable them, and then restart the Redis process.
If you are using a virtual machine, there may be internal latency even if you are not doing anything to Redis. Use./redis-cli –intrinsic-latency 100 to check for minimum latency in your runtime environment. Note: You need to run this command on the server, not the client.
Enable Latency Monitor in Redis to get readable data on Latency events and causes in your Redis instance.

In general, the following table is used to make trade-offs between persistence and latency/performance, from better security to better latency sorting.

AOF + fsync Always: This is very slow and you should only use it if you know what you are doing.
AOF + fsync every second: This would be a good compromise.
AOF + fsync every second + no-appendfsync-on-rewrite option set to yes: This is also a compromise, but it avoids fsync at rewrite time, reducing disk stress.
AOF + fsync Never: This scenario hands off fsync to the kernel, resulting in less risk of disk stress and latency spikes.
RDB: You will have a wider range of trade-offs for this scenario, depending on how you trigger RDB persistence.

Now, let’s take 15 minutes to get into the details…

Measure the delay

If you run into delays, you probably know how to measure them in your program, or the delays are obvious. In fact, redis-cli can measure the latency of the Redis service in milliseconds by using the following command:

redis-cli --latency -h `host` -p `port`
Copy the code

Use the delay monitoring subsystem built into Redis

Since Redis 2.8.13, Redis has provided delay monitoring, which samples different execution paths to detect where services are blocked. This will make it easier to debug the problems described in this document, so we recommend enabling delay monitoring as soon as possible. See the delay monitoring documentation.

The delay monitoring sampling and reporting capabilities will make it easier for you to find out why Redis is late, but we recommend you read this article in detail to better understand Redis and latency spikes.

Delay reference line

One type of latency that is itself part of the environment in which you run Redis is the latency caused by the operating system kernel and, if you use virtualization, the hypervisor.

This latency cannot be eliminated, and it is important to learn about it because it is the baseline. In other words, because of the kernel and hypervisor, you cannot make Redis latency lower than programs running in your environment.

We call this delay internal delay, and redis-CLI has been able to measure it since Redis 2.8.7. This is an example running on an entry-level server based on Linux 3.11.0.

Note: The parameter 100 is the number of seconds in which the test was executed. The longer we run our tests, the more likely we are to find spikes in latency. 100 seconds is usually enough, but you may want to run multiple tests of varying lengths. Please note that this test is CPU intensive and will likely load one core of your system.

Note: Redis-CLI in this example needs to run on the server where you are running Redis or plan to run Redis, not on the client. In this particular mode redis-CLI does not connect to the Redis service at all: it simply tries to test the maximum length of time that the kernel does not provide CPU time to run the Redis-CLI process itself.

In the example above, the system’s internal latency is only 0.115 milliseconds (or 115 subtlets), which is good news, but keep in mind that internal latency can vary and change with runtime, depending on the load on the system.

Virtualized environments will perform poorly, especially if they are under heavy load or affected by other virtualized environments. Here is the result of running Redis and Apache on an instance of Linode 4096:

Here we have an internal delay of 9.7 milliseconds: this means we can’t ask Redis to do much better than this. However, it is easier to get worse results when running in a different virtualization environment with higher load or other neighbors. We might get 40 milliseconds on a normal operating system.

Latency affected by network and communication

Clients connect to Redis via TCP/IP or Unix domains. Typical latency for 1 Gbit/s networks is 200 subtleties, and can be as low as 30 subtleties using Unix domain sockets. It really depends on your network and system hardware. On top of the communication, the system adds more latency (due to thread scheduling, CPU caching, NUMA configuration, etc.). The delay caused by the system in a virtual environment is significantly higher than that caused by a physical machine.

The conclusion is that even though Redis processes most of the commands in submicroseconds, the large number of round-trips between client and server commands will have to pay for network and system-related delays.

Therefore, efficient clients streamline multiple commands to reduce the number of round trips. This is supported by the server and most clients. Batch operations such as MSET/MGET also serve this purpose. Starting with Redis 2.4, some commands also support variable arguments for all data types.

Here are some guidelines:

If you can afford it, use physical machines instead of VMS to host services.
Don’t always connect and disconnect (especially not for Web-based applications). Stay connected as much as possible.
If your client and server are on the same machine, use Unix domain sockets.
Use batch commands (MSET/MGET), or commands with variable arguments, instead of pipelining operations.
Pipelined operations (if possible) should be preferred to a series of individual commands.
Redis supports Lua server-side scripts to cover cases where raw pipelining operations are inappropriate (for example, when the result of a command is the input to the next command).

On Linux, users can achieve better latency through Process Settings (TaskSet), CGroups, Real-time Priority (CHRT), NUMA Settings, or by using a low-latency kernel. Note that Redis is not suitable for binding to a single CPU core. Redis forks background tasks like BGSave or AOF, which can be CPU intensive. These tasks must never run on the same core as the main process.

In most environments, these system-level optimizations are not required. Do this only when you need them or are familiar with them.

The single-threaded nature of Redis

Redis is designed to use single threads most of the time. This means that a single thread handles all requests from clients using a technique called multiplexing. This means that Redis processes only one request at a time, so all requests are processed sequentially. This is a lot like how Node.js works. However, neither product is generally considered slow. This is because they take a short time to process tasks, but mostly because they are designed not to block system calls, especially when reading or writing data to or from a socket.

I say Redis is mostly single-threaded because since Redis 2.4, we have used multithreading to perform some slow I/O operations in the background, mostly related to disk I/O, but that doesn’t change the fact that Redis uses a single thread to handle all requests.

Delays caused by slow queries

The consequence of a single thread is that when there is a slow request, all other clients will wait for it to complete. This is fine when executing common commands like GET or SET or LPUSH, which have constant (very short) execution times. However, there are several commands that operate on a large number of elements, like SORT, LREM, SUNION, and others. For example, going to the intersection of two large sets can take a lot of time.

The algorithmic complexity of all commands is documented. A good practice is to systematically test unfamiliar commands before using them.

If you have concerns about latency, then you should not use slow queries to process values with a large number of elements, or you should run a copy of Redis to run slow queries.

You can use Redis’ slow logging feature to monitor slow queries.

In addition, you can use your favorite process monitor (top, hTOP, prstat, etc.) to quickly check the CPU consumption of the main Redis process. If it is high but the traffic is not high, it usually indicates that a slow query is being executed.

Important: A very common cause of delays caused by executing slow queries is the KEYS command executed in production. The KEYS command is documented to be used only for debugging purposes. Starting with Redis 2.8, new commands have been introduced to iterate over key Spaces and other large collections. See SCAN, SSCAN, HSCAN, and ZSCAN commands for more information.

Delay caused by fork

To generate RDB files in the background or override AOF files when AOF persistence is enabled, Redis must fork a process. The fork operation (running on the main thread) causes a delay. Fork is an expensive operation on Unix-like systems because it involves copying a large number of objects associated with the process. This is especially true for page tables associated with virtual memory.

For example, on Linux/AMD64 systems, memory is divided into 4 kB per page. To translate virtual addresses into physical addresses, each process keeps a page table (actually a tree) that contains at least one pointer to the process’s per-page address space. So, a 24 GB Redis instance needs 24 GB/4 KB*8 = 48 MB page table.

When bgSave is executed, the instance will be forked, which creates and copies 48 MB of memory. It takes time and CPU, especially when creating and initializing large pages on a virtual machine can be very expensive.

The amount of time forked on different systems

Modern hardware copies page tables very quickly, except for Xen. The problem is not Xen virtualization, but Xen itself. For example, using VMware or Virtual Box does not result in slow fork times. The table below compares the time consumed by the fork of different Redis instances. The data comes from executing the BGSAVE and observing the latest_FORK_USEC message output from the INFO command.

The good news, however, is that EC2 HVM-based instances perform well on forks, almost as well as on physical machines, so instances using M3.medium (or high performance) will get better results.

A virtual Linux system running on VMware forked 6.0 GB of RSS for 77 milliseconds (12.8 milliseconds per GB).
Linux fork 6.1 GB of RSS on a physical machine (unknown hardware) took 80 milliseconds (13.1 milliseconds per GB).
Linux running on a physical machine (Xeon @2.27ghz) fork 6.9 GB of RSS takes 62 ms (9 ms per GB).
A Linux VIRTUAL machine (KVM) running on 6sync spends 8.2 ms (23.3 ms per GB) for the 360 MB RSS fork.
A Linux virtual machine (Xen) running on an older version of EC2 forks 6.1 GB of RSS in 1460 milliseconds (10 milliseconds per GB).
A Linux virtual machine (Xen) running on the new version of EC2 forks 1 GB of RSS in 10 milliseconds (10 milliseconds per GB).
The Linux virtual machine (Xen) fork 0.9 GB of RSS running on Linode takes 382 ms (424 ms per GB).

As you can see, the performance penalty for some virtual machines running on Xen ranges from one to two orders of magnitude. For EC2 users, the advice is simple: use a modern hVM-based instance.

Transparent Huge Pages response delay

Unfortunately, if transparent Huge Pages is enabled in the Linux kernel, Redis will cause a significant delay when forking to persist to disk. Large memory pages cause the following problems:

When fork is invoked, two processes that share a large memory page are created.
On a busy instance, some event loops will result in thousands of memory pages being accessed, causing almost the entire process to perform write-time copying.
This results in high response latency and large memory usage.

Make sure to disable transparent Huge Pages using the following command:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
Copy the code

Delays caused by page swapping (operating system paging)

Linux (and many other modern operating systems) can cache memory pages from memory to disk, or read them into memory from disk, in order to use system memory more efficiently.

If a Redis page is saved from memory by the kernel to a swap file, the kernel stops the Redis process in order to move the page to main memory when data stored in the memory page is used by Redis (such as accessing keys stored in the page). Accessing random I/O is a slow operation (compared to accessing pages in memory) and Redis clients will experience unusual latency.

The kernel saves Redis memory pages to disk for three main reasons:

The system is under memory pressure because running processes require more physical memory than is available. The simplest example is when Redis uses more memory than is available.
The data set or part of the data set of the Redis instance is almost completely free (never accessed by the client), so the kernel might save the free memory pages to disk. This problem is rare, because even a not particularly slow instance will usually access all the pages in memory, forcing the kernel to keep all the pages in memory.
Some processes in the system cause a large number of read and write I/O operations. Because files are cached, it puts pressure on the kernel to cache the file system, and therefore swap activity occurs. Note that this includes Redis RDB and/or AOF threads, which produce large files.

Fortunately, Linux provides good tools to verify this problem, so the easiest thing to do is to check for it when swapping memory is suspected of causing delays.

The first thing to do is check the number of Redis memory swaps to disk. To do this, you need to get the PID of the Redis instance:

$ redis-cli info | grep process_id
process_id:5454
Copy the code

Now go to the directory for the process in the /proc directory:

$ cd /proc/5454
Copy the code

Here you’ll find a file called smaps that describes the memory layout of the Redis process (assuming you’re running Linux 2.6.16 or older). This file contains very detailed information about our process memory mapping, and a field called Swap is what we need. However, when smaps files contain different memory maps for Redis processes, it is no longer just a swap field (the process memory layout is much more complex than a simple linear table).

Since we’re all interested in process-related memory swapping, the first thing to do is grep to get all the Swap fields in the file:

$ cat smaps | grep 'Swap:'
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                 12 kB
Swap:                156 kB
Swap:                  8 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  4 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  4 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  4 kB
Swap:                  4 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Copy the code

If everything is 0 KB, or a scattered 4k entry, then everything is perfect. In fact, in our example (a site running Redis, serving hundreds of users per second) some entries show more swap pages. To see if this is a serious problem, let’s change the command to print the size of the memory map:

$ cat smaps | egrep '^(Swap|Size)'
Size:                316 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  8 kB
Swap:                  0 kB
Size:                 40 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:             720896 kB
Swap:                 12 kB
Size:               4096 kB
Swap:                156 kB
Size:               4096 kB
Swap:                  8 kB
Size:               4096 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:               1272 kB
Swap:                  0 kB
Size:                  8 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                 16 kB
Swap:                  0 kB
Size:                 84 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  8 kB
Swap:                  4 kB
Size:                  8 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  4 kB
Size:                144 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  4 kB
Size:                 12 kB
Swap:                  4 kB
Size:                108 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                272 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Copy the code

As you can see from the output, there is a map with 720,896 kB (only 12 kB is swapped) and 156 kB is swapped in the other map: basically a very small portion of our memory is swapped, so it doesn’t cause a big problem.

Conversely, if a large number of process memory pages are swapped to disk, your response latency problem may be related to page swapping. If this is the case, you can use the vmstat command to further check your Redis instance:

The part of the output we need are the two columns SI and SO, which record the number of times memory is swapped out and in from the swap file. If the two columns you see are non-zero, then there is a memory swap on your system.

Finally, the iostat command can be used to detect global I/O activity on a system.

If the latency problem is due to on-disk swapping in Redis, you need to reduce the system memory pressure, add more memory if Redis is using more than available memory, or avoid running other out-of-memory processes on the same system.

Delays due to AOF and disk I/O

Another reason for the delay is the AOF feature supported by Redis. AOF basically uses just two system calls to get the job done. One is write(2), which is used to append data to files, and the other is fdatasync(2), which is used to flush the kernel file cache to disk, which is used to ensure a user-specified persistence level.

Both Write (2) and fdatasync(2) can cause delays. For example, write(2) can be blocked when a wide system synchronization is underway or when the output buffer is full and the kernel needs to flush data to disk to ensure that new writes are accepted.

The fdatasync(2) call is an even worse cause of latency because many combinations of the kernel and file system used can take milliseconds to seconds to complete, especially if some other process is performing I/O. For this reason, Redis 2.4 May perform the fdatasync(2) call in a different thread.

We’ll see how Settings can improve the latency associated with using AOF files.

You can configure AOF to perform fsync on disk in three different ways using the appendfsync configuration option (this setting can be modified at run time using the CONFIG SET command).

whenappendfsyncIs set tonoRedis will not perform fsync. In this setup, onlywrite(2)Can cause delays. There is usually no solution to the delay that occurs in this case, for the simple reason that the disk copy speed cannot keep up with the speed of Redis receiving data, although this is very rare unless the disk reads and writes are slow because other processes are doing I/O.
whenappendfsyncIs set toeverysecRedis performs fsync once per second. It uses a different thread and Redis uses buffer to delay when fsync is runningwrite(2)(because in Linux, write and fsync can block when competing for the same file in the process). However, if fsync takes too long, Redis ends up making write calls, which can cause delays.
whenappendfsyncIs set toalways“, fsync is executed on each write operation, which occurs before the reply OK is sent to the client (in fact Redis tries to combine multiple commands executed simultaneously into one fsync). Performance is typically very slow in this mode, and fast disks and fast fsync file systems are highly recommended.

Most Redis users will use no or everysec to set appendfsync. Avoid other processes doing I/O on the same system when getting the minimum latency is recommended. Using SSDS helps a lot, but if the disk is free, then non-SSDs will do just fine because Redis doesn’t need to do any lookups when writing AOF files.

If you want to check whether the delay is related to AOF files, on Linux you can use the strace command to check:

sudo strace -p $(pidof redis-server) -T -e trace=fdatasync
Copy the code

The command above will display all fdatasync(2) commands executed by Redis on the main thread. You cannot view the fdatasync command that is executed in the background thread when appendfsync is set to everysec through the command above. You can add the -f parameter to view the fdatasync command running on the background.

If you want to see both fdatasync and write system calls at the same time, you can use the following command:

sudo strace -p $(pidof redis-server) -T -e trace=fdatasync,write
Copy the code

However, because the write command is also used to write data to the client socket, a lot of data that is not related to disk I/O can be displayed. There was obviously no way to tell Strace to display only slow system calls, so I used the following command:

Sudo strace - f - p $(pidof redis - server) - T - trace e = fdatasync, write 2 > &1 | grep -v '0.0' | grep -v unfinishedCopy the code

A delay due to expiration

Redis uses the following two methods to handle expired keys:

A passive approach is to delete a command when it is requested if it is found to be out of date.
One proactive approach is to remove expired keys every 100 milliseconds.

The active expiration approach is designed to be adaptive. Once every 100 milliseconds (10 times per second), it does the following:

usingACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOPKey, delete all expired keys.
If more than 25% of the keys are expired, repeat the process.

The default value for ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP is 20. This process is performed 10 times per second, and usually no 200 keys will be actively deleted due to expiration. This clears DB quickly, even if some keys haven’t been accessed for a long time, so passive algorithms aren’t helpful. At the same time, deleting 200 keys per second does not affect Redis latency.

However, this algorithm is adaptive and loops if more than 25% of the sampled keys are found to be out of date. But the algorithm runs 10 times per second, which means that more than 25% of the sampled keys can expire in the same second.

Basically, this means that if the database has too many expired keys in one second, and they constitute 25% of the current set of failed keys, Redis may block until the percentage of failed keys drops below 25%.

This is done to avoid using too much memory for dead keys, and is generally harmless, since it is odd to have a large number of dead keys in the same second, but it is not impossible for users to use the EXPIREAT command widely at the same Unix time.

In short: Note the response delay caused by a large number of keys expiring at the same time.

Redis software watchdog

Redis 2.6 introduces Redis software Watchdog, a debugging tool designed to track latencies that cannot be analyzed using other common tools.

Software watchdog is an experimental feature. Although it is designed for use in a development environment, you should back up the database before continuing to use it, as it may have unpredictable effects on the normal operation of the Redis server.

It should be used only when the cause cannot be found by other means.

Here’s how this feature works:

The user to useCONFIG SETCommand to enable the software watchdog.
Redis began to monitor himself all the time.
If Redis detects that the server is blocked in some operation and no return is made, and this may be the cause of the delay, a low-level report on the location of the server block will be generated in the log file.
Users contact developers in Redis’ Google Group and display the content of monitoring reports.

Note that this feature cannot be enabled in the redis.conf file because it is designed to be enabled only in already running instances and for testing purposes only.

Use the following command to enable this feature:

CONFIG SET watchdog-period 500
Copy the code

Period is specified at the millisecond level. The above example refers to logging to a log file when a server delay of 500 milliseconds or more is detected. The minimum configurable delay time is 200 milliseconds.

When you are done with watchdog, you can turn this function off by setting the watchdod-period parameter to 0.

Important: Always remember to turn it off, as it is not a good idea to leave a watchdog on for long periods of time.

The following is what is recorded in the log file when the watchdog detects a delay greater than the set value:

[8547 | signal handler] (1333114359)
--- WATCHDOG TIMER EXPIRED ---
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libpthread.so.0(+0xf8f0) [0x7f16b5f158f0]
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libc.so.6(usleep+0x34) [0x7f16b5c62844]
./redis-server(debugCommand+0x3e1) [0x43ab41]
./redis-server(call+0x5d) [0x415a9d]
./redis-server(processCommand+0x375) [0x415fc5]
./redis-server(processInputBuffer+0x4f) [0x4203cf]
./redis-server(readQueryFromClient+0xa0) [0x4204e0]
./redis-server(aeProcessEvents+0x128) [0x411b48]
./redis-server(aeMain+0x2b) [0x411dbb]
./redis-server(main+0x2b6) [0x418556]
/lib/libc.so.6(__libc_start_main+0xfd) [0x7f16b5ba1c4d]
./redis-server() [0x411099]
------
Copy the code

Note: In this example the DEBUG SLEEP command is used to group on the server. If the server blocks at different locations, the stack information will be different.

If you happen to collect multiple watchdog stacks, we encourage you to send them all to Redis’ Google Group: the more stacks we collect, the easier your instance’s problems will be to understand.