Decoding Redis is the most overlooked problem with high CPU and memory footprint

The authors introduce

Zhang Pengyi, senior database engineer of Tencent Cloud, participated in the r&d of Huawei Taurus distributed data and Tencent CynosDB for PG, and is now engaged in the r&d of Tencent Cloud Redis database.

When we use Redis, we always encounter some high CPU and memory usage of Redis -server. Here are a few practical examples to discuss some of the situations that are often overlooked when using Redis.

1. The CPU is high due to short connection

A user complains that the QPS is not high. The CPU is indeed high. Since the QPS is not high, it is likely that redis-server itself is doing some cleaning work or users are executing commands with high complexity. After checking, no key expiration deletion operation is performed or commands with high complexity are executed.

Perf analysis of Redis-server on the upper machine found that the CPU usage of listSearchKey function was relatively high, and the analysis of call stack found that listSearchKey was frequently called when the connection was released, and the user feedback was that the short connection was used, so it was inferred that the CPU usage was increased due to frequent connection release.

1. Comparative examples

The following uses the Redis-benchmark tool to do a comparison experiment using long connection and short connection respectively. Redis-server is community version 4.0.10.

1) Long connection test

Use 10000 long connections to ping redis-server 50w times:

./redis-benchmark -h host -p port -t ping -c 10000-n 500000 -k 1 (k=1 indicates long connection, k=0 indicates short connection)./redis-benchmark -h host -p port -t ping -c 10000-n 500000 -k 1

In the end, QPS:

PING_INLINE: 92902.27 requests per second

PING_BULK: 93580.38 requests per second

On the analysis of Redis-Server, it is found that readQueryFromClient occupies the highest CPU, which mainly deals with requests from clients.

2) Short connection test

Use 10000 short connections to ping redis-server 50w times:

./redis-benchmark -h host -p port -t ping -c 10000 -n 500000 -k 0

In the end, QPS:

PING_INLINE: 15187.18 requests per second

PING_BULK: 16471.75 requests per second

The redis-server analysis showed that the highest CPU usage was indeed listSearchKey, while readQueryFromClient accounted for a much lower percentage of the CPU than listSearchKey. In other words, the CPU became “idle”, processing user requests as a side job. Searching lists became the main occupation. Therefore, for the same amount of business requests, using short connections will increase the CPU burden.

From the perspective of QPS, there is a large gap between short connection and long connection, for two reasons:

The network overhead introduced each time the connection is rebuilt.
When the connection is released, redis-server consumes additional CPU cycles to clean up. (This can be optimized from the Redis-server side)

2. Redis connection is released

Let’s take a look at the code level to see what redis-server does after the client initiates the connection release. Redis-server directly enters freeClient when receiving the client’s disconnection request.

void freeClient(client *c) {

listNode *ln;

/ *… * /

/* Free the query buffer */

sdsfree(c->querybuf);

sdsfree(c->pending_querybuf);

c->querybuf = NULL;

/* Deallocate structures used to block on blocking ops. */

if (c->flags & CLIENT_BLOCKED) unblockClient(c);

dictRelease(c->bpop.keys);

/* UNWATCH all the keys */

unwatchAllKeys(c);

listRelease(c->watched_keys);

/* Unsubscribe from all the pubsub channels */

pubsubUnsubscribeAllChannels(c,0);

pubsubUnsubscribeAllPatterns(c,0);

dictRelease(c->pubsub_channels);

listRelease(c->pubsub_patterns);

/* Free data structures. */

listRelease(c->reply);

freeClientArgv(c);

/* Unlink the client: this will close the socket, remove the I/O

* handlers, and remove references of the client from different

* places where active clients may be referenced. */

/* Redis-server maintains a server.clients linked list. When a client establishes a connection, create a new client object and append it to server.clients.

When the connection is released, the client object */ needs to be removed from server.clients

unlinkClient(c);

/ *… * /

}

void unlinkClient(client *c) {

listNode *ln;

/* If this is marked as current client unset it. */

if (server.current_client == c) server.current_client = NULL;

/* Certain operations must be done only if the client has an active socket.

* If the client was already unlinked or if it’s a “fake client” the

* fd is already set to -1. */

if (c->fd != -1) {

/* Search the server.clients list and delete the client node object

ln = listSearchKey(server.clients,c);

serverAssert(ln ! = NULL);

listDelNode(server.clients,ln);

/* Unregister async I/O handlers and close the socket. */

aeDeleteFileEvent(server.el,c->fd,AE_READABLE);

aeDeleteFileEvent(server.el,c->fd,AE_WRITABLE);

close(c->fd);

c->fd = -1;

}

/ *… * /

So every time the connection is disconnected, there is an O(N) operation. For the memory database like Redis, we should try to avoid O(N) operation, especially in the case of a large number of connections, which has a significant impact on performance. Although this can be avoided by users who do not use short connections, in real scenarios, users may establish short connections after the client connection pool is full.

3, optimize

O(N) operation is performed every time the connection is released. Can the complexity be reduced to O(1)?

Clients is a two-way list. As long as the client object remembers its memory address when it is created, it does not need to iterate over the server. Next, try to optimize:

client *createClient(int fd) {

client *c = zmalloc(sizeof(client));

/ *… * /

listSetFreeMethod(c->pubsub_patterns,decrRefCountVoid);

listSetMatchMethod(c->pubsub_patterns,listMatchObjects);

if (fd != -1) {

/* The client records the listNode address of its own list */

c->client_list_node = listAddNodeTailEx(server.clients,c);

}

initClientMultiState(c);

return c;

}

void unlinkClient(client *c) {

listNode *ln;

/* If this is marked as current client unset it. */

if (server.current_client == c) server.current_client = NULL;

/* Certain operations must be done only if the client has an active socket.

* If the client was already unlinked or if it’s a “fake client” the

* fd is already set to -1. */

if (c->fd != -1) {

/* It is no longer necessary to search the server.clients list */

//ln = listSearchKey(server.clients,c);

//serverAssert(ln ! = NULL);

//listDelNode(server.clients,ln);

listDelNode(server.clients, c->client_list_node);

/* Unregister async I/O handlers and close the socket. */

aeDeleteFileEvent(server.el,c->fd,AE_READABLE);

aeDeleteFileEvent(server.el,c->fd,AE_WRITABLE);

close(c->fd);

c->fd = -1;

}

/ *… * /

Optimized after short connection test

Use 10000 short connections to ping redis-server 50w times:

./redis-benchmark -h host -p port -t ping -c 10000 -n 500000 -k 0

In the end, QPS:

PING_INLINE: 21884.23 requests per second

PING_BULK: 21454.62 requests per second

Compared with before optimization, short connection performance can be improved by 30+%, so it can ensure that the performance is not too bad in the case of short connection.

2. The CPU is high due to the info command

Some users periodically run the info command to monitor the status of Redis, which causes high CPU usage to some extent. Frequently executed the info through perf analysis found getClientsMaxBuffers, getClientOutputBufferMemoryUsage and getMemoryOverheadData this several functions CPU is quite high.

Using the Info command, you can pull the following status information (not all) of the redis-server:

client

connected_clients:1

Client_longest_output_list :0 // Redis – Specifies the maximum length of outputBuffer list on the server

Client_biggest_input_buf :0. // Redis – Specifies the maximum length of inputBuffer bytes on the server

blocked_clients:0

Memory

used_memory:848392

Used_memory_human: 828.51 K

used_memory_rss:3620864

Used_memory_rss_human: 3.45 M

used_memory_peak:619108296

Used_memory_peak_human: 590.43 M

Used_memory_peak_perc: 0.14%

Used_memory_overhead :836182 // The amount of memory redis-server uses to maintain its own structure other than the dataset

used_memory_startup:786552

used_memory_dataset:12210

Used_memory_dataset_perc: 19.74%

To obtain the client_longest_output_list and client_longest_output_list status, all clients on the Redis-server side need to be traversed, as shown in getClientsMaxBuffers. You might see the same O(N) operation here.

void getClientsMaxBuffers(unsigned long *longest_output_list,

unsigned long *biggest_input_buffer) {

client *c;

listNode *ln;

listIter li;

unsigned long lol = 0, bib = 0;

/* Traverses all clients, complexity O(N) */

listRewind(server.clients,&li);

while ((ln = listNext(&li)) ! = NULL) {

c = listNodeValue(ln);

if (listLength(c->reply) > lol) lol = listLength(c->reply);

if (sdslen(c->querybuf) > bib) bib = sdslen(c->querybuf);

}

*longest_output_list = lol;

*biggest_input_buffer = bib;

}

In order to get the USED_memory_overhead state, we also need to count the total amount of outputBuffer used by all clients, as shown by getMemoryOverheadData:

struct redisMemOverhead *getMemoryOverheadData(void) {

/ *… * /

mem = 0;

if (server.repl_backlog)

mem += zmalloc_size(server.repl_backlog);

mh->repl_backlog = mem;

mem_total += mem;

/ *… * /

mem = 0;

if (listLength(server.clients)) {

listIter li;

listNode *ln;

/* Iterate over all clients and calculate the total memory occupied by all clients’ outputBuffer. The complexity is O(N) */

listRewind(server.clients,&li);

while((ln = listNext(&li))) {

client *c = listNodeValue(ln);

if (c->flags & CLIENT_SLAVE)

continue;

mem += getClientOutputBufferMemoryUsage(c);

mem += sdsAllocSize(c->querybuf);

mem += sizeof(client);

}

mh->clients_normal = mem;

mem_total+=mem;

mem = 0;

if (server.aof_state ! = AOF_OFF) {

mem += sdslen(server.aof_buf);

mem += aofRewriteBufferSize();

}

mh->aof_buffer = mem;

mem_total+=mem;

/ *… * /

return mh;

}

The experiment

According to the above analysis, when the number of connections is high (N of O(N) is large), the frequency of executing the info command consumes more CPU.

1) Establish a connection and run the info command repeatedly

func main() {

c, err := redis.Dial(“tcp”, addr)

if err ! = nil {

fmt.Println(“Connect to redis error:”, err)

return

}

for {

c.Do(“info”)

}

return

}

Experimental results show that the CPU usage is only about 20%.

2) Set up 9999 free connections and execute info continuously for one connection

func main() {

clients := []redis.Conn{}

for i := 0; i < 9999; i++ {

c, err := redis.Dial(“tcp”, addr)

if err ! = nil {

fmt.Println(“Connect to redis error:”, err)

return

}

clients = append(clients, c)

}

c, err := redis.Dial(“tcp”, addr)

if err ! = nil {

fmt.Println(“Connect to redis error:”, err)

return

}

for {

_, err = c.Do(“info”)

if err ! = nil {

panic(err)

}

return

}

Experimental results show that the CPU can reach 80%, so avoid using the INFO command when the number of connections is high.

3) High memory usage due to pipeline

Some users found that the memory capacity of Redis-server occasionally increased significantly when using pipeline to do read-only operations, which was caused by improper pipeline. Here is a simple example to illustrate how Redis pipeline logic is.

The following is implemented by golang language to read key1, key2, key3 from the Redis-server side by way of pipeline.

import (

“fmt”

“github.com/garyburd/redigo/redis”

)

func main(){

C, err := redis.Dial(” TCP “, “127.0.0.1:6379”)

if err ! = nil {

panic(err)

}

C. end(“get”, “key1”) // Cache to the client buffer

C. end(“get”, “key2”) // Cache to the client buffer

C. end(“get”, “key3”) // Cache to the client’s buffer

C.F lush() // sends the contents of the buffer to the Redis-server in a specific protocol format

fmt.Println(redis.String(c.Receive()))

}

The server receives the following message:

*2 $3 get $4 key1 *2 $3 get $4 key2 *2 $3 get $4 key3

The following is a piece of informal code processing logic on the Redis-server side, which parses commands from the received content, executes them, caches the execution results into replyBuffer, and marks the client as having something to write out. The contents of the replyBuffer are sent to the client over the socket until the next event is scheduled, so instead of processing a command and then sending the result back to the client.

readQueryFromClient(client* c) {

read(c->querybuf) // c->query=”*2 $3 get $4 key1 *2 $3 get $4 key2 *2 $3 get $4 key3 “

cmdsNum = parseCmdNum(c->querybuf) // cmdNum = 3

while(cmsNum–) {

CMD = parseCmd(c->querybuf) // CMD: get key2, get key3

reply = execCmd(cmd)

appendReplyBuffer(reply)

markClientPendingWrite(c)

}

Consider this scenario:

If the client program is slow and fails to read the content from the TCP received buffer through C. receive () in time or fails to execute c. receive () due to some BUG, when the received buffer is full, the TCP sliding window on the server side is 0. The server cannot send the contents of the replyBuffer, so the replyBuffer takes up extra memory because it is not released. The problem is exacerbated when a pipeline packages too many commands at once and contains commands that operate on multiple objects such as Mget, hGEtall, lrange, etc.

summary

These are all very simple problems without complex logic, and in most situations they are not a problem, but in some extreme situations developers need to pay attention to these details to make the best use of Redis. Advice:

Try not to use short connections;
Try not to use INFO too often in high connection count scenarios;
When using a pipeline, it is necessary to receive the results of the request processing in a timely manner, and the pipeline should not package too many requests at once.

Order of congregations

September 27th – November 6th, Tencent Cloud Database King Challenge (click to see details) waiting for you to challenge! Take a few minutes to enter the contest and bring ☟☟ home for free!

MacBook/iPhone 11/AirPods
25台Kindle
Tencent Cloud Venture fund of 80,000 yuan
Face to face with Michael Widenius, the father of MySQL

With photos of you and the great gods and autographed books,

Can let next door code farmer envy to shed tears!

Repost the poster below to participate in the event and you can get Tencent Doll and Tencent Cloud database threshold free voucher. For details, please add the robot QR code consultation on the poster.

Competition details & entry entry

Scan the qr code below

Click to get started with Reids

Decoding Redis is the most overlooked problem with high CPU and memory footprint

1. The CPU is high due to short connection

1. Comparative examples

2. Redis connection is released

3, optimize

2. The CPU is high due to the info command

The experiment

summary

Order of congregations

Related Posts

Are new energy vehicles driving the automotive retail revolution?

Thinking about tidal problems caused by Red envelope riding of Mobike

How to defend websites against ddos attacks first of all, what is traffic attack