One, the phenomenon

  • Instance name: R-BP1CXXXXXXXXXD04 (master/slave)
  • Time: 12:26~12:27 on November 16, 2017
  • Problem: Memory per minute increases by 2 gigabytes, as shown below:
  • Key size: about 60 million

Second, Redis memory analysis

1. Memory composition

The memory in the figure above counts the used_memory property of Redis’s info memory command, for example:

Redis > info memory # memory used_memory:9195978072 used_memory_human: 8.56g used_memory_rss:9358786560 Used_memory_peak: 10190212744 used_memory_peak_human: 9.49 G used_memory_lua: 38912 mem_fragmentation_ratio: 1.02 Mem_allocator: jemalloc - 3.6.0Copy the code

A detailed description of each attribute

The property name Attributes that
used_memory The amount of memory allocated by the Redis allocator is the amount of memory that actually stores data
used_memory_human Returns the total amount of memory used by Redis in readable format
used_memory_rss From an operating system perspective, the total physical memory occupied by Redis processes
used_memory_peak The maximum memory allocated by the memory allocator, representing the historical peak of USED_memory
used_memory_peak_human Display peak memory consumption in a readable format
used_memory_lua Memory consumed by the Lua engine
mem_fragmentation_ratio Used_memory_rss/USED_memory ratio, which indicates the memory fragmentation rate
mem_allocator The memory allocator used by Redis. Default: jemalloc

The calculation formula is as follows:

Used_memory = own memory + object memory + buffer memory +lua memory USED_RSS = USed_memory + memory fragmentationCopy the code

As shown below:

2. Memory analysis:

(1) Its own memory: an empty Redis occupies a small footprint and can be ignored (2) KV memory: key object + value object (3) Buffer: Client buffer (normal + slave camouflage + PUBSUB) and AOF buffer (relatively fixed, generally no problem) (4) Lua: memory consumed by the Lua engine

3. Common problems of memory surge

(1) KV memory: bigkey, mass write (2) Client buffer: common client buffer (such as monitor command) or Pubsub client buffer

Three, problem investigation

(1) bigkey?

No BigKey is found in the scan

Sampled 67234427 keys in the keyspace! Total Key Length in bytes is 1574032382 (AVG Len 23.41) Biggest string found 'CCARD_DEVICE_CARD_REF_MAP_KEY_016817000004209' has 20862 bytes Biggest list found 'CCARD_VALID_DEVICE_TRAIN_QUEUE_KEY' has 51 items Biggest hash found 'CCARD_VALID_DEVICE_TRAIN_MAP_KEY' has 51 fields 67234359 strings with 71767890 bytes 67 Lists with 151 items (00.00% of keys, avG size 1.07) 67 Lists with 151 items (00.00% of keys, Avg size 0.00) 0 sets with 0 members (00.00% of keys, AVg size 0.00) 1 hashs with 51 fields (00.00% of keys, Avg size 0.00) 0 zsets with 0 members (00.00% of keys, AVg size 0.00)Copy the code

(2) The number of keys increases?

No significant change in key values was observed

(3) Client buffer

Info clients: memory does not drop for a long time due to buffer issues:

Found after execution:

redis> info clients
# Clients
connected_clients:43
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
admin_clients:6
rejected_vpc_conn_count:0
close_idle_unknown_conn_count:0Copy the code

There is no obvious omEM greater than 0 in the execution client

id=80207 addr=10.xx.0.4:63920 fd=46 name= age=624 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80215 addr=10.xx.0.23:43489 fd=36 name= age=591 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80366 addr=10.xx.0.8:59785 fd=18 name= age=84 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=del read=0 write=0 type=user
id=80356 addr=10.xx.0.33:32117 fd=13 name= age=114 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80064 addr=10.xx.59.4:53446 fd=38 name= age=1070 idle=1070 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL read=0 write=0 type=admin
id=80276 addr=10.xx.0.23:48511 fd=8 name= age=387 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80188 addr=10.xx.0.33:16265 fd=42 name= age=681 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80326 addr=10.xx.0.32:59779 fd=16 name= age=209 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80065 addr=10.xx.59.4:53447 fd=45 name= age=1070 idle=1070 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL read=0 write=0 type=admin
id=79936 addr=10.xx.0.22:10607 fd=30 name= age=1480 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80174 addr=10.xx.0.5:60914 fd=6 name= age=722 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80300 addr=10.xx.0.22:22757 fd=48 name= age=298 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80037 addr=10.xx.0.5:55189 fd=15 name= age=1143 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80330 addr=10.xx.0.8:48533 fd=17 name= age=199 idle=10 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=79896 addr=10.xx.0.30:26814 fd=11 name= age=1616 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80299 addr=10.xx.0.24:11227 fd=44 name= age=303 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80086 addr=10.xx.0.32:52526 fd=40 name= age=1002 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80202 addr=10.xx.0.33:16658 fd=26 name= age=636 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80256 addr=10.xx.0.24:60496 fd=19 name= age=448 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=79908 addr=10.xx.0.29:18975 fd=12 name= age=1583 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80365 addr=10.xx.0.29:46429 fd=14 name= age=85 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=79869 addr=10.xx.27.4:48455 fd=35 name= age=1700 idle=1700 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL read=0 write=0 type=admin
id=80334 addr=10.xx.0.23:50012 fd=39 name= age=189 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80041 addr=10.xx.0.32:51107 fd=33 name= age=1132 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=79992 addr=10.xx.0.22:12068 fd=28 name= age=1289 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80251 addr=10.xx.0.30:44213 fd=23 name= age=468 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80006 addr=10.xx.0.2:45895 fd=31 name= age=1242 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80321 addr=10.xx.0.30:48048 fd=5 name= age=224 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80381 addr=10.xx.0.8:13360 fd=22 name= age=24 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=del read=0 write=0 type=user
id=80200 addr=10.xx.0.24:59183 fd=24 name= age=640 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80113 addr=10.xx.0.2:52492 fd=21 name= age=915 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=174 addr=11.216.117.242:53027 fd=9 name= age=281390 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=replconf read=0 write=0 type=admin
id=79991 addr=10.xx.0.4:48412 fd=25 name= age=1296 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80301 addr=127.0.0.1:47869 fd=49 name= age=291 idle=261 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=strlen read=0 write=0 type=admin
id=80047 addr=10.xx.59.4:53184 fd=41 name= age=1114 idle=1114 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL read=0 write=0 type=admin
id=80236 addr=10.xx.0.5:62546 fd=47 name= age=516 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80364 addr=10.xx.0.4:18794 fd=7 name= age=85 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80175 addr=10.xx.0.4:62245 fd=29 name= age=718 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80336 addr=10.xx.0.29:45701 fd=50 name= age=180 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80050 addr=10.xx.59.4:53188 fd=43 name= age=1114 idle=1114 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL read=0 write=0 type=admin
id=79765 addr=10.xx.0.2:33832 fd=37 name= age=2027 idle=177 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info read=0 write=0 type=user
id=80170 addr=10.xx.0.2:57853 fd=20 name= age=728 idle=24 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping read=0 write=0 type=user
id=80390 addr=127.0.0.1:49449 fd=27 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client read=0 write=0 type=adminCopy the code

Four, find out the culprit

I used all the commonly used methods, but it still didn’t work. My colleague @Jingyuan helped to analyze it and wondered if it was because the KV hash table of Redis was rehash.

1. Kv storage structure of Redis

As shown in the figure below, all kV of Redis is stored in dict, where HT corresponds to two hash tables HT [0] and HT [1]. Usually, one is idle and the other is used to store data. Ht [1] is only used when rehash is needed.

2.Redis dictionary Rehash

To ensure the load of the hash table, rehash is performed when the number of hash table elements equals the number of slots in the hash table.

The capacity of h[1] after expansion is equal to 2n of the first hash table greater than or equal to HT [0]. Size *2. For example, if the initial capacity of the hash table is 4, the next expansion will be 8, and so on.

3. The test

(1) Test method

Write the data in batches near the Rehash threshold, and then write the data one by one to observe the memory changes

Int expireTime = 60 * 60 * 24; // Rehash threshold -50 Int rehashThreshold = (int) math.pow (2, 25) -50; Pipeline = jedis.pipelined(); pipelined = jedis.pipelined() pipeline = jedis.pipelined(); for (int i = 0; i < rehashThreshold; i++) { pipeline.setex(String.valueOf(i), expireTime, String.valueOf(i)); if (i % 10000 == 0) { pipeline.sync(); } } pipeline.sync(); Timeunit.seconds.sleep (5); for (int i = rehashThreshold; i < rehashThreshold + 200; i++) { jedis.setex(String.valueOf(i), expireTime, String.valueOf(i)); TimeUnit.SECONDS.sleep(1); }Copy the code

(2) Start the test

(a) When the threshold =215=32768, it can be seen from the following that when the number of keys is 32769, the memory increases a little, but it is not obvious.

Keys MEM clients blocked requests connections 32766 4.69m 3 0 32797 (+2) 4 32767 4.69m 3 0 32799 (+2) 4 32768 4.69m 3 0 32801 (+2) 4 32769 5.44m 3 0 32803 (+2) 4Copy the code

(b) When the threshold =220=1048576, it can be seen from the following that when the number of keys is 1048577, the memory increases by 32M. Because rehash is expanded, the slot in the new hash table is 221 * 2 (because each key has an expiration date, expires table), the pointer is 8 bytes, 221 ️ 2 ️ 8 = 225 = 32MB

Keys MEM clients blocked requests connections 1048574 128.69m 3 0 3364129 (+2) 16 1048575 128.69m 3 0 3364131 (+2) 16 1048576 128.69m 3 0 3364133 (+2) 16 1048577 160.69m 3 0 3364135 (+2) 16 1048578 160.69m 3 0 3364137 (+2) 16 1048578 160.69m 3 0 3364137 (+2) 16Copy the code

(c) When the threshold =226=67108864, it can be seen from the following that when the number of keys is 67108865, the memory increases by 2GB. Because rehash is expanded, the slot in the new hash table is 227 * 2 (because each key has an expiration date, expires table), and the pointer is 8 bytes. 227 ️ 2 ️ 8 = 231 = 2GB

Keys MEm clients blocked requests connections 67108862 9.70g 3 0 70473683 (+2) 18 67108863 9.70g 3 0 70473685 (+2) 18 67108864 9.70g 3 0 70473687 (+2) 18 67108865 11.70g 3 0 70473689 (+2) 18 67108866 11.70g 3 0 70473691 (+2) 18 67108867 9.70g 3 0 70473687 (+2) 18 67108867 11.70g 3 0 70473693 (+2) 18Copy the code

If you look at the key and memory changes for R-BP1C15FD9B142D04, you can see that the above rules are correct:

4 Follow-up Observation

At 17:00, rehash ends and memory is reduced by half.

Five, the summary

  • Due to the nature of hash tables, Redis has a large number of key values, which will not affect the performance of access, but will cause the problems mentioned in this article. There are several suggestions for controlling the number of keys:

    • Unused key values are set to expire or deleted periodically.
    • Optimize key-value design: For example, you can use ziplist hash to merge and optimize partial string types.
  • Future improvements: kernel level audit logging support for Rehash and speed enhancements to Rehash.

advertising

ApsaraDB for Redis is a stable and reliable database service with excellent performance and flexible scalability. It is based on the Flying Distributed system and high performance storage of all SSDS, and supports two sets of high availability architecture: primary/secondary edition and cluster edition. It provides a full range of database solutions including disaster recovery switchover, failover, online capacity expansion, and performance optimization. Welcome to buy Redis version of cloud database

Everyday environment application: www.atatech.org/articles/77…

Internal users access KVStore, please read access documents www.atatech.org/articles/75…