Redis series of directories

Redis series – distributed lock

Redis series – Cache penetration, cache breakdown, cache avalanche

Why Is Redis so fast?

Redis series — Data Persistence (RDB and AOF)

Redis series – consistent hash algorithm

Redis series – High Availability (Master slave, Sentinel, Cluster)

Redis series – Things and Optimism lock

Redis series — Geospatial: Do you have Lao Wang next door?

Bitmaps: Did you check in today?

What is a Bloom filter?!

Daily active platform 100,000,000!! 80,000,000 people checked in today!!

How do you figure that out? If you could design, how would you design? Mysql > select count(*) from ‘select count(*)’ Large amount of data, are you going to do offline data processing? How to ensure real-time performance? Is there an easier way?

You probably already know that Redis is not just a key-value store, but also Hash, List, Set, Zset.

Redis is actually a data structure server that supports different types of values.

Bitmaps is a useful data type in Redis that 90% of programmers don’t know about.

Review the redis series: Cache Penetration, Cache Breakdown, Cache Avalanche, and What is a Bloom filter? Bloom Filter.

What is the Bitmaps

Take a look at the official description of Bitmaps:

  • Bitmaps are not actual data types, but rather a collection of byte-oriented operations defined on strings. Because strings are binary-safe blocks, their maximum length is 512 megabytes, which is best set to 2^32 different bytes.

  • Bitmaps have two types of bit operations: 1. Fixed time single bit operations, such as setting a bit of a String to 1 or 0, or getting the value of a bit to 2. For a set of bit operations, count the number of bits set to 1 for a given range of bits (e.g. demographics).

  • The biggest advantage of Bitmaps is that it saves a lot of space when storing data. For example, if a project uses a self-growing ID to identify users, it can record information about 4 billion users with only 512 megs of memory (for example, if a user wants to receive new notifications, identified by 1s and 0s).

Bitmaps are simply an array of variable-length bits. Each bit can store only 0 or 1. Setbit key (0,2,4,6) 1

bit0 bit1 bit2 bit3 bit4 bit5 bit6 bit7
1 0 1 0 1 0 1 0

The logins (logged in and not logged in) of 100 million people a day can be stored using a 100 million bit array indexed by the user’s userId (assuming the userId is incremented).

How to use

Here are a few common bitmaps action commands.

SETBIT key offset value

The initial version of this command is 2.2.0. Time complexity: O(1); Return value: the original bit value at offset; This command sets or clears the bit value of the key’s value(string) at offset.

The bit at that location is either set or cleared, depending on the value (which can only be 0 or 1). When key does not exist, a new string value is created. Make sure the string is large enough to have a bit at offset. The offset parameter must be greater than or equal to 0 and less than 2^32(the bitmap size is limited to 512MB). When the string corresponding to the key increases, the added bit value is set to 0.

Warning: For a nonexistent key, use the setbit command for the first time to set value at offset=2^32-1 (the last bit), or for an existing key, This is not the first time that the setbit command has been used to set value at offset=2^32-1. However, when the offset position was set earlier, Redis needed to allocate all memory immediately, which may cause the service to block for a while. On a 2010MacBook Pro, the offset of 2^ 32-1 (allocated 512MB) requires ~ 300ms, the offset of 2^ 30-1(allocated 128MB) requires ~ 80ms, and the offset of 2^ 28-1 (allocated 32MB) requires ~ 30ms. Offset is 2^26-1 (8MB is allocated) 8ms is required. Note that once memory is allocated for the first time, subsequent calls to SETBIT on the same key will not be allocated in advance.

For example, a platform needs to count daily user activity (if a user logs in that day, it counts as an active user that day), and every day needs to generate a new Bitmaps to store the user’s logins that day.

Setbit active:{date} {userId} value; Key =active:2020-07-01; key=active:2020-07-01;

127.0.0.1:6379> setbit Active :2020-07-01 666 1#userId=666. This is the first user to log in today.
(integer) 0 127.0.0.1:6379> setbit Active :2020-07-01 100000000 1#userId=100000000 user logged in, this is the second user logged in today.
(integer) 0
127.0.0.1:6379> setbit active:2020-07-01 33 1
(integer) 0
127.0.0.1:6379> setbit active:2020-07-01 666 1 
(integer) 0
127.0.0.1:6379> setbit active:2020-07-01 100000 1
(integer) 0
Copy the code

When the first user logs in, a new String is created. The space allocated to this String is 667bit. The user whose offset is 0 to 665 is not logged in, and the value is 0.

When the second user logs in, the String already exists and is allocated between 666 and 100000000 bytes, which causes a short block.

GETBIT key offset

The initial version of this command is 2.2.0. Time complexity: O(1); Return value: the bit value at offset; The command returns the bit of the string at offset corresponding to the key.

When offset exceeds the length of the string, the string is assumed to be a contiguous space filled with zero bits. When the key does not exist, it is considered an empty string, so offset is always out of range, and value is also considered a contiguous space filled with 0 bits.

Mysql > select * from user where userId=666;

127.0.0.1:6379> setbit active:2020-07-01 666
(integer1)Copy the code

Return to 1, indicating login.

BITCOUNT key [start end]

The initial version of this command is 2.6.0. Time complexity: O(N); This command counts the number of bits that the string is set to 1. Return value: Number of bits to be set to 1.

In general, the entire string given will be counted, and by specifying additional start or end arguments, you can make counting happen only on specific bits. The start and end arguments are set similarly to the GETRANGE command, with negative values: -1 for the last bit, -2 for the penultimate bit, and so on. Nonexistent keys are treated as empty strings, so BITCOUNT on a nonexistent key results in 0.

For example, how many users are active (logged in) on 2020-07-01:

127.0.0.1:6379> bitcount active:2020-07-01 0 -1
(integer) 342342326
Copy the code

3,423,42326 users logged in.

There are many more commands for manipulating Bitmaps, because bitmaps are essentially strings.www.redis.cn/commands.ht…

In addition to the above command, you can also use the related command to compare the difference between two strings, and, or, and not; There are many practical scenarios that can be implemented with these simple commands.

Usage scenarios

Since each position in the bit array can only store 0 or 1 states; So for real-world business scenarios that deal with two states, consider using Bitmaps. Such as user login/not login, check-in/not check-in, follow/not follow, clocked/not clocked, etc. At the same time, Bitmap also uses relevant statistical methods for fast statistics.

Memory usage comparison

If a platform has 800 million users, and the average daily active users are 100 million, and the List and Bitmap storage platforms are respectively used to store the memory usage of active users (login) on a given day (1KB=1000bit) :

The data type Each userId takes up space The number of users to be stored Total memory usage
List 4 * 8bit=32bit(assuming userId uses int storage) 100000000 32 bit * 100000000 = 400 MB
Bitmaps 1bit 800000000 1 bit * 800000000 = 100 MB

If a platform has 800 million users and the average daily active users are 1 million, and the List and Bitmap storage platforms are used to store the memory usage of active users (login) on a given day (1KB=1000bit) :

The data type Each userId takes up space The number of users to be stored Total memory usage
List 4 * 8bit=32bit(assuming userId uses int storage) 1000000 32 bit * 1000000 = 4 MB
Bitmaps 1bit 800000000 1 bit * 800000000 = 100 MB

So using Bitmap is not the best choice in all situations. Although the platform has 800 million users, there are very few active users. This is using Bitmaps, and if only one user logs in (userId=800,000,000-1 user logs in), 100MB space needs to be allocated.

Pay attention to

The bitmaps type (string) contains a maximum of 512 MB. Setbit may take a long time if the offset is large. Bitmaps aren’t always good, and can sometimes be a waste of space.

Do you already know what Bloot Filter should do?

Done, done!!

[Spreading knowledge and sharing value], thank you for your attention and support, I am [Zhuge Little Ape], a struggling Internet migrant worker.