Redis is precise to recalculate - roar bitmap

If you want to count the reading amount of an article, you can directly use Redis INCR command to complete. If you require that the number of reads must be de-read by user, then you can use a set to record all user ids that read the article. The length of the set is de-read. However, if popular articles are read too much, set will waste too much storage space. At this time, we need to use the HyperLogLog data structure provided by Redis to replace SET, which will only occupy up to 12K of storage space to complete massive de-statistics. But it sacrifices accuracy. It is fuzzy counting, with an error rate of about 0.81%.

So is there an exact way to count that doesn’t waste a lot of space? The first thing that comes to mind is bitmaps, where you can use one bit of a bitmap to represent a user ID. If a user ID is 32 bytes, the exact count can be done using a bitmap using 1/256 of the space. But how do you map the user ID to the position on the map? This is fine if the user ids are consecutive integers, but in general the user system’s user ids are not integers, but strings or large integers with some randomness.

We can force a sequence of integers for each user ID, and then store the mapping between user ids and integers in Redis.

$next_user_id = incr user_id_seq
set user_id_xxx   $next_user_id
$next_user_id = incr user_id_seq
set user_id_yyy  $next_user_id
$next_user_id = incr user_id_seq
set user_id_zzz   $next_user_id
Copy the code

So, you might say, well, you’re saving space by storing the mapping between the user ID and the integer. This is a good question, but at the same time, we should also see that this mapping can be reused, it can count all the articles read, it can count the daily and monthly activity of the check-in users, and it can be used in many other statistical situations that require users to de-weight. This is the meaning of the saying, “achievements in the present and benefits in the future”.

With this mapping, we can easily construct a bitmap for each article read. For each user, the corresponding position in the bitmap will be one. If the bit changes from 0 to 1, you can add 1 to the reading number. This can be very convenient to obtain the number of articles read.

And we can dynamically calculate how many public users have read two articles? Perform an AND calculation on the two bitmaps, AND then count the number of bits 1 in the bitmap. Also, there can be OR calculation, XOR calculation and so on are feasible.

Here we go again! Redis bitmap is a dense bitmap. If you have a large bitmap with only the last bit being a 1 and all the other bits being zeros, the bitmap will still take up all of the memory, which is not normally wasteful. As you can imagine, most of the articles aren’t read very much, but the amount of space they take up is pretty similar to the amount of memory that popular articles take up.

We need to think of something else! Here comes the Roar Bitmap.

It divides the whole bitmap into chunks. If the whole chunk is zero, then the whole chunk is not saved. But what if the ones are scattered, and there are 1’s in each of the blocks, even though there are very few 1’s in each of the blocks, it’s not enough to just partition, what do you do? Let’s think again, for a single block, can we continue to optimize? If the number of bits inside a single block is small, we can store only the in-block offsets (integers) of all bits 1, i.e. a list of integers, and the in-block storage can also be reduced. This is the sparse form of storage for a single block bitmap — a list of stored offset integers. Sparse storage is converted to dense storage all at once only if the bit 1 in a single block exceeds a threshold.

In addition to saving space, roar bitmap can also reduce the computational efficiency of AND AND OR alleles. Where previously the entire bitmap had to be computed, now only partial blocks have to be computed. If the block is very sparse, then only the collection of AND AND OR operations need to be performed on the list of small integers, AND the computation can be further reduced.

There is neither space for time nor time for space, but logical complexity for both space and time.

The roar bitmap has a maximum bit length of 2^32 and a corresponding space of 512 megabytes (normal bitmap). The bit offset is divided into high 16 bits and low 16 bits, with high 16 bits representing the block offset and low 16 bits representing the position within the block. A single block can represent 64K bits, or 8K bytes. There will be 64K blocks at most. The L1 Cache of modern processors is generally larger than 8K, which can ensure that a single block can fit into the L1 Cache, which can significantly improve performance.

If all bits of a single block are zero, then it does not need to be stored. Whether a specific block exists can also be expressed by bitmap. When the block is small, it can be expressed by integer list. When the block is large, it can be converted into a common bitmap. Integer lists take up less space, and they also have a dynamic scaling mechanism similar to ArrayLists to avoid repeatedly expanding and copying the contents of arrays. When the number in the list exceeds 4096, it is immediately converted to a normal bitmap.

The data structure used to express the presence or absence of a block can be the same as the structure used to express the presence or absence of a single block, because the presence or absence of a block is essentially also 0 and 1, which are common bit flags.

But Redis doesn’t have native support for roar bitmaps. How do we use it?

Redis does not have native, but the Roar bitmap Redis Module does.

Github.com/aviggiano/r…

The number of stars for this project is not very large, so let’s take a look at its official performance comparison

OP	TIME/OP (us)	ST.DEV. (us)
R.SETBIT	31.89	28.49
SETBIT	29.98	29.23
R.GETBIT	29.90	14.60
GETBIT	28.63	14.58
R.BITCOUNT	32.13	0.10
BITCOUNT	192.38	0.96
R.BITPOS	70.27	0.14
BITPOS	87.70	0.62
R.BITOP NOT	156.66	3.15
BITOP NOT	364.46	5.62
R.BITOP AND	81.56	0.48
BITOP AND	492.97	8.32
R.BITOP OR	107.03	2.44
BITOP OR	461.68	8.42
R.BITOP XOR	69.07	2.82
BITOP XOR	440.75	7.90

This is obviously a sparse bitmap, and only a sparse bitmap can render such nice numbers. For dense bitmaps, roar bitmaps are certainly a bit weaker than normal bitmaps, but usually not much worse.

So let’s take a look at the source code and see what the internal structure looks like, okay

/ / a single block
typedef struct roaring_array_s {
    int32_t size;
    int32_t allocation_size;
    void **containers;  // Points to an array of integers or a normal bitmap
    uint16_t *keys;
    uint8_t *typecodes;
    uint8_t flags;
} roaring_array_t;

/ / all the pieces
typedef struct roaring_bitmap_s {
    roaring_array_t high_low_container;
} roaring_bitmap_t;
Copy the code

It is obvious that the presence or absence of a block and the data within the block are expressed using the same data structure, which is roaring_bitmap_t. There are multiple encodings in this structure, and types are represented using the Typecodes field.

#define BITSET_CONTAINER_TYPE_CODE 1
#define ARRAY_CONTAINER_TYPE_CODE 2
#define RUN_CONTAINER_TYPE_CODE 3
#define SHARED_CONTAINER_TYPE_CODE 4
Copy the code

When we look at the type definition here, we see that it is not only the normal bitmap and arraylist forms mentioned earlier, but also the RUN and SHARED types. RUN form is bitmap compression form, such as continuous several 101102103104105106107108109 said after the RUN is 101, 8 (1 behind is 8 on the integer), and so on the space can obviously compression. Roar bitmaps normally have no blocks of type RUN inside them. Only optimization apis that show calls to roar bitmaps are converted to RUN format, and this API is roaring_bitmap_run_optimize.

The SHARED type is used to share blocks between roar bitmaps, and it also provides write copy. A new copy will be made when the block is modified.

There are more details about the calculation logic of the roar bitmap, which we will cover later.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Redis is precise to recalculate – roar bitmap

Redis is precise to recalculate – roar bitmap

Related Posts

MySQL transaction isolation and MVCC

Microsoft’s distributed application framework Dapr Helloworld

What happens when the database runs out of increment ids?