A preface,

In actual development, the following requirements are often encountered: to determine whether the current element exists in the known set, to maintain a HashSet for the elements in the known set, the result can be determined only with the time complexity of O(1) when using, and the corresponding data structure can be provided in Java or Redis. There are few drawbacks to using this approach other than the memory footprint.

When the amount of data reaches hundreds of millions, the footprint of memory space becomes obvious, and BitMap is one way to solve this problem.

Second, BitMap structure

1. Analysis of memory consumption

Redis BitMap can store data in the range of 0,2^32-1, exceeding integer. MAX_VALUE.

To simplify the discussion, assume that the set elements in question have a range of [0, INTEger.max_value], which can be any of these numbers.

The memory footprint of using a HashSet data structure is only related to the number of elements (N) in the collection. When the number of elements in the set is N, the required memory space is approximately N*4/1024/1024MB, and 100 million pieces of data occupy about 381MB of memory space.

The space occupied by BitMap based on Redis is not related to the number of elements in the set, but is directly related to the maximum value of elements in the set. Therefore, the memory space occupied by BitMap is in the range of N / 8/1024/1024, integer.max_value / 8/1024/1024.

// Tests 100 million, 500 million, 1 billion, integer.max_value
List<Integer> items = Arrays.asList(100000000.500000000.1000000000, Integer.MAX_VALUE);
for (Integer item : items) {
    int size = item / 8 / 1024 / 1024;
    System.out.printf("If the maximum value in the set is %-10s, the memory space occupied is %3sMB%n",item, size);
}
Copy the code

A set of test reference data is presented here

If the maximum value in the set is 100000000, the memory used is 11MB. If the maximum value in the set is 500000000, the memory used is 59MB. If the maximum value in the set is 1000000000, the memory used is 119MB If the maximum value in the set is 2147483647, the memory used is 255MBCopy the code

As the data in the collection grows to 1 billion pieces, the maximum memory footprint is about 255MB using BItMap and 3.8GB using HashSet.

2, command line operation BitMap

The Redis command line can directly operate the BitMap. If the value of offset is marked as 1, the current data exists. By default, the unlabeled location value is 0.

When data exists in the set, the corresponding bit is set to 1
SETBIT key offset value
# check whether the corresponding bit data exists (1 indicates yes, 0 indicates no)
GETBIT key offset
Copy the code
3. The client operates BitMap

A SpringBoot ecosystem RedisUtils utility class is provided, which internally encapsulates the utility methods for manipulating Redis Bitmaps.

// Mark the current location as true
RedisUtils.setBit(BIT_MAP_KEY, orderId, true);
// Get the value of the specified position (if the corresponding value exists)
RedisUtils.getBit(BIT_MAP_KEY, orderId)
Copy the code

The dependencies of the above tool classes are as follows. If you cannot find the Jar package, please use Maven’s original repository source directly. Ali Cloud has not been synchronized yet.

<dependency>
    <groupId>xin.altitude.cms</groupId>
    <artifactId>ucode-cms-common</artifactId>
    <version>1.4.3</version>
</dependency>
Copy the code
4. Time and space complexity

The time complexity of BitMap storage and value is O(1), and subscripts can be directly mapped according to values.

The complexity of memory space occupied by BitMap is O(n), which is positively related to the maximum value of elements in the set, not the number of elements in the set.

Third, BitMap application

1. Avoid cache penetration

Cache penetration is when the currently requested data does not exist in the cache and the database needs to be accessed to retrieve the data (the requested data also does not exist in the database). Cache penetration brings pressure to the database, malicious cache penetration can even cause database downtime.

Use BitMap to dynamically maintain a set. Before accessing the database, check whether the primary key of the data exists in the set. This is the basis for whether to access the database.

Adding or removing BitMap data is a lightweight operation, and the accuracy of the check operation depends on the integrity of the closed loop maintained by the dynamic collection. For example, adding data to a database requires adding data to a BitMap, and deleting data from a database requires removing data from a BitMap. If strict reliability checks are required, a separate distributed scheduled task can be maintained to periodically update BitMap data.

2, and the difference between bloom filter

Bloom filter and BitMap have similar application scenarios, but there are some differences. Given a number, a BitMap can know exactly whether it exists in a given set. The Bloom filter can accurately determine whether it is not in the set, but it cannot be sure that it is in the set.

The time complexity of adding or removing data in BitMap is O(1), which is convenient and fast. Bloom filter is easy to build, and the operation of removing data is tedious.

Bitmaps are preferred in cases where precise judgments are required, such as whether the phone number is registered.

Four, summary

Redis BitMap is not a new data structure, but a layer of encapsulation using string types that looks like a new type of data structure. BitMap is less like a technology and more like an algorithm, balancing temporal complexity with spatial complexity.

Other applications of BitMap include check-in, clocking in, counting people online, and so on.


Like this article click ♥️ like ♥️ support, if necessary, can contact me through wechat Dream4S. Related source code in GitHub, video explanation in B station, this article in the blog world.