Middleware - Redis five Advanced data types | More challenges in August

This is the 31st day of my participation in the August Text Challenge.More challenges in August

1 HyperLogLog(Cardinal statistics)

The main application scenario of HyperLogLog is cardinality statistics. Instead of actually storing the value of each element, it uses a probabilistic algorithm that counts elements by storing the first 1 of the element’s hash value. HyperLogLog can perform independent statistics in a very small space. The command is as follows:

The command	role
pfadd key element …	Add all elements to the key
pfcount key	Estimated value of statistics key (imprecise)
pgmerge new_key key1 key2 …	Merge key to new key

The application case

How many different accounts visit the Google homepage every day?

For a webpage with huge page views like Google, there is not much difference between the number of page views of one billion or one billion and one million. Therefore, in such a business scenario, in order to save costs, only a approximate value can be calculated, but it is not necessary to calculate an accurate value.

For the above scenario, you can use HashMap, BitMap, and HyperLogLog. Here’s a comparison of these three solutions:

HashMap: Simple algorithm, high statistical accuracy, recommended for a small amount of data, but for a large amount of data will occupy a large memory space
BitMap: bitmap algorithm, the specific content can refer to my article, high statistical accuracy, although the memory footprint thanHashMapSmall, but still takes up a lot of memory for a lot of data
HyperLogLog: Has a certain error and occupies about 12 KB of memory. 2^64 elements can be counted. You are advised to use this parameter in the application scenario described in the preceding example

2 Geo(Geospatial Information)

Geo is mainly used to store geographic location information and perform operations on the stored information (adding, obtaining, calculating the distance between two locations, obtaining the location set within a specified range, obtaining the location set within a specified range). Redis supports storing Geo information in ordered sets (Zsets) and populating them with Geohash algorithms. The command is as follows:

The command	role
geoadd key latitude longitude member	Add the member location (latitude, longitude, name) to the key
geopos key member …	Gets the member’s GEO coordinates
geodist key member1 member2 [unit]	Calculates the distance between member positions. Return null if one of the two positions does not exist
georadius	Query based on latitude and longitude coordinate range
georadiusbymember	Query based on member location range
geohash	Compute the latitude and longitude hash

GEORADIUS

 GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [COUNT count]
Copy the code

Returns all of the location elements contained by the key that are within a given maximum distance from the center, centered at a given latitude and longitude. A range can use one of the following units:

M is in meters
Km is expressed in kilometers
Mi means miles
Ft is in feet

The command returns additional information when given the following options:

WITHDIST: Returns the location element and the distance between the location element and the center. The units of distance and range should be the same
WITHCOORD: Returns the longitude and dimension of the positional element as well
WITHHASH: Returns the original Geohashed ordered set score of the positional element as a 52-bit signed integer. This option is mainly used for low-level applications or debugging, and is not useful in practice

The command returns unsorted positional elements by default. The user can specify the order of the elements of the returned position with the following two parameters:

ASC: Returns the position element from near to far based on the position of the center
DESC: Returns the position element from far to near based on the position of the center

By default, the GEORADIUS command returns all matching positional elements. Although the user can use the COUNT < COUNT > option to retrieve the first N matched elements, because the command may internally need to process all matched elements, even using the COUNT option to retrieve a small number of elements when searching a very large area, Commands can also be executed very slowly. On the other hand, using the COUNT option to reduce the number of elements that need to be returned is still very useful for reducing bandwidth.

3 Pub/Sub(Publish and Subscribe)

Publish subscriptions are similar to broadcast functions. Redis publishing subscriptions include publishers, subscribers, and channels. Common commands are as follows:

The command	role	Time complexity
subscribe channel	Subscribe to a channel	O(n)
unsubscribe channel …	Unsubscribe from one or more channels	O(n)
publish channel msg	Sends the message to the specified channel	O(n+m), where n is the number of subscribers for the channel and M is the number of clients using Subscribed Patterns
pubsub CHANNELS	Viewing subscription and publication system status (Multi-seed mode)	O(n)
psubscribe	Subscribe to multiple channels	O(n)
unsubscribe	Unsubscribe from multiple channels	O(n)

4 Bitmap

A Bitmap is a Bitmap, which is actually a byte array. It is represented by a series of consecutive binary digits (0 or 1), each of which is offset. A Bitmap uses each binary bit to store or mark the corresponding value of an element. It is usually used to determine whether certain data exists or not. Since bitmaps are stored in bits, they can greatly save storage space. Common commands are as follows:

The command	role	Time complexity
setbit key offset val	Assign val to offset of the specified key	O(1)
getbit key offset	Gets the offset bit of the specified key	O(1)
bitcount key start end	Returns the number of 1 in [start,end] of the specified key	O(n)
bitop operation destkey key	Bit operations on different binary storage data (AND, OR, NOT, XOR)	O(n)

The application case

There are 100 million users, 50 million logins, so count the number of daily logins. Each one identifies a user ID, and when a user visits our site, we set the Bitmap that identifies the user to 1. Using a set and Bitmap store comparison:

The data type	Each userID takes up space	The number of users to be stored	Take up all the memory
set	32-bit is 4 bytes (assuming userID is an integer, but many sites use long integers)	50000000	32-bit x 50,000,000 = 200 MB
Bitmap	1 bit (bit)	100000000	1 bit * 100,000,000 = 12.5MB

Application scenarios

User Online Status
User check-in status
Collecting unique Users

5 BloomFilter

When an element is added to the set, K hash functions are used to map the element to K points in a bit array (multiple hash functions are used to hash the element key (bloom does not contain value) to calculate an integer index value, and then modulo the length of the bit array to obtain a position. Each unbiased hash function gets a different position), set them to 1. When we search, we know (roughly) if it is in the set by seeing if all of the points are 1:

If any of these points is 0, the element being checked must be absent
If they’re all 1’s, it doesn’t necessarily mean that the element is there, but maybe they’re 1’s because other elements are there, and that’s why bloom’s filter miscalculates

Application scenarios

Solve cache penetration: put all existing keys into Redis BloomFilter in advance, which is used for existence detection. If BloomFilter does not exist, the data must not exist. If BloomFilter exists, the actual data may not exist
Blacklist check: assuming that the number of blacklist is hundreds of millions, storage is very expensive storage space, Bloom filter is a better solution. Put all blacklists in the Bloom filter, and then check whether the email address is in the Bloom filter
Web interceptor: for the first request, the user puts the request parameters into the Bloom filter. For the second request, the user determines whether the request parameters are hit by the Bloom filter, thus improving the cache hit ratio

5.1 Based on Bitmap data structure

 import com.google.common.base.Preconditions;
 import org.springframework.data.redis.core.RedisTemplate;
 import org.springframework.stereotype.Service;
 
 import javax.annotation.Resource;
 import java.util.Collection;
 import java.util.Map;
 import java.util.concurrent.TimeUnit;
 
 @Service
 public class RedisService {
 
     @Resource
     private RedisTemplate<String, Object> redisTemplate;
 
     /** * Adds the value */ based on the given Bloom filter
     public <T> void addByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) { Preconditions.checkArgument(bloomFilterHelper ! =null."BloomFilterHelper cannot be empty");
         int[] offset = bloomFilterHelper.murmurHashOffset(value);
         for (int i : offset) {
             redisTemplate.opsForValue().setBit(key, i, true); }}/** * Determines the presence of */ based on the given Bloom filter
     public <T> boolean includeByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) { Preconditions.checkArgument(bloomFilterHelper ! =null."BloomFilterHelper cannot be empty");
         int[] offset = bloomFilterHelper.murmurHashOffset(value);
         for (int i : offset) {
             if(! redisTemplate.opsForValue().getBit(key, i)) {return false; }}return true; }}Copy the code

5.2 Based on RedisBloom module

The RedisBloom module provides four data types:

Bloom Filter
Cuckoo Filter
Count-Mins-Sketch
Top-K

Bloom filters and Cuckoo are used to determine (with a given degree of certainty) whether an item exists in a set. Count-Min Sketch is used to estimate the number of items in the sublinear space, and top-K is used to maintain a list of K most frequent items.

 # 1. The git download
 [root@test ~]# git clone https://github.com/RedisBloom/RedisBloom.git
 [root@test ~]# cd redisbloom
 [root@test ~]# make
 
 # 2. The wget to download
 [root@test ~]Wget # https://github.com/RedisBloom/RedisBloom/archive/v2.0.3.tar.gz
 [root@test ~]# tar - ZXVF RedisBloom - 2.0.3. Tar. Gz
 [root@test ~]# CD RedisBloom - 2.0.3 /
 [root@test ~]# make
 
 # 3. Modify the Redis Conf
 [root@test ~]#vim /etc/redis.conf
 # Add downlink to fileLoadmodule/root/RedisBloom - 2.0.3 / RedisBloom. SoStart Redis Server
 [root@test ~]# /redis-server /etc/redis.conf
 Or load the OS file when starting the service
 [root@test ~]# /redis-server /etc/redis.conf --loadmodule /root/RedisBloom/redisbloom.so
 
 # 5. Test RedisBloom
 [root@test ~]# redis-cli127.0.0.1:6379> bf.add bloomFilter foo 127.0.0.1:6379> bf.exists bloomFilter foo 127.0.0.1:6379> cf. Add cuckooFilter foo 127.0.0.1: > 6379 cf. The exists cuckooFilter fooCopy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Middleware – Redis five Advanced data types | More challenges in August

1 HyperLogLog(Cardinal statistics)

2 Geo(Geospatial Information)

3 Pub/Sub(Publish and Subscribe)

4 Bitmap

5 BloomFilter

5.1 Based on Bitmap data structure

5.2 Based on RedisBloom module

Middleware – Redis five Advanced data types | More challenges in August

1 HyperLogLog(Cardinal statistics)

2 Geo(Geospatial Information)

3 Pub/Sub(Publish and Subscribe)

4 Bitmap

5 BloomFilter

5.1 Based on Bitmap data structure

5.2 Based on RedisBloom module

Related Posts

Java Web Servlet technology

How does Java connect to SQLserver database?

Using OpenResty as backend server – CSDN blog