What is a Bloom filter

The name is like every law, you ask why it’s Called Newton’s law, because Newton invented it or discovered it. “His”

What can he do? It maps a binary vector to a function. Bloom filters can be used to detect the presence of elements in a set or for quick retrieval.

Disadvantages: There are certain deletion issues and error recognition rates

Advantages: Query time and space are far more than the ordinary algorithm

How to implement Bloom Filter

When adding Item or element, create a hash function and a KEY to form a mapping. Set the data =1, as long as the retrieval judgment =1 to know whether the data exists or not, with this method, when the query is found to have 0, the proof must not exist, so on the other hand, if the element is 1, the proof is likely to exist.

Notice why it is said that it is likely to exist, because it has a certain identification error, but this error can be ignored in the actual production process, after all, the advantages outweigh the disadvantages.

Look at the text dizzy, motionless on the drawing, to see should understand a lot of.

People speaking

What exactly does a Bloom filter do?

Special ID not mention ha, database ID is basically self-increasing right! We pass the ID back end to the DB query, which makes perfect sense.

But what if we use negative numbers? One or two doesn’t matter. What if there are thousands? Basically the database will be a lot of pressure to bear, server configuration, not to mention, slow down the system running speed and even downtime are possible, so is not a bit of bloom filter show incisively and vividly. “Dog”

So hanging, also has a price, because it is also uncertain, there is a certain degree of error of judgment!

Q: Why the miscarriage of justice?

A: The search key is not in the container, but the hash key is always 1. If the Bloom filter has a blacklist, then create a whitelist.

Q: Why isn’t it easy to delete?

Counting Bloom Filter Key =1; Counting Bloom Filter Key =1;

So much said about how to achieve

1: estimated quantity n and expected misjudgment rate FPP

2: size of hash function and bit set

Bit Set Size Size

Function hash select, estimate n and bit array length m to get hash function Key

How does it work? Maven project

 <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>23.0</version>
 </dependency>    



Copy the code

A piece of test code I wrote

/** * Bloom filter - used for redis cache penetration *@authorAuthor northwest Big zongzi */
public class TestBloomFilterByDZZ {

    private static int total = 19999;
    private static BloomFilter<Integer> bfilter = BloomFilter.create(Funnels.integerFunnel(), total);

  // Initialize the data
    public static void main(String[] args) {
        for (int i = 0; i < total; i++) {
            bfilter.put(i);
        }

        // If there are any mismatches
        for (int i = 0; i < total; i++) {
            if(! bfilter.mightContain(i)) { System.out.println("Did not pay attention to the northwest big zongzi slip..."); }}// How many more inside matches out
        int count = 0;
        for (int i = total; i < total + 10000; i++) {
            if (bfilter.mightContain(i)) {
                count++;
            }
        }
        System.out.println("Run with cannon fodder:"+ count); }}Copy the code

Applicable service scenarios

1: When a large amount of data is stored in the database, it can be used as a name or unique item for checking. If the data exists, it will be skipped and stored in the database if it does not exist

2: filtering spam, which is a calculation you can combine with your own business to understand.

Daily attention, quality of a key three.

Write at least 3 original articles a week to leave something behind when you’re being tortured by business.

Recently like a sentence “Dao no art, art can be sought. If there is art, there is no tao.