JDK BitMap implementation of BitSet source analysis

The premise

The main content of this article is to analyze the JDK BitMap implementation of java.util.BitSet source code implementation, based on JDK11 written, other versions of the JDK may not be appropriate.

In this article, the low level of the graph bit should be on the right, but in order to improve the reading experience, the author changed the low level to the left.

What is a BitMap

A BitMap, literally translated as a BitMap, is a data structure that represents a Dense Set in which every element appears at least once, with no other data associated with the element. Used in indexing, data compression, etc. (wikipedia entry). In a computer, 1 byte = 8 bit, and a bit (bit, called bit or bit) can represent 1 or 0. A bit is used to mark the value of an element, and KEY or INDEX is the element, forming a mapping diagram. Since Bit is used as the unit of underlying storage data, storage space can be greatly saved.

In Java, an integer of type int is 4 bytes, 16 bits, and the maximum value of int is more than 2 billion. Suppose we now have a requirement to determine the existence of an integer m among 2 billion integers, requiring memory usage to be less than or equal to 4GB. If every integer uses int storage, then the storage of 2 billion integers requires 2 billion * 4byte /1024/1024/1024 = 7.45GB, which is obviously not enough to meet the requirements. If you use BitMap, you only need 2 billion bits of memory, which is 2 billion /8/1024/1024/1024 is about 0.233GB. In the case of a large amount of data and finite state of the data set, BitMap storage and subsequent calculation can be considered. Now suppose we use byte array to make the underlying BitMap storage structure, initialize a BitMap instance with capacity 16, as shown in the following example:

You can see that the current byte array has two elements, bitmap[0] (virtual subscript [0,7]) and bitmap[1] (virtual subscript [8,15]). Suppose we use the BitMap instance constructed above to store customer ids and customer gender relationships (bit 1 for male and bit 0 for female) and add male customers with ID equal to 3 and female customers with ID equal to 10 to the BitMap:

Since 1 byte = 8 bit, the byte array index to be stored can be located by dividing the customer ID by 8. Then, the index of the specific bit in the byte array to be stored can be obtained by modulating the customer ID based on 8:

#Male customer with ID equal to 3Logical index = 3 Byte array index = 3/8 = 0 Bit index = 3% 8 = 3 => That is, the logical index must be stored in byte[0] bit with subscript 3, which is set to 1
#Female client with ID equal to 10Logical index = 10 Byte array index = 10/8 = 1 Bit index = 10% 8 = 2 => That is, the logical index must be stored in byte[1] bit with subscript 2, which is set to 0Copy the code

Then determine the gender of the customer whose ID is 3 or 10 respectively:

If another male user with customer ID 17 is added, the old BitMap can only store 16 bits, so it needs to be expanded. It is determined that only one byte element (byte[2]) needs to be added to the byte array:

In principle, the underlying byte array can be continuously expanded. When the byte array length reaches Integer.MAX_VALUE, the BitMap capacity reaches the maximum.

BitSet is simple to use

Java.util. BitSet is actually a built-in BitMap implementation of the JDK, despite its name. 1 is a very old class, annotated in JDK1.0, but most of its methods have been added or updated since JDK1.4. The example in the previous section is a BitSet Demo:

public class BitSetApp {

    public static void main(String[] args) {
        BitSet bitmap = new BitSet(16);
        bitmap.set(3, Boolean.TRUE);
        bitmap.set(11, Boolean.FALSE);
        System.out.println("Index 3 of bitmap => " + bitmap.get(3));
        System.out.println("Index 11 of bitmap => " + bitmap.get(11));
        bitmap.set(17, Boolean.TRUE);
        // The BitSet will not trigger expansion because the underlying storage array is long[]
        System.out.println("Index 17 of bitmap => " + bitmap.get(17)); }}// Output the result
Index 3 of bitmap => true
Index 11 of bitmap => false
Index 17 of bitmap => true
Copy the code

The API is relatively simple to use. BitSet also provides several useful static factory methods for constructing instances, setting ranges and clearing bit values, and some set operations to satisfy other scenarios. This is not an example, but will be expanded in detail when analyzing the source code.

BitSet source code analysis

As mentioned earlier, bitmaps that use byte arrays must be expanded when the logical subscripts of newly added elements exceed the maximum logical subscripts of the initialized byte array. In order to reduce the number of expansion as much as possible, in addition to the need to define the initialization of the underlying storage structure according to the actual situation, we should also choose the data type array that can “carry” more bits. Therefore, in the BitSet, the underlying storage structure chooses the LONG array. A long integer is 64 bits, and its bit length is 8 times that of a byte integer. This can effectively reduce the number of capacity expansion in scenarios where a large data range needs to be processed. In order to simplify the analysis process, we will use as few elements as possible to simulate the changes of the underlying LONG array. There are some comments on the design at the top of the BitSet, which can be summarized as follows:

BitSetIs an implementation of a vector of growable bits, in which each bit is by design a Boolean value and the logical index of the bit is a non-negative integer
BitSetThe initial value of all bits offalse(integer0)
BitSetthesizeProperty related to its implementation,lengthProperty (logical length of bit table) is implementation-independent
BitSetDesigned to be non-thread-safe, multi-threaded environments require additional synchronization

As usual with source code analysis, let’s look at all the core member attributes of a BitSet:

public class BitSet implements Cloneable.java.io.Serializable {
    
    // Words is a long array, a long integer is 64bit, 2^6 = 64, select 6 as the address parameter of words, based on the logical subscript can be quickly located in the specific element index of words
    private static final int ADDRESS_BITS_PER_WORD = 6;

    // The number of bits per element in words, 64 in decimal
    private static final int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;

    // bit Subscript mask. The decimal value is 63
    private static final int BIT_INDEX_MASK = BITS_PER_WORD - 1;

    // Mask, decimal value -1, that is, 64 bits are all 1s, used for partial word mask shift left or right
    private static final long WORD_MASK = 0xffffffffffffffffL;

    /**
     * 序列化相关，略过
     */
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("bits".long[].class),
    };

    /** * The underlying bit storage structure, the long array, is also the corresponding value of the serialized field "bits" */
    private long[] words;

    /** * The number of words in the logical length of the current BitSet, instantaneous value */
    private transient int wordsInUse = 0;

    /** * marks whether the length of the words array is user */
    private transient boolean sizeIsSticky = false;

    // The serialization version used by JDK 1.0.2
    private static final long serialVersionUID = 7997698588986878753L;

    // Omit the other methods temporarily
}
Copy the code

Let’s look at some helper methods for bitsets:

// Bit-based logical subscript locates the index of word elements in words, directly moving 6 bits to the right
// For example, if bitIndex = 3, then bitIndex >> ADDRESS_BITS_PER_WORD => 0.
// For example, if bitIndex = 35, then bitIndex >> ADDRESS_BITS_PER_WORD => 1.
private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD;
}

// Every public method must preserve these invariants. The identity check of internal variables literally means that every public method must call this identity check
// First identity: the current BitSet is empty or the last words element cannot be 0.
// The second identity: wordsInUse boundary check, range is [0, words.length]
// The third identity: wordsInUse or equal to words.length, which means all elements of words are used; Or words[wordsInUse] == 0, meaning that none of the elements in words whose index is [wordsInUse, words.length-1] are used
private void checkInvariants(a) {
    assert(wordsInUse == 0 || words[wordsInUse - 1] != 0);
    assert(wordsInUse >= 0 && wordsInUse <= words.length);
    assert(wordsInUse == words.length || words[wordsInUse] == 0);
}

// Recalculate the wordsInUse value, that is, refresh the calculated value of the words element used
// Walk forward to I = 0 based on the current wordsinuse-1, find the last non-0 words[I], and reassign to I + 1, where I is the index of the words array
// wordsInUse is the subscript 1 of the last non-zero element in the words array, or the number of words used.
private void recalculateWordsInUse(a) {
    // Traverse the bitset until a used word is found
    int i;
    for (i = wordsInUse-1; i >= 0; i--)
        if(words[i] ! =0)
            break;

    wordsInUse = i+1; // The new logical size
}
Copy the code

Then look at BitSet constructors and static factory methods:

// The default public constructor, the logical length of the bit table is 64, the words array is 2, and the sizeIsSticky flag is false, that is, the length of the bit table is not user-defined
public BitSet(a) {
    initWords(BITS_PER_WORD);
    sizeIsSticky = false;
}

// Construct a custom bit table logical length, the length must be a non-negative integer, the flag sizeIsSticky is true, that is, the length of the bit table is user-defined
public BitSet(int nbits) {
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);
    initWords(nbits);
    sizeIsSticky = true;
}

// Initialize the words array with length = (nbits-1) / 64 + 1
// For example, nbits = 16, which is equivalent to long[] words = new long[(16-1) / 64 + 1] => new long[1];
// For example, nbits = 65, equivalent to long[] words = new long[(65-1) / 64 + 1] => new long[2];
// And so on
private void initWords(int nbits) {
    words = new long[wordIndex(nbits-1) + 1];
}

// Directly customize the underlying words array constructor, marking all words elements are used
private BitSet(long[] words) {
    this.words = words;
    this.wordsInUse = words.length;
    checkInvariants();
}

// Directly customize the underlying words array constructor. This constructor, unlike the previous one, iterates from the back of the long array until it reaches the first element or a non-zero element, so as to truncate useless high-order zeros as much as possible
ValueOf (new long[]{1L, 0L}) => bitset.valueof (new long[]{1L}) => bitset.valueof (new long[]{1L})
public static BitSet valueOf(long[] longs) {
    int n;
    for (n = longs.length; n > 0 && longs[n - 1] = =0; n--)
        ;
    return new BitSet(Arrays.copyOf(longs, n));
}

Longs => words; longs => words; longs => words; longs => words; longs => words
public static BitSet valueOf(LongBuffer lb) {
    lb = lb.slice();
    int n;
    for (n = lb.remaining(); n > 0 && lb.get(n - 1) = =0; n--)
        ;
    long[] words = new long[n];
    lb.get(words);
    return new BitSet(words);
}

// Each of the following constructors iterates from the back of a byte array until it reaches the first or non-zero element, truncates a new array, and converts it to a long array to construct an instance of a BitSet
public static BitSet valueOf(byte[] bytes) {
    return BitSet.valueOf(ByteBuffer.wrap(bytes));
}

public static BitSet valueOf(ByteBuffer bb) {
    // Endian sort
    bb = bb.slice().order(ByteOrder.LITTLE_ENDIAN);
    int n;
    // Get the first element or the first non-zero element from back to front
    for (n = bb.remaining(); n > 0 && bb.get(n - 1) = =0; n--)
        ;
    // We need to consider the case where the number of bytes in the byte container is not a multiple of 8
    long[] words = new long[(n + 7) / 8];
    // Truncate the following 0 element
    bb.limit(n);
    int i = 0;
    // If the number of remaining elements is greater than or equal to 8, read in 64 bits
    while (bb.remaining() >= 8)
        words[i++] = bb.getLong();
    // When the number of remaining elements is less than 8, bytes are read and padded into long array elements by mask calculation and left shift
    for (int remaining = bb.remaining(), j = 0; j < remaining; j++)
        words[i] |= (bb.get() & 0xffL) < < (8 * j);
    return new BitSet(words);
}
Copy the code

BitSet valueOf(ByteBuffer BB); BitSet valueOf(ByteBuffer BB);

ByteBuffer byteBuffer = ByteBuffer.allocate(10);
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
byteBuffer.putLong(1L);
byteBuffer.put((byte)3);
byteBuffer.put((byte)1);
byteBuffer.flip();
BitSet bitSet = BitSet.valueOf(byteBuffer);
System.out.println(bitSet.size());
System.out.println(bitSet.length());

// Output the result
128
73
Copy the code

The process is as follows:

Moving on to the normal set, get, and clear methods:

// Sets the bits of the specified logical index to true
public void set(int bitIndex) {
    // Bit logical index must be greater than or equal to 0
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);
    // Computes the index of the words array element
    int wordIndex = wordIndex(bitIndex);
    // Determine whether the capacity needs to be expanded. If so, expand the words array
    expandTo(wordIndex);
    / / equivalent words [wordIndex] = words [wordIndex] | < < bitIndex (1 l)
    // Note that long overflows if it moves more than 63 bits to the left, i.e. 1L << 64 => 1L,1L << 65 => 1L << 1,1L << 66 => 1L << 2... And so on
    / / here is equivalent to the left shift results directly and corresponding words elements or operations, because the former is based on 1 left, only a bit binary number must be 1, other bits are 0 64 bits of the secondary system sequence, or operation will make the corresponding words elements corresponding to the former bits of 1 bit value is set to 1, And reassign the corresponding words element
    / / similar to this: 0000 0000 0000 1000 | = > 0000 1000
    words[wordIndex] |= (1L << bitIndex); // Restores invariants
    // invariant identity assertion check
    checkInvariants();
}

// Expand based on the calculated index of words array elements
private void expandTo(int wordIndex) {
    // Calculates the length of the words array required for the current words element subscript
    int wordsRequired = wordIndex + 1;
    // If the current words element subscript requires more words than the number of elements in the current words array, then expand the array (not necessarily expand the array, there is a decision in the expansion method).
    if (wordsInUse < wordsRequired) {
        // Expand based on the required words array length
        ensureCapacity(wordsRequired);
        // Resets the number of elements in the currently used Words arraywordsInUse = wordsRequired; }}// Expand the capacity based on the calculated subscript of words array elements, and copy the array if necessary
private void ensureCapacity(int wordsRequired) {
    // If the current words array length is smaller than the required words array length, expand the words array
    if (words.length < wordsRequired) {
        // The length of the new array allocated is the maximum length between the old words array element and the required words array passed in
        int request = Math.max(2 * words.length, wordsRequired);
        // Array expansion
        words = Arrays.copyOf(words, request);
        // The bit table length is not user-defined because it has been expanded
        sizeIsSticky = false; }}// Gets the state of the bits of the specified logical index
public boolean get(int bitIndex) {
    // Bit logical index must be greater than or equal to 0
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);
    // invariant identity assertion check
    checkInvariants();
    // Computes the index of the words array element
    int wordIndex = wordIndex(bitIndex);
    // The index of the words element must be less than the number of words elements in use, and if the bitIndex result is not 0, return true, otherwise return false
    // Similar to this (return true) : 0000 1010&0000 1000 => 0000 1000 => Indicates that the element in the located words was set to 1 by the set method corresponding to 1L << bitIndex
    // similar to this (if false is returned) : 0000 0110&0000 1000 => 0000 0000 => Indicates that the element in the located words has not set the corresponding 1L << bitIndex bit to 1 through the set method, and the corresponding bit uses the default value 0
    return (wordIndex < wordsInUse) && ((words[wordIndex] & (1L<< bitIndex)) ! =0);
}

// Sets the bits of the specified logical index to false
public void clear(int bitIndex) {
    // Bit logical index must be greater than or equal to 0
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);
    // Computes the index of the words array element
    int wordIndex = wordIndex(bitIndex);
    // If the index of the words array element is greater than or equal to the number of words elements in use, it indicates that the bits of the logical subscript are initialized and not in use
    if (wordIndex >= wordsInUse)
        return;
    Words [wordIndex] = words[wordIndex] & (~(1L << bitIndex))
    // Invert each bit of the left shift and then perform the and operation with the corresponding words element, and then reassign to the corresponding words element
    // similar to: 0000 1100&(~(0000 1000)) => 0000 1100&1111 0111 => 0000 0100
    words[wordIndex] &= ~(1L << bitIndex);
    // Recalculate the wordsInUse value, that is, refresh the calculated value of the words element used
    recalculateWordsInUse();
    // invariant identity assertion check
    checkInvariants();
}
Copy the code

Here is a simulation of the set method:

Then look at the set operation related methods:

// Check whether two bitsets overlap. This is a check method that does not modify the structure of the current BitSet
public boolean intersects(BitSet set) {
    // Compare the current BitSet instance with the number of words elements used by the input BitSet instance, taking the smaller value as the traversal baseline
    for (int i = Math.min(wordsInUse, set.wordsInUse) - 1; i >= 0; i--)
        // Iterate over and compare each element of the words array, and return true as long as the result of the operation is not 0.
        if((words[i] & set.words[i]) ! =0)
            return true;
    return false;
}

// AND operation, the bottom layer is two BitSet instances corresponding to the index of the words array element AND operation, which is to calculate the intersection of two BitSet instances, stored in the BitSet instance
public void and(BitSet set) {
    // The input parameter is this BitSet instance and is not processed
    if (this == set)
        return;
    // If the current BitSet instance already uses more words elements than it passed in, the current BitSet instance sets the extra words elements to 0
    while (wordsInUse > set.wordsInUse)
        words[--wordsInUse] = 0;
    // Iterates over the used elements of the words array of the current BitSet instance, suming and reassigning the same index elements of the passed BitSet instance
    for (int i = 0; i < wordsInUse; i++)
        words[i] &= set.words[i];
    // Recalculate the wordsInUse value, that is, refresh the calculated value of the words element used
    recalculateWordsInUse();
    // invariant identity assertion check
    checkInvariants();
}

// OR operation, the bottom layer is two BitSet instances corresponding to the index of the words array element to perform OR operation, intuitively, is to calculate the union of two BitSet instances, stored in the BitSet instance
public void or(BitSet set) {
    // The input parameter is this BitSet instance and is not processed
    if (this == set)
        return;
    // Calculate the common portion of the words array used in the two bitsets, namely the smaller wordsInUse
    int wordsInCommon = Math.min(wordsInUse, set.wordsInUse);
    // The words array of the current BitSet instance has a smaller number of elements than the words array of the passed BitSet instance
    if (wordsInUse < set.wordsInUse) {
        ensureCapacity(set.wordsInUse);
        wordsInUse = set.wordsInUse;
    }
    // The words array of two BitSet instances has been indexed by or using the common part of the element, assigning the words element corresponding to the index in the current BitSet instance
    for (int i = 0; i < wordsInCommon; i++)
        words[i] |= set.words[i];
    // If the BitSet instance passed in has more than a common portion of the words array elements already in use, this portion of the words array elements will be copied to the current BitSet instance. This is where the wordsInUse of the current BitSet instance is greater than or equal to the wordsInUse of the incoming BitSet instance
    if (wordsInCommon < set.wordsInUse)
        System.arraycopy(set.words, wordsInCommon,
                            words, wordsInCommon,
                            wordsInUse - wordsInCommon);
    // invariant identity assertion check
    checkInvariants();
}

// XOR operation, the bottom layer is two BitSet instances corresponding to the index of the words array element XOR operation, implementation is basically similar to OR, after completion of the processing needs to recalculate the current BitSet instance wordsInUse value
public void xor(BitSet set) {
    int wordsInCommon = Math.min(wordsInUse, set.wordsInUse);
    if (wordsInUse < set.wordsInUse) {
        ensureCapacity(set.wordsInUse);
        wordsInUse = set.wordsInUse;
    }
    // Perform logical XOR on words in common
    for (int i = 0; i < wordsInCommon; i++)
        words[i] ^= set.words[i];

    // Copy any remaining words
    if (wordsInCommon < set.wordsInUse)
        System.arraycopy(set.words, wordsInCommon,
                            words, wordsInCommon,
                            set.wordsInUse - wordsInCommon);
    recalculateWordsInUse();
    checkInvariants();
}


// AND NOT operation is performed on the words array elements corresponding to the index of the two BitSet instances. The process is similar to AND operation
public void andNot(BitSet set) {
    // Perform logical (a & ! b) on words in common
    for (int i = Math.min(wordsInUse, set.wordsInUse) - 1; i >= 0; i--)
        words[i] &= ~set.words[i];
    recalculateWordsInUse();
    checkInvariants();
}
Copy the code

Here is a simulation of the and method:

NextXXXBit, previousYYYBit, nextSetBit(int fromIndex) as an example:

FromIndex = fromIndex = fromIndex = fromIndex = fromIndex = fromIndex
public int nextSetBit(int fromIndex) {
    // The initial bit logical index must be greater than or equal to 0
    if (fromIndex < 0)
        throw new IndexOutOfBoundsException("fromIndex < 0: " + fromIndex);
    // invariant identity assertion check
    checkInvariants();
    // Computes the index of the words array element based on the initial bit logical index
    int u = wordIndex(fromIndex);
    // The index of the words array element exceeds the number of words array elements already in use
    if (u >= wordsInUse)
        return -1;
    -1l << 2 => 1111 1111 << 2 => 1111 1100 (this is assuming the constraint length is 8 and the high level of the overflow is discarded)
    // for example: 0000 1010&(1111 1111 << 2) => 0000 1010&1111 1100 => 0000 1000
    long word = words[u] & (WORD_MASK << fromIndex);
    // Iterate based on the obtained word
    while (true) {
        Bit 1 exists in word, and the logical index of that bit is calculated and returned
        if(word ! =0)
            // Count the number of consecutive 0 bits in word based on the initial bit logical index of words array elements * 64 + word
            return (u * BITS_PER_WORD) + Long.numberOfTrailingZeros(word);
        WordsInUse returns -1 if all bits in word are 0, then u must be incremented backwards
        if (++u == wordsInUse)
            return -1; word = words[u]; }}Copy the code

NextSetBit (int fromIndex) searches for the element in the words array where fromIndex is located, and retrits it backwards if it is not sufficient. A classic example of using the nextSetBit(int fromIndex) method is given in the annotation. Here is an excerpt:

BitSet bitmap = new BitSet();
// add element to bitmap
// ... bitmap.set(1993);
for (int i = bitmap.nextSetBit(0); i >= 0; i = bitmap.nextSetBit(i + 1)) {
    // operate on index i here
    if (i == Integer.MAX_VALUE) {
        break; // or (i+1) would overflow}}Copy the code

Finally, let’s look at some getters for spec attributes:

// Get the total number of bits in the BitSet
public int size(a) {
    // Words array length * 64
    return words.length * BITS_PER_WORD;
}

// Get the total number of bits 1 in the BitSet
public int cardinality(a) {
    int sum = 0;
    // Get the number of bits 1 for each used words element and add them up
    for (int i = 0; i < wordsInUse; i++)
        sum += Long.bitCount(words[i]);
    return sum;
}

// Get the logical size of the BitSet. In words[wordsinuse-1], the first bit of the BitSet is 1. For example, 0001 1010, the first bit is 1
public int length(a) {
    if (wordsInUse == 0)
        return 0;
    return BITS_PER_WORD * (wordsInUse - 1) +
        (BITS_PER_WORD - Long.numberOfLeadingZeros(words[wordsInUse - 1]));
}
Copy the code

Other methods, such as set(int fromIndex, int toIndex) and clear(int fromIndex, int toIndex), are limited to space without detailed analysis, and the routines are roughly similar.

BitSet does not solve the problem

ADDRESS_BITS_PER_WORD: The choice of word size is determined purely by performance concerns. The addressing base value of 6 is chosen because the underlying array is long, minimizing the number of expansions in the underlying array. Paradoxically, there seems to be no way in the JDK to find a data type with a larger bit width than Long that takes up the same amount of memory as Long, such as Byte, String, and so on, with far more capacity expansion than Long. Because the underlying array storage structure does not define the lower and upper bounds of the elements in the array, excessive unused memory space can be wasted in certain scenarios. If we want to add integers between 1 billion and 2 billion to a BitSet instance (all of which are marked as 1 in the BitSet’s logical bit index), then in the BitSet’s underlying bit table, the logical index [0.1 billion] will have the initial value of 0. That is, about half of words[N] are 0, which is completely wasted. In the actual business scenarios, and many times the primary key of the business is not using the database on the primary key, but use, is through the SnowFlake of this kind of algorithm with the increasing trend of the numerical keys, if the algorithm defined benchmark timestamp is bigger, then the generated value is far beyond the int type of cap (using long bearing). What BitSet doesn’t solve is the following:

Problem 1: The upper limit of the logical index value of the bit table isInteger.MAX_VALUE, currently there is no way to expand intoLong.MAX_VALUE, the reason isJDKThe array is designed at the bottomlengthAttributes areintType, can be fromjava.lang.reflect.ArrayClass is aware of this limitation, the author believes that the problem can not be solved from the bottom layer
Problem two:BitSetScenario optimization that does not take into account the logical index range of a known bit table, that is, must be stored[0, lower boundary)This part of the0Value, which can waste too much unused memory space in certain scenarios

For problem one, consider doing a simple mapping. Suppose you now want to store [integer.max_value + 1, integer.max_value + 3] into a BitSet instance, you can actually store [1,3]. Long realIndex = (long) bitIndex + integer. MAX_VALUE;

public class LargeBitSetApp {

    public static void main(String[] args) {
        long indexOne = Integer.MAX_VALUE + 1L;
        long indexTwo = Integer.MAX_VALUE + 2L;
        long indexThree = Integer.MAX_VALUE + 3L;
        BitSet bitmap = new BitSet(8);
        // set(int fromIndex, int toIndex) => [fromIndex,toIndex)
        bitmap.set((int) (indexOne - Integer.MIN_VALUE), (int) (indexThree - Integer.MIN_VALUE) + 1);
        System.out.printf("Index = %d,value = %s\n", indexOne, bitmap.get((int) (indexOne - Integer.MIN_VALUE)));
        System.out.printf("Index = %d,value = %s\n", indexTwo, bitmap.get((int) (indexTwo - Integer.MIN_VALUE)));
        System.out.printf("Index = %d,value = %s\n", indexThree, bitmap.get((int) (indexThree - Integer.MIN_VALUE))); }}// Output the result
Index = 2147483648,value = true
Index = 2147483649,value = true
Index = 2147483650,value = true
Copy the code

For problem two, there is already an implementation of the RoaringBitmap class library at github.com/RoaringBitm… . This library is used by multiple Apache big data components and stands the test of production environments. Import dependencies as follows:

<dependency>
    <groupId>org.roaringbitmap</groupId>
    <artifactId>RoaringBitmap</artifactId>
    <version>0.9.23</version>
</dependency>
Copy the code

Simple use:

public class RoaringBitmapApp {

    public static void main(String[] args) {
        RoaringBitmap bitmap = RoaringBitmap.bitmapOf(1.2.3, Integer.MAX_VALUE, Integer.MAX_VALUE - 1);
        System.out.println(bitmap.contains(Integer.MAX_VALUE));
        System.out.println(bitmap.contains(1)); }}// Output the result
true
true
Copy the code

RoaringBitmap distinguishes different scenarios to create the underlying storage container:

Sparse scenario: UsedArrayContainerContainer, the number of elements is less than4096
Intensive scenario: UseBitmapContainerContainer, something like thatjava.util.BitSetThe number of elements is greater than4096
Aggregation scenario (understood as a mixture of the previous two scenarios) : useRunContainerThe container

summary

Learning and analyzing the source code of BitSet is to better deal with the operation between the finite state of a large number of data sets in the actual scene, just in the business of the author has a matching scene, then have the opportunity to expand in detail in another actual combat article.

References:

JDK11Related to the source code
RoaringBitmap Performance Tricks

(C-4-D E-A-20220103)

JDK BitMap implementation of BitSet source analysis

The premise

What is a BitMap

BitSet is simple to use

BitSet source code analysis

BitSet does not solve the problem

summary

Related Posts

Alipay asynchronous notification (notifY_URL) and return_URL.

Docker builds Redis cluster from scratch

Why is ElasticSearch faster than mysql