In the above “interview killer: Redis source SDS” we have in-depth analysis of the implementation of SDS, this introduction of BitMap is achieved by SDS.

This article explains at the endBitMapThe Tencent interview question solution, and based onBitMapGitHub to achieve the number of times to submit the calendar map, I hope you look happy 😄

1. Bitmap introduction

What if we needed to record whether a user logged in to our system every day of the year? If KV storage is used, each user needs to record 365, which is amazing when there are hundreds of millions of users.

Redis provides us with bitmap data structure, each user login record occupies only one bit per day, 365 days is 365 bits, only 46 bytes can be stored, greatly saving storage space.

Bitmap data structures are not entirely new. We can simply think of them as arrays with only zeros or ones in them.

2. Command actual combat

Redis provides SETBIT, GETBIT, BITCOUNT and BITOP commands for handling binary arrays.

  • SETBIT: Specifies the value of the binary bit set on the offset of the bit array. The offset starts at 0 and the value of the binary bit can only be 0 or 1. Returns the value of the original position.
  • GETBIT: gets the value of the binary bit at the specified offset.
  • BITCOUNT: The number of bits in the statistics bit array whose value is 1.
  • BITOP: Performs bitwise and, or, xOR operations on multiple single-digit groups.
127.0.0.1:6379> SETBIT first 01 # 0000 0001 (integer) 0 127.0.0.1:6379> SETBIT first 3 1 # 0000 1001 (integer) 0 127.0.0.1:6379> SETBIT first 00 # 0000 1000 (integer) 1 127.0.0.1:6379> GETBIT first 0 (integer) 0 127.0.0.1:6379> GETBIT first 3 (integer) 1 127.0.0.1:6379> BITCOUNT first # 0000 1000 (integer) 1 127.0.0.1:6379> SETBIT first 0 1 # 0000 1001 (integer) 0 127.0.0.1:6379> BITCOUNT first # 0000 1001 (integer) 2 127.0.0.1:6379> SETBIT first 1 1 # 0000 1011 (integer) 0 127.0.0.1:6379> BITCOUNT first # 0000 1011 (integer) 3 127.0.0.1:6379> SETBIT x 3 1 (integer) 0 127.0.0.1:6379> SETBIT x 11 (integer) 0 127.0.0.1:6379> SETBIT x 01 # 0000 1011 (integer) 0 127.0.0.1:6379> SETBIT y 2 1 (integer) 0 127.0.0.1:6379> SETBIT y 11 # 0000 0110 (integer) 0 127.0.0.1:6379> SETBIT z 2 1 (integer) 0 127.0.0.1:6379> SETBIT z 01 #0000 0101 (integer) 0 127.0.0.1:6379> BITOP AND andRes x y z #0000 0000 (integer) 1 127.0.0.1:6379> BITOP OR orRes x y z #0000 1111 (integer) 1 127.0.0.1:6379> BITOP XOR x y z #0000 1000 (integer) 1 # 127.0.0.1:6379> SETBIT value 01 (integer) 0 127.0.0.1:6379> SETBIT value 3 1 #0000 1001 (integer) 0 127.0.0.1:6379> BITOP NOT notValue value #1111 0110 (integer) 1Copy the code

3.BitMap source code analysis

3.1 Data Structure

A one-byte (8-bit) bitmap represented by SDS is shown below:

Extension: Every object in Redis is represented by a redisObject structure.

typedef struct redisObject {
 / / type
 unsigned type:4;
 / / code
 unsigned encoding:4;
 unsigned lru:REDIS_LRU_BITS; /* lru time (relative to server.lruclock) */
 // Reference count
 int refcount;
 // A pointer to the underlying implementation's data structure
 void *ptr;
} robj;
Copy the code
  • typeThe value ofREDIS_STRINGIndicates that this is a string object
  • sdshdr.lenThe value of 1 indicates that the SDS holds a one-byte bit array
  • Buf arraybuf[0]It actually saves the bit array
  • Buf arraybuf[1]For automatically appended\ 0character

For our purposes, each byte of the buF array is represented on a single line, buf[I] indicating that this is the i-th byte of the BUF array, and the eight cells following buf[I] represent the eight bits on this byte.

To round up the ideas again, another bit array is shown as follows:

The bit array is held by three bytes buf[0], buf[1] and buf[2], and the real data is1111 0000 1100 0011 1010 0101

3.2 GETBIT

GETBIT returns the value of the bits at the offset of the bit array. It is worth noting that GETBIT’s time complexity is O(1).

The procedure for executing the GETBIT command is as follows:

  1. To calculate
    b y t e = o f f s e t present 8 byte = \lfloor offset\div8 \rfloor
    (i.e.> > 3The byte value indicates the specified
    o f f s e t offset
    Which byte is in the bit array (which line is counted);
  2. Buf [I]buf[I]buf[I]buf[I]buf[I]buf[I] Bit =(offset % 8)+1bit=(offset \% 8)+1bit=(offset % 8)+1;
  3. Returns the target value based on byteBytebyte and bitbitbit.

The GETBIT command looks like this:

void getbitCommand(client *c) {
    robj *o;
    char llbuf[32];
    uint64_t bitoffset;
    size_t byte, bit;
    size_t bitval = 0;
    / / for offset
    if (getBitOffsetFromArgument(c,c->argv[2],&bitoffset,0.0) != C_OK)
        return;
    // Find the corresponding bitmap object
    if ((o = lookupKeyReadOrReply(c,c->argv[1],shared.czero)) == NULL ||
        checkType(c,o,OBJ_STRING)) return;
		// Calculates which row of the bit array offset is in
    byte = bitoffset >> 3;
    // Calculate the offset position in a row
    bit = 7 - (bitoffset & 0x7);
    // #define sdsEncodedObject(objptr) (objptr->encoding == OBJ_ENCODING_RAW || objptr->encoding == OBJ_ENCODING_EMBSTR)
    if (sdsEncodedObject(o)) {
        // SDS is of RAW or EMBSTR type
        if (byte < sdslen(o->ptr))
            // Get the value of the specified position
            Uint8_t *)o-> PTR)[byte][bit
            bitval = ((uint8_t*)o->ptr)[byte] & (1 << bit);
    } else {
        // SDS is an integer of type REDIS_ENCODING_INT, converted to String first
        if (byte < (size_t)ll2string(llbuf,sizeof(llbuf),(long)o->ptr))
            bitval = llbuf[byte] & (1 << bit);
    }

    addReply(c, bitval ? shared.cone : shared.czero);
}
Copy the code

For example, 🌰

Take GETBIT Array 3 as an example. Array represents the three-byte bit array shown in the figure above.

  1. Byte =⌊3÷8⌋byte = \lfloor 3 \div8 \ rfloorByte =⌊3÷8⌋ The obtained value is 0, indicating that buf[0]buf[0]buf[0]
  2. Bit =(3 mod 8)+1bit =(3 mod 8)+1bit =(3 mod 8)+1 gives the value 4
  3. Position to the fourth position from left to right of the buf[0]buf[0] byte

becauseGETBITAll operations performed by the command can be completed in constant time, so the algorithm complexity of the command is O(1).

3.3 SETBIT

SETBIT sets the value of the bit array at the offset bit to value and returns the old value to the client.

The SITBIT command is executed as follows:

  1. Calculate len=⌊offset÷8⌋+1len = \lfloor offset÷8\rfloor +1len =⌊offset÷8⌋+1, The lenlenlen value records the minimum number of bytes required to hold the binary bits specified by offseToffSeToffSet
  2. Check if the length of the bit array is less than Lenlenlen, and if so, expand the length of SDS to len bytes and set the binary bits of all new extended space to 0
  3. Calculate byte=⌊offset÷8⌋byte = \lfloor offset÷8\ rfloorByte =⌊offset÷8⌋, The bytebytebyte value represents the byte that the specified offseToffSeToffSet is in the bit array (i.e., iii computed in that buf[I]buf[I]buf[I]buf[I]).
  4. Use bit=(offset mod 8)+1bit=(offset mod 8)+1bit=(offset mod 8)+1 to calculate the specific bit of buf[I]buf[I]buf[I]
  5. OldValueoldValueoldValue is saved based on the values of byteByteByte and bitbitbit, and the new valuevaluevalue is set to the target bit
  6. Returns the old value

Because all operations performed by the SETBIT command can be completed in constant time, the algorithm complexity of the command is O(1).

The SETBIT command is as follows:

void setbitCommand(client *c) {
    robj *o;
    char *err = "bit is not an integer or out of range";
    uint64_t bitoffset;
    ssize_t byte, bit;
    int byteval, bitval;
    long on;
    / / for offset
    if (getBitOffsetFromArgument(c,c->argv[2],&bitoffset,0.0) != C_OK)
        return;
    // Get the value we need to set
    if (getLongFromObjectOrReply(c,c->argv[3],&on,err) ! = C_OK)return;

    /* Determines whether the specified value is 0 or 1 */
    if (on & ~1) {
        // Set values other than 0 and 1
        addReplyError(c,err);
        return;
    }
    // Query SDS objects by key (automatic expansion)
    if ((o = lookupStringForBitCommand(c,bitoffset)) == NULL) return;

    /* Get the current value */
    byte = bitoffset >> 3;
    byteval = ((uint8_t*)o->ptr)[byte];
    bit = 7 - (bitoffset & 0x7);
    bitval = byteval & (1 << bit);

    /* Update value and return old value */
    byteval &= ~(1 << bit);
    byteval |= ((on & 0x1) << bit);
    ((uint8_t*)o->ptr)[byte] = byteval;
    // Send notification of data modification
    signalModifiedKey(c,c->db,c->argv[1]);
    notifyKeyspaceEvent(NOTIFY_STRING,"setbit",c->argv[1],c->db->id);
    server.dirty++;
    addReply(c, bitval ? shared.cone : shared.czero);
}
Copy the code

For example, 1.0🌰

Array represents the three-byte array shown in the figure above. Take SETBIT Array 10 1 as an example.

  1. Len =⌊10÷8⌋+1 Len = \lfloor10÷8\rfloor + 1len=⌊10÷8⌋+1 yields a value of 2, indicating that a bitarray of at least 2 bytes is required
  2. Check whether expansion is required. No
  3. Calculate byte=1byte =1byte =1, calculate bit=3bit =3bit =3bit =3, save oldValueoldValueoldValue and set the new value
  4. Return oldValueoldValueoldValue

For example, 2.0🌰

Assume that the target bit array is 1 byte long. Run SETBIT array 12 1.

  1. Len =⌊12÷8⌋+1len =⌊12÷8⌋+1len = ⌋ 12÷8 silver +1len = ⌋ 12÷8 silver +1 The value is 2, indicating that 2-byte SDS is required
  2. Check whether expansion is required. Yes! According to the automatic expansion mechanism of SDS, SDS will expand by twice the new length.
  3. Calculate byte=1byte =1byte =1
  4. Calculate bit=5bit =5bit =5
  5. Save oldValueoldValueoldValue and set the new value
  6. Return oldValueoldValueoldValue

3.4 BITCOUNT

The BITCOUNT command is used to count the number of bits given a value of 1 in the locator array. The functionality doesn’t seem complicated, but actually implementing this command efficiently is not easy and requires some clever algorithms.

Counting the number of non-zero bits in an array of bits is mathematically called “calculating hamming weight”.

3.4.1 Violence Traversal

The simplest and most straightforward way to implement the BITCOUNT command is to iterate through each bit in the bit array and increment the counter by one when it encounters a bit with a value of 1.

Small amount of data is good, large amount of data directly PASS!

3.4.2 look-up table method

For a finite set, there is a finite number of permutations of the elements of the set, and there is a finite number of permutations of bits that can be represented by a finite bit array. According to this principle, we can create a table whose keys are an array of bits of some sort, and whose values are the number of bits with a value of 1 in the corresponding array of bits.

For an 8-bit array, we can create a table that reads 8 bits at a time from the array, and then looks up the table based on the value of the 8 bits to directly know how many 1’s the value contains.

Unfortunately, table lookup consumes memory.

3.4.3 Binary statistics algorithm :variable-precision SWAR

The most efficient general-purpose algorithm known is the varium-Precision SWAR algorithm, which, through a series of displacement and bit operations, can calculate the hamming weight of multiple bytes in constant time (which is awesome 🐂😍) without using any extra memory.

The SWAR algorithm code is as follows:

uint32_t swar(uint32_t i) {
    // 5 binary: 0101
    i = (i & 0x55555555) + ((i >> 1) & 0x55555555);
    // 3 binary: 0011
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    i = (i & 0x0F0F0F0F) + ((i >> 4) & 0x0F0F0F0F);
    i = (i*(0x01010101) > >24);
    return i;
  
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) > >24;
}
Copy the code

You don’t get it? What? What? What?!

Here’s what you did:

  1. The binary representation of the value I calculated in step 1 can be grouped into groups of every two binary bits. The decimal representation of each group is the number of 1s in the group.
  2. The binary representation of the value I calculated in step 2 can be grouped into groups of four bits. The decimal representation of each group is the number of 1s in the group.
  3. The binary representation of the value I calculated in Step 3 can be grouped into groups of eight bits. The decimal representation of each group is the number of 1s in the group.
  4. Step 4 of thei*0x01010101Statement counts the number of 1’s in the bitarray and records them in the highest eight bits of the binary, while> > 24Statement moves the hamming weight of the bitarray to the lowest eight bits by right shift, resulting in the hamming weight of the bitarray.
For example, 🌰

For the call to sWAR (0xFBB4080B), step 1 calculates the value 0xA6640406, a decimal representation of every two binary bits of the value table that records the Hamming weight of every two binary bits of 0xFBB4080B.

Step 2 calculates the value 0x43310103, a decimal representation of every four bits of the value table that records the hamming weight of every four bits of 0xFBB4080B.

! [] (aobing.oss-cn-hangzhou.aliyuncs.com/aobing/4Gro… (1).png)

Step three computes the value 0x7040103, where the decimal representation of every eight bits of the value table records the hamming weight of every eight bits of 0xFBB4080B.

Step 4 First calculate 0x7040103 * 0x01010101 = 0xF080403 and aggregate the Hamming weight to the highest eight bits of binary.

After that, 0xF080403 >> 24 is calculated and the hamming weight is moved to the lower eight bits to obtain the final value 0x1111, which is decimal 15.

If you’re a Java programmer, check out the integer.bitcount method, which is also based on the SWAR algorithm.

How does this algorithm count the number of set bits in a 32-bit integer work? (stackoverflow.com/questions/2…).

3.4.4 Source code analysis

Redis by calling redisPopcount method to count the hamming weight, the source code is as follows:

long long redisPopcount(void *s, long count) {
    long long bits = 0;
    unsigned char *p = s;
    uint32_t *p4;
    // A table for lookup
    static const unsigned char bitsinbyte[256] = {0.1.1.2.1.2.2.3.1.2.2.3.2.3.3.4.1.2.2.3.2.3.3.4.2.3.3.4.3.4.4.5.1.2.2.3.2.3.3.4.2.3.3.4.3.4.4.5.2.3.3.4.3.4.4.5.3.4.4.5.4.5.5.6.1.2.2.3.2.3.3.4.2.3.3.4.3.4.4.5.2.3.3.4.3.4.4.5.3.4.4.5.4.5.5.6.2.3.3.4.3.4.4.5.3.4.4.5.4.5.5.6.3.4.4.5.4.5.5.6.4.5.5.6.5.6.6.7.1.2.2.3.2.3.3.4.2.3.3.4.3.4.4.5.2.3.3.4.3.4.4.5.3.4.4.5.4.5.5.6.2.3.3.4.3.4.4.5.3.4.4.5.4.5.5.6.3.4.4.5.4.5.5.6.4.5.5.6.5.6.6.7.2.3.3.4.3.4.4.5.3.4.4.5.4.5.5.6.3.4.4.5.4.5.5.6.4.5.5.6.5.6.6.7.3.4.4.5.4.5.5.6.4.5.5.6.5.6.6.7.4.5.5.6.5.6.6.7.5.6.6.7.6.7.7.8};
    // The CPU reads 8 bytes at once. If 4 bytes span two 8 bytes, it needs to read twice
    // So consider 4-byte alignment, and only need to read once to finish reading
    while((unsigned long)p & 3 && count) {
        bits += bitsinbyte[*p++];
        count--;
    }

    // The SWAR algorithm can handle 28 bytes at a time
    // uint32_t: 4 bytes
    p4 = (uint32_t*)p;
    while(count>=28) {
        uint32_t aux1, aux2, aux3, aux4, aux5, aux6, aux7;

        aux1 = *p4++;// Read 4 bytes at once
        aux2 = *p4++;
        aux3 = *p4++;
        aux4 = *p4++;
        aux5 = *p4++;
        aux6 = *p4++;
        aux7 = *p4++;
        count -= 28;// 4*7=28 bytes, so count needs to be subtracted by 28

        aux1 = aux1 - ((aux1 >> 1) & 0x55555555);
        aux1 = (aux1 & 0x33333333) + ((aux1 >> 2) & 0x33333333);
        aux2 = aux2 - ((aux2 >> 1) & 0x55555555);
        aux2 = (aux2 & 0x33333333) + ((aux2 >> 2) & 0x33333333);
        aux3 = aux3 - ((aux3 >> 1) & 0x55555555);
        aux3 = (aux3 & 0x33333333) + ((aux3 >> 2) & 0x33333333);
        aux4 = aux4 - ((aux4 >> 1) & 0x55555555);
        aux4 = (aux4 & 0x33333333) + ((aux4 >> 2) & 0x33333333);
        aux5 = aux5 - ((aux5 >> 1) & 0x55555555);
        aux5 = (aux5 & 0x33333333) + ((aux5 >> 2) & 0x33333333);
        aux6 = aux6 - ((aux6 >> 1) & 0x55555555);
        aux6 = (aux6 & 0x33333333) + ((aux6 >> 2) & 0x33333333);
        aux7 = aux7 - ((aux7 >> 1) & 0x55555555);
        aux7 = (aux7 & 0x33333333) + ((aux7 >> 2) & 0x33333333);
        bits += ((((aux1 + (aux1 >> 4)) & 0x0F0F0F0F) +
                    ((aux2 + (aux2 >> 4)) & 0x0F0F0F0F) +
                    ((aux3 + (aux3 >> 4)) & 0x0F0F0F0F) +
                    ((aux4 + (aux4 >> 4)) & 0x0F0F0F0F) +
                    ((aux5 + (aux5 >> 4)) & 0x0F0F0F0F) +
                    ((aux6 + (aux6 >> 4)) & 0x0F0F0F0F) +
                    ((aux7 + (aux7 >> 4)) & 0x0F0F0F0F)) *0x01010101) > >24;
    }
    /* The number of bytes left is less than 28
    p = (unsigned char*)p4;
    while(count--) bits += bitsinbyte[*p++];
    return bits;
}
Copy the code

It is not difficult to find that Redis uses both table lookup method and SWAR algorithm to complete the BITCOUNT function.

4. Interview question: 4 billion QQ numbers to weight

It is said that this is Tencent three interview questions, answer out can take the OFFER 😄~

If there is no 1GB memory limit, we can do this algorithm using sort and Set:

  • Sorting: ① First, sort 4 billion QQ numbers; ② Traversal from small to large, skip repeated elements and only take the first element.
  • Set: Put 4 billion QQ numbers into Set Set, automatically complete the weight, Perfect🐂

This answer is to GG rhythm ah!

How long would it take to sort 4 billion QQ numbers? The array of 4 billion QQ numbers is already over 1GB, and the Set is also overloaded with memory.

BitMap to heavy

** We can use the BITMAP we just learned to get rid of it. ** One byte can record the existence of 8 numbers (similar to counting sort). If the value of offset corresponding to QQ number is set to 1, the number exists. After traversing 4 billion QQ numbers, the QQ number can be de-duplicated by directly counting the offset value of 1 on BITMAP.

If it is to 4 billion QQ number sorting can also be completed with a bitmap oh ~ the same idea 🐂

5. Bitmap combat

Now that we know so much about BITMAP, it doesn’t make sense not to do a hands-on project!

We use BITMAP to implement GITHUB’s small function of counting the number of daily submissions, based on SpringBoot+Echarts implementation 😄

If it is to record the login status, we can easily use 0 and 1 records, if it is to record the submission timesBITMAPIt doesn’t matter, we can use a byte to record the number of commits, but the business needs to handle the decimal and binary conversion directly.

Generate simulated data

public void genTestData(a) {
    if(redisUtils.isExist(CommonConstant.KEY)){
        return;
    }
    // Get the total number of days in the current year
    int days = getDays();
    for (int i = 0; i < days; i++) {
        int random = ThreadLocalRandom.current().nextInt(64);
        // Generate a random number to represent the number of PR times per day
        String binaryString = Integer.toBinaryString(random);
        if (binaryString.length() < 8) {
            / / fill 0
            if(binaryString.length() == 0){binaryString = "00000000"; }else if(binaryString.length() == 1){binaryString = "0000000"+binaryString; }else if(binaryString.length() == 2){binaryString = "000000"+binaryString; }else if(binaryString.length() == 3){binaryString = "00000"+binaryString; }else if(binaryString.length() == 4){binaryString = "0000"+binaryString; }else if(binaryString.length() == 5){binaryString = "000"+binaryString; }else if(binaryString.length() == 6){binaryString = "00"+binaryString; }else if(binaryString.length() == 7){binaryString = "0"+binaryString;}
        }
        char[] chars = binaryString.toCharArray();
        for (int j = 0; j < chars.length; j++) {
            / / set the BitMap
            redisUtils.setBit(CommonConstant.KEY,i*8+j,chars[j]); }}}/** * Gets the total number of days in the current year *@returnDays Total number of days */
private int getDays(a){
    Calendar calOne = Calendar.getInstance();
    int year = calOne.get(Calendar.YEAR);
    System.out.println(year);
    Calendar calTwo = new GregorianCalendar(year, 11.31);
    return calTwo.get(Calendar.DAY_OF_YEAR);
}
Copy the code

To get the data

public List<String> getPushData(a) {
    List<String> res = new ArrayList<>(366);
    // Create data without data
    genTestData();
    int days = getDays();
    for(long i=0; i<days; i++){ StringBuilder sb =new StringBuilder();
        for (int j = 0; j < 8; j++) {
            String bit = redisUtils.getBit(CommonConstant.KEY, i * 8 + j);
            sb.append(bit);
        }
        // Returns a binary string directly, with the front end converted to decimal
        res.add(sb.toString());
    }
    return res;
}
Copy the code

In this case, Obin thought he could simply return all the bits and split them up in the front end

The front-end rendering

<script type="text/javascript">
    var chartDom = document.getElementById('main');
    var myChart = echarts.init(chartDom);
    var option;

    function getVirtulData(year) {
        var date = +echarts.number.parseDate(year + '- 01-01');
        var end = +echarts.number.parseDate(+year + 1 + '- 01-01');
        var dayTime = 3600 * 24 * 1000;
        var data = [];
        $.ajax({
            "url":'http://localhost:8080/test/getPushData'."async":false.// Ajax synchronized fetch
            success:function (res){
                for (let time = date,k=0; time < end && k < res.data.length; time += dayTime,k++) {
                    data.push([
                        echarts.format.formatTime('yyyy-MM-dd', time),
                        parseInt(res.data[k],2)// Base conversion is done on the client, not on the server]); }}})return data;
    }
    option = {
        title: {
            top: 30.left: 'left'.text: 'BitMap Demo'
        },
        tooltip: {},
        visualMap: {
            min: 0.max: 32.type: 'piecewise'.orient: 'horizontal'.left: 'right'.top: 220.pieces: [{min: 0.max: 0.label:"less"},
                {min: 1.max: 10.label:""},
                {min: 1.max: 20.label:""},
                {min: 21.max: 40.label:""},
                {min: 41.max: 64.label:"more"},].inRange: {
                color: [ '#EAEDF0'.'#9AE9A8'.'#41C363'.'#31A14E'.'#206D38'].// Color Settings
                colorAlpha: 0.9./ / transparency}},calendar: {
            top: 120.left: 30.right: 30.cellSize: 13.range: '2022'.splitLine: { show: false },// Do not display edges
            itemStyle: {
                borderWidth: 0.5
            },
            yearLabel: { show: false}},series: {
            type: 'heatmap'.coordinateSystem: 'calendar'.data: getVirtulData('2022')}}; option && myChart.setOption(option); </script>Copy the code

Done, we have implemented a simple submission times display that is not as beautiful as GitHub, please forgive my aesthetic 😭

Source code repository: gitee.com/catwings/bi…

6. The discussion above

In the previous “interview killer card: SDS of Redis source code” in this article, the big guys discussed widely, some put forward new ideas, there are questions, Ao Bing are seriously thinking about 🤔, did find some bugs, my pot 🐱😓.

Here I will answer these questions by dao and qian, which is also a kind of perfection!

sdsHdrSize

The sdsHdrSize method should be used. The data structure is defined according to the type of the data structure

static inline char sdsReqType(size_t string_size) {
    if (string_size < 1<<5)
       // #define SDS_TYPE_5 0
        return SDS_TYPE_5;
    if (string_size < 1<<8)
        // #define SDS_TYPE_8 1
        return SDS_TYPE_8;
    if (string_size < 1<<16)
        // #define SDS_TYPE_16 2
        return SDS_TYPE_16;
#if (LONG_MAX == LLONG_MAX)
    if (string_size < 1ll<<32)
        // #define SDS_TYPE_32 3
        return SDS_TYPE_32;
    // #define SDS_TYPE_64 4
    return SDS_TYPE_64;
#else
    return SDS_TYPE_32;
#endif
}
Copy the code

The source code for sdsHdrSize is as follows:

static inline int sdsHdrSize(char type) {
    // #define SDS_TYPE_MASK 7
    switch(type&SDS_TYPE_MASK) {
        // #define SDS_TYPE_5 0
        case SDS_TYPE_5:
            return sizeof(struct sdshdr5);
        // #define SDS_TYPE_8 1
        case SDS_TYPE_8:
            return sizeof(struct sdshdr8);
        // #define SDS_TYPE_16 2
        case SDS_TYPE_16:
            return sizeof(struct sdshdr16);
        // #define SDS_TYPE_32 3
        case SDS_TYPE_32:
            return sizeof(struct sdshdr32);
        // #define SDS_TYPE_64 4
        case SDS_TYPE_64:
            return sizeof(struct sdshdr64);
    }
    return 0;
}
Copy the code

Type&SDS _TYPE_MASK select the corresponding data structure by bit operation

The longest SDS is 512MB


This is a more deadly problem. Because in the article analysis wrong, think simple, I am also very fan brain how to smoke ~

According to the official website we can confirm that the maximum capacity of SDS is 512MB, which is indisputable.

T_string.c (checkStringLength); checkStringLength (checkStringLength)

/*----------------------------------------------------------------------------- * String Commands * -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - * /

static int checkStringLength(redisClient *c, long long size) {
    if (size > 512*1024*1024) {
        addReplyError(c,"string exceeds maximum allowed size (512MB)");
        return REDIS_ERR;
    }
    return REDIS_OK;
}
Copy the code

Using the append command as an example, it is not difficult to see if the length exceeds the limit if the type is SDS:

void appendCommand(redisClient *c) {
    size_t totlen;
    robj *o, *append;

    o = lookupKeyWrite(c->db,c->argv[1]);
    if (o == NULL) {
        /* Create the key */
        c->argv[2] = tryObjectEncoding(c->argv[2]);
        dbAdd(c->db,c->argv[1],c->argv[2]);
        incrRefCount(c->argv[2]);
        totlen = stringObjectLen(c->argv[2]);
    } else {
        /* Key exists, check type */
        if (checkType(c,o,REDIS_STRING))
            return;

        /* "append" is an argument, so always an sds */
        append = c->argv[2];
        totlen = stringObjectLen(o)+sdslen(append->ptr);
        // Check the length of the string
        if (checkStringLength(c,totlen) ! = REDIS_OK)return;

        /* Append the value */
        o = dbUnshareStringValue(c->db,c->argv[1],o);
        o->ptr = sdscatlen(o->ptr,append->ptr,sdslen(append->ptr));
        totlen = sdslen(o->ptr);
    }
    signalModifiedKey(c->db,c->argv[1]);
    notifyKeyspaceEvent(REDIS_NOTIFY_STRING,"append",c->argv[1],c->db->id);
    server.dirty++;
    addReplyLongLong(c,totlen);
}
Copy the code

Binary security

SDS only uses part of the C language string function, if the target function is really affected by \0, then SDS can not avoid, then we can not use. Compatibility does not mean use everything.