1. The demand

There is a requirement for a uniform number of users logged in over a period of time. This period of time may have logged in user A, B, C, B, C, A, D so many people, may be repeated login, we need to go to the processing and then return the number of users.

The above case is cardinality statistics: get the size of a set S after it is deduplicated.

One way to do this is to use a HashMap, where the key is the user name and the value is the number of times it was logged in. On 64-bit machines, Golang uses 24 bytes on a single record. When you have tens of thousands or even hundreds of millions of users, you need a lot of memory space.

In order to save memory space, there are other methods: $B^+$tree, Bitmap, etc. In Redis, Hyperloglog is used for rough statistics. 12K memory can count $2^{64}$data.

2. Bernoulli’s experiment

The Hyperloglog principle is related to Bernoulli’s experiment.

You flip a coin, you get heads and tails 50% of the time. So you keep fliping the coin until you get heads on the t-th flip, and that’s a Bernoulli experiment.

If we do n Bernoulli experiments, the first experiment, n=1, the number of flips is $t_1$; For the NTH experiment, the corresponding number of flips is $t_n$.

After doing so many experiments, there must be a maximum number of flips $t_{Max}$

1. No n Bernoulli throws are greater than $T_ {Max}$
2. In n Bernoulli experiments, there will be at least one toss equal to $T_ {Max}$

N and $T_ {Max}$have the following estimation relationship:

$$n=2^{t_{max}}$$

I won’t go into the derivation, the probability theory.

From the $t_{Max}$of multiple Bernoulli experiments, the number n of Bernoulli experiments conducted can be deduced. So if HyperLogLog’s ADD is regarded as a Bernoulli experiment, then by calculating the maximum number of Bernoulli throws $t_{Max}$for each Bernoulli experiment, we should be able to calculate the number of elements n counted by HyperLogLog. This is the basic principle of Hyperloglog.

How do I calculate the number of throws of $t_i$for the ith ADD of the HyperLogLog? The hash value of each ADD element is a series of 0 and 1 combinations of bytecode, so you can calculate $t_i$by counting the position of the first 1 starting at a certain position and direction. For example, 0B1010 1000, computed from the lowest to the highest bit, yields $t=4$.

3. HyperLogLog

HyperLoglog is based on algorithms such as LogLogCounting. It uses a nearly uniform hash function to obtain the hash value of the elements to be counted, and then eliminates errors by buckets averaging. HLL(which later stands for Hyperloglog) divides the hash value into buckets one by one, and uses the first k bits of the hash value to find its bucket position. The number of buckets is expressed as:

$$m=2^k$$

As shown in the figure below, LSB represents the lowest bit, MSB represents the highest bit, and this hash value represents the big-ended byte order. So k is equal to 6, so we have 64 buckets. The hash value below shows the bucket position of 0B00 1101=13.

Next, calculate the position of the first 1 in the sequence after L-K in the hash value above: 6. So follow through on the bucket with index number 13 and set it to 6 if the number in the bucket is less than 6, otherwise it stays the same.

The estimated cardinality value can be calculated by averaging the values stored in each bucket.

Harmonic mean is used for calculation in HLL:

$$H=\frac{n}{\frac{1}{x_1}+\frac{1}{x_2}+… +\frac{1}{x_n}}=\frac{n}{\sum_{i=1}^n\frac{1}{x_i}}$$

Its cardinality estimation formula is as follows:

$$\hat{n}=\frac{\alpha_mm^2}{\sum_{i=0}^m2^{-M[i]}}$$

Where, M[I] represents the value in the ith bucket and represents the maximum position corresponding to the first 1 of the hash value.

There are:

$$\alpha_m=(m\int_0^\infty(log_2(\frac{2+u}{1+u}))^mdu)^{-1}$$

The implementation steps of HLL can be understood through this HLL simulation website, to establish a general impression for the subsequent content.

http://content.research.neust…

As you can see, HLLs use very little memory to implement very large cardinality statistics, and the corresponding implementation is necessarily complex. The next section will examine its implementation in Redis.

4. Redis HLL implementation

4.1. HLL is introduced

Redis HLL code:

https://github.com/crazyStrom…

Redis uses register to represent the hash location bucket, where the contents are the largest position of the first 1 in the L-k part of the hash value of the statistic. The following is an introduction to HLLs in Redis code.

Redis uses the 64-bit hash function proposed in [1] to raise the upper limit of cardinality statistics above $10^9$by adding 1bit per register or bucket.

[1] Heule, Nunkesser, Hall: HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.*

Redis uses its SDS to store HLLs and does not create new data types.

Redis does not apply additional compression to data structures in [1]. Redis uses the algorithm of Hyperloglog in [2], the only difference being that Redis uses the 64-bit hash function.

[2] P. Flajolet, Eric Fusy, O. Ganouet, and F. Meunier. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm.*

Redis uses two different forms of data storage in HLLs:

1. Dense form. Each entry (the contents of each register) of the HLL is represented by a 6-bit number.
2. Point form. When many registers in the HLL are 0, compress these registers to improve memory utilization.

Redis uses HLLHDR to hold an HLL:

struct hllhdr { char magic[4]; /* "HYLL" */ / Encoding: sparse or dense uint8_t encoding; /* HLL_DENSE or HLL_SPARSE. */ uint8_t notused[3]; /* Reserved for future use, must be zero. */ / Cache cardinality statistics uint8_t card[8]; /* Cached cardinality, little endian. */ // The Uint8_T registers[] should be assigned to the Uint8_T registers. /* Data bytes. */ };

Both dense and sparse forms use 16-byte headers.

• The first four bytes are HYLL, fixed.
• Encoding takes one byte and represents HLL encoding: HLL_DENSE and HLL_SPARSE.
• NOTUSED takes up three bytes and is used as placeholder.
• Card is an eight-byte, 64-bit integer stored in small-endian order that holds the most recent cardinality calculation. If the data structure has not been modified since the last cardinality calculation, the contents of the card can be reused. Redis uses the card’s highest bit to indicate whether the data is available. If the highest bit is 1, it indicates that the data has been modified, and the cardinality needs to be recalculated and cached in the card.

Registers indicate that the HLL is holding data in dense or sparse form.

4.1.2. Dense form

The Dense form of the HLL uses registers that are all 6bit and contiguous. A byte is 8 bits, so a byte holds some or all of the bits in both registers.

The DENSE form of registers encodes from LSB to MSB, that is, from the lowest bit to the highest bit. If the current byte is not enough to store the remaining bits of the register, the next byte is used as needed.

In the figure above, there are three bytes from left to right, containing four registers 0-3. The last six bits of the 0th byte store the contents of register 0, the first two bits of the 0th byte and the last four bits of the 1st byte store the contents of register 1, and so on. This is the storage form of dense.

4.1.3. Point form

HLL implements sparse encoding of its data structure using three opcodes. The three opcodes are ZERO, XZERO, and VAL. Two opcodes use one byte each, and the other two use two bytes.

These three opcodes are described below.

• The ZERO opcode occupies one byte and is denoted as 00XXXXXX. The last six bits XXXXXX +1 indicate that N contiguous registers are set to 0. This opcode can denote that 1-64 contiguous registers are set to 0.
• The XZero opcode takes two bytes and is represented as 01XXXXXX YYYYYYYYY. XXXXXX is high, yyyyyyy is low. The fourteen +1 bits indicate that N contiguous registers are set to 0. This opcode can indicate that 0-16384 registers are set to 0.
• The VAL opcode takes one byte and is represented as 1vvvvvxx. It contains a 5bit VVVVVVV for register values, and a 2bit xx+1 for so many contiguous registers set to VVVVVV. This opcode representation can indicate that 1-4 registers are set to 1-32 values.

SPARSE cannot represent registers with a register value greater than 32, but register values greater than 32 are rare in HLLs. When this is not the case, sparse is more memory efficient than dense, and HLL converts from sparse to dense if a register value exceeds 32.

Sparse is used to represent register content for a location, and it’s positional. For example, an empty HLL is represented as 01111111 11111111, which means that there are 16,384 registers set to 0, denoted as XZERO:16384.

For example, if an HLL only has three register values that are not zero (1000,1020,1021), and the values are 2, 3, and 3, the HLL’s sparse representation is:

Register 0-999 is set to 0

VAL:2,1 1 registers set to 2, that is register 1000

Zero: Register 191001-1019 is 0

VAL:3,2 registers are set to 3, which are registers 1020 and 1021 respectively

XZero :15362 is set to 0 from registers 1022-16383

When the cardinality is relatively small, HLL has a high utilization rate. The example above used seven bytes to represent all the HLL registers, and the dense used 12K of memory.

However, sparse is efficient at base calculations of 2000-3000, but the conversion to dense is more efficient when the base is larger. Define server.hll_sparse_max_bytes to set the maximum length of sparse in dense.

4.2. Redis HLL implementation code

HLLHDR definition:

struct hllhdr { char magic[4]; /* "HYLL" */ / Encoding: sparse or dense uint8_t encoding; /* HLL_DENSE or HLL_SPARSE. */ uint8_t notused[3]; /* Reserved for future use, must be zero. */ / Cache cardinality statistics uint8_t card[8]; /* Cached cardinality, little endian. */ / The Uint8_T registers[]; /* Data bytes. */ };

4.2.1. Related definitions and macros

The reason Redis works so fast is that it uses a lot of macro definitions.

As mentioned earlier, Redis uses Card as a cache to improve HLL efficiency. The availability of card can be determined by its highest bit. When the highest bit is set to 1, it indicates that the HLL has been modified, and the cache of card is not available, so recalculation is needed.

/* The cached cardinality MSB is used to signal validity of The cached value. */ / The eighth bit of The card set to 1 indicates that The cache is not available for #define HLL_INVALIDATE_CACHE(hdr) (hdr)->card[7] |= (1<<7) #define HLL_VALID_CACHE(hdr) (((hdr)->card[7] & (1<<7)) == 0)

The rest are common definitions and macros.

#define HLL_P 14 /* The greater is P, The smaller The error. */ / HLL uses 16384 registers: The HLL_Registers (1<<HLL_P) /* With P=14, 16384. */ / The HLL_Registers (1<<HLL_P) can be used to determine the location of the HLL_Registers (1<<HLL_P). 0x0011 1111 1111 1111 #define HLL_P_MASK (HLL_REGISTERS-1) /* Mask to index register. */ #define HLL_BITS 6 /* Enough to  count up to 63 leading zeroes. */ #define HLL_REGISTER_MAX ((1<<HLL_BITS)-1) #define HLL_HDR_SIZE sizeof(struct hllhdr)  #define HLL_DENSE_SIZE (HLL_HDR_SIZE+((HLL_REGISTERS*HLL_BITS+7)/8)) #define HLL_DENSE 0 /* Dense encoding. */ #define HLL_SPARSE 1 /* Sparse encoding. */ #define HLL_RAW 255 /* Only used internally, never exposed. */ #define HLL_MAX_ENCODING 1

4.2.2. Macros for bitwise operations

Redis uses a series of macros to simplify the relevant bit operations in the dense and sparse forms of the HLL.

4.2.2.1. Dense operation

The HLL uses 8bit byte array registers to store the dense 6bit register group. So you need to split the 8bit array into a 6bit array and evaluate and set the values. Redis uses macros to keep things running fast.

The image above shows the big-endian order, with the highest bit (MSB) on the left. Redis is traversed from lowest to highest.

(1) Take the register value

For example, get the register value at position pos=1 with the register number starting at 0.

The first byte to hold part of register 1 is B0:1100 0000.

b0 = 6 * pos / 8

The first part of register 1 is computed as follows:

fb = 6 * pos % 8 -> 6

Move b0 to Fb: 1100 0000 >> Fb = 0000 0011

Move B1 to the left by 8-fb: 2222 1111 << (8-fb) = 2211 1100

The two bytes for the OR operation: 1100 0000 0011 | 2211 = 2211, 1111

The result is then AND 0011 1111 to eliminate the high two bits: 2211 1111&0011 1111 = 0011 1111, which is the value of register 1.

In another example, get the contents of register 0. In this case, all six bits of the register are in a single byte B0:1100000.

b0 = 6 * pos / 8 = 0

The first position of register 0:

fb = 6 * pos % 8 = 0

So move the current byte B0 to the right by 0 bits: 1100 0000 >> 0 = 1100 0000

Next byte B1 moves 8 bits left: 2222 1111 << 8 = 0000 0000

After moving the two bytes of the OR operation: 0000 1100 0000 | 0000 = 1100, 0000

The result AND 0011 1111 are then AND removed by 2 bits: 1100 0000 & 0011 1111 = 0000 0000

So the contents of register 0 are 00, 0000

(2) Set the register value

Setting the value of the register is more complicated, assuming val=0b 00cd efgh is the new value that needs to be set. Two steps are required. The first step clears the relevant bits of the register, and the second step sets the new bits through or operations.

For example, set the value of register 1, whose first byte is B0.

In this case, fB is equal to 6.

To generate an AND mask to clear the associated bits of B0, create an initial mask of 0B0011 1111 with a value of 63. Move fb bit to the left, then take inverse code:! (0011 1111 << fb) = 0011 1111.

Let the new mask AND B0 perform the AND operation to clear the relevant bits of register 1:0011 1111&1100 0000 = 0000 0000

B0 = (val << fb) OR B0 = (00cd efgh << 6) OR 0000 0000 = GH00 0000

The next step is to set the relevant bits for B1. The initial value of b1 is 2222 1111.

Use 63 to build the AND mask, move 8-fb bits to the right, AND then flip:! (0011 1111 >> (8-FB)) = 1111 0000.

And the new mask AND B1 to clear the relevant bits: B1 = 1111 0000 & 2222 1111 = 2222 0000

Then 00 CD efgh val = 0 b 8 – fb moves to the right place, and then the OR operation to set the relevant bits b1: b1 = (00 CD efgh > > (8 – fb)) | b1 = 2222 cdef

(3) Redis implementation code

Redis implementation code is very simple, a little C basic can understand.

/* Store the value of the register at position 'regnum' into variable 'target'. * 'p' is an array of unsigned bytes. */ #define HLL_DENSE_GET_REGISTER(target,p,regnum) do { \ uint8_t *_p = (uint8_t*) p; \ // unsigned long = regnum*HLL_BITS/8; // unsigned long = regnum*HLL_BITS/8; \ // unsigned long _fb = regnum*HLL_BITS&7; \ unsigned long _fb8 = 8 - _fb; \ unsigned long b0 = _p[_byte]; \ unsigned long b1 = _p[_byte+1]; \ high fb / / b0, b1 of low fb8 target = ((b0 > > _fb) | (b1 < < _fb8) & HLL_REGISTER_MAX; \ } while(0) /* Set the value of the register at position 'regnum' to 'val'. * 'p' is an array of unsigned bytes. */ #define HLL_DENSE_SET_REGISTER(p,regnum,val) do { \ uint8_t *_p = (uint8_t*) p; \ unsigned long _byte = regnum*HLL_BITS/8; \ unsigned long _fb = regnum*HLL_BITS&7; \ unsigned long _fb8 = 8 - _fb; \ unsigned long _v = val; \ _p[_byte] &= ~(HLL_REGISTER_MAX << _fb); \ _p[_byte] |= _v << _fb; \ _p[_byte+1] &= ~(HLL_REGISTER_MAX >> _fb8); \ _p[_byte+1] |= _v >> _fb8; \ } while(0)

4.2.2.2. Point operation

Sparse format does not need to look for corresponding bits, but it does need to distinguish the opcode type, the register range covered by the opcode, and the corresponding opcode Settings.

Zero is denoted as 00XX XXX, XZero is denoted as 01XX XXXX XXXX, VAL is denoted as 1VVV VVXX. Therefore, the first two digits of each byte can determine the corresponding opcode.

#define HLL_SPARSE_XZERO_BIT 0x40 /* 01xxxxxx */
#define HLL_SPARSE_VAL_BIT 0x80 /* 1vvvvvxx */
#define HLL_SPARSE_IS_ZERO(p) (((*(p)) & 0xc0) == 0) /* 00xxxxxx */
#define HLL_SPARSE_IS_XZERO(p) (((*(p)) & 0xc0) == HLL_SPARSE_XZERO_BIT)
#define HLL_SPARSE_IS_VAL(p) ((*(p)) & HLL_SPARSE_VAL_BIT)

The last six bits of ZERO denote the length of the register, the last 14 bits of XZERO denote the length of the continuous register, the last two bits of VAL denote the length of the register, and VVVVVV denote the value of the register. Relevant information of the three opcodes can be obtained through these features.

#define HLL_SPARSE_ZERO_LEN(p) (((*(p)) & 0x3f)+1)
#define HLL_SPARSE_XZERO_LEN(p) (((((*(p)) & 0x3f) << 8) | (*((p)+1)))+1)
#define HLL_SPARSE_VAL_VALUE(p) ((((*(p)) >> 2) & 0x1f)+1)
#define HLL_SPARSE_VAL_LEN(p) (((*(p)) & 0x3)+1)

The last two bits of VAL opcode are the register length, and the middle five bits VVVVVV are the register value. Therefore, the code for setting VAL and length len of VAL opcode is as follows:

#define HLL_SPARSE_VAL_SET(p,val,len) do { \
*(p) = (((val)-1)<<2|((len)-1))|HLL_SPARSE_VAL_BIT; \
} while(0)

The rest of setting the length of ZERO and XZERO is also based on this principle:

#define HLL_SPARSE_ZERO_SET(p,len) do { \
*(p) = (len)-1; \
} while(0)
#define HLL_SPARSE_XZERO_SET(p,len) do { \
int _l = (len)-1; \
*(p) = (_l>>8) | HLL_SPARSE_XZERO_BIT; \
*((p)+1) = (_l&0xff); \
} while(0)


4.2.3. Correlation function

HLL uses a general hash algorithm, which can be used on different computer architectures. It will not be described here, but the implementation algorithm of HLL will be focused on in this paper.

HLL can only be counted by adding data. Redis implements two types of ADD, which are respectively applied to dense and sparse.

(1) hllPatLen

Both forms require a function to count the position of the first 1 in the hash value of the data.

int hllPatLen(unsigned char *ele, size_t elesize, long *regp) 

When a client appends a string to an HLL, this function returns the longest 0 fix of part of its hash substring, where the first 1 occurs, and then incrementing it.

The steps are as follows:

1. Calculates the 64-bit hash value of the string element with the added HLL
2. The hash value AND 0b0011 1111 are used to obtain the corresponding bucket index. The figure above corresponds to bucket number 13
3. Then count the longest 0 infix from the 6th position: 5, and add it by one to find the position of the first 1

/* Call hllDenseAdd() or hllSparseAdd() according to the HLL encoding. */
int hllAdd(robj *o, unsigned char *ele, size_t elesize)

All data to be added to the HLL is done through the HLLADD function.

The encoding of the HLLHDR referred to by o is different, calling HLLDenSeAdd or HLLSparseAdd, respectively.

    switch(hdr->encoding) {
default: return -1; /* Invalid representation. */

int hllDenseAdd(uint8_t *registers, unsigned char *ele, size_t elesize)

This function adds the element ele to the HLL’s dense form register.

The parameter registers are stored as an SDS and therefore contain 16,384 6bit registers plus the NULL terminator of the SDS.

The logic of this function is simple.

1. Hllpatlen is used to get the bucket index and the longest 0 fix count for ele’s hash value
2. Gets the value oldcount of the index position register
3. If oldcount < count, set the position register to count, and return 1; Otherwise it returns 0

int hllSparseAdd(robj *o, unsigned char *ele, size_t elesize)

This function adds the element ele to the sparse data structure of the HLL.

We don’t actually add ele to it either, but we add the longest 0 infix computed to the HLL.

The o argument is actually a string that holds an HLL. This function has the advantage of using such an object, which allows it to extend the string as needed and add the required bytes after it.

In a function, an HLL might convert from sparse to dense format for storing data. There are two reasons: register values that need to be set are out of the SPARSE support range; The size of the sparse data is larger than the size of the server.hll_sparse_max_bytes.

First, calculate the index of the new element and count for the longest 0 fix.

If count > 32 means sparse is unavailable, convert to dense, and call hllDenseAdd and return.

If count <= 32, perform subsequent calculations.

While adding data, it is possible to have xzeros split into xzero-val-xzeros, which is also the most complicated case, so use sdsMakeRoomFor to add enough space.

1. Step 1: Locate the opcodes that need to be modified

XZERO takes two bytes, ZERO and VAL take one byte; The length stored in these opcodes is compared to the index to get the opcode position that needs to be updated: *p.

1. Step 2: Update the opcode in another block of memory

After the calculation in step one, the variable first refers to the first register index position in the current opcode that p points to.

The variables next and prev hold the following and preceding opcodes, respectively. If next is null, p points to the last opcode; If prev is null, p points to the first opcode.

The span variable is the number of registers currently covered by the opcode.

Update opcodes have different cases:

A) If the current opcode is VAL and its value is >= count, there is no need to update the current opcode. The function returns 0 to indicate that there is no data update

B) If the current opcode is VAL and the length is 1, which means the current opcode covers only one register, then update the opcode to count

C) If the current opcode is ZERO and the length is 1, change the opcode to VAL and update the value to count with the length 1

D) The rest is the more general situation. For example, the current opcode is VAL of length greater than 1, ZERO of length greater than 1, or an XZERO. In this case, the original opcode needs to be split into multiple opcodes. The most complicated is that XZERO splits into xzero-val-xzero, which will require 5 bytes of storage.

Redis writes the new opcode sequence to an extra buffer, new, of length newLen. And then the new sequence replaces the old sequence. The new sequence may be longer than the old one, so the bytes to the right of the original opcode may have to be moved back when inserting the new sequence.

1. Step 3: Replace the old opcode sequence with the new sequence. See the code for details.
2. Step 4: Merge successive opcodes if they have the same value.

The code for HllSparseAdd is too long to post on this blog, see the code I submitted to GitHub at the beginning of this article.

4.2.3.2. count

(1) hllCount

uint64_t hllCount(struct hllhdr *hdr, int *invalid)

This function returns the approximate cardinality statistics for the HLL, based on the harmonic mean of all registers. If the HLL object is void, invalid is set to a nonzero value.

HLLCOUNT supports a special internal encoding: HLL_RAW. HLL_RAW uses the 8bit register instead of the 6bit register in the HLL_DENSE, which speeds up the PFCOUNT calculation.

In the function, in order to speed up the calculation, Redis pre-builds a table PE with the length of 64, $$PE[I] = \ FRAC {1}{2^ I}$$. Then you can look up the table and do the calculation.

static int initialized = 0; static double PE[64]; // Use initialized to initialize PE if (! initialized) { PE[0] = 1; /* 2^(-reg[j]) is 1 when m is 0. */ for (j = 1; j < 64; J++) {/ * 2 ^ (- reg [j]) is the same as 1/2 ^ reg [j]. J * / PE ull [j] = 1.0 / (1 < < j); } /** * PE[1] = 1/2 * PE[2] = 1/4 * PE[3] = 1/8 * ... */ initialized = 1; }

Next, Hlldensesum, Hllsparsesum or Hllrawsum is selected according to the encoding form of HLL to calculate $SUM(2^{-register[0.. I]})$, denote as E.

/* Compute SUM(2^-register[0..i]). */
if (hdr->encoding == HLL_DENSE) {
E = hllDenseSum(hdr->registers,PE,&ez);
} else if (hdr->encoding == HLL_SPARSE) {
E = hllSparseSum(hdr->registers,
sdslen((sds)hdr)-HLL_HDR_SIZE,PE,&ez,invalid);
} else if (hdr->encoding == HLL_RAW) {
E = hllRawSum(hdr->registers,PE,&ez);
} else {
redisPanic("Unknown HyperLogLog encoding in hllCount()");
}

Different algorithms are adopted according to the range of E: when the cardinality statistics are less than a quarter of the HLL bucket, the LINEARCOUNTING algorithm is used; When the cardinality statistics are larger, a bias variable is used to adjust the calculation error.

If (E < M * 2.5&&EZ! = 0) { E = m*log(m/ez); /* LINEARCOUNTING() */ } else if (m == 16384 && E < 72000) { /* We did polynomial regression of the bias for this range,  this * way we can compute the bias for a given cardinality and correct * according to it. Only apply the correction for P=14 that's what * we use and the value the correction was verified with. */ double bias = 5.9119* 1.0E-18 *(E*E*E*E) 1.4253 * 1.0 e-12 * * * E E (E) + 1.2940 * 1.0 e-7 * * E (E) - 5.2921 * 1.0 e-3 * E + 83.3216; E -= E*(bias/100); }

(2) hllDenseSum

double hllDenseSum(uint8_t *registers, double *PE, int *ezp)

This function calculates the SUM(2^{-register[0.. I]})\$of each register in the dense form.

The simplest is to get the value of each register in 16,384 loops through the previous macro definition and calculate the sum from the PE table passed in. In Redis, 16 registers are computed at a time, and 1024 cycles are performed to speed up the computation. During calculation, the elements in the PE table are double values. Redis maps each register value in the PE table in two phases to eliminate errors.

/* Additional parens will allow the compiler to optimize the * code more with a loss of precision that is not very relevant * here (floating point math is not commutative!) . * / / / two additive reduced errors E + = (PE (r0) + PE (r1)) + (PE/r2 + PE/r3) + (PE (r4) + PE (r5)) + (PE/r6 + PE/r7) + (PE [r8] + [r9] PE) + (PE[r10] + PE[r11]) + (PE[r12] + PE[r13]) + (PE[r14] + PE[r15]);

(3) hllSparseSum

double hllSparseSum(uint8_t *sparse, int sparselen, double *PE, int *ezp, int *invalid) 

The function recognizes each byte of sparse, determines which opcode it is, and returns the sum by looking at the PE table.

(4) hllRawSum

double hllRawSum(uint8_t *registers, double *PE, int *ezp) 

This function has the same idea as the previous one. It uses 8bit registers to store the contents and evaluates the contents of 8 registers in a loop.

4.2.3.3. sparse to dense

int hllSparseToDense(robj *o) 

This function is used to convert sparse to dense if the maximum length of a new element is 0 bytes > 32 or if the current sparse form is longer than server.hll_sparse_max_bytes.

In HLL, Redis uses SDS to store sparse and dense data.

The implementation logic for this function is also relatively simple, assuming you’ve understood the dense, sparse, and opcodes mentioned earlier in this article.

1. Use sdsnewlen to allocate HLL_DENSE_SIZE memory to dense, initialize index to 0
2. }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

• If it is ZERO and XZERO, set the registers covered by this opcode to ZERO. In fact, it is not set. Just skip these registers and increase the corresponding value of index.
• If it is VAL, take out the value and length of the opcode, set the Dense register covered by it to this value, and increase the index.
3. Free memory for sparse data.

5. To summarize

At first glance, you may feel that Hyperloglog is very difficult, but also to learn probability, and what Bernoulli process, but it is a simple formula to solve, patience to see it is still very fruitful, the next step can use Golang to achieve a own Hyperloglog according to these contents to deepen understanding.