The Path to Becoming a Java engineer

Cabbage Java self study room covers core knowledge

The path to Becoming a Java engineer Redis the path to becoming a Java engineer

1. Redis skip list

1.1. Introduction to skip lists

Skiplist is a randomized data structure. It was written by William Pugh in Skip Lists: As proposed in A Probabilistic Alternative to balanced Trees, it is a hierarchical linked list structure comparable to balanced trees — search, delete, add and other operations can be completed in logarithmic expected time.

Among the five basic structures of Redis, there is a data structure called ordered list Zset, which is similar to the combination of SortedSet and HashMap in Java. On the one hand, it is a set that ensures the uniqueness of internal values. On the other hand, each value can be assigned a sorting weight value score to achieve the purpose of sorting. Its internal implementation relies on a data structure called a “skip list.”

Why skip lists?

First, because zset supports random inserts and deletes, it should not be implemented using arrays. For sorting problems, we can easily think of a red-black/balanced tree structure. Why not use such structures?

Performance considerations: In high-concurrency situations, where trees perform operations like rebalance that might involve the entire tree, skip list changes are relatively local (more on that below).
Implementation considerations: Skip lists are simpler and more intuitive to implement with the same complexity as red-black trees;

Based on the above considerations, Redis made some improvements based on William Pugh’s paper and adopted the structure of skip lists.

1.2. Skip list principle

Let’s start with a common linked list structure:

We need this list is sorted according to the score values, and what this means is that when we need to add new elements, we need to position the insertion point, so you can continue to ensure that the list is ordered, we often use binary search method, the binary search is ordered array, linked list can’t for position location, There seems to be no better way to do this than to walk through the whole thing until we find the first node that is larger than the given data (time complexity O(n)). But suppose we add a pointer between each adjacent node to point to the next node, as shown below:

All the new Pointers are then joined into a new linked list, but it contains only half as much data (3,11 in the figure). Now suppose that when we want to find data, we can look it up according to the new list. If we encounter a node larger than the data we want to find, we can go back to the original list and look it up again. For example, if we want to find 7, the search path is in the direction indicated by the red pointer in the following figure:

This is a slightly more extreme example, but we can still see that with the new pointer lookup, we no longer need to compare each node on the list one by one, so there are about half as many nodes to compare. In the same way, we can continue to add a pointer to each two adjacent nodes on the newly generated list to generate the third layer of the list:

In this new three-tier structure, we try to find 13, so we compare 11 first along the topmost list and find that 11 is smaller than 13, so we know that we only need to go after 11 and skip all nodes in front of 11. As you can imagine, when the list is long enough, such a multi-layer structure can help us skip many of the lower nodes, thus speeding up the efficiency of the search.

Further skip lists

Skiplist was inspired by this multi-layer linked list structure. According to the way of generating the linked list above, the number of nodes in each layer of the linked list is half of the number of nodes in the next layer, so that the search process is very similar to a binary search, so that the search time complexity can be reduced to O(logn).

However, this method has major problems when inserting data. When a new node is inserted, the strict 2:1 relationship between the number of nodes in the next two linked lists is disrupted. If this correspondence is to be maintained, all nodes after the newly inserted node (including the newly inserted node) must be re-adjusted, which causes the time complexity to degenerate back into O(n). Deleting data has the same problem.

Skiplist avoids this problem by not requiring a strict relationship between the number of nodes in the next two levels of the linked list. Instead, it assigns a random level to each node. For example, if a node has a random number of layers 3, then it is linked to the list of layers 1 through 3. For clarity, the following figure shows how to create a Skiplist through step-by-step insertion operations:

From the above create and insert it can be seen that in the process of each node of the layer (level) is random, and insert a new node will not affect the other node layer, therefore, insert only need to modify the node pointer before and after operation, without the need for multiple nodes are adjusted, this reduces the complexity of the insert.

Now suppose we look for the nonexistent number 23 from the structure we just created. The search path would look like this:

2. Redis distributed lock

2.1. Introduction to Distributed Locks

Locking is a tool designed to solve the problem of multiple threads of execution accessing a shared resource incorrectly or data inconsistencies, essentially allowing only one user to operate at a time.

In general, we use distributed locks in two main scenarios:

Avoid repeating the same work on different nodes. For example, different nodes may send multiple emails after a user performs an operation.
Avoid damaging data correctness: If two nodes operate on the same data, data errors or inconsistency may occur.

Any tool that satisfies this requirement is available to us (that other applications can lock for us) :

MySQL has its own pessimistic lock for update keyword, can also implement pessimistic/optimistic lock to achieve the purpose;
Zookeeper ordered nodes: Zookeeper allows you to temporarily create ordered child nodes, so that when the client obtains the node list, it can determine whether the sequence number in the current child node list can obtain the lock.
Redis based single thread: since Redis is single thread, commands are executed in a serial manner and provide instructions such as SETNX(Set If Not exists), which are mutually exclusive;

2.2. Redis distributed lock

Distributed locks are similar to “trap”, and Redis’ SETNX(SET If Not eXists) directive is one such operation, allowing possession by only one client.

1. The lock timeout

Suppose we now have two parallel services A and B, where A suddenly hangs after acquiring the lock, then B can never acquire the lock:

Therefore, we need to set an additional timeout period to ensure the availability of the service.

But then another problem arises: if the logic between locking and releasing the lock is executed so long that the timeout limit for the lock is exceeded. Because the first thread’s lock expires and the critical section’s logic is not finished, the second thread takes possession of the lock ahead of time, causing the critical section’s code to not be strictly executed serially.

To avoid this problem, Redis distributed locks should not be used for longer tasks. If something does occasionally go wrong, the resulting data glitches may require human intervention.

A slightly safer option is to set the value of the lock to a random number and then delete the key. This ensures that a lock held by the current thread will not be released by another thread unless the lock is automatically released by the server because it has expired.

However, matching a value and deleting a key are not atomic operations in Redis, and there are no similar atomicity instructions, so you may need to use a script like Lua, because Lua scripts can ensure atomicity of multiple instructions.

2. Single point/multipoint problems

If Redis is deployed in standalone mode, this means that when Redis fails, the entire service becomes unavailable.

For master-slave deployment, imagine a scenario like this: After service A applies for A lock, if the host Redis is down, then service B will obtain the lock from the slave machine when applying for the lock. To solve this problem, the author of Redis proposed A RedLock algorithm (same as Jedis) :

/ / three Redis cluster RLock lock1. = redissionInstance1 getLock (" lock1 "); RLock lock2 = redissionInstance2.getLock("lock2"); RLock lock3 = redissionInstance3.getLock("lock3"); RedissionRedLock lock = new RedissionLock(lock1, lock2, lock2); lock.lock(); // do something.... lock.unlock();Copy the code

2.3. Redlock Distributed lock

This article proposes an authoritative Redis based distributed lock method named Redlock, which is more secure than the original single node method. It guarantees the following features:

Security features: Mutually exclusive access, that is, only one client can always get the lock
Deadlock avoidance: The client can eventually acquire the lock and no deadlock occurs, even if the client that originally locked the resource crashes or a network partition occurs
Fault tolerance: Service can be provided as long as most Redis nodes are alive

Single-node implementation:

SET resource_name my_random_value NX PX 30000

The value my_random_value must be unique to all clients and lock requests during the time they occur. The logic for releasing the lock is as follows:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end
Copy the code

The above implementation can avoid releasing locks created by another client. If only the del command is used, then the lock1 on the Redis end has expired and has been reassigned to client2 because some operation blocks for a long time after client1 gets the lock1. If client1 tries to release the lock, the lock will be released by Client 2. Assigning a unique string value to each client can avoid this problem. As for how to generate this unique string, there are many ways to do it.

Redlock algorithm:

The algorithm is easy to understand, at least 5 master nodes, distributed in different rooms to ensure availability. To obtain the lock, the client does the following:

Retrieves the current time in microseconds.
Try to apply for locks on 5 instances sequentially, of course using the same key and random value. Here, a client needs to set a reasonable timeout size to communicate with the master node, so as to avoid wasting time with a node that fails for a long time.
When the client successfully obtains a lock on at least 3 masters, it calculates how long it took to obtain the lock. The elapsed time is calculated from the current time of acquiring the lock minus the timestamp obtained in the first step. If the lock validity time is longer than the elapsed time, then the lock is actually obtained.
If the lock is applied for, the real lock validity time should be origin (Lock validity time) – the elapsed time during the lock application period.
If the client fails to apply for the lock, it releases the lock on the few master nodes that have successfully applied for the lock, resetting the status.

Retry on failure:

If a client fails to apply for a lock, it needs to wait and try again to avoid the situation where multiple clients apply for locks at the same time. In the best case, a client needs to apply for locks at nearly five masters at the same time. In addition, if a client fails to apply for a lock, it needs to perform an unlock operation on the master as soon as possible, so that other clients can obtain the lock and avoid the waste of time caused by the lock expiration. Of course, if the network partition prevents the client from contacting the master, then this waste is the price you have to pay.

Release the lock:

Unlocking a lock is a simple operation, just release the locks on all nodes in sequence.

Performance, crash Recovery and fsync:

If our node does not have a persistence mechanism, the client obtains locks from 3 of the 5 masters, and then one of them restarts, notice that there are 3 more masters in the entire environment for another client to apply for the same lock! It violates mutual exclusion. If we enable AOF persistence, the situation is slightly better, because Redis’s expiration mechanism is implemented semantically, so time still passes when the server hangs, and the lock state is not polluted after restart. However, considering the power outage, part of the AOF commands will be lost before they can flush back to the disk, unless we set the flush policy to fsnyc = always, which will damage the performance. The solution to this problem is that when a node is restarted, we specify that it is not available during the Max TTL period, so that it will not interfere with the locks that have been applied for originally. When the locks before the crash have expired and there is no historical lock in the environment, then the node is added to work normally.

3. Redis Bloom filter

3.1. Introduction to Bloom filter

Bloom Filter was proposed by Bloom in 1970. It’s actually a long binary vector and a series of random mapping functions (more on that below), but you can also think of it simply as an imprecise set structure that might misjudge the existence of an object when you use its Contains method. But the Bloom filter is not particularly inaccurate, as long as the parameters are set reasonably, its accuracy can be controlled relatively accurate enough, there will only be a small probability of misjudgment.

When a Bloom filter says a value exists, it may not exist; When it says it doesn’t exist, it certainly doesn’t exist. For example, when it says it doesn’t know you, it really doesn’t know you, but when it says it knows you, it may misjudge you because you look like another friend it knows (with a similar face).

Based on the above functions, we can use the Bloom filter in the following scenarios:

If your server has enough memory, then using HashMap may be a good solution. Theoretically, the time complexity can reach O(1), but when the data volume is large, only Bloom filters can be considered.
Resolve cache penetration: We often put hot data in Redis as a cache, such as product details. Usually after a request to come we will first query cache, instead of reading the database directly, this is the most simple performance which is the most common practice, but If one has been requested does not exist the cache, so at this time must be no cache, there will be a lot of request directly to the database, cause cache penetration, Bloom filters can also be used to solve such problems.
Crawler/mailbox system filtering: I don’t know if you have noticed that some normal mail is also put in the spam directory, this is due to the use of bloom filter misjudgment.

3.2. Principle of Bloom filter

A Bloom filter is essentially a bit vector of length m or a bit list (a list containing only 0 or 1 bit values), all of which are initially set to 0.

So let’s first create a slightly longer bit vector for demonstration:

When we add data to the Bloom filter, we use multiple hash functions to evaluate the key, calculate a certificate index value, and then modulo the length of the bit array to get a position, each of which will evaluate a different position. To complete the add operation, set each position of the bit array to 1. For example, we add a wmyskxz:

When a Bloor filter is queried for the existence of a key, just as the add operation does, the key is computed through the same hash functions to check whether the corresponding position is 1. If one bit is 0, the key does not exist in the bloor filter. If all of these positions are 1, it does not mean that the key exists, but rather that it is highly likely to exist, because the 1 in these positions may be caused by the existence of other keys.

Query for a nonexistent key after adding some data:

Obviously, the 1 in 1/3/5 is due to the first wmyskxz added above, so there is a miscalculation here. Fortunately, Bloom filter has a formula that can predict the rate of misjudgment, which is more complex, interested friends can read by themselves, more brain-burning… Just keep the following in mind:

Do not use the actual number of elements much larger than the initial number;
When the actual number of elements exceeds the initial number, the Bloom filter should be rebuilt, and a larger filter should be reassigned, and then all historical elements should be added in batches.

3.3. Use of Bloom filter

The bloom filter was introduced in Redis 4.0 as a plugin. Bloom filters are loaded into Redis Server as a plug-in, providing Redis with powerful bloom de-duplexing capabilities.

The bloom filter has two basic directives, bf.add to add an element and bf.exists to check if an element exists. This is used in the same way as sadd and sismember of set sets. Note that bf.add can only add one element at a time. If you want to add more than one element at a time, you need to use the bf.madd directive. Similarly, if you need to query the existence of multiple elements at once, you need to use the bf. Mexists directive.

127.0.0.1:6379> bf.add codehole user1 (integer) 1 127.0.0.1:6379> bf.add codehole user2 (integer) 1 127.0.0.1:6379> Bf.add codehole user3 (integer) 1 127.0.0.1:6379> bf.exists codehole user1 (integer) 1 127.0.0.1:6379> bf.exists Codehole user2 (integer) 1 127.0.0.1:6379> bf.exists codehole user3 (integer) 1 127.0.0.1:6379> bf.exists codehole user4 (INTEGER) 0 127.0.0.1:6379> bf.madd codehole user4 user5 user6 1) (INTEGER) 12) (integer) 1 3) (integer) 1 127.0.0.1:6379> bf.mexists codehole user4 user5 user6 user7 1) (INTEGER) 12) (integer) 1 3) (integer) 1 4) (integer) 0Copy the code

4. Redis HyperLogLog

4.1. HyperLogLog profile

HyperLogLog is an approximate optimal algorithm for cardinality estimation first proposed by Flajolet and colleagues in 2007. But unlike the original paper, it seems that many books including Redis authors refer to it as a new data structure (new DataStruct) (algorithm implementation does require a specific data structure to implement).

1. Base statistics:

Cardinality Counting is usually used to count the number of unique elements ina set.

Consider this scenario: if you are responsible for the development and maintenance of a large website, one day the boss asked the product manager for the UV of every page on the site (unique visitors, each user is recorded only once a day), and then asked you to develop the statistics module, how would you do it?

If PV is counted, it is very easy to configure a separate Redis counter for each page, adding the date of the day to the key suffix of this counter. The INCRBY command is executed once for each request and all PV data is finally counted.

But UV is different in that it has to be de-duplicated, and multiple access requests from the same user in a day can only be counted once. This requires that each web request needs to carry the user ID, whether the login user or not login user, need a unique ID to identify.

A simple solution might immediately come to mind: set up a separate set for each page to store all the user ids that visited the page that day. But the question is:

Huge storage space: If your site has a lot of traffic, the number of sets you need to store will be huge. You can let your boss beat you to death for the cost of a feature;
Statistics are more complicated: if so many sets want to aggregate statistics, it is a complicated thing;

Common methods of cardinality statistics:

There are generally two better solutions than set sets for things like this that require cardinality statistics:

First: B trees

The biggest advantage of B-tree is high efficiency of insertion and search. If b-tree is used to store data to be counted, it can quickly determine whether new data exists and insert elements into B-tree quickly. To compute the base value, you just need to count the number of nodes in the B tree.

However, maintaining the B-tree structure in memory can solve the problem of statistics and calculation, but does not save memory.

Second: bitmap

Bitmap can be understood as a data structure that stores specific data through an array of bits. Each bit can contain information independently. Bit is the smallest storage unit of data, so it can save a lot of space. If you define a large array of bits, each element in the basic statistics corresponds to a bit in the array, for example:

Bitmaps also have the obvious advantage of being able to easily merge multiple statistical results, requiring only multiple results to be different or, and greatly reducing storage memory. If you want to count the cardinality value of 100 million data, about the required memory: 100_000_000/8/1024/1024 ≈ 12 M, if you use a 32-bit int to represent each statistical data, about the required memory: 32 * 100_000_000/8/1024/1024 ≈ 381 M

As you can see, the memory savings from Bitmap are obvious, but still not enough. It takes 12 M to count the cardinality of an object. If 10,000 objects are counted, it needs close to 120 G, which is still not applicable to big data scenarios.

2. Probability algorithm:

In fact, no better efficient algorithm has been found to accurately calculate cardinality in big data scenarios, so using probabilistic algorithm is a good solution without pursuing absolute accuracy.

The probabilistic algorithm does not store the data set itself directly, but estimates the cardinality through certain probabilistic statistical methods, which can greatly save memory and ensure the error control within a certain range. Current probabilistic algorithms used for radix counting include:

Linear Counting(LC) : An early cardinality estimation algorithm, LC is not excellent in terms of spatial complexity. In fact, the spatial complexity of LC is the same as that of the simple bitmap method above (but with a reduced level of constant term), which is O(Nmax).
LogLog Counting(LLC) : LogLog Counting saves more memory than LC, and the space complexity is only O(log2(Nmax)).
HyperLogLog Counting(HLL) : HyperLogLog Counting is an optimization and improvement based on LLC. With the same space complexity, the error of cardinality estimation is smaller than that of LLC

Among them, HyperLogLog’s performance is amazing, above we have simply calculated that using bitmap to store 100 million statistics about 12 M memory, and in HyperLogLog, only need less than 1 K memory can do it! HyperLoglog implemented in Redis also requires only 12 K of memory and can count 264 data with a standard error of 0.81%!

4.2. HyperLogLog principle

Consider a coin toss game: You toss a coin n times in a row, then say the maximum number of consecutive heads, and I’ll guess how many times you toss a coin.

Given a series of random integers, we record the maximum length K of the low continuous zero bits, which is the maxbit in the figure. By this K value, we can estimate the number N of random numbers. You’ll find a significant linear correlation between the logarithms of K and N: N is about 2k.

The average points barrel

If N is between 2k and 2K +1, the estimated value in this way is equal to 2K, which is obviously unreasonable, so we can use multiple BitKeeper for weighted estimation, and can get a more accurate value.

The process is a little similar to the inside of the talent show, a bunch of professional judges scoring, but there are some of the judges for their love so give up, some judges have low, so is generally going to shield the highest and lowest points, and then calculate the average, such the score is fair and just the same. This code has 1024 judges and uses the harmonic mean, the reciprocal average, to calculate the average, which effectively smootens out the effects of outliers.

The real HyperLogLog

Have a magical website, can let you observed dynamically HyperLogLog how the algorithm performed: content. research. Neustar. Biz/blog/HLL ht…

Some of these concepts are explained here so that you can click step to see them for yourself:

M is the number of buckets: it can be seen from the figure that there are 64 buckets.
The blue bit represents the position in the bucket: for example, 101110 represents binary 46, so the element is counted in the 46th bucket marked red in the large Register Values table in the middle.
The green bit indicates the position where the first 1 appears: the first green bit is 1 from right to left, so Register Values 46 (1);
The red bit is the sum of the green bit: the next element to appear in the 46th bucket is added;

4.3. HyperLogLog in Redis

In Redis’ HyperLogLog implementation, 16,384 buckets are used, i.e. : 214, which is just like the large Register Values table in the middle of the site above.

The maximum amount of data Redis can count is 264, that is, the maxbit of each bucket needs 6 bits to store, and the maxbit is 63, so the total memory occupied is: (214) x 6/8 (6 bits per bucket, which takes up 16,384 bits per bucket, divided by 8 into KB), works out to 12 KB.

Redis is very abnormal for memory optimization. When the count is relatively small, most buckets count to zero. In this case, Redis will properly save space and switch to another sparse storage mode, as opposed to the normal storage mode called dense storage, which occupies a constant 12 KB.

HyperLogLog provides two instructions PFADD and PFCOUNT, literally one for increment and the other for fetch count. The PFADD and SCARD sets are used in the same way. If you want to add a user ID, insert the user ID into the set.

> PFADD codehole user1 (interger) 1 > PFCOUNT codehole (integer) 1 > PFADD codehole user2 (integer) 1 > PFCOUNT codehole  (integer) 2 > PFADD codehole user3 (integer) 1 > PFCOUNT codehole (integer) 3 > PFADD codehole user4 user 5 (integer) 1  > PFCOUNT codehole (integer) 5Copy the code