For me, who bought algorithm books one after another but never read more than 10% of them, Leetcode has never lasted more than 3 days, algorithm ability is really rubbish. However, today I decided to write an article about algorithms. The reason is that I read brother Wu’s article “Minesweeper and algorithm: How to randomize minesweeper (II) : Shuffle algorithm”. I’ve written minesweeper before, see Python: Game: Minesweeper.

The game starts with random mines. The advanced level of mine clearance is a 16 x 30 grid with a total of 99 mines. If all grids are marked from 0, then the problem of laying mines is to randomly select 99 out of 480 numbers. The first instinct is to record the options:

import random

mines = set()
for i in range(99):
    j = random.randint(0.480)
    while j in mines:
        j = random.randint(0.480)
    mines.add(j)
print(mines)
Copy the code

But this algorithm seems a little low.

So if you randomly pick 99 of the 480 numbers, you just shuffle the 480 numbers and pick the first 99. This brings us to the Gaunard shuffle algorithm.

This algorithm is cool but easy to understand. The popular explanation is: swap the last number with any previous n-1 number, and then swap the penultimate number with any previous n-2 number…… And so on.

This principle is very well understood, as common as it can be, and if you think about it a little bit, it’s true.

The Python implementation of the shuffling algorithm is as follows:

import random

lst = list(range(10))
for i in reversed(range(len(lst))):
    j = random.randint(0, i)
    lst[i], lst[j] = lst[j], lst[i]
print(lst)
Copy the code

Read brother Wu’s article, I immediately went to my mine clearance code, I think, I must be using the very “low” algorithm. If you look at the code, I’m using Python to provide a random sampling algorithm: random. Sample. And THEN I came up with the idea of randomly shuffling a sequence, random. Shuffle, that’s what you do. Could random. Shuffle be a shuffle algorithm?

Look at the source code of Random. Shuffle and find that it is the shuffle algorithm.

def shuffle(self, x, random=None):
    if random is None:
        randbelow = self._randbelow
        for i in reversed(range(1, len(x))):
            j = randbelow(i + 1)
            x[i], x[j] = x[j], x[i]
    else:
        _int = int
        for i in reversed(range(1, len(x))):
            j = _int(random() * (i + 1))
            x[i], x[j] = x[j], x[i]
Copy the code

Everything was so natural and beautiful, and then I took a look at the source code of Random. Sample, and I was at a loss. I cut some of the source code:

n = len(population)
result = [None] * k
setsize = 21        # size of a small set minus size of an empty list
if k > 5:
    setsize += 4 ** _ceil(_log(k * 3.4)) # table size for big sets
if n <= setsize:
    # An n-length list is smaller than a k-length set
    pool = list(population)
    for i in range(k):         # invariant: non-selected at [0,n-i)
        j = randbelow(n-i)
        result[i] = pool[j]
        pool[j] = pool[n-i- 1]   # move non-selected item into vacancy
else:
    selected = set()
    selected_add = selected.add
    for i in range(k):
        j = randbelow(n)
        while j in selected:
            j = randbelow(n)
        selected_add(j)
        result[i] = population[j]
return result
Copy the code

The setsize variable is confusing, but the following if and else parts are understandable. If is the shuffling algorithm, and else is the shuffling algorithm which I think is very low.

What’s going on here? In order to understand the truth, I went to search a lot of articles to view, the most valuable is the following: blog.csdn.net/harry_128/a…

Random sampling has two ways to achieve, one is random extraction and do not put back, is the shuffling algorithm; The second is to randomly pick and put it back, which is what I think of as the record selected algorithm. Random. Sample Select one of the following items based on the conditions. That is to say, there are advantages and disadvantages between shuffling algorithm and recording selected algorithm. And that kind of surprises me, isn’t it obvious that the shuffling algorithm is better?

Pool = list(population) pool = list(population) pool = list(population) Pool = list(population) Pool = list Creating a copy also takes time and space, which the algorithm naturally takes into account. When the sample number K to be taken is small compared with the total sample number N, the probability of random repetition is relatively small.

What is the basis of sample to judge which algorithm should be used? The source code is based on the setsize variable, and there is a confusing formula. In fact, this is to calculate the memory overhead required by the set. The implementation of the algorithm mainly considers the extra memory. If the memory footprint of the list copy is small, then the shuffle algorithm is used. If the set takes up little memory, use the record selected algorithm.

What? Is it based on how much extra memory it takes up? It’s kind of crazy. According to?

Let’s look at the time complexity of the algorithm. Calculating the time complexity of an algorithm is also difficult for those of you who are bad at algorithms (like me), so for the sake of simplicity, I’ll do it in a simple way.

Let’s start with the shuffling algorithm, which is order K, which is a little bit easier to understand. So, for the record-selected algorithm, the time complexity is O(NlogN). Don’t ask me how TO calculate it. I didn’t calculate it. I copied it. Interested partners can go to the calculation.

Let’s think about a simple algorithm, for the record selected algorithm, what is the time complexity if the selected value is exactly the same every time? It’s obviously order K. So when K is much less than N, we can think of the time complexity as O(K).

The idea of sample algorithm is that, when K is relatively small compared with N, the time complexity of the two algorithms is O(K), so the one that occupies less memory is selected. When K is relatively close to N, the time complexity of recording the selected algorithm will be higher than O(K). In this case, the shuffling algorithm is selected.

Have to sigh the algorithm is really extensive and profound.


Scan to follow my public account: The Path of Python for elder code farmers