This article originated from personal public account: TechFlow, original is not easy, for attention


In today’s 23rd article on algorithms and data structures, we continue with algorithms related to number theory and take a look at the well-known Escrow method.

We all know that prime numbers are very important in mathematics, and there are tons of formulas and studies on prime numbers, like the famous Goldbach conjecture that no one has solved yet. Like mathematics, prime numbers are very important in the field of information and have many applications. As a simple example, many secure encryption algorithms also use prime numbers. We always have to find prime numbers before we can do any kind of calculation with them. So this leads to the simplest and most difficult question, how do we find prime numbers?

Judge a prime number

The simplest way to find prime numbers is, of course, to go through them one by one. We go through each number in turn and determine if it is prime individually. So the question goes back to deciding whether a number is prime, and how do you decide whether a number is prime?

The only property of a prime number is that it only has two factors, 1 and itself, and that’s the only property we can use to judge a prime number. So you can imagine, if we want to determine whether n is prime, we can go from 2 to n-1, and if none of these n-1 numbers are divisible by n, then n is prime. I remember this correctly in the C language exercise appeared, in short very simple, can be said to be the simplest algorithm.

def is_prime(n):
    for i in range(2, n):
        if n % i == 0:
            return False
    returnn ! =1
Copy the code

Obviously, this algorithm can be optimized, for example, if n is even, we don’t need to loop at all, except that the even number must be composite. Another example is that we cycleThe upper bound doesn’t have to be n minus 1 to PIIt is ok. Because if there are factors that must come in pairs, if there are factors that are less than the square root of n, then n divided by it must be greater than the square root of n.

This improvement is also very simple, just make a few changes:

def is_prime(n):
    if n % 2= =0 andn ! =2:
        return False
    for i in range(3, int(math.sqrt(n) + 1)) :        if n % i == 0:
 return False  returnn ! =1 Copy the code

So we optimize the algorithm of O(n) to O() it’s a big improvement, but it’s not over yet, we can still refine it. There is a theorem in mathematics,Only natural numbers with forms such as 6n-1 and 6n+1 May be primeWhere n is an integer greater than or equal to 1.

At first glance, this theorem seems very advanced, but in fact it is very simple, because all natural numbers can be written as 6n,6n+1,6n+2,6n+3,6n+4,6n+5, and 6n,6n+2,6n+4 are even numbers and are definitely not prime. 6n plus 3 can be written as 3(2n plus 1), which is obviously not prime either, so only 6n plus 1 and 6n plus 5 could be prime. 6n plus 5 is the same thing as 6n minus 1, so we usually write 6n minus 1 and 6n plus 1.

Using this theorem, our code can be further optimized:

def is_prime(n):
    if n % 6 not in (1.5) and n not in (2.3) :        return False
    for i in range(3, int(math.sqrt(n) + 1)) :        if n % i == 0:
 return False  returnn ! =1 Copy the code

That’s fast, but it’s still not optimal, especially when we need to find lots of primes, and it still takes a lot of time. So is there any way to find prime numbers in bulk?

Yes, it’s called the Eratosthenes algorithm. It’s a very difficult name to pronounce, it’s an ancient Greek name. He was a great bull of ancient Greece and a friend of the famous Archimedes. Although he was not as famous as Archimedes, he was very, very powerful. He made achievements in mathematics, astronomy, geography, literature, history and many other fields. The diameter of the earth, the distance between the earth and the moon, the distance between the earth and the sun and the ecliptic Angle were measured by our own method. Considering that he lived more than 2,500 years ago, during the Spring and Autumn Period and the Warring States Period, you can imagine how powerful this man was.

Type sieve method

The Eratosthenes algorithm that we’re going to talk about today is the one he invented for screening prime numbers, which we refer to as the Essayian sieve or sieve method for convenience. The idea of the Eschian sieve is very simple. It takes the selected prime numbers and filters out all the numbers that are divisible by it. These primes act like a sieve through which the natural numbers are filtered, and the numbers that are left behind are naturally the ones that are not divisible by the prime numbers in front of them, which by definition are also prime numbers.

So for example, if we want to filter out all the primes up to 100, and we know that 2 is the smallest prime, we can use 2 first to filter out all the even numbers. And then we go back to 3, which is the first number left by 2, which is prime, and we use 3 to filter out everything that’s divisible by 3. After sifting, we continue to iterate, and the first number we encounter is 7, so 7 is also prime, and we repeat the process until the iteration is complete. At the end of the day, we have all the primes up to 100.

If that doesn’t make sense, take a look at the GIF below, which shows the whole process in great detail.

The idea is so simple that it’s really easy to write code after you understand it:

def eratosthenes(n):
    primes = []
    is_prime = [True] * (n + 1)
    for i in range(2, n+1) :        if is_prime[i]:
 primes.append(i)  Use the current prime I to filter out all numbers divisible by it  for j in range(i * 2, n+1, i):  is_prime[j] = False  return primes Copy the code

Let’s run the code once and see:


As expected, we get all the primes less than 100. Let’s analyze the complexity of sifting, and we can see from the code that we have two loops, and the outermost loop is always iterated n times. The inner layer of the loop is constantly changing, and its number of operations is related to the size of the prime number, which seems to be difficult to calculate. Actually, yes, according toPrime number distribution theoremAnd a bunch of complicated operations (trust me, you won’t be interested), and we can figure out that the complexity of sieving is zero.

Acme optimization

The complexity of sieving is already very similarBecause even when n is very large, after two calculations of ln, it’s pretty close to a constant, and in fact, in most usage scenarios, the algorithm above is sufficient.

But there are still daniu do not satisfy, continue to make optimization of the algorithm, the optimization toThe complexity of. Although there is no order of magnitude improvement in efficiency, the ideas applied are very clever and worth learning. Before we understand this optimization, let’s take a look at what else can be optimized about the previous sieving method. It’s pretty obvious that for a composite number,It can be sifted out by multiple primes. For example, 38, which has two prime factors, 2 and 19, will be set to False twice, which incurs additional overhead. If we update each composite number only once, will we be optimized to?

How to ensure that each composite is updated only once? The theorem that we’re going to use here is that only the prime factors of every composite number factorization are unique. If we can guarantee that a composite number will only be updated to False by its smallest prime factor, then the whole optimization is complete.

So how do we do that? If we assume that the least prime factor of the integer n is m, then we multiply the prime I less than m by n to get a composite number. We eliminate this composite number, and for this composite number, I must be its smallest prime factor. Because it is equal to I times n, whose smallest prime factor is m, and I is less than m, so I is its smallest prime factor, we generate eliminated composite numbers in such a way that each composite number is guaranteed to be eliminated only by its smallest prime factor.

Based on this, we can write new code:

def ertosthenes(n):
    primes = []
    is_prime = [True] * (n+1)
    for i in range(2, n+1) :        if is_prime[i]:
 primes.append(i)  for j, p in enumerate(primes):  # Prevent crossing boundaries  if p > n // i:  break  # filter  is_prime[i * p] = False  # when I % p equals 0, p is the smallest prime factor of I  if i % p == 0:  break   return primes Copy the code

conclusion

This is the end of our introduction to escherictic screening. The optimized version of essel screening method is relatively difficult to remember some, if not remember, you can only use the version before the optimization, the efficiency difference between the two is not large, completely within the acceptable range.

Sifting looks very simple code, but it’s very important, because with it, we can get a lot of primes in a very short amount of time, get a list of primes very quickly. With prime table, many problems are much easier, such as factorization, such as information encryption and so on. Every time I look back at the sifting algorithm, I can’t help but feel that this algorithm, invented more than 2,000 years ago, is not only outdated, but still clever. I hope you understand the essence of the algorithm with reverence.

If you like this article, if you can, please click the following, give me a little encouragement, but also convenient access to more articles.

This article is formatted using MDNICE