The original | Storing a list in an int (iantayler.com/2020/12/07/…).

The author | Computer Wit

Translator | under the pea flower cat (” Python cat “public, author)

Statement | the authorship of the translation has been authorized. The content has been changed slightly for ease of reading.

The profile

Unlike C, Rust, and Go, Python’s default int has an arbitrary size. [Note 1], [Note 2]

This means that an integer can store infinitely large values as long as there is enough memory.

For example, you can open Python3 and run the following command:

>>> import math
>>> math.factorial(2020)
[number omitted]  Note: the factorial of 2020 is a long list of numbers, so it is omitted
>>> math.log2(math.factorial(2020))
19272.453841606068
>>> type(math.factorial(2020))"class 'int'>
Copy the code

That is, in Python, the commonly used int can easily hold a 19273 bit value of a fixed-size unsigned int of type C (c-style fixed-size unsigned int). In a language like Python, where convenience trumps speed and memory efficiency, this is really useful.

This infinite precision also means that we can store any amount of information in a single int. An entire book, an entire database, or even anything, can be stored in a single Python int as long as it is coded correctly.

(Python cat note: There isAn article, an in-depth analysis of the implementation principle of Python integer overflow.

Thus, we can imagine a dialect of Python that only has integers and requires ints for all other types (dictionaries, lists, and so on). We also have special functions and methods that treat ints like lists, dict, and so on.

This will be a fun and entertaining exercise, and that’s what this article is trying to do.

There’s an obvious way to do this: All data structures are just bit-arrays in memory. At worst, it is an array of related bits (for example, like each node in a linked list or tree), and their collection is just a bit array. Bit arrays can be interpreted as binary numbers. So of course we can do this. But it’s kind of boring.

In this and subsequent posts in this series, I’ll introduce some ways to represent complex data structures with ints. They are not necessarily the most compact, rational, or efficient, but the common goal is to find interesting representations of these data structures. [3]

Introduction to Godel numbers (numbering numbering)

The first data structure we represent is a list. We will use Godel numbers named after the logician KurtGodel. For convenience, we deal only with lists made up of unsigned integers (that is, natural numbers).

The principle of Godel numbers is that every natural number greater than 1 is represented by a unique prime factorization. It’s based on the fundamental theorems of arithmetic.

(Prime factorization is a process of multiplying every number by prime numbers.)

Look at some examples:

A number can be uniquely identified by an exponential list of its prime factors (up to its highest non-zero exponent). So, we can use 126 to represent the list [1, 2, 0, 1]. The first number in the list is the prime factorization of 126 as an exponent of 2, the second as an exponent of 3, and so on.

A few more examples:

What if there’s a 0 at the end of the list? Well, with this kind of coding, that doesn’t happen.

In our prime factorization, there could be an infinite number of prime numbers with exponent 0, so we need to stop somewhere. We choose to stop at the last non-zero exponent.

This representation also uses very large numbers when the list contains large numbers. That’s because the numbers in the list represent exponents, so the size of ints grows exponentially with them. For example, [50, 1000, 250] needs to be represented with a number of 2266 bits.

On the other hand, long lists with very many small integers, especially large sparse lists (that is, most of the values are zero), have a very compact representation compared to other int-encoded lists.

Mind you, coding a list as an int isn’t good programming practice, it’s just a fun experiment.

Python implementation

Let’s take a look at the Python implementation. Here are a few caveats:

  1. We will use the function with yield because it greatly simplifies operations. [5]
  2. You’ll see a lot of while loops. This is because list generators, ranges, and most of the things you might want to use in a for loop are prohibited for int-only dialects. All of this is replaced by the while loop.

Prime number generator

The first function we’ll write is an iterator that will generate primes in order. It’s critical from beginning to end. The implementation here is the simplest possible version.

I’ll probably write a whole article soon on algorithms that generate prime numbers, because it’s a cool topic and an ancient field of research in its own right. The best known algorithm is the Sieve of Erathosthenes, but that’s just the tip of the iceberg. [6]

Here, a very naive implementation will suffice:

def primes(starting: int = 2) :
    """Yield the primes in order. Args: starting: sets the minimum number to consider. Note: `starting` can be used to get all prime numbers _larger_ than some number. By default it doesn't skip any candidate primes. """
    candidate_prime = starting
    while True:
        candidate_factor = 2
        is_prime = True
        # We'll try all the numbers between 2 and
        # candidate_prime / 2. If any of them divide
        # our candidate_prime, then it's not a prime!
        while candidate_factor <= candidate_prime // 2:
            if candidate_prime % candidate_factor == 0:
                is_prime = False
                break
            candidate_factor += 1
        if is_prime:
            yield candidate_prime
        candidate_prime += 1
Copy the code

Creating an empty list

def empty_list() - >int:
    """Create a new empty list."""
    # 1 is the empty list. It isn't divisible by any prime.
    return 1
Copy the code

Traverse elements

def iter_list(l: int) :
    """Yields elements in the list, from first to last."""
    # We go through each prime in order. The next value of
    # the list is equal to the number of times the list is
    # divisible by the prime.
    for p in primes():
        # We decided we will have no trailing 0s, so when
        # the list is 1, it's over.
        if l <= 1:
            break
        # Count the number of divisions until the list is
        # not divisible by the prime number.
        num_divisions = 0
        while l % p == 0:
            num_divisions += 1
            l = l // p  # could be / as well
        yield num_divisions
Copy the code

Access to the elements

def access(l: int, i: int) - >int:
    """Return i-th element of l."""
    # First we iterate over all primes until we get to the
    # ith prime.
    j = 0
    for p in primes():
        if j == i:
            ith_prime = p
            break
        j += 1
    # Now we divide the list by the ith-prime until we
    # cant divide it no more.
    num_divisions = 0
    while l % ith_prime == 0:
        num_divisions += 1
        l = l // ith_prime
    return num_divisions
Copy the code

Add elements

def append(l: int, elem: int) - >int:
    # The first step is finding the largest prime factor.
    # We look at all primes until l.
    # The next prime after the last prime factor is going
    # to be the base we need to use to append.
    # E.g. if the list if 18 -> 2**1 * 3**2 -> [1, 2]
    # then the largest prime factor is 3, and we will
    # multiply by the _next_ prime factor to some power to
    # append to the list.
    last_prime_factor = 1  # Just a placeholder
    for p in primes():
        if p > l:
            break
        if l % p == 0:
            last_prime_factor = p
    # Now get the _next_ prime after the last in the list.
    for p in primes(starting=last_prime_factor + 1):
        next_prime = p
        break
    # Now finally we append an item by multiplying the list
    # by the next prime to the `elem` power.
    return l * next_prime ** elem
Copy the code

Try these functions

You can open a Python, iPython, or bPython session and try these functions out!

It is recommended that list elements use numbers ranging from 1 to 10. If you use large numbers, append and Access can take a long time.

To some extent, it is impractical to use Godel numbers to represent lists, although the range of available values can be greatly expanded by optimizing prime number generation and decomposition algorithms.

In [16]: l = empty_list()
 
In [17]: l = append(l, 2)
 
In [18]: l = append(l, 5)
 
In [19] :list(iter_list(l))
Out[19] : [2.5]
 
In [20]: access(l, 0)
Out[20] :2
 
In [21]: access(l, 1)
Out[21] :5
 
In [22]: l
Out[22] :972  2^2*3^5=972
Copy the code

Other INT encoding

We saw a way to represent a list of natural numbers as ints. There are other, more practical approaches that rely on subdividing the binary form of a number into chunks of varying sizes. I’m sure you can make such a suggestion.

I may write other articles later on better algorithms for generating and decomposing prime numbers, as well as ints for other complex data structures.

footnotes

  1. I think programs also break before they run out of memory, but the documentation does explicitly mention that they have infinite precision.
  2. Note that this is true for Python3, but not for Python2. For Python2, int is fixed size. I think it’s fine to refer to Python3 as Python in 2020, but I also think this detail is worth adding a footnote.
  3. It is easy to argue that this is a poor representation of a list in terms of Godel numbers. In a future blog post, we’ll talk about representational trade-offs.
  4. We can store the length of the list in a separate int to know how many zeros to consider at the end of the list. If we don’t want to have awhole separate int, we can always write the length of the list as the exponent of 2 and start the actual list with the exponent of 3. This has some redundant information, though. The way to avoid redundant information is to store the number of final 0sIn the list, instead of the entire length. We won’t be worrying aboutany of this, though.
  5. Note that using yield is no different than using return with a state variable as an argument (usually enough to get the last element returned). It’s a bit like Continuation Passing Style. Also similar to the usual tail recursion accumulator for non-tail recursive functions. If you’ve never heard of accumulator tips, here are some links [1], [2]. I might write something that mimics iterators in a language that doesn’t have them.
  6. See also The Genuine Sieve of Erathosthenes paper, which clarifies how this algorithm is defined.

Python cat note: This is all the translation, but I would like to add one last interesting point. In Hackers and Painters, Master Paul Gray makes a startling prediction that there is no logical need for integer types, because the integer n can be represented by a list of N elements. Ha ha, this is exactly the opposite of the imagination of the text! Imagine a programming language with only integer types and no lists, or a programming language with only list types and no integers. Which is more likely to emerge in the future?