In an interesting post from Dan Bader, learn how to investigate an unexpected Python phenomenon.

A Python dictionary expression puzzle

Let’s take a look at the following obscure Python dictionary expression to find out what’s going on inside the unknown Python interpreter.

# A Python puzzle: It's a secret
What is the result of evaluating this expression?

>>> {True: 'yes'.1: 'no'.1.0: 'maybe'}
Copy the code

Sometimes you’ll come across an in-depth code example — even one line of code — but if you think about it enough, it can teach you a lot about programming languages. Such a code snippet is like a *Zen K ōan* : a question or statement used to question and test the student’s progress during the spiritual practice.

The translator’s note:Zen k not the an“Is probably a way of practice, see for more detailswikipedia

The small code snippet we’ll discuss in this tutorial is one such example. At first glance, it might look like a simple dictionary expression, but when you think about it, it takes you through a mind-stretching exercise with the CPython interpreter.

I got an idea from this short line of code, and I started my presentation with it at a Python conference I attended. This has also inspired some positive exchanges among members of my Python mailing list.

So needless to say, it’s this piece of code. Take a moment to think about the following dictionary expression and what it evaluates to:

>>> {True: 'yes'.1: 'no'.1.0: 'maybe'}
Copy the code

Here, I’ll give you a moment to think about it…

  • 5…
  • 4…
  • 3…
  • 2…
  • 1…

OK, ready?

This is the result of evaluating the above dictionary expression in the CPython interpreter interface:

>>> {True: 'yes'.1: 'no'.1.0: 'maybe'}
{True: 'maybe'}
Copy the code

I admit, I was surprised when I first saw the results. But it all makes sense when you look at what’s going on. So, let’s think about why we got this — I mean, unexpected — result.

Where does this subdictionary come from

When Python processes our dictionary expression, it first constructs a new empty dictionary object; The keys and values are then assigned in the order given by the dictionary expression.

So, when we break it down, our dictionary expression is equivalent to statements in this order:

>>> xs = dict()
>>> xs[True] = 'yes'
>>> xs[1] = 'no'
>>> xs[1.0] = 'maybe'
Copy the code

Oddly, Python assumes that all dictionary keys used in this example are equal:

>>> True= =1= =1.0
True
Copy the code

OK, but wait here. I’m sure you can accept that 1.0 == 1, but the reality is why is True also considered equal to 1? The first time I saw this dictionary expression really stumped me.

After some exploration in the Python documentation, I discovered that Python uses bool as a subclass of int. Here is a snippet in Python 2 and Python 3:

“The Boolean type is a subtype of The integer type, and Boolean values behave like The values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, The strings’ False ‘or’ True ‘are returned, respectively.”

“A Boolean is a subtype of the integer type. In almost all contexts a Boolean behaves like the values 0 and 1, except that when converted to a string, the string” False “or” True “are returned, respectively. “(sic)

Yes, this means that you can programmatically use bool values as indexes for lists or tuples in Python:

>>> ['no'.'yes'] [True]
'yes'
Copy the code

But for the sake of readability, you should not use Boolean variables like this. (Please advise your colleagues not to do the same.)

Anyway, let’s go back to our dictionary expression.

As far as Python is concerned, True, 1 and 1.0 all represent the same dictionary key. When the interpreter evaluates a dictionary expression, it overwrites the value of the key True. This explains why the resulting dictionary contains only one key.

Before we continue, let’s revisit the original dictionary expression:

>>> {True: 'yes'.1: 'no'.1.0: 'maybe'}
{True: 'maybe'}
Copy the code

Why do you end up with True as the key? Shouldn’t we end up changing the key to 1.0 because of repeated assignments? After studying some patterns in the cpython interpreter source code, I learned that the Python dictionary does not update the key object itself when a new value is associated with the dictionary’s key:

>>> ys = {1.0: 'no'}
>>> ys[True] = 'yes'
>>> ys
{1.0: 'yes'}
Copy the code

Of course this makes sense as a performance optimization – if the keys are supposed to be the same, why spend time updating the original? In the original example, you can also see that the original True object has never been replaced. Thus, the dictionary’s string representation is still printed with True as the key (rather than 1 or 1.0).

As far as we can tell, it looks like the result is that the dictionary values are overwritten all the time, just because their keys are equal. In fact, however, this result is not simply due to __eq__ being equal.

Wait, what about hashes?

Python dictionary types are stored by a hash table data structure. When I first saw this surprising dictionary expression, my intuition was that the result had something to do with hash collisions.

The storage of keys in a hash table is represented by buckets according to the hash value of each key. A hash is a fixed-length string of numbers generated based on the keys of each dictionary, used to identify each different key. (Hash function details)

This allows for quick lookups. It is much faster to search through a hash table for a string of hash numbers corresponding to a key, rather than comparing the entire key object to all the other keys to check for heterogeneity.

However, the way hashes are usually computed is not perfect. And, in fact, two or more different keys will generate the same hash value, and they will end up in the same hash table.

If two keys have the same hash value, this is called hash collision, which is a special case to deal with when inserting and looking up elements in a hash table.

Based on this conclusion, hash values have a lot to do with the surprising results we get from dictionary expressions. So let’s see if the key hash is also at work here.

I defined a class to use as our testing tool:

class AlwaysEquals:
     def __eq__(self, other):
         return True

     def __hash__(self):
         return id(self)
Copy the code

There are two special things about this class.

First, because its __eq__ magic method always returns true, all instances of this class are identical to any other object:

>>> AlwaysEquals() == AlwaysEquals()
True
>>> AlwaysEquals() == 42
True
>>> AlwaysEquals() == 'waaat? '
True
Copy the code

Second, each Alwaysequals instance will also return a unique hash value generated by the built-in function ID () :

>>> objects = [AlwaysEquals(),
               AlwaysEquals(),
               AlwaysEquals()]
>>> [hash(obj) for obj in objects]
[4574298968.4574287912.4574287072]
Copy the code

In CPython, the id() function returns the memory address of an object and is definitely unique.

With this class, we can now create objects that look the same as any other object, but have different hashes. We can use this to test whether the keys of the dictionary are covered based on their equality comparison results.

As you can see, the keys in the following example are not overwritten, even though they are always equal:

>>> {AlwaysEquals(): 'yes', AlwaysEquals(): 'no'}
{ <AlwaysEquals object at 0x110a3c588> :'yes',
  <AlwaysEquals object at 0x110a3cf98> :'no' }
Copy the code

Now, let’s think about it a little bit differently, if I return the same hash value does that overwrite the key?

class SameHash:
    def __hash__(self):
        return 1
Copy the code

Instances of the SameHash class will definitely not be equal to each other, but they will have the SameHash value of 1:

>>> a = SameHash()
>>> b = SameHash()
>>> a == b
False
>>> hash(a), hash(b)
(1.1)
Copy the code

Let’s take a look at python’s dictionary when we tried to use an instance of the SameHash class as the dictionary key:

>>> {a: 'a', b: 'b'}
{ <SameHash instance at 0x7f7159020cb0> :'a',
  <SameHash instance at 0x7f7159020cf8> :'b' }
Copy the code

As this example shows, the result of “key overwritten” is also not caused by hash conflicts alone.

Umm.. Okay, so what’s the conclusion?

The Python dictionary type checks whether two objects are equal and compares hash values to determine whether the two keys are the same. Let’s try to summarize the results of our study:

{true: ‘yes’, 1: ‘no’, 1.0: ‘maybe’} dictionary expressions evaluate to {true: ‘maybe’} because the keys true, 1, and 1.0 are all equal and they all have the same hash value:

>>> True= =1= =1.0
True
>>> (hash(True), hash(1), hash(1.0(a))1.1.1)
Copy the code

Perhaps not so surprising, which is why we got this result as the final result of the dictionary:

>>> {True: 'yes'.1: 'no'.1.0: 'maybe'}
{True: 'maybe'}
Copy the code

We cover a lot of ground here, and this particular Python trick can be a bit mind-boggling at first — so I started by likening it to Zen K ōan.

If this article is hard to follow, try examining the code examples one by one in a Python interactive environment. You’ll gain some insight into Python.

Note: Reprint please retain the following content

Original link: dbader.org/blog/python…

Text link: vimiix.com/post/2017/1…