Follow the public account “Cloud crawler Technology research Notes” for more dry goods ~

The owner of the account introduces years of anti-crawler cracking experience, AKA “reverse pupil”, obsessed with data analysis and hacker growth can not extricate. he has the name of CSDN blog expert and Huawei cloud sharing expert.

Today’s title is “Thinking about how to optimize our code, inspired by a simple Python merge dictionary problem”. Why do we have this topic? Today, I had a conversation with a friend who had just finished an interview for a Python development position. This was the same question he had encountered in the interview:

How do I combine two Dict objects in Python with a simple expression?

Although this problem is a very simple problem, and there are many kinds of ideas to solve the problem. It’s a small question, but let’s take a deeper look at how the code is optimized, how it’s optimized, and so on.

Let’s start with a quick thought: What are the ways to merge two dicts in Python? Let’s take Python3 and Python2, respectively.

case

Suppose we now have DictXX and DictYY, and we want to combine them to get a new DictZZ, we would do this:

  • In Python 3.5 or later:
z = {**x, **y}
Copy the code
  • In Python 2 (or 3.4 or lower), write a function:
def merge_two_dicts(x, y):
    z = x.copy()   # start with x's keys and values
    z.update(y)    # modifies z with y's keys and values & returns None
    return z
z = merge_two_dicts(x, y)
Copy the code

Python3.5

Suppose we have two dictionaries and want to combine them into the new dictionary without changing the original dictionary:

x = {'a': 1, 'b': 2}
y = {'b': 3.'c'4} :Copy the code

The ideal result is a new dictionary where z is merged, with the value of the second Dict overwriting the value of the first Dict.

>>> z
{'a': 1, 'b': 3.'c'4} :Copy the code

The new syntax proposed in PEP 448 and available as of Python 3.5 is:

z = {**x, **y}
Copy the code

It only takes a very compact expression to do this, and we can also use unpacking to do this:

z = {**x, 'foo': 1, 'bar': 2, **y}
Copy the code

The results are as follows:

>>> z
{'a': 1, 'b': 3.'foo': 1, 'bar': 2.'c'4} :Copy the code

It is now officially implemented in the 3.5 release schedule, PEP 478, and has entered the Documentation for new features in Python 3.5.

Let’s take a quick look at how this new feature can be used

This feature allows us to use multiple unpack expressions in the same expression and easily merge iterators and regular lists without having to convert iterators to lists and then merge them.

However, since many organizations are still using Python 2, we may want to operate in a backward compatible manner. The classical Pythonic methods available in Python 2 and Python 3.0-3.4 are done in two steps:

z = x.copy()
z.update(y) # which returns None since it mutates z
Copy the code

In this method, we copy x to create a new object Z, and merge the two dict objects using dict update.

Python3.5 is analyzed in the following way

If we’re not using Python 3.5 yet, or need to write backward-compatible code, and want to use it in a single expression, the most efficient way is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dicts, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z
Copy the code

Then we need to use the function like this:

z = merge_two_dicts(x, y)
Copy the code

You can also create a function that merges multiple dicts, and we can specify any number of dicts:

def merge_dicts(*dict_args):
    """ Given any number of dicts, shallow copy and merge into a new dict, precedence goes to key value pairs in latter dicts. """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result
Copy the code

This function will work with all dictionaries in Python 2 and 3. We can use it like this:

z = merge_dicts(a, b, c, d, e, f, g) 
Copy the code

Note, however: dict keys from later have higher precedence, overwriting previous keys.

Let’s imagine other answers

In Python 2, we can also do this:

z = dict(x.items() + y.items())
Copy the code

In Python 2, we use.items() to get a list, which means we’ll create two lists in memory, then a third list in memory equal to the length of the first two dictionaries, and finally discard all three lists to create dictionaries, which is the Dict we need. Note, however, that we can never use this in Python 3, where this will fail because we are adding two dict_items objects instead of two lists together.

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for+ :'dict_items' and 'dict_items'
Copy the code

Of course, if we really wanted to, we could cast them explicitly as lists, such as z = dict(list(x.items()) + list(y.Items ())), but that would waste resources and computing power.

Similarly, items() will fail to be federated in Python 3 (viewitems() in Python 2.7) when the value is a non-hashed object (such as a list). Even if your value is hashable, because the collection is semantically unordered, the behavior regarding priority is uncertain. So don’t do this:

>>> c = dict(a.items() | b.items())
Copy the code

Let’s demonstrate what happens when values are not hashed:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Copy the code

Here is an example where y should take precedence, but because of the arbitrary order of the set, the value of x is preserved:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> x.items() | y.items()
{('a', 1), ('a', 2)}
>>> dict(x.items() | y.items())
{'a'2} :Copy the code

Another technique that we shouldn’t use:

z = dict(x, **y)
Copy the code

This use dict constructor, and very quickly and has a memory efficiency (or even slightly more than our two-step process), but unless we know exactly what is happening inside (that is to say, the second dict as keyword arguments passed to the dict, constructor) can we use, otherwise the expression is very difficult to read, Sometimes we don’t quickly understand what this is, so it’s not Pythonic.

With that in mind, let’s take a look at a sample usage fix in Django.

Dictionaries are designed to get hashed keys (for example, frozenset or tuple), but this method fails in Python 3 when the key is not a string.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings
Copy the code

On the mailing list, Bigwig Guido Van Rossum wrote:

I declare dict ({}, ** {1:3}) illegal because it's an abuse of the ** mechanism. Dict (x, ** y) is obviously similar to the "cool" operation of calling x.update (y) and returning x. But I personally find it more vulgar than "cool".Copy the code

But from what I understand (and what I understand from the big guy), the intended use of dict(**y) commands is to create readable dictionaries, such as:

dict(a=1, b=10, c=11)
Copy the code

Used instead of

{'a': 1, 'b': 10, 'c'11} :Copy the code

Using the ** operator here also doesn’t abuse the mechanism, which is exactly why we use ** to pass dict as a key.

Finally, take a look at the poor implementations

These methods have poor performance, but they will provide the correct behavior. They will not perform as well as copy and Update or the new unpack methods because they traverse each key-value pair at a higher level of abstraction, but they do follow the order of precedence (the latter determines precedence)

  • We can do this again using the generator:
{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7
Copy the code

Or in Python 2.6 (perhaps as early as 2.4 when generator expressions were introduced) :

dict((k, v) for d in dicts for k, v in d.items())
Copy the code

Itertools. chain iterator:

import itertools
z = dict(itertools.chain(x.iteritems(), y.iteritems()))
Copy the code

ChainMap:

>>> from collections import ChainMap
>>> x = {'a': 1,'b': 2}
>>> y = {'b': 10,'c': 11}
>>> z = ChainMap({}, y, x)
>>> for k, v in z.items():
        print(k, '-->', v)

a --> 1
b --> 10
c --> 11
Copy the code

#### Let’s do a time analysis. I will perform a performance analysis only for the usage with known correct behavior.

import timeit
Copy the code

Complete the following operations on Ubuntu 18 in Python 2.7 (System Python) :

>>> min(timeit.repeat(lambda: merge_two_dicts(x, y))) 0.5726828575134277 >>> min(timeit.repeat(lambda: {k: v)for d in (x, y) for k, v inD.tems ()})) 1.163769006729126 >>> min(timeit. Repeat (lambda: Dict (itertools.chain(x.iterItems (), Y.iterItems ())))) 1.1614501476287842 >>> min(timeit.repeat(lambda: dict((k, v)))for d in (x, y) for k, v inD.i tems ()))) 2.2345519065856934Copy the code

In Python 3.5:

>>> min(timeit. Repeat (lambda: {**x, **y})) 0.4094954460160807 >>> min(timeit. Repeat (lambda: Merge_two_dicts (x, y))) 0.7881555100320838 >>> min(timeit.repeat(lambda: {k: v)for d in (x, y) for k, v inD.tems ()})) 1.4525277839857154 >>> min(timeit. Repeat (lambda: Dict (itertools.chain(x.items(), y.items())))) 2.3143140770262107 >>> min(timeit.repeat(lambda: dict((k, v)))for d in (x, y) for k, v inD.i tems ()))) 3.2069112799945287Copy the code

conclusion

After our previous series of analyses and experiments, we can come to the conclusion of this question

  • Python 2We will adoptcopyaddupdateThe scheme
  • Python 3We will adoptMultiple unpackThe scheme

If you are not familiar with Python 3, you can refer to my previous article. Python2 has only one month left to live! Learn Python3’s awesome new features! To help you quickly switch to Python 3 development, but note that Python 3 higher version and Python 2.7 are also quite different, so if you are involved in online business switch, please be careful!

Finally, let’s talk about the problem of optimizing the code. Starting from this problem, we can summarize the idea of optimizing the code:

  1. What solutions have we analyzed?
  2. Which solutions work?
  3. How do these effective programs compare?
  4. What sacrifices do we make for the best solution?