The translation of sequence


If there’s a downside to grace, it’s that you need hard work to get it and a good education to appreciate it.


– Edsger Wybe Dijkstra


The Culture of the Python community has evolved a unique code style to guide the proper use of Python, often referred to as Pythonic. Idiomatic Python code is generally said to be pythonic. Python’s syntax and standard library design are everywhere pythonic. Furthermore, the Python community is very concerned with the consistency of the coding style, and they are pushing and practicing Pythonic everywhere. So it’s not uncommon to see discussions based on some code P vs NP (Pythonic vs non-Pythonic). Pythonic code is concise, unambiguous, elegant and, for the most part, efficient. Reading Pythonic code is a pleasant experience: “The code was written for people, just to make the machine run.”


But what is pythonic, like what is native Chinese, is real but vague. Import this See The Zen of Python by Tim Peters, which provides guidance. Many beginners have read it and agree with its ideas, but are at a loss to put it into practice. PEP 8 is nothing more than a coding specification, not enough to practice Pythonic. If you are having trouble writing Pythonic code, perhaps this note will help you.


Raymond Hettinger is a core Python developer who developed many of the features mentioned in this article. He is also an enthusiastic evangelist for the Python community, devoting himself to teaching pythonic. This post was compiled by Jeff Paine from his presentation at PyCon in 2013.


Terminology clarification: All collections referred to in this article are collections, not sets.


The following is the text.




Here are notes (video, slides) from Raymond Hettinger’s 2013 PyCon talk.


Sample code and quotes are from Raymond’s talk. This is sorted out according to my understanding, I hope you understand as smooth as ME!


Iterate over a range of numbers


for i in [0, 1, 2, 3, 4, 5]:

    print i ** 2

 

for i in range(6):

    print i ** 2


A better way


for i in xrange(6):

    print i ** 2


Xrange returns an iterator that iterates through a range one value at a time. This method will save more than the Range. Xrange has been renamed range in Python 3.


Iterate over a set


colors = [‘red’, ‘green’, ‘blue’, ‘yellow’]

 

for i in range(len(colors)):

    print colors[i]


A better way


for color in colors:

    print color


Reverse traversal


colors = [‘red’, ‘green’, ‘blue’, ‘yellow’]

 

for i in range(len(colors)-1, -1, -1):

    print colors[i]


A better way


for color in reversed(colors):

    print color


Iterate over a set and its subscripts


colors = [‘red’, ‘green’, ‘blue’, ‘yellow’]

 

for i in range(len(colors)):

    print i, ‘—>’, colors[i]


A better way


for i, color in enumerate(colors):

    print i, ‘—>’, color


This is efficient, elegant, and saves you from creating and adding subscripts yourself.


When you find yourself manipulating subscripts in a set, you’re probably doing something wrong.



Iterate over two sets


names = [‘raymond’, ‘rachel’, ‘matthew’]

colors = [‘red’, ‘green’, ‘blue’, ‘yellow’]

 

n = min(len(names), len(colors))

for i in range(n):

    print names[i], ‘—>’, colors[i]

 

for name, color in zip(names, colors):

    print name, ‘—>’, color


A better way


for name, color in izip(names, colors):

    print name, ‘—>’, color


Zip generates a new list in memory that requires more memory. Izip is more efficient than ZIP.


Note: In Python 3, izip was renamed zip and replaced the original zip as the built-in function.


Traversal in order


colors = [‘red’, ‘green’, ‘blue’, ‘yellow’]

 

# positive sequence

for color in sorted(colors):

    print colors

 

# reverse

for color in sorted(colors, reverse=True):

    print colors


Custom sort order


colors = [‘red’, ‘green’, ‘blue’, ‘yellow’]

 

def compare_length(c1, c2):

    if len(c1) < len(c2): return -1

    if len(c1) > len(c2): return 1

    return 0

 

print sorted(colors, cmp=compare_length)


A better way


print sorted(colors, key=len)


The first method is inefficient and uncomfortable to write. In addition, Python 3 no longer supports comparison functions.


Call a function until a token value is encountered


blocks = []

while True:

    block = f.read(32)

    if block == ”:

        break

    blocks.append(block)


A better way


blocks = []

for block in iter(partial(f.read, 32), ”):

    blocks.append(block)


Iter takes two arguments. The first is the function you call over and over again, and the second is the tag value.


In this example, it’s not obvious that partial makes the code less readable. The advantage of method 2 is that the return value of iter is an iterator. Iterators can be used in a variety of places: set, sorted, min, Max, heapq, sum…


Identify multiple exit points within the loop


def find(seq, target):

    found = False

    for i, value in enumerate(seq):

        if value == target:

            found = True

            break

    if not found:

        return -1

    return i


A better way


def find(seq, target):

    for i, value in enumerate(seq):

        if value == target:

            break

    else:

        return -1

    return i


For executes all the loops and else is executed.


If you’re new to the for-else syntax, you’ll be confused as to when else is executed. There are two ways to think about the else. The traditional approach is to treat for as if and else when the condition following for is False. When the condition is False, the for loop has not been broken, and all loops have been completed. So another way to do it is to call the else nobreak, and when for doesn’t get broken, then the loop ends with an else.


Iterate over the dictionary key


d = {‘matthew’: ‘blue’, ‘rachel’: ‘green’, ‘raymond’: ‘red’}

 

for k in d:

    print k

 

for k in d.keys():

    if k.startswith(‘r’):

        del d[k]


When should you use the second method rather than the first? When you need to change your dictionary.


If you change something while iterating on it, you are risking the world and deserve what happens next.


D.keyys () copies all the keys in the dictionary into a list. Then you can modify the dictionary.


Note: If iterating over a dictionary in Python 3 you have to explicitly write: list(d.keys()), because d.keys() returns a “dictionary view” (an iterator that provides a dynamic view of the dictionary key). See the documentation for details.


Iterate over the keys and values of a dictionary


Not fast, you have to rehash and do a lookup every time

for k in d:

    print k, ‘—>’, d[k]

 

# Generate a large list

for k, v in d.items():

    print k, ‘—>’, v


A better way


for k, v in d.iteritems():

    print k, ‘—>’, v


Iteritems () is better because it returns an iterator.


Note: Python 3 no longer has iteritems(), items() behaves very similar to iteritems(). See the documentation for details.


Build a dictionary with key-value pairs


names = [‘raymond’, ‘rachel’, ‘matthew’]

colors = [‘red’, ‘green’, ‘blue’]

 

d = dict(izip(names, colors))

# {‘matthew’: ‘blue’, ‘rachel’: ‘green’, ‘raymond’: ‘red’}


Python 3: d = dict(zip(names, colors))


Count with a dictionary


colors = [‘red’, ‘green’, ‘red’, ‘blue’, ‘green’, ‘red’]

 

# Simple, basic counting method. Suitable for beginners to start learning.

d = {}

for color in colors:

    if color not in d:

        d[color] = 0

    d[color] += 1

 

# {‘blue’: 1, ‘green’: 2, ‘red’: 3}


A better way


d = {}

for color in colors:

    d[color] = d.get(color, 0) + 1

 

# Slightly damp point method, but some potholes need attention, suitable for skilled hands.

d = defaultdict(int)

for color in colors:

    d[color] += 1


Group by dictionary – Parts I and II


names = [‘raymond’, ‘rachel’, ‘matthew’, ‘roger’,

         ‘betty’, ‘melissa’, ‘judith’, ‘charlie’]

 

# In this example, we group by the length of name

d = {}

for name in names:

    key = len(name)

    if key not in d:

        d[key] = []

    d[key].append(name)

 

# {5: [‘roger’, ‘betty’], 6: [‘rachel’, ‘judith’], 7: [‘raymond’, ‘matthew’, ‘melissa’, ‘charlie’]}

 

d = {}

for name in names:

    key = len(name)

    d.setdefault(key, []).append(name)


A better way


d = defaultdict(list)

for name in names:

    key = len(name)

    d[key].append(name)


Is the dictionary popitem() atomic?


d = {‘matthew’: ‘blue’, ‘rachel’: ‘green’, ‘raymond’: ‘red’}

 

while d:

    key, value = d.popitem()

    print key, ‘–>’, value


Popitem is atomic, so there is no need to wrap a lock around it when multithreading.


Connect the dictionary


defaults = {‘color’: ‘red’, ‘user’: ‘guest’}

parser = argparse.ArgumentParser()

parser.add_argument(‘-u’, ‘–user’)

parser.add_argument(‘-c’, ‘–color’)

namespace = parser.parse_args([])

command_line_args = {k: v for k, v in vars(namespace).items() if v}

 

The following is the usual approach, which defaults to using the first dictionary, overwriting it with environment variables, and finally overwriting it with command line arguments.

# Unfortunately, copying data in this way is crazy.

d = defaults.copy()

d.update(os.environ)

d.update(command_line_args)


A better way


d = ChainMap(command_line_args, os.environ, defaults)


ChainMap was added in Python 3. Efficient and elegant.


Improve readability


  • Positional parameters and subscripts are nice

  • But keywords and names are better

  • The first method is convenient for computers

  • The second approach is consistent with the way humans think


Use keyword arguments to improve readability of function calls


twitter_search(‘@obama’, False, 20, True)


A better way


twitter_search(‘@obama’, retweets=False, numtweets=20, popular=True)


The second method is slightly slower (in microseconds), but worth it for the readability and development time of the code.


Use namedTuple to improve readability of multiple return values


Old testmod return value

doctest.testmod()

# (0, 4)

Is the test result good or bad? You can’t tell because the return value is not clear.


A better way


# new testmod return value, a namedTuple

doctest.testmod()

# TestResults(failed=0, attempted=4)


Namedtuple is a subclass of tuple, so it still works for normal tuple operations, but it’s friendlier.


Create a nametuple


TestResults = namedTuple(‘TestResults’, [‘failed’, ‘attempted’])


Unpack the sequence


p = ‘Raymond’, ‘Hettinger’, 0x30, ‘[email protected]

 

# Common methods/habits of other languages

fname = p[0]

lname = p[1]

age = p[2]

email = p[3]


A better way


fname, lname, age, email = p


The second method uses unpack tuples, which are faster and more readable.


Update the state of multiple variables


def fibonacci(n):

    x = 0

    y = 1

    for i in range(n):

        print x

        t = y

        y = x + y

        x = t


A better way


def fibonacci(n):

    x, y = 0, 1

    for i in range(n):

        print x

        x, y = y, x + y


Problem with the first method


  • X and y are states, and the states should be updated in a single operation, and the states can be out of sync over several lines, which is often a source of bugs.

  • Operations have sequential requirements

  • Too low-level, too detailed


The second method has a higher level of abstraction, no risk of ordering errors and is more efficient.


Simultaneous status update


tmp_x = x + dx * t

tmp_y = y + dy * t

tmp_dx = influence(m, x, y, dx, dy, partial=’x’)

tmp_dy = influence(m, x, y, dx, dy, partial=’y’)

x = tmp_x

y = tmp_y

dx = tmp_dx

dy = tmp_dy


A better way


x, y, dx, dy = (x + dx * t,

                y + dy * t,

                influence(m, x, y, dx, dy, partial=’x’),

                influence(m, x, y, dx, dy, partial=’y’))


The efficiency of


  • Basic principles of optimization

  • Don’t move data unless you have to

  • Notice a little bit about replacing the O(n**2) operation with a linear operation


In general, don’t move data without reason


Connection string


names = [‘raymond’, ‘rachel’, ‘matthew’, ‘roger’,

         ‘betty’, ‘melissa’, ‘judith’, ‘charlie’]

 

s = names[0]

for name in names[1:]:

    s += ‘, ‘ + name

print s


A better way


print ‘, ‘.join(names)


Update the sequence


names = [‘raymond’, ‘rachel’, ‘matthew’, ‘roger’,

         ‘betty’, ‘melissa’, ‘judith’, ‘charlie’]

 

del names[0]

The following code indicates that you are using the wrong data structure

names.pop(0)

names.insert(0, ‘mark’)


A better way


names = deque([‘raymond’, ‘rachel’, ‘matthew’, ‘roger’,

               ‘betty’, ‘melissa’, ‘judith’, ‘charlie’])

 

# Deque is more efficient

del names[0]

names.popleft()

names.appendleft(‘mark’)


Decorator and context management


  • Logic used to separate business and management

  • A clean and elegant tool for shredding code and improving code reuse

  • Having a good name is key

  • Remember the motto of Spider-Man: With great power comes great responsibility


Use decorators to separate out administrative logic


Mixing business and management logic, not reusable

def web_lookup(url, saved={}):

    if url in saved:

        return saved[url]

    page = urllib.urlopen(url).read()

    saved[url] = page

    return page


A better way


@cache

def web_lookup(url):

    return urllib.urlopen(url).read()


Note: Functools. lru_cache was introduced in Python 3.2 to solve this problem.


Detach temporary context


# Save the old, create the new

old_context = getcontext().copy()

getcontext().prec = 50

print Decimal(355) / Decimal(113)

setcontext(old_context)


A better way


with localcontext(Context(prec=50)):

    print Decimal(355) / Decimal(113)


The sample code is using the standard library Decimal, which already implements the LocalContext.


How do I open and close a file


f = open(‘data.txt’)

try:

    data = f.read()

finally:

    f.close()


A better way


with open(‘data.txt’) as f:

    data = f.read()


How to use the lock


# to create lock

lock = threading.Lock()

 

# Use locks the old way

lock.acquire()

try:

    print ‘Critical section 1’

    print ‘Critical section 2’

finally:

    lock.release()


A better way


# New ways to use locks

with lock:

    print ‘Critical section 1’

    print ‘Critical section 2’


Isolate the temporary context


try:

    os.remove(‘somefile.tmp’)

except OSError:

    pass


A better way


with ignored(OSError):

    os.remove(‘somefile.tmp’)


Ignored is a documentation added to Python 3.4.


Note: Ignored is actually called suppress in the standard library.


Try creating your own ignored context manager.


@contextmanager

def ignored(*exceptions):

    try:

        yield

    except exceptions:

        pass


Put it in your tools directory and you can ignore exceptions as well


__enter__ and __exit__ are written to the contextmanager by decorating the generator function in contextlib. See the documentation for details.


Detach temporary context


Temporarily redirect standard output to a file and then return to normal

with open(‘help.txt’, ‘w’) as f:

    oldstdout = sys.stdout

    sys.stdout = f

    try:

        help(pow)

    finally:

        sys.stdout = oldstdout


A better way to write it


with open(‘help.txt’, ‘w’) as f:

    with redirect_stdout(f):

        help(pow)


Redirect_stdout was added in Python 3.4, bug feedback.


Implement your own redirect_stdout context manager.


@contextmanager

def redirect_stdout(fileobj):

    oldstdout = sys.stdout

    sys.stdout = fileobj

    try:

        yield fieldobj

    finally:

        sys.stdout = oldstdout


Concise one-sentence expression


Two conflicting principles:


  • Don’t have too much logic on one line

  • Don’t break a single idea into multiple parts


Raymond’s principles:


  • The logic of a line of code is equivalent to a sentence of natural language


List parsing and generators


result = []

for i in range(10):

s = i ** 2

    result.append(s)

print sum(result)


A better way


print sum(i**2 for i in xrange(10))


The first way is about what you are doing, and the second way is about what you want.