Python list derivation and implementation of priority queues

This is a reading note for Smooth Python. Lists, list derivation, and finally, how to implement a priority queue using lists.

Python’s built-in sequence types

The Python standard library implements rich sequence types in C:

Container sequence:

List, tuple, and collections.deque can hold different types of data.

Flat sequence:

STR, bytes, bytearray, MemoryView, and array.array can contain only one type.

Container sequences store references to any type of object they contain, whereas flat sequences store values instead of references (or flat sequences actually store contiguous memory).

If a sequence is classified according to whether it can be modified or not, it is divided into mutable and immutable sequences:

The variable sequence

List, bytearray, array.array, collections.deque and memoryView.

Immutable sequence

Tuple, STR, and bytes.

The following figure shows the difference between mutable and immutable sequences:

As you can see from this diagram, mutable sequences inherit some methods from mutable sequences.

List derivations and generator expressions

Lists are the most basic sequence types in Python. List is a mutable sequence and can hold different types of elements at the same time. The basic use of lists is not covered here, but rather the list derivation.

List derivation and readability

List derivation is a shortcut to building lists and is much more readable. Let’s start with the following two pieces of code:

#1. Turn a string into a list of Unicode code points

>>> symbols = '$& @ # % ^ & *'
>>> codes = []
>>> for symbol in symbols:
        codes.append(ord(symbol))

>>> codes
[36.38.64.35.37.94.38.42]Copy the code

#2. Turn a string into a list of Unicode code points using list derivation

>>> symbols = '$& @ # % ^ & *'
>>> codes = [ord(s) for s in symbols]
>>> codes
[36.38.64.35.37.94.38.42]Copy the code

By comparison, the second section of code is more concise and readable than the first if you understand the list derivation. Of course, list inferences should not be abused, and the general rule is to use list inferences only to create new lists, and keep them short. If the list goes beyond two lines, you should consider rewriting it using a for loop.

NOTE

List inference has a variable leak problem in Python2

# Python2 example

>>> x = 'my precious'
>>> dummy = [x for x in 'ABC']
>>> x
'C'Copy the code

This is where the original value of x is replaced by the last value in the derivation of the list, so we need to avoid this problem. The good news is that Python3 solves this problem.

# Python3 example

>>> x = 'ABC'
>>> dummy = [ord(x) for x in x]
>>> x 
'ABC'
>>> dummy
[65.66.67]Copy the code

As you can see, the original value of x is preserved here, and the list derivation creates the correct list.

The cartesian product

List derivations can also generate cartesian products of two or more iterable types.

The Cartesian product is a list of elements consisting of tuples of input pairs of iterable types, so the length of the Cartesian product list is equal to the length of the input variable, as shown in the figure:

# Use list derivation to calculate cartesian product code as follows

>>> suits = ['spades'.'diamonds'.'clubs'.'hearts']
>>> nums = ['A'.'K'.'Q']
>>> cards = [(num, suit) for num in nums for suit in suits]
>>> cards
[('A'.'spades'),
 ('A'.'diamonds'),
 ('A'.'clubs'),
 ('A'.'hearts'),
 ('K'.'spades'),
 ('K'.'diamonds'),
 ('K'.'clubs'),
 ('K'.'hearts'),
 ('Q'.'spades'),
 ('Q'.'diamonds'),
 ('Q'.'clubs'),
 ('Q'.'hearts')]Copy the code

The result here is sorted by number first, then by pattern. If you want to arrange by pattern first and then by number, just adjust the order of the for clause.

Filter sequence elements

Problem: You have a data sequence and want to use some rules to extract the desired value or shorten the sequence

The simplest way to filter sequence elements is to use list inference. Such as:

>>> mylist = [1.4.- 5.10.7 -.2.3.- 1]
>>> [n for n in mylist if n >0]
[1.4.10.2.3]Copy the code

One potential drawback of using list inference is that very large inputs can result in a very large result set, consuming a lot of memory. At this point, iterating on filter elements using generator expressions is a good choice.

Generator expression

Generator expressions comply with the iterator protocol and can produce elements one by one, rather than building a complete list that is then passed to a constructor.

The syntax of a generator expression is similar to a list derivation, except that square brackets are replaced with parentheses.

# Use generator expressions to create lists

>>> pos = (n for n in mylist if n > 0)
>>> pos
<generator object <genexpr> at 0x1006a0eb0>
>>> for x in pos:
.print(x) 
...
1
4
10 
2 
3Copy the code

If the generator expression is the only argument in a function call, there is no need to enclose it with additional parentheses. Such as:

tuple(n for n in mylist)Copy the code

Parentheses are required if the generator expression is one of the arguments in a function call. Such as:

>>> import array
>>> array.array('list', (n for n in mylist))
array('list'[1.4.10.2.3])Copy the code

Implement a priority queue

The problem

How to implement a prioritized queue? And every POP operation on this queue always returns the element with the highest priority

The solution

Use heAPQ module

Heapq is a built-in python module (source code: Lib/heapq.py) that provides a heap-based prioritization algorithm.

The logical structure of a heap is a complete binary tree in which the value of the parent node is less than or equal to the value of all the children of that node. This implementation can be expressed in the form of heap[k] <= heap[2k+1] and heap[k] <= heap[2k+2] (where k is the index and counts from 0). For the heap, the smallest element is the root element heap[0].

Heap can be initialized through a List, or a known list can be converted to heap objects through Heapify in the API.

Some of the methods heAPQ offers are:

Heap = [] # Creates an empty heap
Heapq. Heappush (heap, item) : Inserts an element into the heap
Heapq. heapPOP (heap) : Returns the root node, which is the smallest element in the heap
Heapq. Heappushpop (heap, item) : Adds an item element to the heap and returns the smallest element in the heap
heapq.heapify(x)
Heapq.nlargest (n, iterable, key=None) : Returns nlargest values in an enumerable object and a list of results with key being the operation on that result set
Heapq. Nsmallest (n, iterable, key=None) : Echo echo

The implementation is as follows:

import heapq
class PriorityQueue: 
    def __init__(self):
        self._queue = []
        self._index = 0

    def push(self, item, priority):
        heapq.heappush(self._queue, (-priority, self._index, item)) 
        self._index += 1

    def pop(self):
        return heapq.heappop(self._queue)[- 1]Copy the code

Here’s how it works:

>>> class Item:
        def __init__(self, name):
            self.name = name
        def __repr__(self):
            return 'Item({! r})'.format(self.name)

>>> q = PriorityQueue()
>>> q.push(Item('foo'), 1)
>>> q.push(Item('bar'), 5)
>>> q.push(Item('spam'), 4)
>>> q.push(Item('grok'), 1)
>>> q.pop()
Item('bar') 
>>> q.pop() 
Item('spam') 
>>> q.pop() 
Item('foo') 
>>> q.pop() 
Item('grok')Copy the code

By executing the result, we can see that the first pop() operation returns the element with the highest priority. For two elements of the same priority (foo and grok), pop operations are returned in the order in which they were inserted into the queue.

The functions heapq.heappush() and heapq.heappop() insert and remove the first element on the queue, which guarantees that the first element has a minimum priority. The heapPOP () function always returns the smallest element, which is key to ensuring that the queue POP operation returns the correct element. In addition, because push and POP operations are O(log N) in time, where N is the size of the heap, they run very fast even when N is very large. In the above code, the queue contains a tuple (-priority, index, item). The purpose of negative priority is to order the elements from highest to lowest priority. This is the exact opposite of normal heap sort, which is sorted from lowest to highest priority. The function of the index variable is to ensure that elements of the same priority are sorted correctly. By keeping an ever-increasing index subscript variable, you ensure that elements are sorted in the order in which they were inserted. Also, the index variable plays an important role in comparison with the priority element.

The key to sorting above is that tuples support comparisons:

>>> a = (1, Item('foo')) 
>>> b = (5, Item('bar')) 
>>> a < b
True
>>> c = (1, Item('grok'))
>>> a < c
Traceback (most recent call last):
File "<stdin>", line 1.in <module> 
TypeError: unorderable types: Item() < Item()Copy the code

When the first value is equal, TypeError is raised because Item does not support comparison. To avoid the above error, we introduce index (it is impossible to have the same index value with two elements), and the variables form the (priority, index, item) triplet. Now the comparison does not present the above problem:

>>> a = (1.0, Item('foo')) 
>>> b = (5.1, Item('bar')) 
>>> c = (1.2, Item('grok')) 
>>> a < b
True
>>> a < c 
TrueCopy the code

Lists, list derivation, and finally how to implement a priority queue using HEAPQ and lists. The next article introduces tuples

Refer to the link

Heap queue algorithm

Finally, thank your girlfriend for her support.

Welcome to follow (April_Louisa)	Buy me a Fanta
Welcome to attention	Buy me a Fanta