Column address: a Python module per week

Meanwhile, welcome to follow my wechat official account AlwaysBeta, more exciting content waiting for you.

Heapq implements a minimal heap sort algorithm for Python lists.

A heap is a tree-like data structure in which the children are in a sort relationship with their parents. The binary heap can be represented using lists or arrays such that the children of element N are at positions 2 * N + 1 and 2 * N + 2 (for zero-based indexes). This layout allows the heap to be rearranged in the right place, so there is no need to reallocate memory when data is added or removed.

Max-heap ensures that the parent level is greater than or equal to its children. Min-heap requires that the parent item be less than or equal to its children. Python’s HEAPq module implements a min-heap.

The sample data

The examples in this section use the data heapq_heapdata.py.

# heapq_heapdata.py 
# This data was generated with the random module.

data = [19.9.4.10.11]
Copy the code

The heap output uses the print heapq_showtree.py.

# heapq_showtree.py 
import math
from io import StringIO


def show_tree(tree, total_width=36, fill=' '):
    """Pretty-print a tree."""
    output = StringIO()
    last_row = - 1
    for i, n in enumerate(tree):
        if i:
            row = int(math.floor(math.log(i + 1.2)))
        else:
            row = 0
        ifrow ! = last_row: output.write('\n')
        columns = 2 ** row
        col_width = int(math.floor(total_width / columns))
        output.write(str(n).center(col_width, fill))
        last_row = row
    print(output.getvalue())
    print(The '-' * total_width)
    print()
Copy the code

Create a heap

There are two basic ways to create a heap: heappush() and heapify().

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heap = []
print('random :', data)
print()

for n in data:
    print('add {:>3}:'.format(n))
    heapq.heappush(heap, n)
    show_tree(heap)
    
# output
# random : [19, 9, 4, 10, 11]
# 
# add 19:
# 
# 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# add 9:
# 
# 9
# 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# add 4:
# 
# 4
9 # 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# add 10:
# 
# 4
# 10 9
# 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# add 11:
# 
# 4
# 10 9
11 # 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code

When heappush() is used, the heap order is preserved when new elements are added.

If the data is already in memory, use Heapify () to rearrange the elements in the list more efficiently.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)

# output
# random : [19, 9, 4, 10, 11]
# heapified :
# 
# 4
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code

Access the contents of the heap

Once the heap is created correctly, use heappop() to remove the element with the minimum value.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
print()

for i in range(2):
    smallest = heapq.heappop(data)
    print('pop {:>3}:'.format(smallest))
    show_tree(data)
    
# output
# random : [19, 9, 4, 10, 11]
# heapified :
# 
# 4
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# 
# pop 4:
# 
# 9
10 # 19
# 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# pop 9:
# 
# 10
19 # 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code

In this example, heapify() and heappop() are used for sorting.

To remove existing elements and replace them with new values in a single operation, use heapreplace().

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heapq.heapify(data)
print('start:')
show_tree(data)

for n in [0.13]:
    smallest = heapq.heapreplace(data, n)
    print('replace {:>2} with {:>2}:'.format(smallest, n))
    show_tree(data)
    
# output
# start:
# 
# 4
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# replace 4 with 0:
# 
# 0
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
# 
# replace 0 with 13:
# 
# 9
10 # 19
11 # 13
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code

Replacement elements can maintain a fixed-size heap, such as a jobs queue sorted by priority.

The data extreme value of the heap

Heapq also includes two functions to examine iterable and find the range of maximum or minimum values it contains.

import heapq
from heapq_heapdata import data

print('all :', data)
print('3 largest :', heapq.nlargest(3, data))
print('from sort :', list(reversed(sorted(data)[- 3:])))
print('3 smallest:', heapq.nsmallest(3, data))
print('from sort :', sorted(data)[:3])

# output
# all : [19, 9, 4, 10, 11]
# 3 largest : [19, 11, 10]
# from sort : [19, 11, 10]
# 3 smallest: [4, 9, 10]
# from sort : [4, 9, 10]
Copy the code

Using nlargest() and nsmallest() only works for relatively small values of n > 1, but can still be useful in a few cases.

Effectively merge sort sequences

Combining several sorted sequences into a new sequence is easy for small data sets.

list(sorted(itertools.chain(*data)))
Copy the code

For large data sets, it takes up a lot of memory. Instead of sorting the entire combined sequence, merge() is used to generate a new sequence one at a time.

import heapq
import random


random.seed(2016)

data = []
for i in range(4):
    new_data = list(random.sample(range(1.101), 5))
    new_data.sort()
    data.append(new_data)

for i, d in enumerate(data):
    print('{}: {}'.format(i, d))

print('\nMerged:')
for i in heapq.merge(*data):
    print(i, end=' ')
print()

# output
# 0: [33, 58, 71, 88, 95]
# 1: [10, 11, 17, 38, 91]
# 2: [13, 18, 39, 61, 63]
# 3: [20, 27, 31, 42, 45]
# 
# Merged:
# 10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95
Copy the code

Because merge() uses a heap implementation, it consumes memory based on the number of sequence elements being merged, not the number of elements in all sequences. Related documents:

Pymotw.com/3/heapq/ind…