Introduction to python libraries - Collections: High-performance container data types

Introduction to the

2.4 new

Source code: Lib/collections.py and Lib/ _collections.py

Provides data types to replace dict, list, set, and tuple.

The main types are as follows:

Namedtuple (): namedtuple, a factory function that creates subclasses of tuples with namespaces. New in Python 2.6.
Deque: A double-ended queue, similar to a list, with fast loading and unloading at both ends. New in Python 2.4.
Counter: Subclass of dictionary, used to count hash objects. New in Python 2.7.
OrderedDict: An ordered dictionary, a subclass of dictionaries that records the order in which they are added. New in Python 2.7.
Defaultdict: A dict subclass that calls a factory function to support values that do not exist. New in Python 2.5.

It also provides an abstract base class that tests whether the class provides a special interface, whether it’s a hash or a map.

Counter

A Counter is a container that keeps track of how many times a value occurs. Similar to bag or multiset in other languages.

Counters support three forms of initialization. Constructors can call sequences, dictionaries containing keys and counts, or use keyword arguments.


import collections

print(collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']))
print(collections.Counter({'a': 2, 'b': 3, 'c': 1}))
print(collections.Counter(a=2, b=3, c=1))
Copy the code

Execution Result:

$ python3 collections_counter_init.py 
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Copy the code

Note that the order in which keys appear is from large to small in the count.

You can create an empty counter and update it:


import collections

c = collections.Counter()
print('Initial :{0}'.format(c))

c.update('abcdaab')
print('Sequence:{0}'.format(c))

c.update({'a': 1, 'd': 5})
print('Dict    :{0}'.format(c))
Copy the code

Execution Result:

Python3.5 collections_counter_update.py* Initial :Counter() Sequence:Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1}) Dict :Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})Copy the code

Access to the count


import collections

c = collections.Counter('abcdaab')

for letter in 'abcde':
    print('{0} : {1}'.format(letter, c[letter]))

Copy the code

Execution Result:

$PYTHon3.5 CollectionS_counter_get_values. py a: 3 b: 2 C: 1 D: 1 e: 0Copy the code

Notice that elements that don’t exist here are also counted as 0.

The elements method lists all elements:


import collections

c = collections.Counter('extremely')
c['z'] = 0
print(c)
print(list(c.elements()))

Copy the code

Execution Result:

({$python3.5 collections_counter_elements. Py Counter 'e' : 3, "y" : 1, the "r" : 1, the "x" : 1, 'm' : 1, 'l' : 1, the 't' : 1, 'z' : 0}) ['y', 'r', 'x', 'm', 'l', 't', 'e', 'e', 'e']Copy the code

Notice that no element with a count of 0 is printed.

Most_common () extracts the most commonly used elements.


import collections

c = collections.Counter()
with open('/etc/adduser.conf', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print('Most common:')
for letter, count in c.most_common(3):
    print('{0}: {1}'.format(letter, count))

Copy the code

Execution Result:


$ python3.5 collections_counter_most_common.py 
Most common:
 : 401
e: 310
s: 221
Copy the code

Counter also supports arithmetic and set operations, both of which only retain keys with positive integers.

import collections import pprint c1 = collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']) c2 = collections.Counter('alphabet') print('C1:') pprint.pprint(c1) print('C2:') pprint.pprint(c2) print('\nCombined counts:') print(c1 + c2) print('\nSubtraction:') print(c1 - c2) print('\nIntersection (taking positive  minimums):') print(c1 & c2) print('\nUnion (taking maximums):') print(c1 | c2)Copy the code

Execution Result:


$ python3 collections_counter_arithmetic.py
C1:
Counter({'b': 3, 'a': 2, 'c': 1})
C2:
Counter({'a': 2, 't': 1, 'l': 1, 'e': 1, 'b': 1, 'p': 1, 'h': 1})

Combined counts:
Counter({'b': 4, 'a': 4, 'p': 1, 'e': 1, 'c': 1, 't': 1, 'l': 1, 'h': 1})

Subtraction:
Counter({'b': 2, 'c': 1})

Intersection (taking positive minimums):
Counter({'a': 2, 'b': 1})

Union (taking maximums):
Counter({'b': 3, 'a': 2, 'p': 1, 'e': 1, 'c': 1, 't': 1, 'l': 1, 'h': 1})

Copy the code

The above example gives the impression that Collections can only handle a single character. This is not the case. See an example in the standard library.


from collections import Counter
import pprint
import re

cnt = Counter()

for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
    cnt[word] += 1
pprint.pprint(cnt)
cnt = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
pprint.pprint(cnt)

words = re.findall('\w+', open('/etc/adduser.conf').read().lower())
print(Counter(words).most_common(10))

Copy the code

Execution Result:


$ python3 collections_counter_normal.py
Counter({'blue': 3, 'red': 2, 'green': 1})
Counter({'blue': 3, 'red': 2, 'green': 1})
[('the', 27), ('is', 13), ('be', 12), ('if', 12), ('will', 12), ('user', 10), ('home', 9), ('default', 9), ('to', 9), ('users', 8)]
Copy the code

The first and second paragraphs of the code effect style, the latter paragraph of the code through Counter to achieve a simple word statistics function. For example, the interview question: Use Python to print out the top 10 occurrences of /etc/ssh/sshd_config and the number of occurrences.

Let’s look at the definition of Counter:

The class collections. Counter ([iterable – or – mapping]). Notice that Counter is an unordered dictionary. C [‘sausage’] = 0 is returned when the key does not exist. Setting the value to 0 does not delete elements, using del c[‘sausage’].

In addition to the standard dictionary method, additional:

Elements () : Returns an iterator containing all elements, ignoring counts less than 1.

Most_common ([n]) : Returns a list of the most commonly used elements and their counts. All elements are returned by default.

Subtract ([iterable-or-mapping]) : Subtract.

namedtuple

Named tuples and regular tuples have similar memory efficiency. It does not generate a dictionary for each instance.


import collections

Person = collections.namedtuple('Person', 'name age gender')

print('Type of Person:{0}'.format(type(Person)))

bob = Person(name='Bob', age=30, gender='male')
print('\nRepresentation: {0}'.format(bob))

jane = Person(name='Jane', age=29, gender='female')
print('\nField by name: {0}'.format(jane.name))

print('\nFields by index:')
for p in [bob, jane]:
    print('{0} is a {1} year old {2}'.format(*p))


Copy the code

Execution Result:


$ python3 collections_namedtuple_person.py
Type of Person:<class 'type'>

Representation: Person(name='Bob', age=30, gender='male')

Field by name: Jane

Fields by index:
Bob is a 30 year old male
Jane is a 29 year old female
Copy the code

As you can see from the example above, the named tuple Person class is similar to the excel header. Give each of the following columns a name, and the actual Excel row data is stored in an instance of the Person class. The advantage is that you can access it in a form like Jane.name, which is more intuitive than remembering a tuple’s index.

Note that the column name is actually an identifier inside the implementation, so it cannot conflict with the keyword and must start with a letter or underscore. Next error:


import collections

try:
    collections.namedtuple('Person', 'name class age gender')
except ValueError as err:
    print(err)

try:
    collections.namedtuple('Person', 'name age gender age')
except ValueError as err:
    print(err)
Copy the code

Execution Result:


$ python3 collections_namedtuple_bad_fields.py 
Type names and field names cannot be a keyword: 'class'
Encountered duplicate field name: 'age'

Copy the code

If you set rename=True, column names are automatically renamed in case of conflicts, but this renaming is not pretty.


import collections

with_class = collections.namedtuple('Person', 'name class age gender',
                                    rename=True)
print(with_class._fields)

two_ages = collections.namedtuple('Person', 'name age gender age',
                                  rename=True)
print(two_ages._fields)

Copy the code

Execution Result:


$ python collections_namedtuple_rename.py
('name', '_1', 'age', 'gender')
('name', 'age', 'gender', '_3')
Copy the code

define

Collections. namedtuple(typename, field_names, verbose=False) returns a namedtuple class. If verbose is True, class definition information is printed

Named tuples are useful when dealing with databases:

ChainMap mapping chain

Used to find multiple dictionaries.

ChainMap manages a series of dictionaries that look up values by key in order.

Access to value:

Apis are similar to dictionaries.

collections_chainmap_read.py


import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)

print('Individual Values')
print('a = {}'.format(m['a']))
print('b = {}'.format(m['b']))
print('c = {}'.format(m['c']))
print()

print('m = {}'.format(m))
print('Keys = {}'.format(list(m.keys())))
print('Values = {}'.format(list(m.values())))
print()

print('Items:')
for k, v in m.items():
    print('{} = {}'.format(k, v))
print()

print('"d" in m: {}'.format(('d' in m)))Copy the code

Execution Result:


$ python3 collections_chainmap_read.py 
Individual Values
a = A
b = B
c = C

m = ChainMap({'c': 'C', 'a': 'A'}, {'c': 'D', 'b': 'B'})
Keys = ['c', 'a', 'b']
Values = ['C', 'A', 'B']

Items:
c = C
a = A
b = B

"d" in m: False
Copy the code

Adjust the order

collections_chainmap_reorder.py


import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)

print(m.maps)
print('c = {}\n'.format(m['c']))

# reverse the list
m.maps = list(reversed(m.maps))

print(m.maps)
print('c = {}'.format(m['c']))Copy the code

Execution Result:


$ python3 collections_chainmap_reorder.py
[{'c': 'C', 'a': 'A'}, {'c': 'D', 'b': 'B'}]
c = C

[{'c': 'D', 'b': 'B'}, {'c': 'C', 'a': 'A'}]
c = D
Copy the code

Update the value

Update the original dictionary:

collections_chainmap_update_behind.py


import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)
print('Before: {}'.format(m['c']))
a['c'] = 'E'
print('After : {}'.format(m['c']))
Copy the code

The execution result


$ python3 collections_chainmap_update_behind.py

Before: C
After : ECopy the code

Update ChainMap directly:

collections_chainmap_update_directly.py


import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)
print('Before:', m)
m['c'] = 'E'
print('After :', m)
print('a:', a)Copy the code

The execution result


$ python3 collections_chainmap_update_directly.py

Before: ChainMap({'c': 'C', 'a': 'A'}, {'c': 'D', 'b': 'B'})
After : ChainMap({'c': 'E', 'a': 'A'}, {'c': 'D', 'b': 'B'})
a: {'c': 'E', 'a': 'A'}
Copy the code

ChainMap makes it easy to insert a dictionary in front of it so that you don’t have to modify the original dictionary.

collections_chainmap_new_child.py


import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m1 = collections.ChainMap(a, b)
m2 = m1.new_child()

print('m1 before:', m1)
print('m2 before:', m2)

m2['c'] = 'E'

print('m1 after:', m1)
print('m2 after:', m2)Copy the code

The execution result


$ python3 collections_chainmap_new_child.py
m1 before: ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})
m2 before: ChainMap({}, {'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})
m1 after: ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})
m2 after: ChainMap({'c': 'E'}, {'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})
Copy the code

You can also pass in a dictionary

collections_chainmap_new_child_explicit.py


import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}
c = {'c': 'E'}

m1 = collections.ChainMap(a, b)
m2 = m1.new_child(c)

print('m1["c"] = {}'.format(m1['c']))
print('m2["c"] = {}'.format(m2['c']))Copy the code

The execution result


$ python3 collections_chainmap_new_child_explicit.py
m1["c"] = C
m2["c"] = E
Copy the code

Another equivalent way:


m2 = collections.ChainMap(c, *m1.maps)Copy the code

The resources

Address of the latest version of this article
This article covers the Python test development library thanks for the upvote!
This article related to the mass of books to download
Python official documentation: docs.python.org/3/library/c…
Pymotw.com/3/collectio…
collections-extended.lenzm.net/
Pypi.python.org/pypi/collec…
This article code address

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Introduction to python libraries – Collections: High-performance container data types

Introduction to the

Counter

namedtuple

ChainMap mapping chain

The resources

Introduction to python libraries – Collections: High-performance container data types

Introduction to the

Counter

namedtuple

ChainMap mapping chain

The resources

Related Posts

Well, that’s what LRU is all about.

Sc-config: monitors configuration changes on the server

C # using the Microsoft c # 】 【. Office. Interop. Excel operation Excel spreadsheet

C # using the Microsoft c # 】【. Office. Interop. Excel operation Excel spreadsheet