In the first two articles on Python slicing, we learned the basics of slicing, advanced usage, pitfalls, and how custom objects implement slicing (see below for links). This article, the third in the slicing series, focuses on iterator slicing.

Iterators are an advanced feature unique to Python, and slicing is an advanced feature. What do you get when you combine the two?

1. Iteration and iterators

First, a few basic concepts need to be clarified: iteration, iterable, iterator.

Iteration is a way of iterating over container-type objects (such as strings, lists, dictionaries, and so on). For example, when we iterate over a string “ABC”, we refer to the process of taking all of its characters one by one from left to right. (PS: Iteration in Chinese is a word that goes round and round, but in Python it’s meant to be one-way horizontal linear, and if you’re not familiar with it, I recommend just thinking of it as traversal.)

So, how do you write the instructions for iteration? The most common writing syntax is the for loop.

# for loop implements iterative process
for char in "abc":
    print(char, end="")
A, B, c
Copy the code

However, not all objects can be used in a for loop. For example, if the string “ABC” is replaced by any integer number, an error is reported: ‘int’ object is not iterable.

The word “iterable” in this error statement means “iterable”, i.e., int is not iterable. The string type is iterable, as are lists, tuples, dictionaries, and so on.

So how do you tell if an object is iterable? Why are they iterative? How do you make an object iterable?

To make an object iterable, the iterable protocol is implemented, that is, the __iter__() magic method is implemented. In other words, any object that implements the magic method is an iterable.

So how do you tell if an object implements this method? In addition to the for loop above, I know of four other methods:

# method 1: dir() looks at __iter__
dir(2)     # No, skip
dir("abc") # yes, slightly

# method 2: isinstance(
import collections
isinstance(2, collections.Iterable)     # False
isinstance("abc", collections.Iterable) # True

# method 3: Hasattr () judgment
hasattr(2."__iter__")     # False
hasattr("abc"."__iter__") # True

Method 4: use iter() to check if an error is reported
iter(2)     'int' object is not iterable
iter("abc") # <str_iterator at 0x1e2396d8f28>

### PS: Check whether it is iterable and also check whether __getitem__ is implemented. This article is omitted for convenience.
Copy the code

The most notable of these is the iter() method, which is Python’s built-in method that turns an iterable into an iterator. Iterables and iterators are two different things. (2) An iterable can become an iterator.

In fact, iterators must be iterables, but iterables need not be iterators. How much difference is there?

As shown in the blue circle above, the key differences between ordinary iterables and iterators can be summarized as follows: When an iterable object is converted to an iterator, it loses some attributes (__getitem__) and adds some attributes (__next__).

First look at the added attribute __next__. It is the key to what makes iterators iterators. In fact, it is objects that implement both the __iter__ and __next__ methods that we define as iterators.

With this extra property, iterables can iterate/traverse on their own without the need for external for loop syntax. I have invented two concepts to describe these two types of traversal (PS: for ease of understanding, it is called traversal, actually also called iteration) : it traversal refers to traversal through external syntax, and self-traversal refers to traversal through its own method.

Using these two concepts, we say that an iterable is an object that can be “traversed by it”, and an iterator is an object that can be “self-traversed” on top of that.

ob1 = "abc"
ob2 = iter("abc")
ob3 = iter("abc")

# ob1 it traverses
for i in ob1:
    print(i, end = "")   # a b c
for i in ob1:
    print(i, end = "")   # a b c
# ob1 self-traversal
ob1.__next__()  'STR' object has no attribute '__next__'

# ob2 it traverses
for i in ob2:
    print(i, end = "")   # a b c    
for i in ob2:
    print(i, end = "")   # no output
# ob2 self-traversal
ob2.__next__()  # Error: StopIteration

# ob3 self-traversal
ob3.__next__()  # a
ob3.__next__()  # b
ob3.__next__()  # c
ob3.__next__()  # Error: StopIteration
Copy the code

As can be seen from the above examples, the advantage of the iterator is that it supports self-traversal. Meanwhile, it is one-way and non-cyclic. Once traversal is completed, an error will be reported when the iterator is called again.

An example comes to mind: a normal iterable is like a bullet magazine. It iterates by taking the bullet out and putting it back in, so it can iterate over and over (that is, calling the for loop multiple times to return the same result). Iterators, on the other hand, are like guns loaded with cartridges that cannot be detachable. Traversal or self-traversal shoots bullets, which are expendable traversals that cannot be reused (i.e., traversals have an end).

Iteration is a way of iterating over elements. There are two types of implementation, external iteration and internal iteration. Objects that support external iteration (traversal) are iterables, and objects that also support internal iteration (self-traversal) are iterators. According to the consumption mode, it can be divided into reusable iteration and one-time iteration. The common iterable is reusable, while the iterator is one-time.

2. Iterator slicing

The last difference is that ordinary iterables lose some attributes when they are converted to iterators. The key attribute is __getitem__. In Advanced Python: Slicing custom Objects, I introduced this magic method and used it to implement slicing for custom objects.

So the question is: why don’t iterators inherit this property?

First, the iterator uses consumable traversal, which means that it is full of uncertainty, i.e., its length and index key-value pairs decay dynamically, so it is difficult to get its item and the __getitem__ attribute is no longer needed. Second, it is not reasonable to impose this attribute on the iterator, just as the so-called twist is not sweet……

This raises a new question: why use iterators when such important attributes (along with other unidentified attributes) can be lost?

The answer to this question is that iterators have irreplaceable powerful and useful functionality that makes Python design them this way. For lack of space, I won’t expand on this topic here, but I will fill in the blanks later.

That’s not all, but the nagging question is: can we make iterators have this property, even if they continue to support slicing?

hi = "Welcome to the public account: Python Cat"
it = iter(hi)

# Ordinary slice
hi[7 -:] # Python cat

Counter example: iterator slicing
it[7 -:] 'str_iterator' object is not subscriptable
Copy the code

Iterators cannot use normal slicing syntax because they lack __getitem__. There are no more than two ways to realize slicing: one is to build the wheel and write the logic of implementation; The second is to find the encapsulated wheel.

Python’s Itertools module is the wheel we’re looking for, and it provides a way to easily implement iterator slicing.

import itertools

Example 1: Simple iterator
s = iter("123456789")
for x in itertools.islice(s, 2.6):
    print(x, end = "")   # output: 3, 4, 5, 6
for x in itertools.islice(s, 2.6):
    print(x, end = "")   # output: 9

Example 2: Fibonacci sequence iterator
class Fib(a):
    def __init__(self):
        self.a, self.b = 1.1

    def __iter__(self):
        while True:
            yield self.a
            self.a, self.b = self.b, self.a + self.b
f = iter(Fib())
for x in itertools.islice(f, 2.6):
    print(x, end = "")  # output: 2, 3, 5, 8
for x in itertools.islice(f, 2.6):
    print(x, end = "")  Output: 34, 55, 89, 144
Copy the code

The islice() method of the Itertools module answers the previous question by combining iterators with slicing perfectly. However, iterator slicing has many limitations compared with ordinary slicing. First, this method is not a “pure function” (pure functions should follow the “same input gets the same output” principle, as discussed in Advice from Kenneth Reitz: Avoid Unnecessary Object-oriented programming). Second, it only supports forward slicing and does not support negative indexes, both due to the lossy nature of the iterator.

So, I can’t help but ask: what implementation logic does the slicing method of the IterTools module use? Below is the source code provided by the official website:

def islice(iterable, *args):
    # islice('ABCDEFG', 2) --> A B
    # islice('ABCDEFG', 2, 4) --> C D
    # islice('ABCDEFG', 2, None) --> C D E F G
    # islice('ABCDEFG', 0, None, 2) --> A C E G
    s = slice(*args)
    # index interval is [0,sys.maxsize], default step is 1
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # Consume *iterable* up to the *start* position.
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # Consume to *stop*.
        for i, element in zip(range(i + 1, stop), iterable):
            pass
Copy the code

The index direction of the islice() method is limited, but it also offers the possibility of allowing you to slice an infinite (system-supported) iterator. This is the most imaginative use scenario for the iterator slice.

In addition, iterator slicing has a very practical application scenario: reading data in a given range of lines in a file object.

In the Python Guide to reading and writing files, I described several ways to read from files: readline() is a bit of a no-go; Read () is suitable for reading small amounts of content, or for processing everything at once; Readlines () is more flexible, and reads the content in each iteration, which not only reduces memory stress, but also facilitates data processing line by line.

While readlines() has the advantage of iterative reading, it reads line by line from beginning to end, which is inefficient if the file has thousands of lines and we only want to read a few specific lines(lines 1000-1009, for example). Given that file objects are iterators by nature, we can use iterator slices to intercept first and process later, which is much more efficient.

# test.txt File contents
Python is a cat. This is the end.

from itertools import islice
with open('test.txt'.'r',encoding='utf-8') as f:
    print(hasattr(f, "__next__"))  Check if iterator is used
    content = islice(f, 2.4)
    for line in content:
        print(line.strip())
###
True
python is a cat.
this is the end.
Copy the code

3, summary

An iterator is a special type of iterable, which can be used for traversal and self-traversal. However, the traversal process is losable and does not have cyclic reuse. Therefore, the iterator itself does not support slicing operations. With the help of iterTools module, we can achieve iterator slicing, combining the advantages of the two, its main purpose is to intercept large iterators (such as infinite sequence, large files, etc.) fragment, to achieve accurate processing, thus greatly improving performance and efficiency.

Slice series:

Advanced Python: The Pitfalls and Advanced Uses of Slicing

Advanced Python: Slicing custom objects

Related links:

Introduction to iterTools module

Advice from Kenneth Reitz: Avoid Unnecessary Object-oriented Programming

A Guide to reading and writing Files for Python Learners

—————–

This article was originally published on the wechat public account [Python Cat]. The background replies “Love learning”, and you can get 20+ selected e-books for free.