Pay attention to the “water drop and silver bullet” public account, the first time to obtain high-quality technical dry goods. 7 years of senior back-end development, with a simple way to explain the technology clearly.

It takes about 12 minutes to read this article.

In Python development, we often hear about “containers,” “iterators,” “iterables,” and “generators.”

We often confuse these concepts. What are the connections and differences between them?

In this article, we will look at the relationship between them.

The container

So first of all, how do we define a container?

Simply put, a container is a general name for storing some elements. Its biggest feature is to determine whether an element is in the container.

How to understand this sentence?

Quite simply, in Python we usually use in or not in to determine if an element exists/does not exist in a container.

Take this example:

print('x' in 'xyz')  # True
print('a' not in 'xyz')  # True
print(1 in [1.2.3])       # True
print(2 not in (1.2.3))  # False
print('x' not in {'a'.'b'.'c'}) # True
print('a' in {'a': 1.'b': 2}) # True
Copy the code

In this example, we can see that STR, list, tuple, set, and dict can all use in or not in to determine whether an element is present or not in the instance, so we can call these types containers.

So why can these “containers” be identified by in or not in?

This is because they both implement the __contains__ method.

If we also want to customize a container, we can simply define the __contains__ method in the class as follows:

class A:

    def __init__(self) :
        self.items = [1.2]

    def __contains__(self, item) :
        return item in self.items

a = A()
print(1 in a)   # True
print(2 in a)   # True
print(3 in a)   # False
Copy the code

In this example, class A defines the __contains__ method, so we can use the 1 in A method to determine whether the element is in the container.

In other words, a class is a “container” if it implements the __contains__ method.

In addition to the use of in to determine whether elements are in the container, another common function we use during development is to output all elements in the container.

For example, for x in [1, 2, 3] iterates out all the elements in the container.

So how does it work to output elements in this way? This is where iterators come in.

The iterator

For an object to iterate over all the data in the container using for, this class implements the “iterator protocol.”

That is, a class can be called an iterator if it implements the iterator protocol.

What is the “iterator protocol”?

In Python, implementing the iterator protocol implements the following two methods:

  • __iter__: This method returns the object itself, i.eself
  • __next__: this method returns the value of the iteration each time, and throws if there are no iterable elementsStopIterationabnormal

Let’s look at an example of implementing the iterator protocol:

# coding: utf8

class A:
    A implements the iterator protocol and its instance is an iterator.
    def __init__(self, n) :
        self.idx = 0
        self.n = n

    def __iter__(self) :
        print('__iter__')
        return self

    def __next__(self) :
        if self.idx < self.n:
            val = self.idx
            self.idx += 1
            return val
        else:
            raise StopIteration()

# Iterating elements
a = A(3)
for i in a:
    print(i)
Iterating again has no element output because iterators can only iterate once
for i in a:
    print(i)

# __iter__
# 0
# 1
# 2
# __iter__
Copy the code

In this example, we define A class A that internally implements __iter__ and __next__ methods.

The __iter__ method returns self, and the __next__ method implements the specific iteration details.

We then execute a = a (3), and when we execute for I in a, we see the __iter__ method called, and then the elements in __next__ are output in turn.

When executing the for loop, the actual flow looks like this:

  1. for i in aEquivalent to carrying outiter(a)
  2. This is done once per iteration__next__Method that returns a value
  3. If there is no iterable data, throwStopIterationAbnormal,forWill stop the iteration

Note, however, that when we iterate for I in a, if we iterate again, there will be no data output.

If we want to iterate over elements with each execution, we can simply iterate over a new object at a time:

Iterate over one object at a time
for i in A(3) :print(i)
Copy the code

iterable

Now that we know how an “iterator” works, what is an “iterable”?

What does that mean? If a class is an iterator, isn’t its instance an iterable? What’s the difference between them?

In fact, any object that returns an iterator can be called an iterable.

In other words: the __iter__ method returns an iterator, then the object is “iterable”.

It doesn’t sound obvious, so let’s look at an example.

class A:
    # A is an iterator because it implements the __iter__ and __next__ methods
    def __init__(self, n) :
        self.idx = 0
        self.n = n

    def __iter__(self) :
        return self

    def __next__(self) :
        if self.idx < self.n:
            val = self.idx
            self.idx += 1
            return val
        else:
            raise StopIteration()

class B:
    # B is not an iterator but an instance of B is an iterable
    # because it only implements __iter__
    # __iter__ returns the instance iteration details to A
    def __init__(self, n) :
        self.n = n

    def __iter__(self) :
        return A(self.n)

# a is both an iterator and an iterable
a = A(3)
for i in a:
    print(i)
# <__main__.A object at 0x10eb95550>
print(iter(a))

# b is not an iterator but it is an iterable because it hands the iteration details to A
b = B(3)
for i in b:
    print(i)
# <__main__.A object at 0x10eb95450>
print(iter(b))
Copy the code

Looking closely at this example, we define two classes A and B, A implementing the __iter__ and __next__ methods.

B implements __iter__, not __next__, and its __iter__ return value is an instance of A.

For A:

  • AIs an “iterator” because it implements the iterator protocol__iter____next__
  • At the same timeA__iter__Method returns the instance itselfselfThat is, an iterator is returned, soAAn instance of theaIs also an “iterable”

For B:

  • BNot an “iterator”, because it only implements__iter__, did not come true__next__
  • Due to theB__iter__Returned to theA, whileAIt’s an iterator, soBAn instance of thebIs an “iterable”,In other words,BHanded over the iteration detailsA

In short, the iteration details of one class can be handed over to another class, as is the case with B in this example, so instances of B can only be “iterables”, not “iterators”.

In fact, we see this a lot. The list, tuple, set, and dict types we use the most are just “iterables,” but not “iterators,” because they hand over the details of iteration to another class, which is the real iterator.

Look at this example to see the difference between the two.

# list is an iterable
>>> l = [1.2]
The iterator for # list is list_iterator
>>> iter(l)
<list_iterator object at 0x1009c1c18>
# executes __next__ of list_iterator
>>> iter(l).__next__()
>>> 1

# tuple is an iterable
>>> t = ('a'.'b')
The iterator for # tuple is tuple_iterator
>>> iter(t)
<tuple_iterator object at 0x1009c1b00>
# executes __next__ of tuple_iterator
>>> iter(t).__next__()
>>> a

# set is an iterable
>>> s = {1.2}
The iterator of # set is set_iterator
>>> iter(s)
<set_iterator object at 0x1009c70d8>
# executes set_iterator's __next__
>>> iter(s).__next__()
>>> 1

Dict is an iterable
>>> d = {'a': 1.'b': 2}
The dict iterator is dict_keyiterator
>>> iter(d)
# executes dict_keyiterator's __next__
<dict_keyiterator object at 0x1009c34f8>
>>> iter(d).next(a)>>> a
Copy the code

List_iterator l = [1, 2]; iterator l = [1, 2]; iterator l = [1, 2]; List passes the iteration details to list_iterator.

So list is an iterable, but it is not an iterator. The same is true for the other types tuple, set, and dict.

From this we can conclude that an iterator must be an iterable, but an iterable need not be an iterator.

The generator

What is a generator?

In fact, a generator is a special “iterator,” and it is an “iterable.”

There are two ways to create a generator:

  • Generator expression
  • Generator function

An example of creating a generator with a generator expression is as follows:

Create a generator of type generator
>>> g = (i for i in range(5))
>>> g
<generator object <genexpr> at 0x101334f50>
A generator is an iterator
>>> iter(g)
<generator object <genexpr> at 0x101334f50>
The generator is also an iterable
>>> for i in g:
.    print(i)
0, 1, 2, 3, 4
Copy the code

Note that we use g = (I for I in range(5)) to create a generator of type generator. Calling iter(g) tells us that __iter__ returns the instance itself, that the generator is an iterator. And it’s also an iterable.

Let’s look at creating a generator with functions:

def gen(n) :
    for i in range(n):
        yield i

Create a generator
g = gen(5)
# <generator object gen at 0x10bb46f50>
print(g)
# <type 'generator'>
print(type(g))

Iterate over the generator
for i in g:
    print(i)
0, 1, 2, 3, 4
Copy the code

In this example, we use the yield keyword in the function. In fact, the function that contains the yield keyword is no longer a normal function, but returns a generator. It is functionally the same as the above example, iterating over all the data in the generator.

In general, we use yield to create a generator within a function.

But what are the advantages of iterating over data using generators over data in the normal way?

So let’s see what’s the difference between a function that uses yield and a normal function that uses return.

The difference between a function that uses yield and a function that uses return is that:

  • containsreturnThe method will bereturnThe keyword is the final return, and each execution returns the same result
  • containsyieldThe method is generally used for iteration and is encountered each time it is executedyieldIt returnsyieldThe result of the last iteration, but internally retains the state of the last execution, which will continue the next iterationyieldLater code until encountered againyieldAfter the return

When we want to get a collection, we can only create the collection once using the normal method, and then return:

def gen_data(n) :
    Create a collection
    return [i for i in range(n)]
Copy the code

However, if there is a lot of data in this collection, we need to allocate a very large amount of memory in memory at once to store it.

If we iterate over the collection using the yield generator, we can solve the memory footprint problem:

for gen_data(n):
    for i in range(n):
        # return one element at a time
        yield i
Copy the code

This collection is created using a generator, which returns an element only when the iteration reaches yield, without requiring a very large amount of memory at once. When we are faced with this scenario, using generators is very appropriate.

In fact, generators have a lot of use in Python, which I’ll examine in more detail in a later article when I cover yield.

conclusion

To summarize, this article focuses on analyzing the relationships and differences between “containers”, “iterators”, “iterables”, and “generators” in Python, using a graph to illustrate their relationships:

A class is an iterator if it implements __iter__ and __next__ methods. If only __iter__ is implemented, and the method returns an iterator class, then the instance of that class is just an iterable, since the details of its iteration are left to another class.

We often use types like list, tuple, set, and dict, which are not iterators but only iterables, leaving the details of their iteration to another class. From this we also know that an iterator must be an iterable, but an iterable need not be an iterator.

A generator can be thought of as a special iterator and an iterable. Using generators in conjunction with yield, we can implement lazy computing, and we can iterate over a large set of data with very small amounts of memory.

My advanced Python series:

  • Python Advanced – How to implement a decorator?
  • Python Advanced – How to use magic methods correctly? (on)
  • Python Advanced – How to use magic methods correctly? (below)
  • Python Advanced — What is a metaclass?
  • Python Advanced – What is a Context manager?
  • Python Advancements — What is an iterator?
  • Python Advancements — How to use yield correctly?
  • Python Advanced – What is a descriptor?
  • Python Advancements – Why does GIL make multithreading so useless?

Crawler series:

  • How to build a crawler proxy service?
  • How to build a universal vertical crawler platform?
  • Scrapy source code analysis (a) architecture overview
  • Scrapy source code analysis (two) how to run Scrapy?
  • Scrapy source code analysis (three) what are the core components of Scrapy?
  • Scrapy source code analysis (four) how to complete the scraping task?

Want to read more hardcore technology articles? Focus on”Water drops and silver bullets”Public number, the first time to obtain high-quality technical dry goods. 7 years of senior back-end development, with a simple way to explain the technology clearly.