< Smooth Python> excerpt

This article directory

  • preface
  • Chapter 1. The Python data Model
  • Chapter 2 data sets composed of sequences
  • Chapter 3 Dictionaries and collections
  • Chapter 4 text and byte sequences
  • Chapter 5 First-class functions
  • Chapter 6 uses first-class functions to implement design patterns
  • Chapter 7 function decorators and closures
  • Chapter 8 object references, Variability, and garbage Collection
  • Chapter 9 on Python-style objects
  • Chapter 10 sequence modification, hashing and slicing
  • Chapter 11 interfaces: From protocol to extraction base classes
  • Chapter 12 advantages and Disadvantages of inheritance
  • Chapter 13. Correctly overload operators
  • Chapter 14. Iterable objects, iterators, and Generators
  • Chapter 15 context Managers and ELSE blocks
  • Chapter 16 coroutines
  • Chapter 17. Concurrent processing of objects
  • Chapter 18 uses the Asyncio package to handle concurrency
  • Chapter 19 dynamic properties and features
  • Chapter 20 attribute Descriptors
  • Chapter 21 class metaprogramming
  • Other information

preface

You know what? If you don't understand a part of the language and someone happens to use it, shoot them. This is easier than learning new features, and soon enough, those programmers who are still alive will be using Python version 0.9.6, and they only need to use the easy-to-understand bits of that version. ———— Tim Peters, legendary core developer and author of "Zen Python"Copy the code

. People tend to seek out what is familiar to them. Given the influence of other languages, you might have guessed that Python would support regular expressions, and then you’d have to consult the documentation. But if you’ve never seen tuple unpacking or heard of descriptors, you probably haven’t gone out of your way to search for them, and you’ve lost the chance to use these Python features for good…… This book is not intended to be a complete technical manual, but rather to highlight features that are unique to Python as a programming language, or that are rarely found in other popular languages.

Chapter 1. The Python data Model

Guido's deep understanding of the aesthetics of language design is astounding. I know a lot of great programming language designers who come up with really great things, but never get an audience. Guido knows how to compromise in theory and design a language that makes its users feel happy, which is a rare thing. ———— Jim Hugunin, creator of Jython, one of the authors of AspectJ,.net DLR architectCopy the code

excerpts

One of Python’s best qualities is consistency.

The data model is a description of the Python framework that defines the interface to the language’s own building blocks, including but not limited to sequences, iterators, functions, classes, and context managers.

Python 2 Data Model: https://docs.python.org/2/reference/datamodel.html Python 3 Data Model: https://docs.python.org/3/reference/datamodel.html - lists 83 special methods, Forty-seven of these are used to implement arithmetic operations, bit operations, and comparison operations. Python documentation is always referred to as the "Python data model", whereas most authors refer to the concept as the "Python object model". Object model of the first definition of wikipedia (http://en.wikipedia.org/wiki/Object_model) is: computer programming language objects belong to sex. This is exactly what the "Python Data Model" describes.Copy the code

No matter what framework you write in, you spend a lot of time implementing methods that are called by the framework itself, and Python is no exception. When the Python interpreter encounters a particular syntax, it uses special methods to deactivate some basic object operations. These column methods have names that begin and end with two underscores.

Magic method is a nickname for a special method. Special methods are also called the Dunder method.Copy the code

To be clear, special methods exist to be called by the Python interpreter; you don’t need to call them…… yourself However, if it is Python’s built-in inner type, such as list, string, byte sequence, etc., then CPython takes a shortcut and __len__ actually returns the ob_size property in PyVarObject. PyVarObject is a C language structure that represents built-in objects of variable length in memory. Reading this value directly is much faster than calling a method.

Many times, special methods are called implicitly, such as iter(x) behind the statement for I in x, and x.__iter__() behind the function…… Often your code doesn’t need to use special methods directly.

By implementing special methods, custom data types can behave like built-in types, allowing us to write more expressive code — or rather, more Python-style code.

There are many other special approaches to The Python data model, and this book will cover most of them and explore how to use and implement them.

miscellaneous

  • __getitem____len__
    • To achieve the__len__Method, class instances can be used just like standard Python collection typeslen()The function;
    • Only to achieve__getitem__Method, the class instance becomes iterable and also supports reverse iteration (which is also defined at this point)__len__Methods);
  • Iterations are often implicit, such as when a collection type is not implemented__contains__Method, theninThe operators do an iterative search in order;
  • absIs a built-in function that returns the absolute value of the input value if the input is an integer or floating-point number. If the input is a complex number, return the magnitude of the complex number. This consistency needs to be used__abs__To ensure that.
  • A basic requirement for Python objects is that they have a reasonable string representation. Python has a built-in function calledreprIt can represent an object as a string for easy identification,reprIt is through__repr__This special method is used to get the string representation of an object.
    • __repr__The returned string should be accurate, unambiguous, and show as much as possible how the code created the printed object.
    • __repr____str__The difference is that the latter isstrA function orprintFunction, and returns a more end-user friendly string.
    • If you only want to implement one of these two special methods,__repr__Is a better choice if an object is undefined__str__Method, the interpreter will switch to use it in situations where it is needed__repr__As a substitution.
  • __add____mul__Added for the class instance+*Operator. The basic principle of infix operators is not to change the operand object, but to produce a new instance.
  • By default, instances of our own classes are always considered true unless the class pair__bool__or__len__Methods have their own implementation:bool(x)The function first tries to callx.__bool__(); if__bool__Method does not exist, try againx.__len__()
    • __bool__Is a special method introduced in Python 3.0+ and used in Python 2__nonzero__

Chapter 2 data sets composed of sequences

You may have noticed that the previous operations can be applied equally to text, lists, and tables. We call texts, lists and tables trains... The FOR command usually works on rows and columns. ———— Geurts, Meertens and Pemberton ABC Programmer's HandbookCopy the code

excerpts

Before creating Python, Guido had contributed code to the ABC language. The ABC language was a decade-long research project aimed at designing programming environments for beginners, and many of the ideas now seem python-like: sequence generics, built-in tuples and mapping types, source code with indent schemas, strong typing without variable declarations, and so on…… Python also inherits from ABC a unified style for handling sequence data. No matter what kind of data structure it is — string, list, byte sequence, data, XML element, or database query result — it all shares a rich set of operations: iteration, slicing, sorting, and concatenation.


The Python standard library implements rich sequence types in C, as listed below:

  • Container sequence –list.tuple.collections.deque
  • Flat sequence –str.bytes.bytearray.memoryview.array.array

Container sequences hold references to any type of object they contain, whereas flat sequences hold values instead of references. In other words, a flat sequence is actually a contiguous memory space. Thus, flat sequences are more compact, but they can only hold basic types such as characters, bytes, and values.

Sequence types can also be classified according to whether they can be modified:

  • Variable sequence –list.bytearray.array.array.collections.deque.memoryview
  • Immutable sequence –str.tuple.bytes
The list of

List inferences are shortcuts to building lists, while generator expressions can be used to create any other type of sequence. If you don’t use them often in your code, chances are you’re missing out on many opportunities to write more readable and efficient code. On the other hand, list derivations can be abused, and the general rule is to use list derivations only to create new lists, and keep them short. If the list derivation is longer than two lines, you may want to consider rewriting it with a for loop…… List derivation allows us to filter or process the elements of a sequence or other iterable type and then create a new list. The combination of Python’s built-in filter and map functions can also achieve this effect, but readability is significantly reduced…… The iterator protocol is followed behind generator expressions to produce elements…… one by one Can save memory.

tuple(ord(char) for char in "ABCD")
Copy the code
tuples

A tuple is a record of data: each element in a tuple holds the data of a field in the record, plus the location of that field. It is this location information that gives meaning to the data…… A tuple can act as an immutable list, supporting all methods of a list except those related to adding or subtracting elements.

The print function using the % operator is also an application of tuple unpacking…… Element unpacking can be applied to any iterable, and the number of elements in the iterable must match the number of gaps in the tuple that accepts those elements. Unless we use * to indicate that redundant elements are ignored (Python 3+)

a, b, *rest = range(5)           # Python 3+
a, *body, c, d = range(5)        # Python 3+
a, b, (c, d) = (1, 2, (3, 4))    # Python 2+
Copy the code

Collections.namedtuple is a factory function that can be used to build a tuple with field names and a named class…… An instance of a class built with namedtuple consumes the same amount of memory as a tuple, because the field names are stored in the corresponding class. The instance is also smaller than a normal object instance, because Python does not use __dict__ to store the instance attributes…… Information about a field can be obtained by field name or location…… In addition to the attributes inherited from regular tuples, named tuples also have their own proprietary attributes, such as the _fields class attribute, the class method _make(iterable), and the instance method _asdict.

Card = collections.namedtuple("Card", ["rank", "suit"])
Copy the code
slice

In Python, sequence types like lists, tuples, and strings support slicing…… Not including the last element of the interval in slicing and interval operations is Python style, consistent with Python, C, and other languages’ tradition of using 0 as the initial subscript. The benefits of this are as follows:

  • When we only have the last position information, we can quickly see how many elements there are in the slice and interval, for examplemy_list[:3]range(3)They all have three elements
  • When the start and end position information is visible, we can quickly calculate the length of the slice free and interval:stop - start
  • You can also use any of the subscripts to split the order into two parts that don’t overlap, as long as you writemy_list[:x]my_list[x:]It is ok

. We can also evaluate s in terms of s[a:b:c] in intervals c between a and b. The value of c can also be negative, which means the reverse value…… When evaluating seq[start:stop:step], Python actually calls seq.__getitem__(slice(start, stop, step)).

>>> s = 'bicycle'
>>> s[::3]
'bye'
>>> s[::-1]
'elcycib'
>>> s[::-2]
'eccb'
Copy the code

The [] operator can also use multiple indexes or slices separated by commas, a feature used in the external NumPy library…… To handle the [] operator properly, the object’s special methods __getitem__ and __setitem__ need to receive the index in a[I, j] as a tuple. That is to say, if you want to get a [I, j] value, Python will call a. __getitem__ ((I, j))… Python’s built-in sequence types are one-dimensional, so they only support a single index. Pairs of indexes are useless.

If the section is placed to the left of the assignment statement, or as the object of the DEL operation, we can graft, cut, or modify the variable sequence:

>>> l = list(range(10))
>>> l[2:5] = [20, 30]
>>> l
[0, 1, 20, 30, 5, 6, 7, 8, 9]
>>> del l[5:7]
>>> l
[0, 1, 20, 30, 5, 8, 9]
>>> l[3::2] = [11, 22]
>>> l
[0, 1, 20, 11, 5, 22, 9]
>>> l[2:5] = 100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only assign an interable
Copy the code
operation

Python programmers will default to sequences that support + and * operations. In general, the sequences of + and + curtains are composed of the same type of data. During concatenation, neither sequence is modified, and Python creates a new sequence that contains the same type of data as the result of concatenation. If you want to make several copies of a sequence and then concatenate it, a faster way to do this is to multiply the sequence by an integer. Again, this operation produces a new sequence.

The delta assignment operators +=, *=, etc., behave depending on their first operation object, using += as an example. The special method behind += is __iadd__ (for “add in place”). But if the first operand does not implement the method, Python takes a step back and calls __add__…… In general, mutable sequences generally implement __iadd__ methods, while immutable sequences do not support in-place addition at all, and therefore __iadd__ is not implemented.

A puzzle about += :

>>> t = (1, 2, [30, 40])
>>> t[2] += [50, 60]
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> t
(1, 2, [30, 40, 50, 60])
Copy the code
The sorting

The list.sort method sorts in place…… The return value is None…… Remind you that this method does not create a new list. Returning None in this case is actually a Python convention: if a function or method is making in-place changes to an object, it should return None to let the caller know that the argument passed in has changed and that it does not produce a new object…… The built-in function sorted creates a new list as the return value. This function can take any form of iterable as an argument, even an immutable sequence or generator. Sorted, however, returns a list regardless of the arguments it accepts.

Sorted sequences can be used for fast search, and bisect module of the standard library provides us with binary search algorithm.

other

While lists are flexible and simple, we may have better choices for a variety of needs. For example, arrays are much more efficient to store 10 million floating-point numbers, because they don’t store float objects behind them, but rather machine translations of numbers, called byte representations. This is just like arrays in C. For example, a deque should be faster if it requires frequent FIFO operations on a sequence.


When creating an array, you specify the type of data you want to store. Python arrays do not allow data other than that type to be added to the array. If we need a list that only contains numbers, array.array works better than list. Arrays support all operations related to mutable sequences, including.pop,.insert, and.extend. In addition, arrays provide faster ways to read and store files from files, such as.frombytes and.tofile.


Memoryview (Python 3+) is a built-in class that allows users to manipulate different slices of the same array without copying the contents.


. We can use lists as stacks or queues. But operations like deleting the first element of a list are time-consuming because they involve moving all the elements…… Collection. deque is a thread-safe data type that can quickly add or remove elements from both ends. And if you want a data type to store “the most recently used elements,” a deque is also a good choice. This is because when creating a new two-way queue, you can specify the size of the queue, and if the queue is full, you can remove the expired elements from the opposite end and add new elements…… to the end However, bidirectional queuing comes at a cost to implementing these methods. Removing elements from the middle of the queue is slower because it optimizes only operations at the beginning and end…… Both deque.append and deque.popleft are atomic operations, which means that deques can safely be used as first-in, first-out stacks in multithreaded programs using……


In addition to deques, the library also provides implementations of queues:

  • thread-safequeue.Queuequeue.LifoQueuequeue.PriorityQueue
  • That can be used for interprocess communicationmultiprocessing.Queue
  • Python 3.4asyncioThe package providesQueueLifoQueuePriorityQueueJoinableQueueSuch as asynchronous programming in the task management to provide convenience
  • Implemented heap sort algorithmheapqCan be used as a heap queue or optimization-level queue

miscellaneous

  • Python ignores the code[].{}, and(a)So if your code has multi-line lists, list derivations, generator expressions, dictionaries, and the like, you can omit the ugly line continuation character\
  • The syntax for generator expressions is similar to list derivations, except that square brackets are replaced by parentheses; If the generator expression is the only argument in a function call, there is no need to enclose it with additional parentheses.
  • Raymond Hettinger, a prolific contributor to Python, wrote a sorting collection modulesortedcollection, which integratesbisectFunctional, but more than independentbisectEasier to use.
  • Introductory Python textbooks tend to emphasize that lists can hold different types of elements at the same time, but there is no particular benefit to doing so. Elements, on the other hand, are often used to hold different types of elements. These are true to their nature, meta-lines are records that hold data that are unrelated to each other.
  • list.sortsortedThe sorting algorithm behind it isTimsort, it is an adaptive algorithm that uses insertion sort and merge sort alternately according to the order characteristics of the original data to achieve the best efficiency. Such calculations have proved effective because data from the real world is usually sequential.

Chapter 3 Dictionaries and collections

The dictionary is a data structure that is active behind all Python programs, even if you don't use it directly in your source code. ———— a.m. Kuchling, "Beauty of Code." Chapter 18 "Python's Dictionary Classes: How to Make an All-around Warrior."Copy the code

excerpts

Dict is not widely used in programs and is a cornerstone of the Python language. The dictionary can be found in the command space of the module, in the attributes of the instance, and in the keyword arguments of the function. The related built-in functions are in the __builtins__.__dict__ module…… Because dictionaries are so important, Python is highly optimized for their implementation, and hash tables are the root cause of dictionary types’ superior performance……. The implementation of sets also relies on hash tables.

The dictionary

The collections. ABC module contains two extraction base classes, Mapping and MutableMapping, which define formal interfaces for dict and other similar types. (In Python 2.6 – Python 3.2, These classes are not yet part of the collections. ABC module, but belong to the Collections module…… Non-extraction map types do not generally inherit directly from these extraction base classes, instead extending directly to Dict or collections.userdict. The primary function of these abstraction base classes is to serve as formal documentation, defining the most basic interfaces needed to build a mapping type. They can then be used with isinstance to determine if a data is a generalized mapping type:

>>> my_dict = {}
>>> isinstance(my_dict, abc.Mapping)
True
Copy the code

All mapping types in the library are implemented using dict, so they all have the common limitation that only hashed data types can be used as keys in these mappings……

If an object is hashable, its hash value is constant for the life of the object, and the object needs to implement the __hash__() method. In addition, hashed objects must have an __eq__() method so that they can be compared with other keys. If the two can be a hash object are equal, then they hash value must be the same - https://docs.python.org/3/glossary.html#term-hashableCopy the code

Atomic immutable data types (STR, bytes, and numeric types) are all hashed, and so is frozenset because, by definition, only hashed types can be contained in a Frozenset. For tuples, a tuple is hashed…… only if all of its elements are hashed Generally speaking, instances of custom types are hashed, and the hash value is the return value of their id() function, so all objects are not equal when compared. If an object implements an __eq__ method and uses its internal states in the method, then the object is hashed only if all of those internal states are immutable.


. The dict.update function handles the m argument in a typical “duck way.” The function first checks to see if M has a keys method, and if so, the dict. Update function treats it as an ejective object. Otherwise, the function takes a step back and treats m instead as an iterator containing key, value elements. Most mapping classes in Python are constructed using similar logic.

Python throws an exception when the dictionary D [k] cannot find the correct key, which is consistent with Python’s “fail fast” philosophy. Probably every Python programmer knows that d[k] can be replaced with d.set (k, default), giving a default return value for missing chains (much more convenient than handling KeyError). However, to update the value of a key, using either __getitem__ or GET is unnatural and inefficient…… You can solve this problem with dict.setdefault.

Sometimes, for convenience, we want to get a default value for reading a value through a key even if it doesn’t exist in the map. There are two ways to do this: by using defaultdict, or by subclassing yourself a dict and implementing the __missing__ method…… In fact, defaultdict is also implemented via the __missing__ method……. The base dict class doesn’t define this method, but dict knows it exists. That is, if a class inherits dict, and that inheritance provides a __missing__ method, it will call that method whenever __getitem__ hits an unfound key.

The __missing__ method is only called by __getitem__ (as in the expression d[k]). Providing a __missing__ method has no effect on the use of methods such as get or __contains__. This method is called by the __getitem__() method of the dict class when the requested key is not found; whatever it returns or raises is then returned or raised by __getitem__().Copy the code

For creating custom mapping types, using UserDict as a base class is much easier…… than using regular dict as a base class The latter sometimes takes shortcuts to implement methods that we have to override in its subclasses, but UserDict doesn’t cause those problems. Another thing to note is that UserDict is not a subclass of dict, but UserDict has a property called data, which is an instance of dict. This property is actually where UserDict ultimately stores its data,…… UserDict inherits MutableMapping.


All mapping types in the standard library are mutable, but sometimes you need to make sure that the user does not incorrectly modify a mapping…… Since Python 3.3, a wrapper class called MappingProxyType has been introduced in the Types module. It creates a read-only mapping view for the mapping passed to it. It’s a read-only view, but it’s dynamic. This means that if changes are made to the original map, we can observe them through this view.

A collection of

The essence of a collection is an aggregation of many unique objects.

The collections set and Frozenset first appeared as modules in Python 2.3, then they were upgraded to built-in types in Python 2.6.

Elements in a collection must be hashed. The set type itself is not hashed, but frozenset is. So you can create a set that contains different frozensets.

In addition to guarantee uniqueness, collection also implements a lot of basic infix operators: |, &, -… Non-empty collections can be created using collection literals such as {1}, {1, 2}. Empty collections and frozensets must be created using set() and frozenset.

Hash table

Dictionaries and collections are implemented using hash tables at the bottom. A hash table is actually a sparse array. In general data structure textbooks, the cells in a hash table are usually called buckets. In a dict hash table, each key-value pair takes one table element, and each table element has two parts, one reference to the key and the other reference to the value. Because all the table elements are of the same size, a table element can be read by an offset. Since Python is designed to keep about a third of the table elements empty, the original hash table is copied into a larger space when this threshold is reached.

If you want to put an object into a hash table, you first compute the hash value of the element’s key. Python can do this with the hash() function…… The built-in hash() function works with all objects of built-in type. If you call hash() on a custom object, you’re actually running a custom __hash__ special method. If two objects are equal when compared, their hash values must be equal, otherwise the hash table will run…… When a hash conflict occurs, the hash algorithm takes additional bits from the hash value and uses a special method (perturb) to find table elements using the newly obtained number as an index. If the table element found this time is empty, KeyError is also raised. Returns the value if it is not empty or if the key matches; Or if another hash conflict is found, repeat the above steps…… When a new value is inserted, Python can decide whether to reallocate memory to expand the hash table based on how crowded it is.


The hash feature gives Dict some advantages and limitations:

  • The key must be hashed. A hashable object must satisfy the following requirements (all user-defined objects are implicitly considered hashable, and their hash value is obtained by id()) :

    1. supporthash()Function, and its hash value is invariant;
    2. Can support a__eq__()Methods To detect the equality;
    3. ifa == bIs true, thenhash(a) == hash(b)Also is true;
    If a class with a custom __eq__ is in a mutable state, do not implement the __hash__ method in that class; instances of it are not hashed.Copy the code
  • Dictionaries are expensive in memory

  • Key query is fast

  • The order of keys depends on the order in which they are added, and adding new keys may change the order of existing keys

    Whether and how any of these changes will happen depends on the implementation behind the dictionary, so you can't be confident that you know what's going on behind the scenes. If you modify the dictionary while iterating through all the keys of a dictionary, it is likely that the loop will skip some keys - even those that are already in the dictionary.Copy the code

The implementation of set and Frozenset also relies on hash tables, but only references to elements are stored in their hash tables. Before set was added to Python, I used dictionaries with meaningless values as collections. The restrictions on dictionary features mentioned above also apply to set and Frozenset.

miscellaneous

  • Since Python 2.7, dictionary derivation (dictcomp) can build fields from any iterable that has key-value pairs as elements.

    nd = {k: v for k, v in another_iterable}
    Copy the code
  • Since Python 2.7, set derivation (setcomp) can be used to create collections.

    ns = {k for k in another_iterable}
    Copy the code
  • In addition to dict, the Python standard library provides several other mapping types: DefaultDict, OrderedDict, ChainMap, and Counter, as well as the UserDict class for user extension.

Chapter 4 text and byte sequences

Humans use text, computers use byte sequences. ———— Ester Nam and Travis Fischer, "Character Encoding and Unicode in Python"Copy the code

Python 3 makes a clear distinction between human-readable text strings and raw byte sequences. Implicitly converting byte sequences to Unicode text is a thing of the past.

excerpts

A “string” is a fairly simple concept: a string is a sequence of characters. The problem is the definition of “character”. In 2015, the best definition of “character” is Unicode character. Thus, elements retrieved from the STR object in Python 3 are Unicode characters, which is equivalent to elements retrieved from Python 2 Unicode objects, rather than raw byte sequences retrieved from Python 2 STR objects.

The Unicode standard makes a clear distinction between character identifiers and specific byte representations as follows.

  • Character identifiers, known as code points, are numbers ranging from 0 to 1114111. In the Unicode standard, they are represented by four to six hexadecimal digits and are prefixed with U+. In Unicode 6.3 (which is the standard used by Python 3.4), about 10% of valid code points have corresponding characters.
  • The exact representation of a character depends on the encoding used. Encoding is the algorithm used to convert between code points and sequences of bytes. In UTF-8 encoding, the code point of A (U+0041) is encoded as A single byte x41, while in UTF-16LE encoding it is encoded as two bytes x41x00.

The process of converting a code point to a byte sequence is encoding, and the process of converting a byte sequence to a code point is decoding…… Think of byte sequences as arcane machine core dumps, and Unicode strings as “human-readable” text. So turning a byte sequence into a human-readable text string is decoding, and turning a string into a byte sequence for storage or transmission is encoding.


Python comes with over 100 codecs (encoder/decoder) for converting text and bytes to and from one another. Each codec has a name, such as’ UTF_8 ‘, and often has several aliases, such as’ UTF8 ‘, ‘UTF-8’, and ‘U8’.

Python 3 uses UTF-8 encoding by default, and Python 2 (starting with 2.5) uses ASCII…… by default You can add a magic coding comment at the top of the file.

How do I find the encoding of the byte sequence? In short, no. Someone has to tell you…… Then, just as human speech has rules and limits, it is possible to find the code…… by trial and analysis, assuming that the byte stream is human-readable plain text This is how the unicode detection package CharDET works, recognizing the 30 supported encodings.

The best practice for working with text is the Unicode sandwich. This means decoding the input byte sequence into character strings as soon as possible. The business logic of a program can only handle string objects. It must not be encoded or decoded during other processing. For output, encode the string as a sequence of bytes as late as possible.

miscellaneous

  • -*- coding: utf8 -*-Indicates how Python parses characters in source code.
  • sys.setdefaultencodingSets the default character encoding for Python, which is used to:
    • decodeencodeThe default encoding of the function;
    • When comparing and concatenating a Base String with a Unicode String, Python implicitly converts the base String to a Unicode String using the default encoding.
    • Python uses this encoding to convert Unicode Strings before storing, transferring, and printing operations.

Chapter 5 First-class functions

Despite what others say or think, I've never felt that Python was much influenced by functional languages. I'm very familiar with imperative languages such as C and Algol 68, and I don't think of Python as a functional coding language, although I consider functions first-class objects. ———— Guido van Rossum, Python's benevolent dictatorCopy the code

In Python, functions are first-class objects. Programming language theorists define “first-class objects” as program entities that satisfy the following conditions:

  • Created at run time
  • Can assign values to variables or elements in data structures
  • Can be passed to functions as arguments
  • Can be returned as a result of a function

excerpts

A function that takes a function as an argument or returns a function as a result is a high-order function. One example is the map function, and the built-in sorted function is also…… In the functional programming paradigm, the most well-known higher-order functions are Map, Filter, Reduce, and Apply. The apply function was removed in Python 3. If you want to call a function with an unquantited argument, you can write fn(*args, **kwargs). You don’t need to write apply(fn, args, kwargs)…… With the introduction of list derivations and generator expressions, maps and filters become less important. List extras or generator expressions have the function of both map and filter and are easier to read…… The reduce function is most commonly used for summing, and since Python 2.3 in 2003, summing is best done using the built-in sum function. This is a major improvement in readability and performance.

The general idea of sum and reduce is to continuously apply an operation to the elements of the sequence, accumulate the previous results, and reduce a series of values to a single value. Any and all are also built-in reduction functions.

The lambda keyword creates anonymous functions within Python expressions. However, Python’s simple syntax restricts the body of a lambda function to the pure expression…… Anonymous functions are best used in argument lists, and Python rarely uses them otherwise.


In addition to user-defined functions, the call operation (that is, ()) can be applied to other objects. If you want to determine whether an object can be called, you can use the built-in callable() function. The Python data model documentation lists seven callable objects:

  • User-defined functions

  • Built-in functions. Such as len ()

  • Built-in methods. If dict. The get ()

  • methods

  • class

    When a class is called, it runs the class's __new__ method to create an instance, then runs the entity's __init__ method to initialize the instance, and finally returns the instance to the caller.Copy the code
  • Class. An instance of a class such as a __call__() method is defined

  • Generator function


Function objects have many attributes besides __doc__. The list of properties can be obtained using the dir function…… Like regular user-defined classes, functions as objects use the __dict__ attribute to store the user attributes assigned to them. This is a basic form of annotation. In general, it’s not common to arbitrarily assign attributes to functions, but the Django framework does.

def upper_case_name(obj):
    return ("%s %s" % (obj.first_name, obj.last_name)).upper()
upper_case_name.short_description = "Customer name"
Copy the code

Here are the properties that are unique to functions but not common user-defined objects:

__annotations__ dict Annotations for parameters and return values
__call__ method-wrapper Implement a configurable object protocol
__closure__ tuple Function closures, which are bindings to free variables
__code__ code Function metadata and function definition body compiled into bytecode
__defaults__ tuple Default values for formal parameters
__get__ method-wrapper Read-only descriptor protocol method
__globals__ dict A global variable in the module of the function
__kwdefaults__ dict Default values for keyword arguments only
__name__ str The name of the function
__qualname__ str The qualified name of the function, as inRandom.choice

The function object has a __defaults__ function whose value is a tuple that holds the default values for the positioning and keyword arguments. The default value for the combination-only key argument is in the __kwDefaults__ property. However, the name of the parameter is in the __code__ attribute, whose value is a code object reference that has many attributes of its own…… The argument names and local variables created in the function definition body are stored in __code__.co_varnames, and the number of arguments is stored in __code__.co_argcount. By the way, variable length arguments prefixed with * and ** are not included. The default values of the parameters can only be determined by their position in the __defaults__ tuple, so it is necessary to scan backwards to match the parameters to the default values…… We can do this more easily using the introspective module inspect provided by the Python standard library.

Python 3 provides a syntax for attaching metadata to arguments and return values in function declarations…… Annotations do no task, but are stored in the function’s __annotations__ attribute. That’s it. Python does no checking, no coercion, no validation, nothing. In other words, annotations mean nothing to the Python interpreter. Annotations are just metadata that can be used by tools such as ides, frameworks, and decorators.


Guido makes it clear that Python’s goal is not to become a functional programming language, but thanks to packages like Operator and FuncTools, the functional programming style is also readily available…… The operator module provides functions for multiple arithmetic operators to avoid writing lambda anonymous functions…… The operator module also has a class of functions that can replace lambda anonymous functions that fetch elements from a sequence or read object attributes: Therefore, Itemgetter and Attrgetter actually build their own functions. Itemgetter uses the [] operator, so it supports not only sequences, but also maps and any classes that implement __getitem__ methods. If you pass multiple arguments to the Itemgetter, it builds a function that returns a tuple of extracted values. Attrgetter is similar to ItemGetter in that it creates a function that extracts the attributes of the object based on the name. If you pass multiple attribute names to attrgetter, it also returns a tuple of extracted values.

miscellaneous

  • For functions that take iterables as arguments, it is best to create a copy to prevent the unintended side effects of iterating arguments;

    class Bingo(object):
        def __init__(self, items):
            self._items = list(items)
    Copy the code
  • Only Keyword-only argument is allowed

    Keyword-only arguments are a new feature in Python 3 that definitely does not capture unnamed positioning arguments. To specify keyword-only arguments when defining functions, place them after arguments preceded by *. >>> def f(a, *args, b=None) pass >>> # or >>> > def f(a, *, b): Return a, b >>> f(1, b=2) (1, 2) note that keyword-only arguments do not need to have default values.Copy the code
  • Python 2 has the following rules for functional argument lists:

    • If a parameter has a default value, all following parameters must also have a default value — this is a syntactic restriction that is not expressed by the grammar.
    • The grammar shows *args and **kwargs should follow normal parameter with or without default values.
    • Default parameter values are evaluated when the function definition is executed. This means that expression is evaluated once, when the function is defined, and that same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object, the default value is in effect modified. This is generally not what was intended. A way around this is to useNone as the default, and explicitly test for it in the body of the function.

Chapter 6 uses first-class functions to implement design patterns

Fitting a pattern is not doing it right. ———— Ralph Johnson, co-author of the classic Design Patterns: Fundamentals for Reusable Object-oriented SoftwareCopy the code

While design patterns are language-independent, this does not mean that every pattern can be used in every language. The authors of Design Patterns: The Foundation of Reusable Object-oriented Software acknowledge in their introduction that the language used determines which patterns are available:

The choice of programming language is very important, it will affect the starting point of understanding the problem. Our design pattern uses language features from both the Smalltalk and C++ layers, and this choice essentially determines which mechanisms can be easily implemented and which cannot. If we adopt a procedural language, we might include design patterns such as "integrated," "sealed," and "polymorphic." Accordingly, special object-oriented languages can directly support some of our schemas, such as CLOS, which supports the multi-method concept, reducing the need for visitor patterns.Copy the code

Specifically, Norvig suggests revisiting the “policy”, “command”, “template methods” and “visitor” patterns in languages with first-class functions. In general, we can reduce boilerplate code by replacing instances of some of the classes involved in these patterns with simple functions.

Python programmers don’t have as many reading choices as other languages when it comes to design patterns.

  • “8.21 Implementing the Visitor Pattern” in The Python Cookbook (3rd edition) implements the Enquirer pattern in an elegant way.
  • Learning Python Design Patterns is the only book dedicated to Python Design Patterns. But Zlobin’s book is extremely thin, covering only eight of the 23 design patterns.
  • Advanced Programming in Python is the best Intermediate Python book on the market. Chapter 14, “Useful Design Patterns,” introduces seven classic patterns from the Python programmer’s perspective.

miscellaneous

  • globals()The function returns a dictionary representing the current global symbol table. This symbol table is always specific to the current module.
  • Two design principles from The book Design Patterns: The Foundations of Reusable Object-oriented Software are often cited: “Program for interfaces, not implementations” and “prioritize object composition over class inheritance.”

Chapter 7 function decorators and closures

Many people complained that naming this feature "decorator" was a bad idea. The main reason is that the name is inconsistent with the GoF book, and the name decorator is probably more appropriate in the compiler world because it traverses and annotates the syntax tree. ———— "PEP 318: Decorators for Functions and Methods"Copy the code

excerpts

Function decorators are used to “identify” functions in the source code, enhancing their behavior in some way. This is a powerful power, but to master it, you must understand closures.

In addition to being useful in decorators, closures are the foundation of callback asynchronous programming and functional programming styles.


A decorator is a callable object whose argument is another function (the function being decorated). A decorator might process the decorated function and then return it, or replace it with another function or callable object…… A key feature of decorators is that they run immediately after the function being decorated is defined. This is usually…… at import time Decorated functions run only when explicitly called. This highlights what Python programmers call the difference between import time and run time.

Most decorators modify the decorated function. Typically, they define an inner function and then return it to replace the decorated function. Code that uses internal functions almost always relies on closures to work correctly. To understand closures, we need to understand variable scopes in Python…… Python does not require variables to be declared, but assumes that variables assigned in the function definition body are local variables.

A closure is a scoped function that contains non-global variables referenced in the function body but not defined in the body. It doesn’t matter if a function is anonymous, the key is that it can access non-global variables defined outside the definition body. The function’s __closure__ attribute holds references to these free variables, and the variable names are stored in the function’s __code__.co_freevars…… To sum up, a closure is a function that preserves the binding of the free variables that existed when the function was defined, so that when a function is called, those bindings can still be used, even though the definition scope is no longer available. Note that only functions nested within other functions may need to handle external variables that are not in global scope.

Python 3 introduced the nonlocal declaration, which marks a variable as a free variable and remains a free variable…… even if a new value is assigned to the variable in the function There is no nonlocal in Python 2, which requires variable methods. In essence, you can store non-global free variables that need to be modified by an internal function as elements or attributes of a mutable object, and bind that object to a free variable.


Python has three built-in functions for decorating methods: propery, classMethod, and StaticMethod. Another common decorator is functools.wraps, which is used to help construct good wraps. Two of the most notable decorators in the standard library are lru_cache and the new SingleDispatch (new in Python 3.4).

miscellaneous

  • Free variables are variables that are not bound in a local scope. Function of the__code__Preserve local variables in genus (__code__.co_varnames) and free variables (__code__.co_freevars).

Chapter 8 object references, Variability, and garbage Collection

"You are unhappy," said the White Knight in a worried tone. "Let me comfort you with a song... The song is called haddock's Eyes." "Oh, that's one of my songs, isn't it?" 'asked Alice, trying to be interested. "No, you don't understand," said the White Knight, looking rather annoyed. "That's the name of the song they call it. The real title of the song is old, Old, Old." (Adapted from Chapter 8, "It's My Own Invention.") ———— Lewis Carroll, Through the Looking GlassCopy the code

excerpts

Variables are not boxes. People often use the metaphor “variables are boxes,” but this gets in the way of understanding referential variables in object-oriented languages. Python variables are similar to reference variables in Java, so they are best thought of as annotations attached to objects…… For referential variables, it makes more sense to say that a variable is assigned to an object, but the reverse is problematic. After all, objects are created before assignment.

To understand Python assignment statements, always read the right-hand side first. Objects are created or retrieved on the right, after which variables on the left are bound to the object. This is like attaching labels to objects, forget boxes.Copy the code

Because variables are nothing more than annotations, there is nothing to prevent attaching multiple annotations to an object. Multiple notes posted are aliases.


The only parameter passing mode supported by Python is shared parameter passing…… Shared pass-through refers to obtaining a copy of each reference in the argument for each form of the function. That is, the parameters inside the function are aliases of the arguments.

We should avoid using mutable objects as default values for arguments…… Default values are evaluated when a function is defined (usually when a module is loaded), so default values become properties of the function object. Therefore, if the default value is a mutable object and its value is changed, subsequent function calls will be affected.

If you define a function that accepts a mutable object as an argument, you should be careful whether the caller expects to modify the argument passed in. This actually requires the writer and caller to agree.


Each variable has an identity, type, and value. Once an object is created, its identity never changes; You can think of an identifier as an object’s address in memory. The IS operator compares the identities of two objects; The id() function returns an integer representation of the object’s identity.

The == operator compares the value (data stored in the objects) of two objects, while IS compares object identifiers. However, when comparing variables with singleton values, is should be used. Currently, is is most commonly used to check whether the variable binding has a value of None. The is operator is faster than == because it cannot be overloaded, so Instead of finding and calling special methods, Python compares two integer ids directly. A == b is the same as A. __eq__(b). The __eq__ method inherited from object compares the ids of two objects and gives the same result as is. But most built-in types override the __eq__ method in a more meaningful way, taking into account the value of the object attribute.


Tuples, like most Python collections (lists, dictionaries, sets, and so on), hold references to objects. Single-type sequences such as STR, bytes, and array.array are flat, holding not references but the data itself (characters, bytes, and numbers) in contiguous memory.

If the element referenced by a tuple is mutable, the element is mutable even if the tuple itself is immutable. In other words, the immutability of a tuple means that the physical contents of the tuple data structure (i.e. the stored references) are immutable, regardless of the object of the reference.


The sequence constructor or [:] operator makes shallow copies (that is, copies references to elements in the outermost container, the source container in the copy). If the elements are immutable, this is fine and saves memory. However, if there are mutable elements, they can cause unexpected problems…… Sometimes what is needed is deep copy (that is, copies do not share references to internal objects). The copy module provides the deepCopy and copy functions to make deep and shallow copies of any object.


The DEL statement removes the name, not the object. A DEL statement may cause an object to be garbage collected, but only if the deleted variable holds the last reference to the object, or if the object cannot be retrieved (circular references). Rebinding may also cause the object’s reference count to go to zero, causing the object to be destroyed…… Del statements do not directly call __del__ methods. The __del__ method, which is called by the interpreter only before the instance is about to be destroyed, gives the instance a final chance to free up external resources. Self-written code rarely requires live __del__ code.

In CPython, the main algorithm used for garbage collection is reference counting. In fact, each object counts how many references it has, and when the reference count hits zero, the object is destroyed immediately: CPython calls the __del__ method (if defined) on the object and frees the memory allocated to the object. CPython 2.0 adds a generational garbage collection algorithm for detecting groups of objects involved in reference loops.


Objects exist in memory because of references. When the object’s reference count reaches zero, the garbage collector destroys the object. Sometimes, however, you need to refer to an object without letting it outlive its normal life cycle. This is often used in caches.

A weak reference does not increase the reference count of an object, so it does not prevent the object being treated as garbage. Python provides the WeakRef module to operate on weak references.

miscellaneous

  • Simple assignments do not create copies.
  • right+ =* =For incremental assignment, if the variable on the left is bound to an immutable object, a new object is created; If it is a mutable object, it will be modified in place.
  • Assigning a new value to the current variable does not modify the previously bound variable. This is called rebinding; The variable is now bound to other objects. If the variable is the last reference to the previous object, the object is garbage collected.
  • Function arguments are passed as aliases.

Chapter 9 on Python-style objects

Never use two leading underscores, it's annoying and selfish. ———— Ian Bicking, creator of PIP, Virtualenv, and Paste projectsCopy the code

Thanks to the Python data model, custom types can behave just as naturally as built-in types. Such natural behavior is achieved not by inheritance, but by duck type: we simply implement the object the way it needs to behave.

To build objects that conform to the Python style, observe the behavior of real Python objects.


As you know, we implement __repr__ and __str__ special methods to provide support for repr() and STR (). To provide alternative representations of objects, two other special methods are used: __bytes__ and __format__. The __bytes__ method is similar to the __str__ method: the bytes() function calls it to get the byte sequence representation of the object. The __format__ method is called by the built-in format() function and str.format() method, using special knowable code to display the string representation of the object.

Format specifier used representation is called micro Format Specification Language (” the Format Specification Mini – Language “, docs.python.org/3/library/s…). .


The ClassMethod decorator is useful, but I’ve never seen a situation where I had to use a StaticMethod. If you want to define functions that do not need to interact with a class, you can define them in a module. Sometimes a function never handles a class, but its functionality is so closely related to the class that it wants to keep it close. Even so, it is fine to define functions before or after classes in the same module.


In order to make class instances hashed, __hash__ and __eq__ methods must be used. Also, make the instance unavailable…… We can use the @Property decorator to mark an attribute as read-only.

To create hashed types, you don't have to implement features or protect instance attributes. Just implement the __hash__ and __eq__ methods correctly. However, the hash value of an instance should never change, so we took the opportunity to mention a read-only property.Copy the code

If you define a type that has scalar values, you might also implement the __int__ and __float__ methods (called by the int() and float() constructors, respectively) to cast the type in some cases. In addition, there is a __complex__ method to support the built-in complex() constructor.


Python cannot use private modifiers to create private attributes like Java, but Python has a simple mechanism to prevent subclasses from accidentally overwriting “private” attributes ———— properties that start with a double underscore. Python stores such attribute names in the __dict__ of the class or instance, preceded by an underscore and the class name…… This language feature is called Name Mangling.

Name rewriting is a security measure that is not foolproof: it is intended to avoid accidental access and does not prevent intentional wrongdoing…… Not all Python programmers like the name-rewriting feature, and not all of them like asymmetrical names like self.__x. Some people don’t like this syntax and agree to write “protected” attributes (such as self._x) with an underscore prefix. The Python interpreter does not make a special case for generic names that use a single underscore, but this is a convention that many Python programmers strictly adhere to and do not access such attributes outside the class.

In modules, however, the use of a leading underscore in the top-level name does matter: from mymod import * names prefixed with underscores in mymod are not imported. However, you can still import it using from mymod import _privateFunc.Copy the code

By default, Python stores instance attributes in a dictionary named __dict__ in each instance. To speed up access using the underlying hash table, fields consume a lot of memory. If you’re dealing with millions of instances with few attributes, you can save a lot of memory with the __slots__ class attribute by having the interpreter store instance ownership in tuples rather than dictionaries. __slots__ is defined by creating a class attribute, using the name __slots__, and setting its value to a string iterable, where the elements represent the attribute…… of each instance The purpose of defining a __slots__ attribute in a class is to tell the interpreter, “All of the instantiated attributes in this class are here.” Thus, Python uses tuple-like structures to store instance variables in individual instances, avoiding the memory-consuming __dict__ attribute. This can save a lot of memory if you have millions of instances active at the same time.

After the __slots__ attribute is defined in a class, the instance cannot have any attributes other than the names listed in __slots__. This is just a side effect, not the real reason __slots__ exists. Do not use the __slots__ attribute to prohibit users from adding instance attributes to a class. __slots__ is used for optimization, not to constrain programmers.Copy the code

All in all, __slots__ can be a significant memory saver if used properly, but there are a few caveats.

  • Each subclass is defined__slots__Property, because the interpreter ignores inheritance__slots__Belong to sex.
  • Instances can only have__slots__The attributes listed in the__dict__join__slots__(This will lose the memory saving effect).
  • If you don’t__weakref__join__slots__The instance cannot be the target of weak references.
  • Using the__slots__Adding new class attributes to a class cannot be disabled.

Python has a unique feature: class attributes can be used to provide default values for instance attributes. However, if you assign a value to an instance attribute that does not exist, a new instance attribute is created. From then on, the instance reads the property as the instance property, i.e., masking the class property with the same name. With this feature, you can customize different values…… for the properties of each instance If you want to change the value of a class attribute, you must change it directly on the class, not through an instance.

Chapter 10 sequence modification, hashing and slicing

Don't check if it's a duck, quacks like a duck, walks like a duck, etc. What you check for depends on which behaviors you want to use in the language. (Comp.lang. python, July 26, 2000) ———— Alex MartelliCopy the code

extract

In object-oriented programming, protocols are informal interfaces defined only in documentation, not code. For example, Python’s sequence protocol requires only __len__ and __getitem__ methods. Any class that implements these two methods using standard signatures and semantics can be used wherever a sequence is expected. It doesn’t matter which subclass it is, as long as the required methods are provided…… We say it’s a sequence because it behaves like a sequence, that’s the point…… The protocol is informal and not enforceable, so if you know how the class will be used, you usually only need to implement part of the protocol. For example, to support iteration, only the __getitem__ method needs to be implemented, not the __len__ method.

In Python documentation, if you see the expression “file-like objects,” you’re usually talking about protocols. This is a short way of saying, “Something that behaves basically like a file, implements part of the file interface, and satisfies context-specific needs.”

You may feel that implementing only one part of the protocol is not rigorous enough, but it has the advantage of simplicity. The Chapter “Data Model” in the Python language reference book suggests:

When mimicking a built-in type implementation class, keep in mind that the level of mimicking is reasonable for the object being modeled. For example, some sequences may only need to fetch individual elements without extracting slices.Copy the code

Instead of implementing unneeded methods to satisfy over-designed docking contracts and make the compiler happy, we follow the KISS principle.


When a sequence instance is accessed using slicing syntax, its __getitem__ method receives an instance of the built-in type Slice. The indices() methods provided by this example do the following:

S.idices (len) -> (start, stop, stride) Given a sequence of len length, calculate the over-beginning and end-index of the extended slice represented by S, as well as the step size. Indexes that are out of bounds are truncated, just like regular slices.Copy the code

In other words, the Indices method opens up the tricky logic of built-in sequence implementations for gracefully handling missing index negative indexes and slicing that is longer than the target sequence. This method turns start, stop and stride into non-negative numbers and falls within the boundary of the specified length sequence.

miscellaneous

  • Established agreements in dynamically typed languages evolve naturally. By dynamic typing, we mean checking the type at run time because method signatures and variables have no static type information.
  • If that happens__getattr__Method, then also define__setattr__Method in case the object behaves inconsistently.
  • Used in Python 2mapThe delta function is less efficient becausemapThe function uses the results to build a list. But in Python 3,mapThe function is lazy and creates a generator that produces results on demand, thus saving memory.
  • As long as the result of one comparison isFalseallFunction the function returnsFalse
  • usezipA function that loosely iterates over two or more iterables in parallel returns a tuple that can be unpacked into variables, each corresponding to an element in each parallel input. It’s important to note that,zipThere is a strange feature: when an iterable is exhausted, it stops without warning. whileitertools.zip_longestThe behavior varies: use optionalfillvalue(The default isNone) fills missing values, so it can continue to produce until the longest iterable is exhausted.

Chapter 11 interfaces: From protocol to extraction base classes

Abstract classes represent interfaces. ———— Bjarne Stroustrup, the father of C++Copy the code

excerpts

The topic of this chapter is interfaces: from dynamic protocols representing duck types to Abstract Base classes (ABCs) that make interfaces more explicit and verify implementation compliance.

If you’ve ever worked with Java, C#, or a similar language, you’ll find duck type informal protocols quite novel. But for programmers who have used Python or Ruby for a long time, this is the “normal” way of doing interfaces, and the new knowledge is the strict specification and type checking of abstract base classes. Abstraction base classes were introduced in Python 2.6, 15 years after the Python language was born.

Python was very successful before the introduction of abstract base classes, and even now very little code uses abstract base classes. We talked about duck types and protocols, and defining protocols as informal interfaces is a way to make Python, a dynamically typed language, polymorphic.

duck-typing: A pythonic programming style which determines an object's type by inspection of its method or attribute signature rather  than by explicit relationship to some type object ("if it looks like a duck and quacks like a duck, it must be a duck.") By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorhpic substitution. Duck-typing avoids tests using ``type()`` or ``isinstance()`` (Note, however, that duck-typing can be complemented with abstract base classes.) Instead, it typically employs ``hasattr()`` tests or EAFP programming.Copy the code

There is a useful complementary definition for interfaces: objects expose a subset of methods that give them a specific role in the system. This is what “file-like object” or “iterable” means in Python documentation, as in referring to no particular class. An interface is a collection of methods that implement a particular role. Protocol has nothing to do with inheritance. A class may implement multiple interfaces, allowing instances to play multiple roles. Protocols are interfaces, but they are not formal (defined only by documentation and conventions), so they cannot impose the same restrictions as formal interfaces. A class may implement only part of the interface, which is allowed. Sometimes, some apis simply require that “file-like objects” return the.read() method of a byte sequence. Other file manipulation methods may or may not be required in a particular context To Python programmers, “X class objects,” “X protocol,” and “X interface” all mean the same thing.


Duck typing ignores the object’s true type and focuses instead on whether the object has the required methods, signatures, and semantics to implement. For Python, this basically means avoiding using isinstance to check the type of an object…… In general, duck-type techniques are useful in many situations; But in other cases, with development, there is usually a better way…… Branched sequence systematics…… DNA …… But other aspects, such as resistance to different pathogens, are much more important…… Add goose type (goose-typing) to duck type…… The white goose type means that isinstance(obj, CLS)…… can be used as long as CLS is an abstract base class, i.e. the CLS metaclass is ABC.ABCMeta Abstract base classes have many theoretical advantages over concrete classes, but Python’s abstract base classes have another important practical advantage: You can use the Register class method to “declare” a class in the end user’s code as a “virtual” subclass of an abstract base class (to do this, the registered class must 腨 satisfy the method name and signature requirements of the abstract class, and most importantly, the underlying semantic contract; However, you don’t need to know about abstract base classes to develop that class, let alone inherit from them……. Sometimes, you don’t even have to register…… in order for an abstract class to recognize subclasses Resist the urge to create abstract base classes. Misuse of abstract base classes can be disastrous, showing that the language is too focused on superficial form, which is not good for Python, which is known for being practical and pragmatic.

When inheriting a subclass definition of an abstract class, Python does not check its implementation of the abstract method, but only when the subclass is instantiated at run time. If an abstract method is not implemented correctly, Python throws a TypeError exception…… In collection.abc, the concrete methods of each abstract base class are implemented…. as a public interface to the class When implementing a subclass, we can override methods inherited from an abstract class and redo the implementation in a more efficient manner. For example, the __contains__ method scans the entire sequence, but if you define a sequence that holds elements in order, you can speed up the search by redefining the __contains__ method and using the bisect function for binary lookup.

Since Python 2.6, the standard library has provided abstract base classes. Most of the abstract base classes are defined in the collections.abc module, but they are also available elsewhere. For example, there are abstract base classes in the numbers and IO packages.


A basic feature of the white goose type is that there is a way to register a class as a virtual subclass of the abstract base class, even if it does not inherit. In doing so, we guarantee that the registered classes faithfully implement the interface defined by the abstract base class, and Python will trust us without checking. If we lie, the normal runtime exception will catch us…… Virtual subclasses are registered by calling the Register method on the abstract base class. By doing so, the registered class becomes a virtual subclass of the abstract base class, recognized by functions such as isSubClass and isinstance, but the registered class does not inherit any methods or attributes from the abstract base class…… A virtual subclass does not inherit from a registered abstract base class and is never checked to see if it is an interface to the abstract base class, even when instantiated. To avoid runtime errors, the virtual subclass implements all the required methods.

Before Python 3.3, the register method could not be used as a decorator. Although registers can now be used as decorators, it is more common to use them as functions to register classes defined elsewhere. For example, in the source code for the collections. ABC module, the built-in types tuple, STR, range, and memoryView are registered as virtual subclasses of Sequence like this:

Sequence.register(tuple)
Sequence.register(str)
Sequence.register(range)
Sequence.register(memoryview)
Copy the code

An abstract base class can recognize a class as a virtual subclass even without registration. For example, the class Struggle is defined below

>>> class Struggle(object):
...     def __len__(self): return 23
...
>>> from collections import abc
>>> isinstance(Struggle(), abc.Sized)
True
>>> issubclass(Struggle, abc.Sized)
True
Copy the code

Struggle is also a subclass of Sized. This is because abC.sized implements a special class method named __subclasshook__. Here is the code for the Sized __subclasshook__ implementation:

class Sized(metaclass=ABCMeta):

    __slots__ = ()

    @abstractmethod
    def __len__(self):
        return 0

    @classmethod
    def __subclasshook__(cls, C):
        if cls is Sized:
            if any("__len__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented
Copy the code

Sized as a virtual subclass if C or any of its superclasses implement a __len__ method. __subclasshook__ adds some duck type traces to the white goose type…… We can define formal interfaces using abstract base classes, or we can use unrelated classes entirely, as long as we implement specific methods (or do something to convince __subclasshook__). Of course, only an abstract base class that provides a __subclasshook__ method can do this.


One final warning: Do not define your own abstract base classes unless you are building a framework that allows users to extend ———— however this is not the case in most cases. In everyday use, our association with abstract base classes should be to create subclasses of existing abstract base classes, or to register virtual subclasses with existing abstract base classes. In addition, we may use abstract base classes in isinstance checks, but this is less common than inheritance or registration. It is rare that you need to write a new abstract base class from scratch.

Although abstract base classes make type checking easier, you should not overuse them in your programs. At its core, Python is a dynamic language that brings great flexibility. If you enforce type constraints on all your processing, you make your code more complex than it should be. We should embrace Python's flexibility. ———— David Beazley and Brian Jones Python Cookbook (3rd Edition)Copy the code

miscellaneous

  • Sequence protocol: __getitem__, __len__. Classes that implement these two methods, while not inheriting abC.sequence, can support iteration and in operators. If you want to implement the mutable sequence protocol, you must also provide a __setitem__ method.

  • If the established protocols are followed, it is likely to increase the possibility of leveraging existing standard libraries and third-party code, benefiting from the duck type.

  • To check whether an object is callable, use the built-in callable() function; However, Python does not provide a function to check whether an object is hashed, so isinstance(obj, Hashable) can be used instead.

  • Before abstract base classes, abstract methods were generally implemented using Raise NotImplementedError.

  • Things to note when defining your own abstract base class:

    • Abstract base classes need inheritanceabc.ABC(Python 3+) or set the metaclass toabc.ABCMeta
    • Abstract method use@abstractmethodDecorator tags, and usually only document character strings are in the body.
    • An abstract base class can contain concrete methods, but concrete methods can only rely on interfaces defined by the abstract base class (that is, other concrete methods, abstract methods, or features in the abstract base class).
    • Abstract methods can have implementation code. But the subclass must also override the abstract method, which can then be used in the subclass implementationsuper()Function calls superclass abstract methods.
  • Abstract base classes do not appear in the __mro__ property of virtual subclasses registered with Register. Similarly, the abstract list of direct subclasses (the return value of the __subclasses__() function) does not contain virtual subclasses. The abstraction base class has a data attribute of type WeakSet, _abc_Registry, which stores weak references of virtual subclasses registered by the abstraction base class.

  • The __subclasshook__ method is described in the manual as follows:

    __subclasshook__(subclass)
    
    Check whether *subclass* is considered a subclass of this ABC. This means
    that you can customize the behavior of issubclass further without the
    need to call register() on every class you want to consider a subclass of
    the ABC. (This class method is called from the __subclasscheck__() method
    of the ABC.)
    This method should return True, False, or NotImplemented. If it returns
    True, the *subclass* is considered a subclass of this ABC. If it returns
    False, the *subclass* is not considered a subclass of this ABC, even if
    it would normally be one. If it returns NotImplemented, the subclass
    check is continued with the usual mechanism.
    Copy the code

Chapter 12 advantages and Disadvantages of inheritance

The idea behind inheritance is to make it easy for beginners to use a framework that only experts can design. ———— Alan Key, "The Early History of Smalltalk"Copy the code

excerpts

Many people feel that multiple inheritance is not worth the cost. Java without support for multiple inheritance obviously has nothing to lose, and the abuse of multiple inheritance by C++ has hurt many people, and may have strengthened their resolve to use Java.


Subclassing built-in types is cumbersome. Built-in types can be subclassed after Python 2.2, but with one important caveat: built-in types (written in C) do not call special user-defined methods. The PyPy documentation describes the problem in concise language, as shown in the Subclasses of Built-in Types section of “Differences Between PyPy and CPython” :

Officially, CPython has no rule at all for when exactly overridden method
of subclasses of built-in types get implicitly called or not. As an
approximation, these methods are never called by other built-in methods of
the same object. For example, an overridden ``__getitem__()`` in a subclass
of ``dict`` will not be called by e.g. the built-in ``get()`` method.
Copy the code

(FIXME: get() does not call __getitem__().

You can solve the problem of subclassing dict above by subclassing UserDict.

The problems described above only occur with method delegates inside built-in types implemented in C and only affect user-defined classes that directly take arguments to built-in types. Subclassing classes written in Python, such as UserDict or MutableMapping, is not affected.


Any language that implements multiple inheritance deals with the potential for naming conflicts caused by unrelated ancestor classes implementing methods of the same name. This conflict problem is called the “diamond problem”…… Python iterates through inheritance diagrams in a specific order. This Order is called Method Resolution Order (MRO). Each class has a property named __mro__, whose value is a tuple that lists the superclasses in method resolution order, from the current class up to the Object class. The super() function also uses the method resolution order to find properties…… The superclass order listed in the subclass declaration affects the method resolution order.

Methods The analytic order was calculated by C3 algorithm. Michele Simionato's paper "The Python 2.3 Method Resolution Order" is an authoritative exposition of The C3 algorithm used to parse The Order of Python methods.Copy the code

Multiple inheritance can play a positive role. The adapter schema in Design Patterns: The Foundations of Reusable Object-oriented Software uses multiple inheritance, so there’s nothing wrong with using multiple inheritance. (The other 22 design patterns in that book use single inheritance, so multiple inheritance is obviously a panacea.)

Inheritance has many uses, and multiple inheritance adds options and complexity. Multiple inheritance leads to confusing and fragile designs. We don’t have a complete theory yet, but here are some tips for avoiding messing up class diagrams.

  1. Separate interface inheritance from implementation inheritance

    When using multiple inheritance, be sure to specify why and what subclasses are built in the first place. The main reasons may be: * Inheriting interfaces, creating types, implementing "what is" relationships * Inheriting implementations, duplicating nameless code through reuse These two things often go hand in hand, but be clear about intent whenever possible. Reusing code through inheritance is an implementation detail, and composite and delegate patterns are often used instead. Interface inheritance is the backbone of the framework.Copy the code
  2. Interfaces are explicitly represented using abstract base classes

    In modern Python, if the purpose of a class is to define an interface, it should be explicitly defined as an abstract base class.Copy the code
  3. Reuse code through mixins

    If the purpose of a class is to provide method implementations for multiple unrelated subclasses, thereby enabling reuse, but without a "what is" relationship, that class should be explicitly defined as a mixin class. Conceptually, mixin does not define new types, just packages methods for easy reuse. Mixins must not be instantiated, and concrete classes cannot only inherit from mixins. A mixin class should provide specific behavior for some aspect and implement only a small number of methods that are not closely related.Copy the code
  4. Explicitly specifies interfuse in the name

    Since classes are not declared as mixins in Python, it is highly recommended to include the suffix "Mixin" in the name.Copy the code
  5. Abstract base classes can be mixed in, but not the other way around

    Abstract base classes can implement concrete methods, so they can also be used as mixins. However, abstract base classes define class types, which mixin does not. In addition, an abstract base class can act as a unique base class for other classes, while a mixin can never act as a unique superclass unless it inherits another, more concrete mixin ———— Real code rarely does this. Abstract base classes have one limitation that can't be mixed in: concrete methods implemented in an abstract base class can only work with methods in the abstract base class and its superclasses. This shows that abstracting concrete methods in the base class is just a convenience measure, because everything these methods do can be done by users calling other methods in the abstract base class.Copy the code
  6. Do not subclass multiple concrete classes

    A concrete class can have no concrete superclass, or at most one concrete superclass. That is, all but one of the concrete superclasses in the concrete superclass are abstract base classes or blends.Copy the code
  7. Provide aggregation classes for users

    If an abstract base class or mixin combination is useful to the client code, provide a class that combines them in a way that is easy to understand. Grady Booch calls this class an aggregate class.Copy the code
  8. Object composition is preferred over class inheritance

miscellaneous

Chapter 13. Correctly overload operators

Some things upset me, like operator overloading. I decided not to support operator overloading, which was purely a matter of choice, as I have seen too many C++ programmers abuse it. ———— James Gosling, father of JavaCopy the code

excerpts

Operator overloading is used to make a user-defined object use an infix or unary operator. More broadly, in Python, function calls, attribute access, and element access/slicing are also operators, but this chapter only discusses unary and infix operators.

Operator overloading has a bad reputation in some circles. This language feature can be abused, confusing programmers, leading to defects and unexpected performance bottlenecks. However, when used properly, the API becomes usable and the code becomes easy to read. Python imposes restrictions that strike a balance between flexibility, availability, and security:

  • Operators of built-in types cannot be overridden
  • No new operators can be created, only existing ones can be overloaded
  • Some operators cannot overload ————isandornot

Python has three unary operators: +, -, and ~. Their corresponding special methods are: __pos__ __neg__ and __invert__… The Data Model chapter in the Python language Reference manual also lists the built-in ABS () function as a unary operator. Its corresponding column method is __abs__.

Supporting unary operators is as simple as implementing the corresponding special methods. These special methods take only one argument, self. Then, use the logical implementation that fits the class. However, one of the basic rules of operators is to always return a new object. That is, self cannot be modified and a new instance of the appropriate type is created and returned.


As with meta-operators, the infix operator’s methods must not modify operands. Use these operator expressions to expect the result to be a new object. Only incremental assignment expressions may modify the first operand (self).

To support different types of operations, Python provides a special dispatch mechanism for infix operator special methods. To express multiple A + B, the interpreter performs the following steps:

  1. ifa__add__Method and the return value is notNotImplemented, the calla.__add__(b), and return the result.
  2. ifaThere is no__add__Method, or call__add__Method returnsNotImplementedTo checkbIs there any__radd__Method, if there is, and there is no returnNotImplemented, the callb.__radd__(a), and return the result.
  3. ifbThere is no__radd__Method, or call__radd__Method returnsNotImplementedThrown,TypeErrorAnd indicate in the error message that operand types are not supported.

If operation-specific methods cannot return a valid result due to type incompatibility, return NotImplemented instead of raising TypeError. When NotImplemented is returned, the type of the other operand still has a chance to perform the operation, that is, Python tries to call the reverse method. After the swap, the reverse operator method might evaluate correctly.


The Python interpreter works with many comparison operators (==,! =, >, <, >=, <=) is handled similarly, but with two important differences:

  • Forward and reverse calls use the same set of methods. The rules in this regard are listed below. For example, to= =Both forward and reverse calls__eq__Method, just switch the parameters; The positive__gt__Method calls returns__lt__Method, and switch the parameters.
  • right= =! =If the reverse call fails, Python compares the object’s ID instead of throwing itTypeError
Infix expression Forward method call Reverse method call Backup mechanism
a == b a.__eq__(b) b.__eq__(a) returnid(a) == id(b)
a ! = b a.__ne__(b) b.__ne__(a) returnnot (a == b)
a > b a.__gt__(b) b.__lt__(a) throwTypeError
a < b a.__lt__(b) b.__gt__(a) throwTypeError
a >= b a.__ge__(b) b.__le__(a) throwTypeError
a <= b a.__le__(b) b.__ge__(a) throwTypeError

If a class does not implement an in-place operator, the incremental assignment operator is just syntactic sugar: a += b does exactly the same thing as a = a + b. This is the expected behavior for immutable types, and += can be used without extra code if the __add__ method is defined. However, if local operator methods are implemented, such as __iadd__, the local operator methods are called when the result of a += b is computed. The names of such operators indicate that they modify the left-hand operand in place, rather than creating a new object as a result.

miscellaneous

Chapter 14. Iterable objects, iterators, and Generators

When I find patterns used in my own programs, I think it indicates that something is wrong. The form of the program should reflect only the problem it is intended to solve. Any additional form in the code is a signal. (at least for me) indicates that I'm not abstracting the problem deeply enough ———— This usually means that I'm doing something manually that should have been done automatically by writing code for the extension of the macro. ———— Paul Graham, Lisp hacker and venture capitalistCopy the code

excerpts

Iteration is the cornerstone of data processing. When scanning for data sets that do not fit in memory, we find a way to lazily fetch data items, one at a time, as needed. This is the Iterator Pattern. Unlike Lisp, Python has no macros, so in order to abstract away the iterator pattern, you need to change the language itself. To do this, the yield keyword is added to Python 2.2. This keyword is used to build generators and acts as iterators.

All generators are iterators because generators fully implement the iterator interface. In Python 3, generators have a wide range of uses. Now, even the built-in range() function returns a generator-like object, and the full list before it.


Iter (x) is automatically called when the interpreter needs to iterate over object X. The built-in ITER function has the following functions:

  1. Check whether the object is implemented__iter__Method, called if implemented, to get an iterator.
  2. If it’s not implemented__iter__Method, but implemented__getitem__Method, Python creates an iterator that tries to get elements in order (starting with index 0).
  3. If the attempt fails, Python throwsTypeErrorException, C object is not iterable is displayed, where C is the class to which the target object belongs.

As of Python 3.4, the most accurate way to check whether object X can be iterated is to call iter(x) and, if not, handle TypeError. This is more accurate than using isinstance(x, abc.iterable), because the iter(x) function takes into account the legacy __getitem__ method that the abc.iterable class does not.

The iterator object can be obtained using the iter function. An object is iterable…… if it implements an __iter__ method that returns an iterator Python gets iterators from iterables…… Use iter() to get iterators, and next() to use iterators.

The standard iterator interface has two methods. The __next__ method returns the next available element. If there are no more elements, raise StopIteration; The __iter__ method returns self to use iterators where iterables should be used. Since the __subclasshook__ method implemented by abc.Iterator also checks both methods, isinstance(x, abc.Iterator) can be used to determine if the object is an Iterator. It does not matter whether the object belongs to a real or virtual subclass of abc.Iterator.

Because the iterator only needs the __next__ and __iter__ methods, there is no way to check if there are any remaining elements other than to call the next() function and catch the StopIteration exception. Furthermore, there is no way to “undo” the iterator. If we want to iterate again, we call iter() to create a new iterator. The Iterator itself is useless because the Iterator.__iter__ method is implemented by returning the Iterator itself, so the Iterator cannot restore the exhausted Iterator.


A Python function is a generator function as long as the yield keyword is in its body. When a generator function is called, a generator object is returned. That is, the generator function is the generator factory…… Calling the generator function creates a generator object that wraps the body of the generator function definition. When you pass a generator to the next() function, the generator function executes the next yield statement in the function body forward, returns the value of the output, and pauses at the current location of the function body. Finally, when the body of the function definition returns, the outer generator object raises a StopIteration exception ————, consistent with the iterator protocol.

The generator does not "return" a value in the usual way: a return statement in the generator function definition body raises a StopIteration exception on the generator object. Prior to Python 3.3, an error was reported if the return statement of a generator function returned a value. In Python versions after 3.3, a return statement can carry a return value, but it still raises the StopIteration exception. However, at this point the caller can get the return value from the exception object.Copy the code

A generator expression can be thought of as a lazy version of list derivation: it does not urgently build the list, but returns a generator that lazily generates elements on demand. That is, if the list derivation is the factory that makes the list, then the generator expression is the factory that makes the generator…… Generator expressions are syntactic sugars for generator functions…… The generator table syntax is a concise way to create a generator without first defining the function and then calling it. Generator functions, however, are much more flexible and can implement powerful logic with multiple statements, or can be used as coroutines using…… The choice of syntax is easy to make: if the generator expression is to be written on multiple lines, generator functions can be used to improve readability. In addition, generator functions have names that can be reused.


The Itertools module in Python 3.4 provides 19 generator functions. Other standard libraries also provide many generator functions for other functions. Many generator functions can be combined using…… If a generator function needs to produce a value generated by another generator, the traditional solution is to use a nested for loop. For example, here is the chain generator function we implemented ourselves:

>>> def chain(*iterables):
...     for it in iterables:
...         for i in it:
...             yield i
...
>>> s = "ABC"
>>> t = tuple(range(3))
>>> list(chain(s, t))
["A", "B", "C", 0, 1, 2]
Copy the code

The chain generator function passes the operation in turn to the iterables it receives. To this end, “PEP 380 – Syntax for Delegating to a Subgenerator” introduces a new Syntax:

>>> def chain(*iterables):
...     for i in iterables:
...         yield from i
...
>>> list(chain(s, t))
 ["A", "B", "C", 0, 1, 2]
Copy the code

Yield from I completely replaces the inner for loop. In addition to replacing loops, yield from creates channels that connect the inner generator to the clients of the outer generator. This channel is particularly important when using a generator as a coroutine, not generating values for the client code, but using values provided by the client code.


Python 2.2 introduced generator functions implemented with the yield keyword, and about five years later Python 2.5 implemented “PEP 342-Coroutines via Enhanced Generators”. This proposal adds additional methods and functionality to generator objects, most notably the.send() method…… As with the.__next__() method, the.send() method causes the generator to advance to the next yield statement. However, the.send() method also allows customers using generators to send data to themselves, meaning that whatever argument is passed to the.send() method becomes the value of the corresponding yield expression in the generator function definition body. In other words, the.send() method allows a two-way exchange of data between the client code and the generator. The.__next__() method only allows customers to get data from the generator. This is an important improvement that even changes the nature of generators: when used like this, generators become coroutines.

At PyCon US held during the period of 2009 in a live broadcast of the famous course (www.dabeaz.com/coroutines/), David Beazley (can have is the Python community’s most prolific author and speaker on coroutines) warned:

  • Generators are used to generate data for iteration
  • Coroutines are consumers of data
  • To avoid a brain explosion, the two concepts should not be confused
  • Coroutines are independent of iteration
  • Notice, although it’s used in coroutinesyieldOutput value, but this has nothing to do with iteration

miscellaneous

  • If a function or constructor has only one argument, the generator expression is passed in without a pair of parentheses that call the function. Another pair of parentheses surround the generator expression, just a pair of parentheses.

    sum(n for n in range(5))
    Copy the code

Chapter 15 context Managers and ELSE blocks

Ultimately, the context manager may be almost as important as the subroutine itself. For now, we have only scratched the surface of the context manager...... Basic has the with statement, and many languages have it. However, the function of the with statement varies from language to language, and it does simple things. While it is possible to avoid the constant use of the dot to look up attributes, it is not possible to do pre-preparation and post-cleanup. Don't think the same name means the same function. The with statement is a great feature. ———— Raymond Hettinger, eloquent Python evangelistCopy the code

excerpts

This language feature is not a secret, but it has been overlooked: the else clause can be used not only in if statements, but also in for, while, and try statements. For /else, while/else, and try/else are semantically related, but very different from if/else:

  • for/else– only whenforAt the end of the loop (the iteration data has been exhausted and not beenbreakStatement interrupts prematurely) before runningelseBlock of code.
  • while/else– only whenwhileWhen the loop completes (the loop condition is false and is notbreakStatement interrupts prematurely) before runningelseBlock of code.
  • try/else– only whentryRun when no exceptions are thrown in the blockelseBlock of code.

In all cases, the else clause is skipped if an exception or a return, break, or continue statement causes control to jump outside the main block of the compound statement.


Context manager objects exist for use in with statements, just as iterators exist for for statements. The with statement is used to simplify the try/finally pattern. It is used to ensure that an operation is performed after a piece of code is run, even if that code is aborted by an exception, a return statement, or a call to sys.exit(). Or restore the state of temporary change, etc.).

The context manager protocol contains __enter__ and __exit__ methods. When the with statement starts running, the context manager object’s __enter__ method is called. When the with statement finishes running, its __exit__ method is called, making it play the role of the finally clause…… The return value of the __enter__ method can be bound to a variable…… by the AS clause If an exception is thrown in the with block, the __exit__ method gets information about the exception from the argument. If the __exit__ method handles the exception and does not need the with expression to continue throwing the exception up, __exit__ needs to return True, and the with expression then suppresses the exception. If the __exit__ method does not explicitly return a value, then with gets None, which throws the exception up.

with LookingGlass() as what:
    print(what)
Copy the code

The Python standard library provides the Contextlib module, which assists users in defining and using context managers. Below is contextlib contextmanager a use the sample:

import contextlib

@contextlib.contextmanager
def looking_glass():
    import sys
    original_write = sys.stdout.write

    def reverse_write(text):
        original_write(text[::-1])

    sys.stdout.write = reverse_write
    yield "JABBERWOCKY"
    sys.stdout.write = original_write

with looking_glass() as what:
    print(what)
Copy the code

The @contextManager decorator is elegant and practical, combining three different Python features: function decorators, generators, and the with statement.

miscellaneous

  • EAFP vs. LBYL

Chapter 16 coroutines

If Python books are any guide, the least documented, least known Python feature is, on its face, the least useful. ———— David Beazley, author of Python BooksCopy the code

excerpts

The field gives two definitions for the verb “to yield” : to produce and to yield. Both meanings are true for yield in Python generators. The yield item line yields a value that is supplied to the caller of next(); In addition, a concession is made to pause the generator and let the weekly user continue until another value is needed before calling next(). The caller will pull the value from the generator.

Syntactically, coroutines are similar to generators in that they are functions whose body contains the yield keyword. However, in coroutines, yield usually appears on the right side of an expression, and may or may not produce a value ————. If there is no expression following the yield keyword, the generator produces None. The coroutine may receive data from the caller, but the caller provides the data to the coroutine using the.send(datum) method instead of the next() function. Normally, the caller pushes the value to the coroutine.

The yield keyword can even not receive or outgoing data. Yield is a process control tool that enables collaborative multitasking regardless of how the data flows: Coroutines can yield the controller to the central scheduling level, thereby enabling other coroutines to be activated.

It’s easier to understand coroutines by thinking of yield primarily as a way of controlling the flow.


The underlying architecture of Coroutines is defined in “PEP 342-Coroutines via Enhanced Generators” and implemented in Python 2.5 (2006). Since then, the yield keyword can be used in expressions, and the.send(value) method has been added to the generator API. Callers of generators can use.send(…) Method sends data that becomes the yield expression value in the generator function. Therefore, generators can be used as coroutines. A coroutine is a procedure that works with the caller to produce values provided by the caller.

In addition to the.send() method, PEP 342 adds.throw() and.close() methods: the former lets the caller throw an exception to be processed in the generator; The latter acts as a termination generator.

The most recent evolution of coroutines comes from the “PEP 380 – Syntax for Delegating to a Subgenerator” implementation of Python 3.3 (2012). PEP 380 makes two changes to the syntax of generator functions for better use as coroutines:

  • Now, generators canreturnA value; Previously, if you gave it in the generatorreturnStatement provides a value that is thrownSyntaxErrorThe exception.
  • The newly introducedyield fromSyntax, which reconstructs complex generators into small nested generators, eliminating the marbling boilerplate code that was previously required to delegate generator work to child generators.

Coroutines can be in one of four states. The current state can be used to inspect the getgeneratorstate () function, the function returns one of the following strings:

  • GEN_CREATED– Waiting for the start
  • GEN_RUNNING– Executing (this state can only be seen in multi-threaded applications, or the generator object is invoked on itselfgetgeneratorstateFunctions can also see this state, but it doesn’t make sense.
  • GEN_SUSPENDED– inyieldPause at expression
  • GEN_CLOSED– Execution end

The newly created coroutine is in the GEN_CREATED state. The caller needs to start/activate the coroutine with next(my_coro) or my_coro.send(None). The activated coroutine starts running until the first yield expression, The coroutine is then ready for use by the active coroutine (this step is often called the “pre-excited” coroutine). If you use.send() to pass a value other than None to the newly created coroutine, you get the following error:

>>> my_coro = simple_coroutine()
>>> my_coro.send(1729)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't send non-None value to a just-started generator
Copy the code

Many frameworks provide special decorators for handling coroutines, although not all decorators are used for pre-excited coroutines and some provide other services, such as checking event loops…… When a coroutine is invoked using the yield from syntax, it is automatically preexcited. The Asyncio.coroutine decorator in the Python 3.4 standard library does not pre-excite coroutines because it is yield from compatible.

Unhandled exceptions in the coroutine bubble up to the caller of the next function or send method (the object that touched the coroutine)…… Starting with Python 2.5, client code can explicitly send exceptions to coroutines by calling two methods on generator objects: throw and close. The throw method causes the generator to throw the specified exception at the paused yield expression. If the generator handles the exception, the code is executed forward to the next yield expression, and the resulting value becomes the return value of the throw method; If the generator does not handle the exception, the exception bubbles up to the caller’s context…… The close method causes the generator to raise the GeneratorExit exception at the paused yield expression. If the coroutine does not handle the exception or raises the StopIteration exception, the caller will not report an error. After receiving the GeneratorExit exception, the coroutine can no longer produce a value, otherwise the interpreter will throw a RuntimeError. But at this point the coroutine can throw other exceptions that will bubble up to the caller.

Alternatively, you can exit the function by passing a sentinel value to a coroutine that receives the sentinel value. Built-in constants such as None and Ellipsis are often used as sentry values. The advantage of Ellipsis is that it is not often used in data streams.


Yield from is a completely new language construct that does much more than yield…… Its main power is to open a two-way channel between the outermost caller and the innermost child generator, so that both can send and produce values directly, and pass in exceptions directly, without having to add a lot of boilerplate code to handle exceptions in the middle coroutine.

To use the yield from structure, you need to change the code significantly, and PEP 380 uses some technical terms to explain what needs to be changed:

  • Delegate generator– containsyield from <iterable>The generator function of an expression.
  • Son of the generator– from<iterable>Partially acquired generator.
  • Caller – The client code that calls the delegate generator.
When the delegate generator pauses at the yield from expression, the caller can send the data directly to the child generator, which in turn sends the resulting value to the caller. When the child generator returns, the interpreter throws a StopIteration exception and appends the return value to the exception object, at which point the delegate generator resumes execution.Copy the code

The

part supported by the yield from expression can be a simple iterator that implements only __next__ methods, or a generator that implements __next__, send, close, and throw methods.

The behavior of yield from is described in PEP 380 as follows:

  • The values produced by the child generator are passed directly to the caller of the delegate generator.
  • usesend()Values that a method sends to the delegate generator are passed directly to the child generator. If the value sent isNone, the child generator will be called__next__()Methods. If the value sent is notNone, the child generator will be calledsend()Methods. If the child generator throwsStopIterationException, that delegate generator resumes running. Any other exceptions will bubble up to the delegate generator.
  • When the generator exits, the generator (or child generator)return exprThe expression will fireStopIteration(expr)Exception thrown.
  • yield fromThe value of the expression is passed when the child generator terminatesStopIterationThe first parameter value of the exception.
  • Exception passed to the delegate generator, exceptGeneratorExitEverything else is passed to the child generatorthrow()Methods. If the callthrow()Method is thrownStopIterationException, the delegate generator resumes running.StopIterationExceptions outside will bubble up to the delegate generator.
  • If theGeneratorExitThe exception is passed to or called on the delegate generatorclose()Method, then (delegate generator) calls the child generator’sclose()Method, if provided by the child generator. If the sub generatorclose()Method causes an exception to be thrown, and the exception bubbles up to the delegate generator. If the sub generatorclose()Method executes normally, the delegate generator then throws upGeneratorExitThe exception.

miscellaneous

  • A coroutine completes its code execution, the coroutine object reference count goes to zero, or the caller is called on the coroutine.close()Method, the coroutine will terminate.
  • As of Python 3.3, generator functions are availablereturnThe expression returns a value to the caller. But it is,returnThe value of the expression is secretly passed to the caller and assigned toStopIterationAn attribute of an exception. Doing so is somewhat counterintuitive, but preserves the normal behavior of generator objects ———— thrown when exhaustedStopIterationThe exception.
  • useyield fromExpression is used by a child generatorreturnThe value returned by the expression is used asyield fromThe return value of the expression, returned to the delegate generator.

Chapter 17. Concurrent processing of objects

It's often systems programmers who bash threads, thinking about usage scenarios that the average application programmer probably won't encounter in a lifetime... 99% of the time, application programmers encounter usage scenarios where they just need to know how to spawn a bunch of independent threads and queue up the results. ———— Michele Simionato, who thinks deeply about PythonCopy the code

excerpts

This chapter focuses on the concurrent.futures module introduced in Python 3.2, which can also be used in Python 2.5 and above once the Futures package is installed from PyPI. This library encapsulates the pattern Michele Simionato described in the previous citation and is particularly easy to use.

We use futures to represent operations that are performed asynchronously. This concept is very useful and is the basis of the concurrent.futures module and the Asyncio package.


The main feature of the concurrent.futures module is the ThreadPoolExecutor and ProcessPoolExecutor classes, which implement interfaces to execute callable objects in different threads or processes. Internally, these two classes maintain a pool of worker threads or processes and a queue of tasks to execute.

Since Python 3.4, in the standard library, there are two classes called Future (implementation cramps) : concurrent. Futures. The Future and asyncio. The Future. Both classes serve the same purpose: instances of both classes represent delayed calculations that may or may not have completed. This is similar to the Deferred class in the Twisted engine, the Future class in the Tornado framework, and the Promise objects in several JavaScript libraries.

Ephemeris encapsulate operations to be completed, which can be queued or used to query the completion status of the operation, obtain the result of the operation (or the exception thrown)…… In general, client code should not create a live item or change the state of a live item. The concurrency framework creates the futures and changes the state of the futures when the delay represented by the futures ends.

All transients have the.done() method, which does not block and returns a Boolean indicating whether the transients link’s optional object has finished executing. But client code typically does not ask if the item is finished running, and instead waits for notification. Therefore, both Future classes have the.add_done_callback() method: this method takes only one argument and is of type callable object, and the specified callable object is called when the futures run.

The.result() method of the artifact can return any execution results (or rethrow exceptions thrown during the execution of the task) after the artifact finishes running.

Several of the functions in the two libraries return futures. others use futuresdirectly and provide users with an easy-to-understand way to use them. For example, the executor.map method falls into the latter category: It returns an iterator, and the __next__ method of the iterator calls the result method of each term, so we get the result of each term, not the term itself.


The CPython interpreter is not inherently thread-safe, so there is a global interpreter lock (GIL) that only allows Python bytecode execution using one thread at a time. Therefore, a Python process usually cannot use more than one CPU core at a time. This is a limitation of the CPython interpreter, independent of the Python language. Jython and IronPython don’t have this limitation. However, PyPy, currently the fastest Python interpreter, also has a GIL.

[Fixed] Unable to control GIL while writing Python code However, when performing time-consuming tasks, you can use a built-in function or an extension written in C to release the GIL. In fact, there is a Python library written in C that manages the GIL, starts its own operating system threads, and uses all the CPU cores you need. Doing so greatly increases the complexity of library code, so most library authors don’t do it.

However, all library functions that perform blocking I/O will release the GIL while waiting for results from the operating system. This means that multithreading can be used at the Python language level, and I/ O-intensive Python programs can benefit from it: when a Python thread waits for a network response, the blocking I/O function releases the GIL and runs another thread.

So Python threads are particularly well suited for I/O intensive applications. At the same time, the use of concurrent. Futures. ProcessPoolExecutor class can easily bypass the GIL.

miscellaneous

  • To efficiently process network IO, you need to use concurrency, because networks have high latency, so it’s best to do something else before receiving a network response in order not to waste CPU cycles waiting.
  • executor.submitfutures.as_completedThe combination of thanexecutor.mapMore flexible becausesubmitMethod can handle different callable objects and parameters, whileexecutor.mapYou can only handle the same callable with different arguments. In addition, tofutures.as_completedThe term set of a function can come from more than oneExecutorExamples, such as some byThreadPoolExecutorImplementation creation, others byProcessPoolExecutorInstance creation.
  • For CPU-intensive and data-intensive parallel processing, a new tool is available ———— distributed computing engine Apache Spark. Spark is gaining momentum in the big data space with a Python friendly API that allows Python objects to be treated as data.

Chapter 18 uses the Asyncio package to handle concurrency

Concurrency means processing more than one thing at a time. Parallelism means doing more than one thing at a time. They're different, but they're related. One about structure, one about execution. Concurrency is used to formulate solutions to problems that may (but need not) be parallel. ———— Rob Pike, one of the creators of GoCopy the code

excerpts

This chapter introduces the Asyncio package, which uses an event looping driven coroutine for concurrency. This is one of the largest and most ambitious libraries in Python. Guido van Rossum developed the Asyncio package outside of the Python repository, codenamed the project “Tulip”. Python 3.4 renamed Tulip to Asyncio when it was added to the standard library. This package is also compatible with Python 3.3, but it makes heavy use of yield from expressions and is therefore incompatible with older versions of Python.


The Python community tends to ignore the fact that ———— access to the local file system blocks, assuming that this operation is unaffected by the high latency of network access. By contrast, Node.js programmers always keep in mind that all file system functions block because their signatures specify callbacks. Disk I/O blocking can waste millions of CPU cycles, which can have a significant impact on application performance. The asyncio event loop maintains a ThreadPoolExecutor object behind it, and we can call the run_in_executor method and send it a callable object to execute.

miscellaneous

  • The Python thread scheduler can interrupt the thread at any time. Locks must be used to protect important parts of the program, preventing multiple steps from being interrupted during execution and invalidating data.

Chapter 19 dynamic properties and features

Features are crucial because they allow developers to safely and reliably expose common data attributes as part of a class's public interface. ———— Alex Martelli, Python contributor and book authorCopy the code

excerpts

In Python, the attributes of data and the methods used to process the data are collectively called attributes. In fact, methods are just callable properties. In addition to both, we can also create properties that use access methods to modify data properties without changing the class interface. This is consistent with the unified access principle:

Regardless of whether the service is implemented by storage or computation, all services provided by a module should be used in a uniform manner.Copy the code

In addition to features, Python provides a rich API for controlling access to properties and implementing dynamic properties. When a property is accessed using a dot (such as obj.attr), the Python interpreter calls special methods (such as __getattr__ and __setattr__) to evaluate the property. User-defined classes can implement “virtual properties” through the __getattr__ method, which computes the value of a nonexistent property immediately when it is accessed.


Creating properties on the fly is a form of metaprogramming that framework authors often do…… The key technique for doing this is the __getattr__ method, which Python calls only when the attribute cannot be retrieved in the normal way (that is, the specified attribute cannot be found in the instance, class, or superclass).


We often refer to __init__ methods as constructors, a term borrowed from other languages. Instead, the special method __new__ is used to build the instance: this is a classmethod (handled in a special way, so you don’t have to use the @classmethod decorator) that must return an instance as the first argument to the __init__ method. Also, __init__ does not return any value, so the __init__ method is really an “initialization method”. We hardly need to write the __new__ method ourselves; the implementation inherited from the Object class is sufficient. The __new__ method can also return instances of other classes, in which case the __init__ method is not called by the interpreter.

__new__ is a class method whose first argument is the class itself. The remaining arguments are the same as the __init__ method, but without the self argument.


Although the built-in Property is often used as a decorator, it is really a class. In Python, functions and classes are often interchangeable, because both are callable objects and there is no new operator that instantiates the object, so calling a constructor is no different from calling a factory function. In addition, either can be used as a decorator, as long as it returns a new callable object instead of the decorated function.

Attributes in a class can affect how instance attributes are found. Features are class attributes, but features manage access to instance ownership. As mentioned earlier, if the instance and the owning class have data attributes of the same name, the instance attributes mask the class attributes ———— at least when the attributes are read from that instance. But this rule does not apply to features. Properties are not masked by instance properties.

miscellaneous

  • Use other attributes access dictionary elements: AttrDict (pypi.python.org/pypi/attrdi…). And the addict (pypi.python.org/pypi/addict).

  • Dict handles function arguments to ensure that the parameters passed in are fields (or can be converted to fields), and that the copy is created to ensure that the parameters are not accidentally modified.

  • The keyword. Iskeyword () function can be used to determine whether a string is a Python keyword. The s.identifier () method provided by Python 3’s STR class can be used to determine whether a string is a valid Python identifier.

  • The shelve module provides a way to store pickles.

    A "shelf" is a persistent, dictionary-like object. The difference with
    "dbm" databases is that the values in a shelf can be essentially
    arbitrary Python objects -- anything that the pickle module can handle.
    Copy the code
  • We can quickly create instance attributes based on the construct parameters as follows:

    class Record(object):
        def __init__(self, **kwargs):
            self.__dict__.update(kwargs)
    Copy the code
  • The dir() function lists most of the attributes of the object. It is intended for interactive use, so it does not provide a complete property list, just a list of “important” property names. The dir function examines objects with or without the __dict__ attribute. It does not list the __dict__ attribute itself, but lists the keys within it. The dir function also does not list several special attributes of a class, such as __mro__, __bases__, and __name__.

  • The vars() function returns the __dict__ attribute of the object.

Chapter 20 attribute Descriptors

Once you learn descriptors, you will not only have a wider toolset to work with, but you will also have a deeper understanding of how Python works and a genuine appreciation for the elegance of Python design. ———— Raymond Hettinger, Python core developer and expertCopy the code

excerpts

Descriptors are a way of applying the same access logic to multiple attributes. For example, field types in ORMs such as Django ORM and SQLAlchemy are descriptors that correspond data in fields in database memory to properties of Python objects. Descriptors are classes that implement a particular protocol, including __get__, __set__, and __delete__ methods. Property implements the complete descriptor protocol. In general, only partial protocols can be implemented. Descriptors are unique to Python and are used not only in the application layer, but also in the language infrastructure. In addition to features, methods and classMethod and StaticMethod decorators are implemented through tracers.


The descriptor is used to create an instance as a class attribute of another class. The classes that implement the descriptor protocol are called descriptor classes, the classes that declare descriptor instances as class attributes are called managed classes, and the public properties in managed classes that are handled by descriptor instances are called managed properties.

The __get__ method takes three arguments: self, instance, and owner. Owner is a reference to the managed class and is used to get properties from the managed class via the descriptor. If the caller accesses managed properties using a managed class, the descriptor’s __get__ method receives the instance parameter value None.


As mentioned earlier, Python is particularly unequal in the way it accesses attributes. When a property is read from an instance, it usually returns the property defined in the instance; However, if no attribute is specified in the instance, the class attribute is obtained. When assigning a value to an attribute in an instance, the attribute is usually created in the instance and does not affect the class at all.

This unequal treatment also has an effect on descriptors. In fact, descriptors fall into two broad categories, depending on whether or not __set__ methods are defined. Descriptors that implement __set__ methods are override descriptors, because while descriptors are class attributes, implementing __set__ methods overrides assignments to instance attributes. A property is also an override descriptor: if no setter function is provided, the property’s __set__ method throws an AttributeError exception indicating that the property is read-only…… Descriptors that do not implement __set__ methods are non-override descriptors. If an instance attribute of the same name is set, the descriptor is obscured so that the descriptor cannot handle that attribute of that instance. Methods are implemented as non-overlapping descriptors.

If the override descriptor does not implement the __get__ method, only write operations are handled by the descriptor. Reading a descriptor through an example returns the descriptor object itself. If an instance has an instance attribute of the same name in its __dict__, the write operation is still handled by the descriptor, but when an implementation reads an attribute of that name, the instance attribute is returned. That is, when a read is performed, instance attributes mask overridden descriptors that do not implement the __get__ method.

In addition, descriptors are defined as class attributes, and descriptors attached to a class have no control over assigning values to class attributes. This means that assigning a value to a class attribute can override the descriptor attribute. This represents another asymmetry of read and write attributes: operations that read class attributes can be handled by descriptors that define __get__ methods attached to the managed class, but operations that write class attributes are not handled by descriptors that define __set__ methods attached to the managed class.

To control the operation of setting class attributes, attach the descriptor to the class of the class, that is, to the metaclass. By default, the metaclass for a user-defined class is type, and we cannot add ownership for type. But in Chapter 21, we'll create metaclasses ourselves.Copy the code

Because user-defined functions have __get__ methods, they attach to the class and act as descriptors. But functions do not implement __set__ methods and are therefore non-overriding descriptors. As with descriptors, when queried by a managed class, the function’s __get__ method returns a reference to itself. But when accessed by a class instance, the function’s __get__ method returns a bound method object: a callable object that wraps the function and binds the supporting instance to the function’s first argument…… The bound method object also has a __call__ method that handles the actual calling process. This method calls the original function referenced by the __func__ attribute, setting the function’s first argument to the __self__ attribute of the binding method. This is the implicit binding method of the parameter self.

Functions become binding methods, which is a good example of the underlying use of descriptors in the Python language.


Descriptor usage suggestions:

  • Use features to keep things simple
  • A read-only descriptor must exist__set__methods
  • The descriptors used for validation can be only__set__methods
  • only__get__Method descriptors enable efficient caching
  • Non-special methods can be masked by instance attributes ———— The interpreter only looks for special methods in the class

miscellaneous

  • To provide introspection and other metaprogramming techniques for users, it is best to have the __get__ method return an instance of the descriptor when accessing managed properties through a class. Such as:

    def __get__(self, instance, owner):
        if instance is None:
            return self
        else:
            return getattr(instance, self.storage_name)
    Copy the code
  • Python function objects themselves implement __get__ methods, that is, functions themselves are non-overriding descriptors. In this way, functions that are assigned to class attributes can be used as methods.

Chapter 21 class metaprogramming

Metaclasses are esoteric knowledge that 99% of users don't care about. If you're wondering if you need to use metaclasses, I'm telling you, no (the people who do need to use metaclasses are sure they do, and don't need to explain why). ———— Tim Peters, inventor of the Timsort algorithm and active Python contributorCopy the code

excerpts

Class metaprogramming is the art of creating or customizing classes at run time. In Python, classes are first-class objects, so you can use functions to create new classes at any time without using the class keyword. Class decorators are also functions, but can review, modify, and even replace decorated classes with other classes. Finally, metaclasses are the ultimate tool in class metaprogramming: you can use metaclasses to create entirely new classes with certain characteristics, such as the abstract base classes we’ve seen.


In general, we think of type as a function because we use it like a function. For example, calling type(my_object) to get the object’s class ———— has the same effect as my_object.__class__. Type, of course, is a class. When used as a class, passing three arguments creates a new class:

MyClass = type("MyClass", (MySuperClass, MyMixin),
               {'x': 42, 'x2': lambda self: self.x * 2})
Copy the code

The three arguments to type are name, bases, and dict. The last parameter is the mapping, which specifies the property name and value of the new class…… Passing three parameters to Type is a common way to create classes on the fly.


To do metaprogramming properly, you must know when the Python interpreter evaluates individual blocks of code. Python programmers distinguish between “import time” and “runtime,” but the terms are not strictly defined, and there is a gray area between them. At import time, the interpreter parses the module’s code from top to bottom once and generates bytecode for execution. If you have a syntax error, report it at this time. If you have the latest.pyc file in your local __pycache__ folder, the interpreter skips this step because it already has the bytecode needed to run it.

Compilation is definitely an activity at import time, but something else happens at that time, because statements in Python are almost always executable, meaning that statements may run user code and change the state of user programs. The import statement, in particular, does not just declare that when a module is imported for the first time in the process, it also runs the entire top-level code in the imported module ————. Later imports of the same module use caching and only name binding. That top-level code can do just about anything, including things that are normally done “at run time,” such as connecting to a database. Thus, the line between “import time” and “runtime” is blurred: import statements can trigger any “runtime” behavior.

In the previous paragraph I wrote that the import “runs all the top-level code,” but the top-level code does some work. When importing a module, the interpreter executes the top-level DEF statement, but what does that do? The interpreter interprets the body of a function (when importing a module for the first time), binding the function object to the corresponding global name, but obviously the interpreter does not execute the body of the function. Typically this means that the interpreter defines the top-level function at import time, but only executes the body of the function when the function is called at runtime.

For classes, the situation is different: at import time, the interpreter executes the body of each class, and even the body of nested classes. The result of executing the class definition body is that the properties and methods of the class are defined, and the class object is constructed. In this sense, the body of a class definition is “top-level code” because it runs at import time.


Metaclasses are factories that make classes. According to the Python object model, a class is an object, so a class must be an instance of some other class. By default, classes in Python are instances of the Type class. That is, type is the metaclass of most built-in and user-defined classes. To avoid infinite backtracking, Type is its own entity. Type is a class, which is a subclass of Object.

The relationship between the Object and Type classes is unique: Object is an instance of Type, and Type is a subclass of Object. This relationship is "magical" and cannot be expressed in Python code, because the other must exist before one can be defined. It's also amazing that Type is an instance of itself.Copy the code

In addition to type, there are other metaclasses in the library, such as ABCMeta and Enum…… Looking up, ABCMeta ends up being a class of Type. All classes are directly or indirectly instances of Type, but only metaclasses are also subclasses of Type. Metaclasses inherit the ability to build classes from the Type class. In particular, metaclasses can be customized by implementing an __init__ method. The __init__ method of a metaclass can do anything a class decorator can do, but more.

miscellaneous

  • Python provides ample introspection tools that are not needed most of the timeexecevalLetter of number.
  • typeConstructor and metaclass__new____init__Each method receives the body of the class to be evaluated in the form of a name-to-attribute image.

Other information

  • The Python Technical Manual’s description of the attribute access mechanism is probably the most authoritative explanation on this subject other than the C source code in CPython.
  • Python Tutor is a tool site for visual analysis of how Python works.