Since my last post on Python Import System? It has been a month, during which I consulted many big wigs and asked them for their opinions on my previous article. The consensus was: “This article is too dry to be used for an interview. You have to go into the Cpython source code for an interview. Who can stand that?”

Ok, I admit that I am suspected of being a bit of a clicker this time, plus I don’t have much experience as an interviewer, so I don’t know how to evaluate the interviewer’s ability in the interview, and the article is off the topic. So THIS time I’m back to share with you the interview questions that can be extended based on the Import System. I have divided this sharing into two parts, one is about the principles of some techniques we use in daily use of Import System, and the other is about the solution to the real situation of Import System.

Follow the public account “Technical Disassemble officer”, reply “import” to obtain the HD PDF reading version

The first is principles

First, principle

The principles section doesn’t really go into the underlying source code to sort out the logic of Cpython’s source code as in the previous article, so you can rest assured that the principles here are only relative to a real-world project problem, such as writing a Web service framework from scratch. The idea is that we need to understand the layers of network protocols, application servers, Web servers, and so on. Interview: Python Import System? Interview: Python Import System

1 Python package search path

For Python developers, one of the first things to worry about in Import System is the Python package search path, which is often asked in interviews. So our first question to solve is about the Python package search path. So what is the order of Python package search paths? How do we make sense of it? We can get the answer from the import keyword flowchart in the previous article. Review that flowchart

The first thing to notice is that there are three list structures in the import keyword flowchart that should not be ignored: sys.path, sys.meta_path, and sys.path_hooks.

To understand the search path of Python package, in the final analysis, we need to understand the order of the Importer in sys.meta_path. In the source, we loop the Importer in sys.meta_path and pass path to search. In this case, path is derived from parent_module.__path__. When path is empty, it is assigned to sys.path. In sys.meta_path, sys.path is mainly used by PathFinder, so we can first draw a conclusion that the first classes to be searched are the two Importer. That is, BuiltinImporter and FrozenImporter. The scope of search is built-in module and frozen Module, so the order so far is

Looking at the second part, it is time to loop the sys.path list from the third Importer- PathFinder. Therefore, since it is loop sys.path, the order of sys.path should be a subset of the whole search link. Let’s look at the results of sys.path first

(base) [root@VM-0-8-centos ~]# python -m site
sys.path = [
    '/root'.The root directory of the project
    '/root/miniconda3/lib/python38.zip'.# Standard package for the current environment
    '/ root/miniconda3 / lib/python3.8'.'/ root/miniconda3 / lib/python3.8 / lib - dynload'.'/ root/miniconda3 / lib/python3.8 / site - packages'.# Tripartite package of the current environment
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python38./site-packages' (doesn't exist)
ENABLE_USER_SITE: True
Copy the code

Above is the result of the standard sys.path list of paths, and we can see that the order in which paths are searched is

Root -> Standard package -> Tripartite package

There are two things to note here, though. One is PYTHONPATH setting. As you can see above, PYTHONPATH is not explicitly set. Let’s first learn what PYTHONPATH is (pythonpath-python-docs)

Augment the default search path for module files. The format is the same as the shell’s path: one or more directory pathnames separated by os.pathsep (e.g. colons on Unix or semicolons on Windows). Non-existent Directories are silently ignored. (As an extension of the package search path, the format is the same as normal shell format, multiple paths can be added)

In addition to normal directories, individual PYTHONPATH entries may refer to zipfiles containing pure Python modules (in either source or compiled form). Extension modules cannot be imported from zipfiles.

The default search path is installation dependent, but generally begins with *prefix*/lib/python*version* (see PYTHONHOME above). It is always appended to PYTHONPATH.

An additional directory will be inserted in the search path in front of PYTHONPATH as described above under Interface options. The search path can be manipulated from within a Python program as the variable sys.path. (PYTHONPATH will be added to sys.path to be used)

From the above explanation, we can add multiple paths by separating them, which will be inserted into sys.path to be used. So let’s specify PYTHONPATH to test the result

(base) [root@VM-0-8-centos ~]# export PYTHONPATH=/root/test specifies a specific path (the path may not exist)
(base) [root@VM-0-8-centos ~]# python -m site
sys.path = [
    '/root'.'/root/test'.'/root/miniconda3/lib/python38.zip'.'/ root/miniconda3 / lib/python3.8'.'/ root/miniconda3 / lib/python3.8 / lib - dynload'.'/ root/miniconda3 / lib/python3.8 / site - packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python38./site-packages' (doesn't exist)
ENABLE_USER_SITE: True
Copy the code

As you can see, the PYTHONPATH path is added to sys.path before the current environment builtins and after the root directory.

One interesting point here is that the way we manually specify PYTHONPATH is a bit like switching virtual environments. We can just specify PYTHONPATH for different virtual environments and change the python version, the package version, which amounts to a poor environment management scheme. Of course, PYTHONPATH has a lot of problems with virtual environment management, which is not the same as the implementation of real virtual environment management (virtual environment mainly uses the principle of changing the package PATH and activating the current environment $PATH, The PYTHONPATH path is added to the directory of the virtual environment, so it affects all virtual environments), which will be discussed in a later article.

Now that we’ve covered the PYTHONPATH problem, let’s look at another point, the PTH file. For an introduction to the PTH file, refer to PEP-0648, which presumably means exactly what the title says

Extensible Customizations of the interpreter at startup

We can rely on the PTH file to customize how packages are loaded, and the path specified by PTH is added to sys.path

Note that pth files were originally developed to just add additional directories to sys.path, but they may also contain lines which start with “import”, which will be passed to exec(). Users have exploited this feature to allow the customizations that they needed. See setuptools [4] or betterexceptions [5] as examples.

So how to use it? We need to go to the site-Packages directory and create a new PTH file. (Note that although we can place PTH in a place accessible to various Python interpreters, PTH is a package import extension that loads after tripartite packages.) Therefore, it is usually placed in the site-packages directory)

(base) [root@VM-0-8-centos site-packages]# CD/root/miniconda3 / lib/python3.8 / site - packages
(base) [root@VM-0-8-centos site-packages]# echo "/root/test" > test.pth
(base) [root@VM-0-8-centos site-packages]# mkdir /root/testMkdir: cannot create directory '/root/test' : File exists (base) [root@VM-0-8-centos site-packages]# python -m site
sys.path = [
    '/ root/miniconda3 / lib/python3.8 / site - packages'.'/root/miniconda3/lib/python38.zip'.'/ root/miniconda3 / lib/python3.8'.'/ root/miniconda3 / lib/python3.8 / lib - dynload'.'/root/test',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python38./site-packages' (doesn't exist)
ENABLE_USER_SITE: True
Copy the code

As you can see, we created the PTH file in the site-Packages directory and specified a path in it. When the interpreter starts, it will traverse all python’s accessible directories. When it finds the PTH file, it will parse the file contents and import the path into sys.path (of course, as mentioned in the PEP, Not only paths can be imported), as additional search paths.

Let’s summarize the overall search link here

You might wonder, aren’t the built-in libraries and the standard libraries the same thing in Python? It’s not. It’s mentioned in the official Python documentation

Python’s standard library is very extensive, offering a wide range of facilities as indicated by the long table of contents listed below. The library contains built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming. Some of these modules are explicitly designed to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs.

It states that the built-in modules are written in C and provide access to system functions. The sys library, for example, is not found under the Python standard library path because it is operating system specific and written in C. You can see the asyncio module, which is written in Python.

Although this explains that built-in modules are not standard libraries, they can be classified as standard libraries, but it is important to note that the classification is not identical. It seems a bit nitpicking to say that the built-in modules are not standard libraries, as if there is no point in distinguishing between them. Yes, in most cases, there is no need to distinguish between them. But this is a crucial difference when we understand the order of module lookups in Python. You can see that our project root directory is located between the built-in library and the standard library. Imagine a scenario where we have a module file with the same name as the built-in library and the standard library. Who will be overwritten?

2 Import the protocol and Hooks registry

In the previous article, we combed through the core process of Import System and learned that we can customize the module Import method by importing hooks for each stage of the Import process. Then we will take a look at how to use the Importer Protocol of Import System to develop our custom Importer and complete the registration of Import hook.

Firstly, we need to understand what is the Importer Protocol, which can be explained by peP-0302-specification-part-1-the-importer – Protocol. The Importer Protocol mainly contains two parts. Finder. find_spec and loader.exec_module (for Python after 3.4, Previous versions had find_module and load_module methods respectively).

From the above, we can know that there are two aspects to realize the self-defined importer

  • Implement Finder protocol

  • Implementing the Loader protocol

Before we understand the implementation principles of the two protocols, we need to pay attention to the connector that precedes them, namely ModuleSpec.

2.1 ModuleSpec (Module specification)

What is a module specification? To understand this from the official documentation

The import machinery uses a variety of information about each module during import, Most of the information is common to all modules. The purpose of a module’s spec is to Encapsulate this import-related information on a per-module basis. The import mechanism uses various information about each module during import, especially before loading. Most of the information is common to all modules. The purpose of module specifications is to encapsulate import-related information on a per-module basis.

Using a spec during import allows state to be transferred between import system components, e.g. between the finder that creates the module spec and the loader that executes it. Most importantly, it allows the import machinery to perform the boilerplate operations of loading, Whereas without a module spec the loader had that responsibility.

The Module’s spec is exposed as The __spec__ attribute on a module object. See ModuleSpec for details on The contents of The module spec. (The specification of the module can be obtained via the __spec__ attribute.)

ModuleSpec is an integration of module information. ModuleSpec was introduced after Python 3.4. According to PEP 451 — A ModuleSpec Type for the Import System, ModuleSpec is used to replace the loader returned by the finder, decouple the two and uniformly encapsulate module information.

What does __spec__ contain

(base) [root@VM-0-8-centos ~]# python
Python 3.8. 5 (default, Sep  4 2020, 7:30:14) 
[GCC 7.3. 0] :: Anaconda, Inc. on linux
Type "help"."copyright"."credits" or "license" for more information.
>>> import sys
>>> sys.__spec__
ModuleSpec(name='sys', loader=<class '_frozen_importlib.BuiltinImporter') > > > >Copy the code

This includes the module’s name and loader, and, of course, any other properties not shown in it

On ModuleSpec On Modules
name __name__
loader __loader__
parent __package__
origin __file__
cached __cached__
submodule_search_locations __path__
loader_state
has_location

We can obtain the ModuleSpec value for each module from its properties, for example

(base) [root@VM-0-8-centos ~]# python
Python 3.8. 5 (default, Sep  4 2020, 7:30:14) 
[GCC 7.3. 0] :: Anaconda, Inc. on linux
Type "help"."copyright"."credits" or "license" for more information.
>>> import sys
>>> sys.__name__
'sys'
>>> sys.__spec__.name
'sys'
>>> 
Copy the code

It can be seen that using ModuleSpec contains more information than directly returning the loader, which is also more conducive to our secondary development.

2.2 the finder

ModuleSpec or None is returned via find_spec

The official built-in finder class exists in sys.meta_path, and they all have a common method, find_spec. It is the ModuleSpec object returned by this method that enables the loader to obtain the ModuleSpec object for module loading. Therefore, we want to customize a finder. The core principle is to implement the find_spec method and return ModuleSpec objects for concrete classes.

For any custom implementation of the underlying class, there are abstract classes in Python. We simply inherit the implementation. The main abstract classes for Finder are the following two

class importlib.abc.MetaPathFinder
	def find_spec(fullname, path, target=None) "'An abstract method for finding a spec for the specified module. If this is a top-level import.path will be None. Otherwise.this is a search for a subpackage or module and path will be the value of __path__ from the parent package. If a spec cannot be found.None is returned. When passed in.target is a module object that the finder may use to make a more educated guess about what spec to return. importlib.util.spec_from_loader(a)may be useful for implementing concrete MetaPathFindersAn abstract method to find specifications for a given module. If this is a top-level import,pathIt will beNoneOtherwise, this is searching for subpackages or modules, and the path will be in the parent package__path__The value of the. If no specification is found, returnNone. When passed in,targetIs a module object that the finder can use to make a more educated guess about the specification to be returned,importlib.util.spec_from_loader() may be useful in implementing concrete meta-pathfinders. ' ' 'class importlib.abc.PathEntryFinder
	def find_spec(fullname, path, target=None) "'An abstract method for finding a spec for the specified module. The finder will search for the module only within the path entry to which it is assigned. If a spec cannot be found.None is returned. When passed in.target is a module object that the finder may use to make a more educated guess about what spec to return. importlib.util.spec_from_loader(a)may be useful for implementing concrete PathEntryFindersAn abstract method to find specifications for a given module. If this is a top-level import,pathIt will beNoneOtherwise, this is searching for subpackages or modules, and the path will be in the parent package__path__The value of the. If no specification is found, returnNone. When passed in,targetIs a module object that the finder can use to make a more educated guess about the specification to be returned,importlib.util.spec_from_loader() may be useful in implementing specific pathfinder. ' ' 'Copy the code

Now, if you look at these two finders, you might wonder, which abstract class should we inherit to implement our functionality? In fact, the official explanation for us, although the two are indeed similar, their scope is different

path entry finder

A finder returned by a callable on sys.path_hooks which knows how to locate modules given a path entry.

See importlib.abc.PathEntryFinder for the methods that path entry finders implement.

meta path finder

A finder returned by a search of sys.meta_path. Meta path finders are related to, but different from path entry finders.

See importlib.abc.MetaPathFinder for the methods that meta path finders implement.

Both need to be added to sys.path_hooks and sys.meta_path respectively, and some traces can be found in the source code

_register(PathEntryFinder, machinery.FileFinder)

_register(MetaPathFinder, machinery.BuiltinImporter, machinery.FrozenImporter,
          machinery.PathFinder, machinery.WindowsRegistryFinder)

def _register(abstract_cls, *classes) :
    for cls in classes:
        Abstract classes register abstract subclasses
        abstract_cls.register(cls)
        if _frozen_importlib is not None:
            try:
                frozen_cls = getattr(_frozen_importlib, cls.__name__)
            except AttributeError:
                frozen_cls = getattr(_frozen_importlib_external, cls.__name__)
            abstract_cls.register(frozen_cls)
Copy the code

The PathEntryFinder class registers FileFinder as an abstract subclass, and FileFinder is a hook method in sys.path_hooks

The MetaPathFinder class has registered BuiltinImporter, FrozenImporter, and PathFinder as its abstract subclasses, which are derived from sys.meta_path

What is the difference between directly inheriting ABC abstract classes in Python and using register?

There is no doubt that MetaPathFinder is much more scoped as developers can choose on demand.

All we need to do to implement the finder protocol is to create a new class that inherits MetaPathFinder, implement the find_spec method, and return ModuleSpec for the specific class.

2.3 loader

Core principle: Exec_module is a key method. The core process is the same, and the expansion mode of different types of Loaders is different

Compared to the find_spec method of the finder, the principle of the loader exec_module is undoubtedly more complicated because it involves the specific loading module. However, since ModuleSpec has been released, a lot of implementation steps have been omitted. Let’s take a look at the official documentation to see what the older load_module (pre-python 3.4 loading method) needs to do

If there is an existing module object named ‘fullname’ in sys.modules, the loader must use that existing module. (Otherwise, the reload() builtin will not work correctly.) If a module named ‘fullname’ does not exist in sys.modules, the loader must create a new module object and add it to sys.modules.

Note that the module object must be in sys.modules before the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to sys.modules beforehand prevents unbounded recursion in the worst case and multiple loading in the best.

If the load fails, the loader needs to remove any module it may have inserted into sys.modules. If the module was already in sys.modules then the loader should leave it alone.

The __file__ attribute must be set. This must be a string, but it may be a dummy value, for example “”. The privilege of not having a __file__ attribute at all is reserved for built-in modules.

The __name__ attribute must be set. If one uses imp.new_module() then the attribute is set automatically.

If it’s a package, the __path__ variable must be set. This must be a list, but may be empty if __path__ has no further significance to the importer (more on this later).

The __loader__ attribute must be set to the loader object. This is mostly for introspection and reloading, but can be used for importer-specific extras, for example getting data associated with an importer.

The __package__ attribute must be set.

Before the load_module method is used to load the module, we need to assign a lot of module attributes to the module and then import the module

# Consider using importlib.util.module_for_loader() to handle
# most of these details for you.
def load_module(self, fullname) :
    # get source code
    code = self.get_code(fullname)
    ispkg = self.is_package(fullname)
    Get module and start manual assignment
    mod = sys.modules.setdefault(fullname, imp.new_module(fullname))
    mod.__file__ = "<%s>" % self.__class__.__name__
    mod.__loader__ = self
    if ispkg:
        mod.__path__ = []
        mod.__package__ = fullname
    else:
        mod.__package__ = fullname.rpartition('. ') [0]
    # exec executes the source code, loading it into __dict__
    exec(code, mod.__dict__)
    return mod
Copy the code

Since the official release of ModuleSpec, the assignment step has already been implemented by a function, which we can omit directly. Now we can do so

# Import the Module object directly from ModuleSpec parsing
def _new_module(name) :
	Create a class using the type keyword
    return type(sys)(name)

def _init_module_attrs(spec, module, *, override=False) :
    # The passed-in module may be not support attribute assignment,
    # in which case we simply don't set the attributes.
    # __name__
    if (override or getattr(module, '__name__'.None) is None) :try:
            module.__name__ = spec.name
        except AttributeError:
            pass
    # __loader__
    if override or getattr(module, '__loader__'.None) is None:
        loader = spec.loader
        if loader is None:
            # A backward compatibility hack.
            if spec.submodule_search_locations is not None:
                if _bootstrap_external is None:
                    raise NotImplementedError
                _NamespaceLoader = _bootstrap_external._NamespaceLoader

                loader = _NamespaceLoader.__new__(_NamespaceLoader)
                loader._path = spec.submodule_search_locations
                spec.loader = loader
                # While the docs say that module.__file__ is not set for
                # built-in modules, and the code below will avoid setting it if
                # spec.has_location is false, this is incorrect for namespace
                # packages. Namespace packages have no location, but their
                # __spec__.origin is None, and thus their module.__file__
                # should also be None for consistency. While a bit of a hack,
                # this is the best place to ensure this consistency.
                #
                # See # https://docs.python.org/3/library/importlib.html#importlib.abc.Loader.load_module
                # and bpo-32305
                module.__file__ = None
        try:
            module.__loader__ = loader
        except AttributeError:
            pass
    # __package__
    if override or getattr(module, '__package__'.None) is None:
        try:
            module.__package__ = spec.parent
        except AttributeError:
            pass
    # __spec__
    try:
        module.__spec__ = spec
    except AttributeError:
        pass
    # __path__
    if override or getattr(module, '__path__'.None) is None:
        if spec.submodule_search_locations is not None:
            try:
                module.__path__ = spec.submodule_search_locations
            except AttributeError:
                pass
    # __file__/__cached__
    if spec.has_location:
        if override or getattr(module, '__file__'.None) is None:
            try:
                module.__file__ = spec.origin
            except AttributeError:
                pass

        if override or getattr(module, '__cached__'.None) is None:
            if spec.cached is not None:
                try:
                    module.__cached__ = spec.cached
                except AttributeError:
                    pass
    return module

def module_from_spec(spec) :
    """Create a module based on the provided spec."""
    # Typically loaders will not implement create_module().
    module = None
    if hasattr(spec.loader, 'create_module') :# If create_module() returns `None` then it means default
        # module creation should be used.
        module = spec.loader.create_module(spec)
    elif hasattr(spec.loader, 'exec_module') :raise ImportError('loaders that define exec_module() '
                          'must also define create_module()')
    if module is None:
        module = _new_module(spec.name)
    _init_module_attrs(spec, module)
    return module

mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
def exec_module(self, module) :
    filename = self.get_filename(self.fullname)
    poc_code = self.get_data(filename)
    obj = compile(poc_code, filename, 'exec', dont_inhert=True, optimize=-1)
    exec(obj, module.__dict__)
Copy the code

After finished the basic principle of exec_module agreement, we imitate the Finder development mode, and see what we can use the official abstract classes to development, from the official document – (importlib. ABC) [docs.python.org/3/library/i…

object
 +-- Finder (deprecated)
 |    +-- MetaPathFinder
 |    +-- PathEntryFinder
 +-- Loader
      +-- ResourceLoader --------+
      +-- InspectLoader          |
           +-- ExecutionLoader --+
                                 +-- FileLoader
                                 +-- SourceLoader
Copy the code

Loader abstract classes are divided into three main classes. One class is ResourceLoader, which has been replaced by ResourceReader since Python 3.7. The official recommendation for this class implementation is to match specific resources, that is, to load only specific packages.

The other two loaders are derived from InspectLoader, so their underlying implementation logic is the same

# importlib.abc
class Loader(metaclass=abc.ABCMeta) :

    """Abstract base class for import loaders."""

    def create_module(self, spec) :
        """Return a module to initialize and into which to load. This method should raise ImportError if anything prevents it from creating a new module. It may return None to indicate that the spec should create the new module. """
        # By default, defer to default semantics for the new module.
        return None

    # We don't define exec_module() here since that would break
    # hasattr checks we do to support backward compatibility.

    def load_module(self, fullname) :
        """Return the loaded module. The module must be added to sys.modules and have import-related attributes set properly. The fullname is a str. ImportError is raised on failure. This method is deprecated in favor of loader.exec_module(). If exec_module() exists then it is used to provide a backwards-compatible functionality for this method. """
        if not hasattr(self, 'exec_module') :raise ImportError
        return _bootstrap._load_module_shim(self, fullname)

    def module_repr(self, module) :
        """Return a module's repr. Used by the module type when the method does not raise NotImplementedError. This method is deprecated. """
        # The exception will cause ModuleType.__repr__ to ignore this method.
        raise NotImplementedError

class InspectLoader(Loader) :

    """Abstract base class for loaders which support inspection about the modules they can load. This ABC represents one of the optional protocols specified by PEP 302. """

    def is_package(self, fullname) :
        """Optional method which when implemented should return whether the module is a package. The fullname is a str. Returns a bool. Raises ImportError if the module cannot be found. """
        raise ImportError

    def get_code(self, fullname) :
        """Method which returns the code object for the module. The fullname is a str. Returns a types.CodeType if possible, else returns None if a code object does not make sense (e.g. built-in module). Raises ImportError if the module cannot be found. """
        source = self.get_source(fullname)
        if source is None:
            return None
        return self.source_to_code(source)

    @abc.abstractmethod
    def get_source(self, fullname) :
        """Abstract method which should return the source code for the module. The fullname is a str. Returns a str. Raises ImportError if the module cannot be found. """
        raise ImportError

    @staticmethod
    def source_to_code(data, path='<string>') :
        """Compile 'data' into a code object. The 'data' argument can be anything that compile() can handle. The'path' argument should be where the data was retrieved (when applicable)."""
        return compile(data, path, 'exec', dont_inherit=True)

    exec_module = _bootstrap_external._LoaderBasics.exec_module
    load_module = _bootstrap_external._LoaderBasics.load_module
Copy the code

The core function is undoubtedly the exec_module method, but InspectLoader implements several extension protocols on this basis, see PEP-0302 -Optional Extensions to the Importer Protocol.

2.4 hooks registered

After the above implementation of the import protocol, we need to register the self-defined importer to use it. According to the way of registration, it is divided into two hooks, Meta hooks and Path hooks

Against 2.4.1 Meta Hooks

Meta hooks are called at the beginning of the import process, we can insert Meta hooks anywhere in sys.meta_path, or at the front of it, to override built-in modules, frozen modules, and so on

2.4.2 Path hooks

In contrast, for Path hooks, the scope is limited to the list of paths in sys. Path, and the methods registered are inserted as callables into sys.path_hooks, The result of Path hooks processing is saved in sys.path_importer_cache, which is pre-checked each time Path hooks are triggered.

Next time, we’ll introduce a real world scenario to see what problems Import System can be used to solve in the real world.