Tools to ensure the quality of Python projects

“This is the first day of my participation in the August More Text Challenge.

Remember before

When you’re writing a project, there are some concerns about code quality, such as code art, design patterns, refactoring, and so on. But a good Python project includes system design, code quality tools, and so on, in addition to the programmer’s own code quality capabilities. However, because each system design will have some differences, system design requires programmers to have certain experience, need to grow with the project. While code quality tools can be isolated and applied to every project, this article is a brief summary of my use of these code quality tools.

Originally the address

0. Submit code specifications

Every team or individual must have their own branch management and commit code specification. For branch management, git Flow is usually used. If you don’t know how to use git flow, you can use the cheat list and set some permissions for the master, Develop, and other branches. In addition, the submitted information should also have corresponding specifications, such as what type of submission this time, what is the function of the submission and so on, but there is often no standard for this submission specification, as long as the team and individual use satisfactory, can reduce the development conflict through these specifications, duplicate code and so on. I often use:

git commit -m"<issue_id>:<file change>:<operating>:<info>"
Copy the code

The meanings of each field are as follows:

Issue_id: indicates the ID of an issue. When preparing to write a feature or fix a bug, you should first mention an issue. This issue should specify in detail what to modify and what to achieve, and then submit code for this issue
File change: indicates changes to a file, such as additions, deletions, and modifications. There are also people who useThe +, -, *To represent add, delete and modify respectively
Operating: represents this code change, which includes the following types
- Feat: New function
- Fix: Fixes bugs
- Doc: Document changes
- Style: Code format changes
- Refactor: Refactoring of an existing feature
- Perf: Performance optimization
- Test: Adds a test
- Build: Changed build tools such as Grunt to NPM
- Revert: Undo the last COMMIT
Info: briefly describes the submission information

1. Project Environmental Management -Poetry

The most important thing for a project is to run, and many projects are being developed locally at the same time. Each project uses a different environment, so virtual environment isolation is needed. Python provides a virtual environment management package called Venv, it is very stable, but also not many features, generally used on the server, for local development, will want more functions, more convenient for virtual environment, dependent package management, Python package management field related tools are many. After various attempts, including the controversial Pipenv, I find Poetry is easier to use and has fewer pitfalls.

The introduction to The Poetry website is all about making Python package installation and dependency management easy, and I think Poetry is the most useful. Not only does it have a lot of support for package management, but it also has other extensions such as easy packaging and publishing, script shorthand, and so on.

The first time most Python projects were written, they basically followed the following process:

1. Install the Python version
2. Bypython -m venv <name>Way to create venv virtual environment in the project
3. Pass during usepython -m pip install <name>The way to install dependencies
4. Pass the code after it is writtenpython -m pip freeze > requirements.txtGenerate dependency files

Poetry is pretty simple. Here’s how to create Poetry:

1.1. Create a project

Create a project scaffolding by ordering Poetry new

➜ poetry new example ➜ tree. └ ─ ─ example ├ ─ ─ example │ └ ─ ─ just set py ├ ─ ─ pyproject. Toml ├ ─ ─ the README. RST └ ─ ─ tests ├ ─ ─ ├ ─ exclude.py 3 directories, 5 filesCopy the code

As you can see, Poetry creates an example project, generates the corresponding folder, and pyProject.toml, which contains the project information. If an existing project is in place, it is initialized by ordering poetry init:

➜  example poetry init

This command will guide you through creating your pyproject.toml config.
# interactive bash, through which to fill in the project information.
Package name [example]:  example
Version [0.1.0]:  0.0.8
Description []:  example project
Author [so1n <[email protected]>, n to skip]:  n
License []:  
Compatible Python versions [^3.7]:  

Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file
After completing the project information, the following content is generated, and the pyProject.toml file will be created in the same path as before and written to.
[tool.poetry]
name = "example"
version = "0.0.8"
description = "example project"
authors = ["Your Name <[email protected]>"]

[tool.poetry.dependencies]
python = "^ 3.7"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["Poetry - the core > = 1.0.0"]
build-backend = "poetry.core.masonry.api"


Do you confirm generation? (yes/no) [yes] yes
Copy the code

1.2. Create a virtual environment

Poetry uses the system’s default Python environment by default, but you can specify the Python version by using the Poetry env use < Python version>, and then create a virtual environment. The default configuration of the virtual environment is /home/{user}/. Cache /pypoetry.

➜ poetry config - the listPath to the cache directory used by # poetry
cache-dir = "/home/so1n/.cache/pypoetry"
experimental.new-installer = true
installer.parallel = true
Default: true; if poetry install/poetry add is executed without a virtual environment, a virtual environment is automatically created; if set to false, the package will be installed into the system environment when the virtual environment does not exist
virtualenvs.create = true
The default value is false. If set to true, the virtual environment is created in the current project directory
virtualenvs.in-project = false
The default path {cache-dir}\virtualenvs
virtualenvs.path = "{cache-dir}/virtualenvs"  # /home/so1n/.cache/pypoetry/virtualenvs
Copy the code

The default usage (including some third-party packages) is to assume that Venv is created under the project path, which is also easy to manage. You can run the following command to modify poetry and then create a virtual environment in the project path:

# change configuration➜ poetry config virtualenvs. - in the projecttrue 
Copy the code

After the virtual environment is created, you can pass

➜ poetry run < commod >Copy the code

To execute the command you want to run or call a Python package, or you can launch an interactive shell wrapped in a virtual environment using the Poetry shell.

1.3. Install dependencies

Once the virtual environment has been created, you can install the dependency by using poetry’s add command, with –dev indicating that it is a development dependency (useful to separate development dependencies from generation dependencies) :

➜ poetry add flask
➜ poetry add pytest --dev 
Copy the code

After installing dependencies, you can see that the PyProject.toml file has changed:

.[tool.poetry.dependencies]
python = "^ 3.7"
Flask = "^ 1.1.2." "

[tool.poetry.dev-dependencies]
pytest = "^ 6.2.4." ".Copy the code

There are the flask dependencies and PyTest dependencies that you just installed, and the PyTest dependencies are dev dependencies. Requirements-dev.txt = requirements-dev. TXT = requirements-dev. TXT = requirements-dev.

# Production environment
poetry export -o requirements.txt --without-hashes --with-credentials
# Test environment
poetry export -o requirements-dev.txt --without-hashes --with-credentials --dev
Copy the code

This distinction between the dependencies of the test environment and the production environment can minimize the impact of the dependency packages required by the test on the production environment.

In addition to adding dependencies, Poetry supports a variety of dependency operations, including the following:

# check dependencies
poetry show
View dependencies in a tree structure
poetry show -t
Update all locked version dependencies
poetry update
Update the specified dependency
poetry update flask
# delete dependency
poetry remove flask
Copy the code

1.4. Other

The poetry operation above is sufficient for a general Python project, but if you need to publish your own package to Pypi, or install github’s latest unpublished package, you can use his other extension commands, as described in the documentation. In my opinion, poetry run PIP install is already very good. However, there is a lack of a stable maintenance team, so there are some bugs. For example, if the dependency installation fails, you can use the poetry Run PIP install package. Then manually fill in the PyProject.yml file.

Code quality tools

In large projects, we generally do not pursue fancy code, but stable, easy to understand, low complexity code, the most perfect code should be a person in the field can understand, and can perfectly solve the needs. But no one is perfect, many times there may be some small problems in writing code, and these small problems are time-consuming and laborious to check by people, and it is difficult to troubleshoot, then the code inspection tool is needed. General code inspection tools are divided into three categories. One is to check code style and format non-standard code style into standard code style. The other type is code logic checking, which checks the code logic, code complexity, package references, etc. The last type is code security checking, such as whether to introduce a key into the code, or write an eval function in Python code, etc.

2.1. Flake8

Flake8 is a tool released by Python to help check whether Python code is standardized. Compared with Pylint, Flake8 has flexible checking rules, supports the integration of additional plug-ins, and has strong scalability. Flake8 is a wrapper around the following three tools:

PyFlakes: a tool for statically checking Python code for logical errors
2.Pep8: static check Pep8 coding style tool.
NedBatchelder’s McCabe: A tool for statically analyzing Python code complexity.

In addition to the above three features, Flake8 also supports the introduction of other features through plug-ins, such as using Flake8-Docstrings to force the writing of the function docString.

In the project you can introduce flake8 to dev dependencies by saying poetry add Flake8 –dev, and then by adding. Flake8 files in the root directory:

[Flake8] # set the maximum complexity of the new complexity to 24 # Ignore these error types. E203 # exclude =.git,.venv, __pycache__, scripts, logs, upload, build, dist, docs, migrations,Copy the code

Specify how Flke8 should be executed, and finally invoke the command poetry run flake8. Can.

2.2. Mypy

There is no doubt that Python’s syntax makes it easy to write code, but its dynamic language nature can make large projects unstable, and Mypy is a solution to this problem. Mypy is a static type checker. It helps us catch errors before we run the code like static languages do, but when we write Python code, we write its type as static languages do. This is Type Hints. Combining mypy and Type Hints will increase our code volume, but it can introduce the following benefits:

1. You can makeIDEProvides better code completion and hints through type inference to facilitate project refactoring and error detection ahead of time.
2. Forcing you to think about the types of dynamic language programs may help you build clearer code architectures.

For example, let’s have the following function:

def foo(a, b) :
    return a + b
Copy the code

In general, there is no way to know what Type of parameter this function passes. Maybe it starts with an int variable and changes to STR variable later. Type Hints can specify what Type this variable is and what Type it returns.

def foo(a: int, b: int) - >int:
    return a + b
Copy the code

The a and b arguments to this function and the returned value are marked as int. If there are two calls in the program:

foo(1.2)
foo("a"."b")
Copy the code

They both work, but mypy checks to see that the second call is incorrect. While this example is too simple to be painful, in complex logic its advantage is clear.

In your project you can install the dependency package by poetry add mypy –dev and then by adding the mypy.ini file to the root directory:

# Mypy core configuration
[mypy]
The value type of the specified function is also checked
disallow_untyped_defs = True
# Ignore some import errors, some older package architectures may not meet mypy requirements
ignore_missing_imports = True

# Indicate the configuration for root tests
[mypy-tests.*]
Specifies that checks for this range are ignored
ignore_errors = True
Copy the code

Specify how the mypy should be executed, and finally call poetry Run mypy. Can be

2.3. Automatic formatting code

Python is a dynamic language and does not impose strong code styles, which can lead to a thousand Python code styles for a thousand people, which is also bad for large projects… Fortunately, there are many tools for automatic formatting in the Python ecosystem, but I won’t compare them in detail here, just to give you a brief overview of the three tools I’ve kept after trying out many of them (I’ll have to try them out myself) :

1. Autopep8, this tool is mainly used to remove unused import statements. Autopep8 does the best job in open source toolkits, but it may not be available in some scenariosPycharmIt works, unfortunatelyPycharmCan only manually press the shortcut key file by file formatting… Autopep8 can passpoetry add autopep8 --devIts configuration parameters are very simple, so it only provides commands without configuration files. Its main commands are used as follows:
- --in-place: Make changes directly to the document, rather than printing out the differences (trust him to use it)
- --exclude: Exclude files/folders that are not formatted
- --recursive: traverses the file recursively
- --remove-all-unused-imports: Deletes all unimported dependency packages
- --ignore-init-module-imports: Deletes all unimported packages__init__.pyfile
- --remove-unused-variables: Deletes unused variables

2. Isort, this tool is mainly used to format import statements, such as wrapping statements beyond the maximum allowed file length, and sorting import statements automatically (this feature is great for obsessio). Isort can passpoetry add isort --devInstall, isort supportedpyproject.tomlFile configuration, here is one of my common configurations:

[tool.isort]
Black mode is compatible because black is used for automatic formatting
profile = "black"
Which mode should be used when too many import packages exceed the file length and need to be wrapped
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
ensure_newline_before_comments = true
The maximum length of each line
line_length = 120
# Ignore folders
skip_glob = "tests"
Copy the code

3. Black, known as the uncompromising automatic formatting tool, as long as it does not think appropriate, automatic formatting, there is no choice, if not the same as the team standard please be careful to use, I am quite accept its automatic formatting style… . Black can passpoetry add black --devInstall, mypy also supportspyprojrct.tomlFile configuration, here is a common configuration of mine (black has few configuration items):
```
[tool.black]
The maximum length of each line
line-length = 120
Which version of Python is currently available
target-version = ['py37']
Copy the code
```

3.pre-commit

The auto-formatting tool was introduced soon after the project started looking for automation, because it was too cumbersome to manually run auto-formatting scripts before every commit. Fortunately, there is a pre-commit tool that is specifically designed for Git hooks.

Pre-commit is a Git pre-commit hooks framework for managing and maintaining multiple languages, like Python’s package manager PIP, You can use pre-commit to install pre-Commit hooks created and shared by others into your own project repository. Pre-commit makes it a lot easier to use Git hooks, just specify the hooks you want in your configuration file, it installs hooks in any language for you, resolves environment dependencies, and then executes them before every commit.

Installed 3.1.

Typically, this is done by PIP install pre-commit, but for environmental isolation, use poetry add pre-commit –dev to install, Pre-commit -config.yaml is a file in the root directory of the project. Here is my configuration.

repos:
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v0.910
    hooks:
      - id: mypy
  - repo: https://github.com/PyCQA/isort
    rev: 5.93.
    hooks:
      - id: isort
  - repo: https://github.com/psf/black
    rev: 21.7b0
    hooks:
      - id: black
  - repo: https://github.com/PyCQA/flake8
    rev: 3.92.
    hooks:
      - id: flake8
  - repo: https://github.com/myint/autoflake
    rev: v1.4
    hooks:
      - id: autoflake
        args: ['--recursive'.'--in-place'.'--remove-all-unused-imports'.'--remove-unused-variable']
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.2.0
    hooks:
      - id: check-ast
      - id: check-byte-order-marker
      - id: check-case-conflict
      - id: check-docstring-first
      - id: check-executables-have-shebangs
      - id: check-json
      - id: check-yaml
      - id: debug-statements
      - id: detect-private-key
      - id: end-of-file-fixer
      - id: trailing-whitespace
      - id: mixed-line-ending
Copy the code

The content of the file is simple. It indicates which tools are used, which version of the tool, and which hooks are used (a repository may have multiple hooks). Each parameter is explained as follows:

Repo: repository URL, pre-commit Git to install tools that exist on Github
Rev: Version of each tool, using git’s tag attribute
Hooks/ID: Each repository has many hooks, and you can select which hooks to use by using the hook-ID
Hook/ID/ARgs: Each hook supports some parameters. Args is the parameter used to configure the hook

These tools will read the root configuration file, while AutoFlake I can’t find his PyProject. toml configuration description, so directly through the args parameter configuration parameters. You can then call the hook script directly, or if you are importing an existing project for the first time you should manually call Poetry Run pre-commit Run –all-files, which will check all the hooks for the project and then adjust the code and configuration based on the check results. Git /hooks/pre-commit If you want to install the hook script, you can call poetry run pre-commit install. After installation, every Git commit is automatically executed using the Git hooks mechanism, which automatically checks and formats the code.

The above configuration file is my common configuration. There are many pre-commit hooks. If you are interested, you can check all the hooks in the pre-commit hook collection

4. The remote warehouse is automatically executed

Local hooks are only for local submitters, while in team collaboration, other people can temporarily shield or delete hook files, so local hooks cannot be forced. Therefore, the team will generally configure a script of its own in the pre-recevice stage of Github&Gitlab. The above code detection tool is used to run, although the two methods are slightly different, but the core steps are the same:

1. First pull the latest code into the container
2. The installation phase, where Python versions and the like are installed to the containerRedisContainer, etc.
3. Code check, at this time, the code quality inspection tool will run, if there is a detection error, then reject the submission, and show what is wrong, if there is no problem, go to the next step.
4. Test phase, where test cases are run to check whether the test code coverage is up to standard. Similarly, if the test is not up to standard, the submission will be rejected and the next step will be taken if the test is successful.
4. Style uniformity. Use style uniformity plug-ins, such as Pythonisort.blackEtc., format the project code.

Every company has a standard CI/CD, and the way they use it may be a little different, but the core principles are the same. Here’s how to use Github Action (which is free!!) using an open source project. .

Gitlab CI/CD related articles is more, you can refer to the continuous delivery network or access to books, you can also view the article: www.mindtheproduct.com/what-the-he… If you are interested in Gitlab Hook, you can refer to the addition and use of Gitlab pre-receive Webook

This example comes from my project RAP. Create a script directory in the project directory that can be called locally, but is mainly used for Github actions. Create a script for install that will be used to install dependencies:

#! /bin/sh -e

# Use the Python executable provided from the `-p` option, or a default.
[ "The $1" = "-p" ] && PYTHON=$2 || PYTHON="python3"

REQUIREMENTS="requirements-dev.txt"
VENV="venv"

set -x

if [ -z "$GITHUB_ACTIONS" ]; then
    "$PYTHON" -m venv "$VENV"
    PIP="$VENV/bin/pip"
else
    PIP="pip"
fi

"$PIP" install -r "$REQUIREMENTS"
Copy the code

Note that this is based on venv, not poetry as I mentioned above. The reason for using VENV is that one machine usually runs one project online, and the machines produced at the same time all pursue stability. In this case, the advantages of simple and stable VENV are reflected, so it is recommended to use VENV online. The above script creates a virtual environment and installs the test environment dependencies according to requirements-dev.txt.

Now that the dependencies are in place, it’s time to tell Github Action how to do a code quality check, so write a script that checks:

#! /bin/sh -e

export PREFIX=""
if [ -d 'venv'];then
    export PREFIX="venv/bin/"
fi

set -x
echo 'use venv path:' ${PREFIX}
${PREFIX}mypy .
${PREFIX}flake8
${PREFIX}isort .
${PREFIX}black .
${PREFIX}autoflake --in-place --remove-unused-variables --recursive .
Copy the code

This script simply invokes the commands in the same order as above, first checking the code, then running the test cases, and finally formatting the code. The commands here do not write the individual configuration, because they will automatically read the configuration file under the project, consistent with our local hook.

Once the script for the Github action invocation is created, it’s time to create the actual Github action file. First create the.github/workflows directory in the project and create the test-suite.yml file in the.github/workflows directory (see the official documentation for more on this file) :

---
Specify the workflows name
name: Test Suite

# specify that the operation push to master or pr to master is performed
on:
  push:
    branches: ["master"]
  pull_request:
    branches: ["master"]

jobs:
  tests:
    Set the task name
    name: "Python ${{ matrix.python-version }}"
    # Select which container type to run on
    runs-on: "ubuntu-latest"

    # set variable, where multiple Python versions are set to run once for each Python version
    strategy:
      matrix:
        python-version: ["3.6"."3.7"."3.8"."3.9"."Three 3.10.0 - beta."]

    steps:
      Call the official check and install python version
      - uses: "actions/checkout@v2"
      - uses: "actions/setup-python@v2"
        with:
          python-version: "${{ matrix.python-version }}"
      # change script permissions
      - name: "Change permissions"
        run: | chmod +x scripts/install chmod +x scripts/check      # install dependencies
      - name: "Install dependencies"
        run: "scripts/install"
      # Check
      - name: "Run linting checks"
        run: "scripts/check"
Copy the code

After the file is written, you can push the code to the remote site. Then you can go to Github to check the action status. The general success result is as follows (only Python3.7 is tested here, and you will receive an email notification if the action fails) :You can also click to view details of a step, such as checking code details:

5. To summarize

All of these tools are the ones THAT I’ve slowly practiced and integrated to find the best set of tools for my Python building project, but these tools can only check the surface, while others, such as code logic, need to be written and run. And some of the team even using stress testing, online simulation test, etc., the introduction of these tools/systems and use will bring a lot of the early learning and time cost, but they can make the project keep thrive, reduce line project Bug the number of occurrences of test cases (of course, these tools and so on also want to together with maintenance).