Jupyter Notebooks are a great way to interactively write Python code and link documentation, program output, and data visualizations with the resulting code. Many ides support Jupyter notebooks nativly, and the Jupyter Notebook server and the JupyterLab environment are also effective ways to write notebooks. But under the hood, the Jupyter notebook is just a JSON document, and the contents of that document are often unreadable to humans. Because of this, it can create confusing differences in your version control system. Jupytext is a Jupyter plug-in that automatically saves Jupyter notebooks as a variety of human-readable (and editable) outputs. It also allows changes in these other files to be synchronized back to the notebook file (.ipynb) itself.

Why did you use Jupytext?

There are several good reasons to consider using Jupytext. First, you’re probably struggling to get version control right in your laptop. My article on version control describes this situation and gives some background and good ways to work around the problem, but they are not necessarily perfect for every situation. Using a specialized differentiator tool like NBDIME makes the differences easier to navigate, but ultimately, a single notebook file (that is, an.ipynb file) contains code, output, and metadata. All of these can alter and contaminate your differences, making versioning a challenge.

The second reason to consider Jupytext is if you prefer to work outside the standard Jupyter notebook creation environment. Perhaps your favorite IDE to write Code in is PyCharm or Visual Studio Code. Or you use a text editor like Vim or Emacs and love the full functionality of your favorite editor. Maybe you write and test some code in an IPython session and like notebooks where units of code can easily get out of order. You may also want to work on your laptop from a terminal (perhaps over AN SSH connection) without a web browser.

The third reason is the ability to work more efficiently with notebooks and notebook contents, especially Python source code. For example, if source code is stored in the more common format of Python files, there are a number of tools available to examine code, including Linters and Formatters /beautifiers.

We’ll look at a few examples of how Jupytext supports all three.

Installation and Setup

Jupytext is easy to install with PIP.

pip install jupytext --upgrade
Copy the code

If you are using Anaconda, you can use it to install.

conda install jupytext -c conda-forge
Copy the code

You’ll probably also use a Jupyter Notebook or Lab environment. If so, restart your process and pick up the Jupytext extension at the front end.

Basic use of notebook or lab

The easiest way to see how Jupytext works is to start with a simple example. In our previous article on notebook versioning, we used this notebook as an example. This is just a simple notebook that includes a drawing using Matplotlib. After you have set up a Jupyter Notebook (or JupyterLab) environment with Matplotlib installed, you can open the notebook in the Jupyter Notebook (run the Jupyter Notebook). When you do this, you should see a Jupytext entry in the file menu. Sync your laptop to a Python file by checking the values shown below.

Jupytext has added menu options to follow the example by clicking on the icon

First, if you primarily want to work in scripts or Markdown files (I’ll cover all formats later), you should turn off Jupyter autosave. If you want to work primarily in Jupyter notebooks and only check in script files when you’re done, you can leave autosave enabled.

Once the notebook is paired with the script output, the file is created in the same directory as the notebook. In my case, this means the file jupyter_git_example.py is created. It looks something like this.

# ---
# jupyter:
#   jupytext:
#     formats: ipynb,py:percent
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.13.0
#   kernelspec:
#     display_name: Python 3
#     language: python
#     name: python3
# ---

# %%
import matplotlib.pyplot as plt
plt.plot([x**2 for x in range(100)])

# %%
Copy the code

This format is called the Percent format, and those special comments (# %%) represent the units of the notebook.

Circular ball travel line

There are a few things you should notice about this file. Jupytext will try to take the latest version of one of the files and use it to generate the other. So, for example, if you update your laptop and then save it manually (because you turned off autosave), Jupytext will refresh the.py file. And vice versa, if you edit the.py file, it will update the matching units in the notebook. Try it: Make a small edit to the.py file in a text editor, then save it (for example, change the drawing to use 0.5 instead of 2). Then, click the “Save” icon in your notebook. Jupyter will warn you that the file on the disk has changed and give you three options.

  • Cancel – Go back to what you already saw, but it doesn’t match what was saved on disk.
  • Reload – Reload the notebook with the contents saved on disk (now with.pyThe contents of the file are consistent).
  • Coverage – will just be updated in Jupytext.ipynbSave your notebook on the file.

In this case, you want to reload from disk. The code in the cells will be updated to match your edits. However, you need to know that it does not execute the cell. The output will still reflect x**2 instead of x**0.5. Also, the Python session you are running will not update any variables because the code has not been executed yet. You can re-execute this unit to get the changes in the running instance. The above example may seem confusing, but I think it’s a very effective way to think about Jupytext usage scenarios.

Let’s consider these three usage scenarios in more detail.

Version control

First, if you are looking for an effective laptop version control option, you can simply install Jupytext, pair it with the output format you want to use, and check the resulting files on every commit modification. You’ll get clean differences for historical tracking.

In more complex cases, such as branching and merging, you can easily merge the generated script or Markdown first and then regenerate the output notebook using Jupytext. Jupytext includes a command-line tool, so updating files outside of the laptop environment is easy.

jupytext --to notebook notebook.py  # generates notebook.ipynb from notebook.py, using comment markers
Copy the code

I emphasize here that when you regenerate the.ipynb file, it will contain no output. You still have to decide if you want to check the output in the notebook file. If you do this, you need to re-execute the notebook before committing to version control (for example, by using Jupyter Notebook, or JupyText — Execute, or Papermill).

Use other tools for coding

The second reason I like using Jupytext is coding and editing in an IDE or text editor. In this case, your script or Markdown file will be the main file for your work, and the notebook can automatically or manually generate and execute as needed. Using this approach, you can get all the benefits of clean differences, and if you prefer using your IDE or are more comfortable in a Markdown environment, you can still use the notebook format to publish results to others. It’s the best of both worlds.

Code quality tools

A third area of advantage with Jupytext is automated code reviews and other QA tools. Since you can convert notebooks to regular Python code, you can automatically run Linter/Validators like PyLint, Flake8, or Black. If Python code is hidden in notebook files, it is harder to verify that it meets your organization’s coding standards.

The documentation for Jupytext also describes the integration of common pre-commit hooks using the pre-commit framework. You can be sure that every time your notebook code is submitted to Git, it will be validated.

Jupytext supports many formats, not just Markdown

The above example syncs a notebook file to a Python source file, but there are many other formatting options.

Multiple Markdown formats are supported.

  • Jupytext Markdown – a simple Markdown format
  • R Markdown- Format in RStudio
  • MyST- Structured text for the tag
  • PandocMarkdown – For Pandoc, general purpose file converter. It can also convert notebooks (like the one I used to write this article!). .
  • Quarto- a scientific publishing system based on Pandoc

Jupytext also supports many types of script output, and many languages, not just Python. This allows regular code files to generate notebooks. Jupytext interprets special comments into instructions, which then generate individual notebook units based on the metadata specified in the script. There are pros and cons to using each format, and most of them support full round-trip conversions, as we discussed. Jupytext can understand the following script formats.

  • Light – A format created for the Jupytext project where the beginning and end tags of the cells are# +And the# -
  • A version of Nomarker-Light, but completely unmarked. This format cannot be recycled.
  • The Percent – tag is placed in the code in this format.# %% Optional title [cell type] key="value"
  • Hydrogen – Very similar to percentage, but it doesn’t annotate Jupyter’s magic.

Possible problems

One of the main problems with adding Jupytext to your configuration is that you have an extra piece of complexity. If you want to check in and version control the finished notebook and output, you now need to submit two files instead of one. It may not be worth it to you, depending on your circumstances.

Another problem is that Jupytext is supported by the command line and the official Jupyter authoring tool, but not fully supported by all other ides, so if you use a different tool, you’ll have to get used to converting on the command line. In almost all cases, I would say that if you’re going to do more work in Jupyter, it’s worth learning how to do it.

Finally, as always, you need to be strict that your notebook output units are consistent with the code that generated them. The best way to ensure this is to execute the entire notebook every time you update it, after the kernel restarts, before committing. You can automate this regeneration step, but a truly long laptop can make it tedious. Just be aware that Jupytext may update notebook files and you may not realize it.

Jupytext is a great plug-in and will be useful for those who prefer to work in Markdown or plain source files, as well as for those who practice using code validation tools.

python

The postJupytext – Jupyter notebooks as Markdown documents or Python scriptsappeared first onwrighters.io.