The introduction

Today (October 21), I will attend PyCon China 2018, a Python developer conference held in Shanghai Pier 1 Art Hotel. Pythoneer, such as Jupyter, PyTorch, PyEnv, Pyeth, VNpy, Numba, etc.

questions

Jupyter Notebook files (extension: IPynb) version control problem was asked by some of my friends in the morning q&A session. Jupyter notebook uses JSON file format storage. Jupyter notebook uses JSON file format storage. Due to the JSON format, Jupyter notebook slightly changed, Jupyter notebook files (STORED in JSON format) will be greatly different, Git version management tool can not do a good version control.

The solution

As a Jupyter user, some time ago I also encountered this problem, step on the pit to grope, is a “satisfactory” solution to this problem.

Define the problem the solution addresses: gracefully manage the ipynb file change history of Jupyter notebooks with Git.

There are two solutions, and I recommend the second one.

Option 1: Install the NoteDown plug-in so that the Jupyter notebook can modify and run Markdown format code, in other words, replace ipynb files with MD files. (This is from li Mu tutorial to learn, practice pit more)

Install the NoteDown plug-in, run the Jupyter notebook and load the plug-in:

pip install https://github.com/aaren/notedown/tarball/master
jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'

#This is also possible, but the versions are younger (1.4.2, latest 1.5.0) and less compatible
# pip install notedown
Copy the code

If you want the NoteDown plugin to be enabled by default every time you run Jupyter, follow these steps.

First, run the following command to generate the Jupyter notebook configuration file (skip it if it has already been generated).

jupyter notebook --generate-config
Copy the code

Then, add the following line to the end of the Jupyter notebook configuration file (on Linux/macOS, it is usually in the path ~/.jupyter/jupyter_notebook_config.py)

c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
Copy the code

After that, we just need to run the Jupyter notebook command to enable the NoteDown plug-in by default.

Solution 2: Install nbstripout tool, make Git save Jupyter notebook to Git version library “ignore” the output, leave the outputs empty, while keeping the working directory unchanged. This is generally not a problem, and others only need to re-run the Jupyter notebook once after checkout to see the output.

   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
Copy the code

To complete the configuration, run nbstripout in the Git project directory and use Git commands such as add, checkout, commit, and diff. Changes are made only when the code is actually changed, and a version record is generated after the commit.

Install the nbstripout tool.

nbstripout --install
Copy the code

You can also uninstall the Nbstripout tool at any time.

nbstripout --uninstall
Copy the code

What these two commands actually do is modify the.git/config file

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[filter "nbstripout"]
	clean = \"/ Users/CCM/anaconda3 / bin/python \" \ "/ Users/CCM/anaconda3 / lib/python3.7 / site - packages/nbstripout. Py \" smudge = cat required = true [diff "ipynbTextconv = \ ""] / Users/CCM/anaconda3 / bin/python \" \ "/ Users/CCM/anaconda3 / lib/python3.7 / site - packages/nbstripout. Py \" - tCopy the code

To sum up, the key point of Jupyter notebook file version control problem lies in the IPynB file storage outputs, this part is often changeable, do not need to do version control, solve this, also solve the Jupyter notebook file version control problem.

Reprint please indicate the source, also welcome to communicate with me.