Ai founder Jeremy Howard recently wrote about the company’s new programming environment nbDev, which is based on Jupyter Notebook, It also brings the benefits of the IDE editor to Jupyter Notebook, which can be developed in Notebooks without affecting the project life cycle.

From Fast. Ai by Jeremy Howard, Compiled by The Heart of the Machine.

  • Nbdev making address: https://github.com/fastai/nbdev/

  • Nbdev documentation: https://nbdev.fast.ai/

“I think NBDev is a huge step forward for the programming environment.” Chris Lattner, creator of Swift, LLVM, and Swift Playgrounds

In recent years, my colleague Sylvain Gugger and I have been working hard on something we love to do, the Python programming environment NBdev. Nbdev allows users to create complete Python packages containing testing and rich documentation systems in Jupyter Notebook. We have written a large programming library (Fastai V2) and several small projects using NBDev.

Jeremy Howard, founding researcher of FAST. Ai.



Exploratory Programming nbdev system for Exploratory Programming. We find that most programmers spend most of their working time exploring and experimenting. We experiment with new apis we’ve never used before to understand how they work. We explore the behavior of the algorithm under development to see how it handles different data types; We explore different combinations of inputs to debug the code…


Nbdev:
Exploratory programming


We think the exploration process is valuable and should be saved so that other programmers (or themselves) can see what’s going on and learn from the examples within six months. Think of it as a scientific journal, where you can show what you’ve tried (both what worked and what didn’t) and your efforts to improve your understanding of working systems. As you explore, you will discover that some of the things you understand are critical to running your system, so explore including tests and assertions.


Exploration is easiest when you are developing based on Prompt (or REPL), or using a chromned-oriented development system such as Jupyter Notebook. But the “programming” part of these systems is less powerful. This is why people mostly use such systems for early exploration, then move to ides or text editors.


The reason for moving to another system is to get features that notebook or REPL don’t have, such as excellent document lookup, excellent syntax highlighting, integration unit testing, and (critically) the ability to generate the final distributable source code file.


Nbdev brings the advantages of IDE/ editor development to the Notebook system so that users can develop in the Notebook without affecting the entire project life cycle. To support such exploration, NbDev is built on Jupyter Notebook (which means it supports Python’s dynamic features better than a normal editor or IDE) and adds the following important tools for software development:


  • Follow best practices to automatically create Python modules, such as defining __all__ automatically with exported functions, classes, and variables;

  • Perform code navigation and editing in a standard text editor or IDE, and automatically export all changes back to notebook;

  • Automatically create searchable hyperlinked documents based on code, any word in quotes is hyperlinked to the appropriate document, the sidebar of the document site links to each module, and so on;

  • PIP installation package (uploaded to PyPI);

  • Tests (defined directly in notebook and run in parallel);

  • Continuous integration;

  • Versioning and conflict handling.



The following is a snippet from the actual nbdev source code, which was written in NBdev.


Explore the Notebook file format in the NBdev source code.



As shown above, when building software this way, all members of the project team benefit from the work you do to understand the problem domain, such as file formats, performance characteristics, API edge cases, and so on. Because you develop in Notebook, you can also add charts, text, links, images, videos, and so on, which will be automatically incorporated into the library documentation. The cell that defines the code is hidden and replaced by a standardized function document showing its name, parameters, docstrings, and GitHub links to the source code.


For more information about the nbdev features, installation, and use, see the NBdev documentation: https://nbdev.fast.ai/.


Here’s why NBDev was built and the history and background behind nbDev’s design philosophy. First, let’s look at history. (If that doesn’t appeal to you, skip to ‘What’s Missing from Jupyter Notebook?’ )


Software development tools


Most software development tools are not built based on exploratory programming.
When I started coding about 30 years ago, waterfall software development was a virtual monopoly. This programming approach defines the entire software system in detail up front and then programs as close to the specification as possible. At the time, I decided that this approach didn’t fit the way I worked.


In the 1990s, things changed and agile development became popular. People are beginning to understand the reality that most software development is an iterative process, and to develop ways of working that are consistent with that fact. But the software development tools we were using weren’t up to the task of matching the changes in the way we worked. Several tools have been added to the library to make it easier to perform test-driven development.
But these tools are only mild extensions of existing editors and development environments, without really rethinking what a development environment should look like.


Exploratory testing is an important part of Agile testing,
Interest in exploratory testing has grown in recent years. We absolutely agree with that, but we do not think it has gone far enough. We think
Exploration should be central to every part of the software development process.


The legendary Donald Knuth was ahead of his time and wanted to see a different approach to development. In 1983, he put forward a method, called “allusions programming” and described it as “the combination of programming languages and document language, so that the writing program is written in a high-level language programs more solid, more portability, easy to maintain, more fun to write. Its main idea is to process as the audience for human rather than the computer’s literary works. ”


I was obsessed with this idea for a long time, but unfortunately it didn’t work out. Because it takes longer to develop software, no one wants to pay that price.


Nearly 30 years later, another transformative thinker, Bret Victor, expressed deep dissatisfaction with the development tools of the day and described how to design “programming systems that understand programs.” In his breakthrough talk at Inventing on Principle, he said: “Our current concept of a computer program is a string of text definitions that you pass into a compiler based directly on Fortran and ALGOL in the late 1950s. But Fortran and ALGOL were designed for punched cards.”


He presented perfect examples and new principles for the design of multiple programming systems. Although no one has fully implemented all of his ideas, some have tried to implement them. Perhaps the best known and most complete implementation (including the presentation of intermediate results) is Swift and Xcode Playgrounds, created by Chris Lattner.


Demo of Xcode Playgrounds.



While this is an important leap forward, it is still limited by one fundamental limitation: the development environment was not built to involve such exploration. For example, the development environment cannot capture the process of exploration, tests cannot be integrated directly into the development environment, and a full-fledged version of literary programming cannot be implemented.


Interactive programming environment


There is a different direction in software development, namely interactive programming (and related real-time programming). Attempts at interactive programming have been around for decades, such as LISP and Forth REPL, which allow developers to interactively add and remove code from running applications. Smalltalk takes this one step further by providing a fully interactive visual workspace. In all of these cases, the language itself works well with interactive ways of working, such as LISP’s macro system and the “Code as Data” foundation.


Real-time programming in the Smalltalk language (1980).



Today, this approach is not the most conventional approach to software development, but it is the most popular approach in many fields, including science, statistics, and other data-driven programming. (JavaScript front-end programming continues to borrow ideas from these approaches, such as hot reloading and in-browser real-time editing.) For example, Matlab emerged as a fully interactive tool in the 1970s, and is still widely used in engineering, biology, and more (it still provides general software development functions). A similar approach has been used by S-Plus, the open source language associated with S-Plus that is currently very popular in the statistics and data visualization community.


I was very excited when I first used Mathematica 25 years ago. For me, Mathematica is the most likely language to support literary programming without compromising productivity. Mathematica uses the “Notebook” interface, which behaves like a traditional REPL, but allows other types of information, such as charts, images, formatted text, outline sections, and so on. In fact, not only did it not affect productivity, I used it to build things I couldn’t build before. It helps me get visual feedback immediately after I experiment with the algorithm.


In the end, Mathematica didn’t help me build anything useful, because I couldn’t distribute my code or applications to colleagues (unless they paid thousands of dollars for a Mathematica license), and I couldn’t easily create web applications that worked in a browser. In addition, I find that Mathematica code is generally slower and more memory intensive than code written in other languages.


So you can imagine how excited I was when Jupyter Notebook was born. Jupyter Notebook has the same basic Notebook interface as Mathematica (although initially the Jupyter Notebook interface had only a fraction of the functionality of the latter) and is open source, allowing me to write code in a widely supported and freely available language. I have used Jupyter to explore algorithms, apis and new research ideas, as well as as a teaching tool for FAST. Ai. Many students find it has the ability to experiment with inputs, view intermediate results and outputs, and allow modifications, helping them to develop a more complete and profound understanding of the topic under discussion.


We also wrote a book using Jupyter Notebook, which was fun. Based on Jupyter Notebook, we combined prose, code examples, hierarchical titles, and so on in the book, while ensuring that sample output (charts, tables, and images) perfectly matched the code examples.


In short: We really like using Jupyter Notebook and have done great things with it, and the students love it. But we can’t use it to build our own software!


What’s missing from Jupyter Notebook?


Jupyter Notebook is good at the Exploration part of Exploratory programming, but it’s not so good at programming. For example, it does not provide a way to do the following:


  • Create modular reusable code that can be run outside of Jupyter;

  • Create searchable hyperlinked documents;

  • Testing code (including automated code testing through continuous integration);

  • Code navigation;

  • Version control.



As a result, developers often need to switch between tools that are not well integrated to take advantage of those tools, and switching back and forth between tools can lead to conflict. The advantages of different tools are as follows:




We think the best way to handle these conflicts is to build the required functionality using existing tools that are useful. For example, a useful tool already exists for handling pull requests and viewing diff: ReviewNB. When you look at the illustrated diff in ReviewNB, you suddenly notice missing information in the plain text diff. For example, what if a COMMIT blurs the resulting image generation, or leaves the chart unlabeled? When you visualize these diff’s, you’ll know exactly what’s going on.


The visual diff in ReviewNB shows the changes to the table output.



Nbdev avoids many merge conflicts because it installs git hooks that remove some of the metadata causing the conflict in the first place. If a merge conflict occurs when you perform Git pull, simply run nbdev_fix_merge. When running this command, nbdev uses only the cell output with conflicting output, and if the cell input is conflicting, the final notebook contains two cells and a conflict flag. This way you can easily find them and repair them directly in Jupyter.


Example of cell-based merge conflicts in NBdev.



Nbdev creates modular reusable code by simply creating standard Python modules. Nbdev looks for special comments in code cells, such as #export (indicating that the cell should be exported to a Python module). Use special comments at the beginning of the notebook to associate each notebook with a specific Python module. Documentation sites (using Jekyll to be directly supported by GitHub Pages) are created automatically based on notebook and special annotations. We wrote our own documentation system because existing methods, such as Sphinx, did not provide all the functionality we needed.


As for code navigation, most editors and ides (such as vim, Emacs, and vscode) have some nice features built into them. GitHub’s web interface even supports code navigation directly (currently in beta, only for selected projects such as fast.ai). So we made sure that the code exported by NBdev can be navigated and edited directly on any system, and that any edits are automatically synchronized to the Notebook.


As for testing, we’ve written our own simple libraries and command-line tools. As part of the exploration and development (and documentation) process, tests can be written directly in the Notebook, and command-line tools run tests in parallel across all notebooks. Notebook’s natural Statefulness is an important way to develop unit testing and integration testing. You don’t need special syntax to learn how to create test suites, just use the regular Collection and Looping structures in Python, and there are far fewer new concepts to learn.


These tests can also be run in normal continuous integration tools and provide explicit information about the source of the test error. The default NBdev template integrates GitHub Actions for continuous integration and other features.


Dynamic Python


One of the challenges of fully supporting Python in a regular editor or IDE is the powerful dynamic nature of Python. For example, you can add methods to classes at any time, use a metaclass system to change how classes are created and how classes work, and use decorators to change how functions and methods work. Microsoft has developed the Language Server Protocol, which can be used in development environments to get current file and project information needed for auto-completion, code navigation, and so on. However, for a truly dynamic language (such as Python), such information is usually just a guess, since providing the correct information requires running Python code (which Python cannot do for a variety of reasons, such as the possibility that the code may be in a mess when written, causing all files to be deleted).


Notebook, on the other hand, contains the actual running Python interpreter instance, which is completely within your control. Thus, Jupyter can provide auto-completion, parameter lists, and contextual documentation based on the actual state of the code. For example, when using Pandas, we get TAB auto-completion of all column names in DataFrames. We have found that this feature of Jupyter Notebook improves the productivity of exploratory programming. It works fine in NBDev without making any changes. These are just some of the features of Jupyter that are available free of charge when building a development environment based on Jupyter Notebook.


The status quo


Along with the development of NBdev, we wrote Fastai V2 from scratch using NBdev. Fastai V2, which provides rich, well-structured apis for building deep learning models, will be released in the first half of 2020. It’s now fully functional, and early adopters are already building cool projects with pre-release versions. We also have other projects written in Fastai V2, some of which will be released in the coming weeks.


We found that using NBDev was 1-2 times more productive than using traditional programming tools. It was a big surprise to me. Having written code for more than 30 years and tried dozens of tools, libraries, and systems for building programs, I had no idea there was so much room for productivity improvements. Now, I’m excited about the future, I think there’s a lot of room for improvement in developer productivity, and I look forward to seeing people create new projects with NBDev.