This article is a summary of the author’s experience in customizing Jupyter in the past year.

Jupyter provides a one-stop multi-kernel integrated notebook development environment. After experiencing several versions of Jupyter Notebook and JupyterLab, more and more users are using it. We have also built a Jupyter platform in our company for students in data science and data analysis.

End-to-end custom Jupyter

A customized Jupyter platform includes the following

  1. Deployment mode: JupyterLab is very easy to be deployed locally. Docker container is adopted for the deployment within the company, which is convenient for version upgrade and can realize the isolation of physical resources to ensure the stability of a single instance
  2. Data storage: The local disk of docker container only provides image storage, all data and scripts are uniformly stored in shared storage (ceph shared disk is used), and all personal configurations, customized extension packages, scripts and data files are uniformly stored in shared storage to achieve decoupling with instances
  3. Data integration: Frequent data processing is required for internal data analysis. The PyHive extension package is used to access Hive, query_hive and create_table are encapsulated, and parameters such as cache, engine routing, and maximum number of rows are set
  4. Function customization: including code sharing, code collaboration, code version management, content synchronization, etc
  5. Content construction: Different users of Jupyter platform have different proficiency in using, and there is a certain degree of coincidence in coding scenes according to departments and functions. In view of this, we developed demo of typical scenes and code blocks of typical functions, so as to facilitate the promotion of excellent coding experience in the company and align the way of using
  6. Monitoring system: including instance resource monitoring (it is convenient for users to know instance status and avoid OOM crash) and function usage monitoring (the platform can timely know the usage of launched functions)
  7. Operation system: continuously operate the platform through product training, round-table communication, help documents, use skills sharing and other ways

How to make an enabling tool within a company

It has been one and a half years since my transformation to make products, but I still haven’t formed my own system and understanding. Here are some fragments of understanding for making tools, which still need more refinement

  1. Internal product/B side how to survival and development of products: very realistic, the company’s internal resources competition products need to attract attention, by all means to make their products to attract the attention of business users, bosses, any product is piling up resources, have the function of the products to rich resources, the end user and the product yield
  2. Layered experience: the experience of the product is layered, good-looking interface UI, user-friendly functions, robust computing speed and system stability; It is recommended to optimize the experience step by step from the bottom to the front: stability > computing performance > feature richness > beautiful interface
  3. Understanding users: It will become popular in 2020 to get involved in b-side products. B-side products have a certain professional threshold. Product managers should constantly learn from users to understand their scenarios, businesses and usage modes
  4. Lead users: understand that if users understand that the product is based on the foundation, through the understanding of competitive products and cutting-edge technology and transform it into internal functions, the use of Amway to users reflects the improvement of the product; When designing new functions and new experiences, we should pay attention to the protection of users’ original use paths, and infiltrate new functions and methods into users’ use of products in a silent way

The above are some bits and pieces. I hope we can continue to improve our understanding in the future, build better data analysis tools, and empower users.