From: The Heart of the Machine

How hard is it for a machine learning developer to build an App?

In fact, all you need to know is Python code and a tool can do the rest of the work. StreamLit co-founder Adrien Treuille has written about StreamLit, a free, open source app development framework for machine learning engineers. This tool allows you to update your application in real time while you are writing Python code.

StreamLit has more than 15,000 GitHub stars and 9,000 + hits on Medium.

Streamlit’s website: https://streamlit.io/

Making address: https://github.com/streamlit/…

With 300 lines of Python code, program a semantic search engine that performs neural network inference in real time.

In my experience, every extraordinary machine learning project is a combination of bug-ridden, hard-to-maintain internal tools. These tools are often written in Jupyter Notebooks and Flask apps, are difficult to deploy, require reasoning about client-side server architectures (C/S architectures), and do not integrate well with machine learning components such as TensorFlow GPU sessions.

I first saw these tools at Carnegie Mellon, then at Berkeley, Google X, and Zoox. These tools started out as little Jupyter Notebooks: sensor calibration tools, simulation comparison apps, lidar alignment apps, scene reconstruction tools, etc.

As a tool becomes more and more important, the project manager becomes involved: processes and requirements increase. These individual projects become code scripts and grow into lengthy “maintenance nightmares”…

And when a tool is critical, we build a tool team. They write Vue and React expertly, and stick stickers of declarative frames all over their laptops. Their design process looks like this:

This is awesome! But all of these tools require new features, such as a weekly release of new features. Whereas a tool team might support more than 10 projects at a time, they might say, “We’ll update your tool in two months.”

Let’s go back to the process of building our own tool: deploying the Flask app, writing HTML, CSS, and JavaScript, and trying to version control everything from notebooks to style sheets. My friend Thiago Teixeira, who works at Google X, and I started thinking: What if building tools were as easy as writing Python scripts?

We want machine learning engineers to be able to build great apps without a tool team. These internal tools should emerge as a natural byproduct of a machine learning workflow. Writing a tool like this feels like training a neural network or performing ad-hoc analysis in Jupyter! At the same time, we wanted to preserve the flexibility of a strong app framework. We want to create great tools that engineers can be proud of.

Our expected APP construction process is as follows:

Working with engineers from Uber, Twitter, Stitch Fix, Dropbox, and more, we spent a year creating StreamLit, a free, open source app framework for machine learning engineers. For any prototype, the core tenets of StreamLit are simplicity and purity.

The core principles of StreamLit are as follows:

  1. Embrace the Python

StreamLit apps are completely top-down scripts with no hidden state. You can manipulate code with function calls. As long as you can write Python scripts, you can write StreamLit apps. For example, you could write to the screen as follows:

import streamlit as stst.write('Hello, world! ')

  1. Think of widgets as variables

No Callback in StreamLit! Each interaction is simply a top-down rerun of the script. This method keeps the code very clean:

import streamlit as stx = st.slider('x')
st.write(x, 'squared is', x * x)

  1. Reuse data and calculations

What if you want to download a lot of data or perform complex calculations? The key is to safely reuse information across multiple runs. StreamLit introduced the Cache Primitive, which acts as a persistent, default, immutable data store that allows StreamLit apps to easily and securely reuse information. For example, the following code only from Udacity automated driving project (https://github.com/udacity/se)… Download the data once from, and you can get a simple and fast app:

Code to run, see: https://gist.github.com/treui… . *

In a nutshell, StreamLit works like this:

  1. Each user interaction requires running the entire script from scratch.
  2. StreamLit assigns the latest value to each variable based on the state of the widget.
  3. Caching ensures that StreamLit reuses data and calculations.

As shown in the figure below:

If you’re interested, you can try it right away! Simply run the following line:

The web browser will automatically open and go to the local StreamLit app. If no browser window appears, just click the link.

These ideas are simple, but they work, and using StreamLit will not prevent you from creating rich and useful apps. While working at Zoox and Google X, I watched the self-driving car project grow into gigabytes of visual data that needed to be searched and understood, including running models on image data to compare performance. Every self-driving car project I’ve seen has a whole team working on the tools to do this.

Building such a tool in StreamLit is simple. The following StreamLit demo can perform semantic search on an entire Udacity self-driving car photo dataset, visualize human-tagged truth tags, and run a full neural network (YOLO) in real time within the app.

The entire app is only 300 lines of Python code, most of which is machine learning code. In fact, there are only 23 streamLit calls in the entire app. You can try:

As we worked with machine learning teams on their projects, we came to realize that these simple ideas would yield a number of important benefits:

StreamLit app is a pure Python file. You can use your favorite editor and debugger.

Pure Python code can seamlessly interact with source control software such as Git, including Commits, pull requests, issues, and comments. Because the underlying language of StreamLit is Python, you can take advantage of the benefits of these collaboration tools for free.

StreamLit provides a real-time mode programming environment. When StreamLit detects a change to the source file, just click Always Rerun.

Caching simplifies the computation process. A series of caching functions automatically create an efficient computation flow! You can try the following code:

Run the above code, see the instructions:

https://gist.github.com/treui….

Basically, the process involves steps from loading metadata to creating a summary (load_metadata → create_summary). Each time the script runs, StreamLit simply recalculates a subset of the process.

StreamLit works on GPUs. StreamLit has direct access to machine-level primitives (such as TensorFlow, PyTorch) and complements these libraries. For example, in the following demo, StreamLit’s cache stores the entire Nvidia PGGAN. This method allows the app to perform near-instantaneous inferences when the user updates the left slider.

The StreamLit app uses TL-GaN to demonstrate the effect of the Nvidia PGGAN.

StreamLit is a free open source library rather than a private web app. You can deploy the StreamLit app locally without contacting us in advance. You can even run StreamLit locally on your laptop without an Internet connection. In addition, existing projects can use StreamLit incrementally.

That’s just the tip of the iceberg. One of the most exciting things about it is that these primitives can easily be assembled into complex apps, but look like simple scripts. This is how the architecture works and what it does, which is not covered in this article.

We’re excited to share StreamLit with the community in the hope that it will make it easy for people to turn Python scripts into beautiful and useful machine learning apps.

English:

https://towardsdatascience.co…

Open source outpostDaily sharing of popular, interesting and useful open source projects. Participated in the maintenance of 100,000 + STAR open source technology resource library, including: Python, Java, C/C++, Go, JS, CSS, Node.js, PHP,.NET, etc.