This article was originally published by AI Frontier.
Lore: Deep learning model from configuration to deployment in 15 minutes


By Montana Low


The translator | Debra


Edit | Emily

Today, Keras author Francois Chollet has posted a post on how to configure, build, deploy, and maintain deep learning models in 15 minutes based on the open source framework Lore. Can you really build a deep learning model in such a short time? Let’s find out.

As Instacart has evolved, we’ve learned a few things, but the way to get them is not easy. Lore, our open source framework, allows engineers to take machine learning one step further and machine learning researchers to operate it.

Open source framework Lore link:

Github.com/instacart/l…

The general impression of machine learning is this:

Well, this piece of paper doesn’t tell me how it works……

Q&A

  1. When you write custom code at a high level like Python or SQL, it’s easy to hit performance bottlenecks.
  2. Code complexity increases because valuable models are the result of multiple iterations, so as code evolves in an unstructured way, individuals feel overwhelmed.
  3. As data and library dependencies change, repeatability suffers.
  4. When we’re trying to keep up with the latest papers, components, features, and bugs, information overload can easily cause us to miss new, low-hanging fruit. For those just entering the field, the situation is even worse.

To address these issues, we are standardizing machine learning in Lore. At Instacart, three of our teams are using Lore for machine learning development and are already running more than a dozen Lore models.

TLDR

Here’s a quick prediction demo without context, you can copy my_app from Github. To see the full story, skip to the outline.

$ pip3 install lore
 $ git clone https://github.com/montanalow/my_app.git
$ cd my_app 
$ lore install # caching all dependencies locally takes a few minutes the first time
$ lore server & 
$ curl "http://localhost:5000/product_popularity.Keras/predict.json? product_name=Banana&department=produce"
Copy the code

Functional specification

Throw your own deep learning project into production in 15 minutes, and you get a good sense of what Lore is all about. If you want to take a look at the feature specs before writing the code, see the following brief overview:

  1. The model supports hyperparameter search of the estimator using data pipes. They will effectively leverage multiple Gpus (if available) and two different strategies, and can be saved and distributed due to horizontal scalability.
  2. Support for estimators from multiple packages: Keras, XGBoost, and SciKit Learn. They can all be sorted through build, debug, or predictive coverage to fully customize your algorithm and architecture, while benefiting from many other aspects.
  3. Pipelines can avoid information leakage between training and testing, and a pipeline can be experimented with with many different estimators. Disk-based pipes can be used if the machine does not have enough AVAILABLE RAM space.
  4. Converters standardize advanced functional programming. For example, converting a U.S. name to its statistical age or sex based on U.S. census data; Extract the geographic area code from a free-form phone number string; Pandas effectively supports common date, time, and string operations.
  5. The encoder provides sufficient input to the estimator and avoids the common problems of missing and long-tailed values. Well tested, they can save you the effort of retrieving information from the garbage.
  6. For popular (No)SQL databases, the entire application configudes IO connections in a standardized way, with transaction management and read and write optimization for bulk data, rather than using typical ORM operations. In addition to the encrypted S3 buckets used to distribute models and datasets, the connection shares a configurable query cache.
  7. Dependencies are managed for each application in development and can be copied 100 percent into production. There is no manual activation, no breaking of Python environment variables or hidden files. Don’t bother with venv, PyEnv, Pyvenv, Virtualenv, Virtualenvwrapper, Pipenv, conda.
  8. Model testing can be done in your own continuous integration environment, enabling continuous deployment of code and training updates without increasing the workload of the infrastructure team.
  9. Workflow supports command line, Python console, Jupyter notebook, or IDE. Each environment can be configured with readable logging and timing statements for production and development.

15 minutes to build a framework for deep learning

A basic knowledge of Python is all you need to get started. If your machine can’t learn, spend the rest of your time exploring the intricacies of machine learning.

  1. Create a new application (3 minutes)
  2. Design model (1 minute)
  3. Generating frames (2 minutes)
  4. Pipeline deployment (5 minutes)
  5. Test code (1 minute)
  6. Training Model (1 min)
  7. Put into production (2 minutes)

Create a new application

Lore manages each project’s dependencies independently to avoid conflicts with your system Python or other projects. Install Lore as a standard PIP package:

# On Linux
$ pip install lore
# On OS X use homebrew python 2 or 3
$ brew install python3 && pip3 install lore
Copy the code

When you can’t create the same environment, it’s hard to duplicate someone else’s work. Lore preserves the way your operating system, Python, works to prevent dependency errors and conflicts between projects. Each Lore application has its own directory, its own Python installation, and locks its required dependencies into specific versions in Runtime.txt and requirements.txt. This makes sharing Lore applications more efficient and takes our machine learning project a step closer to avoiding duplication.

Once installed, you can create a new application for deep learning projects. Lore is modular by default, so we need to specify Keras to install the deep learning dependencies for this project.

$ lore init my_app --python-version=3.6.4 --keras
Copy the code

The design model

We’ll show you how to build a model that predicts how popular an item will be on Instacart based solely on the name and category it belongs to. Manufacturers around the world test names for various categories of products, while retailers optimize products to maximize their appeal. Our simple AI will provide the same service.

One of the hardest challenges of machine learning is getting good data. Fortunately, Instacart has released 3 million anonymous goods orders for the task, and retuned them. We can then build the problem into a supervised learning regression model that predicts annual sales based on two characteristics: product name and category.

Please note that the model we will build is for demonstration purposes only and has no practical use. We’ve provided a good model for curious readers, just for practice.

Generation framework

$ cd my_app
$ lore generate scaffold product_popularity --keras --regression --holdout
Copy the code

Each Lore model includes a pipeline for loading and encoding data, as well as an estimator to deploy specific machine learning algorithms. What’s interesting about this model is the details of the class generation implementation.

The whole process starts with raw data on the left and encodes it into the form required on the right. The estimator is then trained with coded data, validation test early termination data, and evaluated on the test set. Everything can be serialized in the model store and then trained by loading the data loop again.

Schematic diagram of model working principle

Pipeline deployment

Very few raw data are fully applicable to machine learning algorithms. Usually we get raw data from a database or download a CSV file, adjust it to fit the algorithm, and then divide it into training sets and test sets. The base class in Charge executes this logic according to a standardized process.

The lore. Pipelines. The holdout. Base data into training, validation and test set, and machine learning algorithms for us. Our subclasses will be responsible for defining three methods: get_data, get_encoders, and get_output_encoder.

Instacart publishes data in multiple CSV files.

The pipe’s get_data will download the raw Instacart data and use PANDAS to add it to the DataFrame with (product_name, department) and the corresponding total sales:

Here is the deployment of get_data:

Next, we need to specify an encoder for each column. Computer scientists might think of encoders as a form of annotation for effective machine learning. Some products have funny names, so we pick the first 15 words.

That’s how it works. Our initial estimates of editor as the lore. The estimators. Keras. A simple Regression subclasses, it builds a classic deep learning framework has a reasonable default values.

# my_app/estimators/product_popularity.py

import lore.estimators.keras


class Keras(lore.estimators.keras.Regression):
    pass
Copy the code

Finally, our model implements the advanced properties of a deep learning architecture by returning them to the estimator and extracting data from the built framework.


The test code

When the framework is generated, the model is automatically smoke tested. Running the test for the first time takes a little longer to download the 200MB test data set. It’s a good idea to take care of the file cache in./tests/data and put it in the REPO to eliminate network dependencies and speed up testing.

$ lore test tests.unit.test_product_popularity
Copy the code

Training model

The data for the training model is cached in./data and the artifacts are saved in./ Models.

$ lore fit my_app.models.product_popularity.Keras --test --score
Copy the code

Check Lore’s time using the logs in the second terminal.

$ tail -f logs/development.log
Copy the code

Try adding more hidden layers to see if it helps with model score. You can edit the model file, or call the properties directly from the command line to fit –hidden_layers = 5. It may take about 30 seconds to cache the data set.


Examine the functionality of the model

You can run Jupyter notebooks in your Lore environment. Lore will install a custom Jupyter kernel and use the virtual ENV of the application you created for Lore Notebook and Lore Console.

Notebooks/Product_popularity/features.ipynb and “Run It All” to see the visualization of the last fit of the model.

The “production” department is coded as “20”

As you can see, the model’s predictions fit well with the test set (gold) for a particular function. In this case, there are 21 categories that overlap well, except for the category “produce,” because the model doesn’t fully calculate how many outliers it has.

You can also see the deep learning framework generated by the/product_Popularity/Architecture. Ipynb script.


The 15 tag names run through the LSTM on the left, the category names go into the embed on the right, and then both pass through the hidden layer.

Provide services for the model

Lore application can be used as a native model HTTP API. By default, models will expose their “prediction” methods through HTTP GET endpoints.

My results showed that adding “organic” to “banana” tripled sales of “produce.” Sales of the green banana are expected to be weaker than those of the Brown banana.

Put into production

Lore can be deployed with all infrastructures that support Heroku BuildPack. Buildpacks installations in runtime. TXT and requirements. TXT have specifications. If you want to scale horizontally in the cloud, follow heroku’s getting started guide.

You can view each time in

./models/my_app.models.product_popularity/Keras/

Results of the lore fit command. This directory and./data/ are in.gitignore by default, because your code can rebuild them at any time. A simple deployment strategy is to insert the version of the model to be published.

$ git init .
$ git add .
$ git add -f models/my_app.models.product_popularity/Keras/1 # or your preferred fitting number to deploy
$ git commit -m "My first lore app!"
Copy the code

Heroku makes it easy to publish applications. This is their guide to getting started.

Devcenter.heroku.com/articles/ge…

This is the TLDR:

$ heroku login
$ heroku create
$ heroku config:set LORE_PROJECT=my_app
$ heroku config:setLORE_ENV=production $git push heroku master $heroku open $curl "' heroku info-s | grep web_url | cut -d= -f2`product_popularity.Keras/predict.json? Product_name = Banana&department = produce"Copy the code


The next step

We feel that version 0.5 of the framework provides a solid foundation for the community to build version 1.0 together. Major changes can be avoided by releasing a patch version, but a minor version may change functionality to meet community needs, discarding older versions and updating them.

Here are some features we plan to add before 1.0:

  • Web UI with visual model/estimator/feature analysis
  • Integrate distributed computing support during model training and data processing, i.e., queuing problems
  • Test for bad data or frameworks, not just broken code
  • More documentation, estimators, encoders and converters
  • Full Windows support

Thanks to Jeremy Stanley, Emmanuel Turlay and Shrikar Archak for their contributions to the code.

Original link:

Tech.instacart.com/how-to-buil…

For more content, you can follow AI Front, ID: AI-front, reply “AI”, “TF”, “big Data” to get AI Front series PDF mini-book and skill Map.