DeepDetect, an open source API and service designed for deep learning. DeepDetect’s API is simple, intuitive, easy to use, generic, and easy to extend.

With the help of other contributors, he has integrated Caffe, XGBoost, and Tensorflow (soon to be completed) without making any changes to the original service or API.

XGBoost gradient elevation tree is a common algorithm for depth models. Tensorflow supports distributed training model and data, and can well support LSTM and RNNs neural network algorithms. Caffe specializes in working with image and text data. DeepDetect lets you move freely between these deep learning frameworks.

Here are the main principles for implementing a common deep learning API. We also look forward to your contributions and comments to improve DeepDetect.

  • Startups are looking to build a verifiable deep learning SaaS API that can be extended and quickly marketable and productized.
  • Enterprises expect seamless connection with existing systems. Data flow is slow at the beginning, and subsequent models need to be optimized with the increase of data. And the corresponding technology can be copied to other projects or departments.

Open source projects that meet both of these requirements include search engine Elasticsearch, extensible search engine, clear REST-style API, and fully jsonized input/output data structure.

So how do you implement deep learning API integration services? Here are a few:

  • No rewriting: Deep learning (machine learning) is like cryptography, it only needs to be generated once. No rewriting is essential to the existence of multiple deep learning libraries;
  • Seamless transition: Having the same environment for development and product release speeds up testing and release cycles and avoids bugs;
  • Simplified command line: Simple, user-friendly input/output format, such as JSON format. Simplicity is king;
  • Productization: The professional machine learning service lifecycle is more expected in data prediction than in training models.

If there were a universal machine learning service that could combine these points, with simple and powerful apis, it would satisfy both developer and enterprise needs and move seamlessly between development and production. It will use a JSON data format, unify other deep learning and machine learning development libraries with a single framework and API, and hide the internal complexity of individual code.

A core part of the DeepDetect Machine learning API is resource and data input/output formats. Resources used to refer to server resources, not machine learning services. The reason for this design is that GPU and embedded POST machine learning services are scarce resources. Show them what the core resources are:

  • Server information: Obtain server information from the GET directory /info.
  • Machine learning service management: Manage machine learning services through PUT (create a machine learning service), GET (GET a machine learning service status) and POST (update a machine learning service) directory /services;
  • Model training: model training through POST (create a new training job), GET (GET the status of a training job) and DELETE (cancel a training job) directory /train;
  • Predict data: Predict data through POST (send data to service) directory /predict.

So services include machine learning services, model training and data prediction, and these service resources are the two main operations on statistical models. There is no difference between supervised and unsupervised learning services at this stage.

The main parameters of machine learning are input or preprocessing, statistical learning and final output. The three that come to mind are input, MLlib and Output. Mllib specifies a supported machine learning library. Input and output are not written to be self-evident. Here is an example of creating an image classification service:

PUT /services/imageserv {" description ":" image classification Service ", "mllib" : "caffe", "model" : {" repository ": "/ path/to/models/imgnet", "templates" : ".. /templates/caffe/ "}, "parameters" : {" input ": {" connector" : "image"}, "mllib" : {" nclasses ": 1000," template ": "Googlenet}", "output" : {}}, "type" : "supervised"}Copy the code

Parameters generally include input, MLlib, and output. Supervised learning service and unsupervised learning service can be set by adjusting the output connector. The input Connector handles input formats, including CSV, libSVM, and text, including images and features. The MLlib widget specifies a machine learning library for service creation, training, and prediction models, conveniently referencing the parameters of each machine learning library and preserving the parameter flags.

Here is an example of an input connector in CSV format:

"Input" : {" id ":" id ", "label" : "Cover", "the separator" : ", ", "shuffle" : true, "test_split" : 0.1}Copy the code

Here is an output connector for a typical training model:

"Output" : {" measure ": [" ACC", "MCLL", "F1"]}Copy the code

Here’s the output for a bit of complexity, the output template for Mustache format (the standardized JSON format can be converted to any other format) :

{" network ": {" http_method" : "POST", "url" : "http://localhost:9200/images/img"}, "template" : "{{{# body}} {{# predictions}} \ \" uri ": {{uri}} \" \ ", \ "category \" : [{{# classes}} {\ "category \" : {{cat}} \ "\", \ "score \" : {{prob}}} {{^ last}}, {{/ last}} {{/ classes}}] {{/ predictions}} {{/ body}}}}"Copy the code

The template can be directly input Elasticsearch and supervised learning classification results generated index, see www.deepdetect.com/tutorials/e… . Note that the Network object, which posts to the output server, can also be used in the input Connector to connect to a remote input source.

The template above is a typical DeepDetect server supervised classification JSON output:

Predictions: {" classes ": [{" CAT" : "N03868863 Oxygen Mask", "PROB" : 0.24278657138347626},], "Loss" : 0.0, "uri" : "http://i.ytimg.com/vi/0vxOhd4qlnA/maxresdefault.jpg"}Copy the code

The above example does not require “glue” code to integrate into an existing project pipeline, which is a good fit for many businesses.

Here’s a quick look at mllib components, including Caffe and XGBoost:

/ / Caffe "mllib" : {" gpu ": true,".net ": {" batch_size" : 128}, "solver: {"test_interval":1000, "iterations":16000, "base_lr":0.01, "solver_type":"SGD"}} // XGBoost "mllib": {"iterations": 100, "objective": "multi:softprob" }Copy the code

In Caffe’s case, the server uses a GPU, and other parameters include Solver and learning Rate. For the XGBoost example, the parameters Iterations and Objective are set.

The next important part is data prediction. Observing the life cycle of machine learning services, it is important to make predictions based on data:

curl -X POST 'http://localhost:8080/predict' -d '{"service":"covert","parameters":{"input": {" id ":" id ", "the separator" : ", "}}, "data" : [" test. CSV "]} 'Copy the code

The MLLIB part is omitted here, which is sometimes useful when extracting features from deep networks. Similarly in unsupervised learning, the output is a tensor rather than a class or regression object:

"mllib":{"extract_layer":"pool5/7x7_s1"}
Copy the code

To conclude, the core points of the song’s machine learning API:

  • Readability: All data structures are simple and user-friendly;
  • Universality: Common apis for supervised and unsupervised learning services; REST style and programmable API: this API is available over the web, but retains the original C++ logo;
  • “Fiction”; It is easy to learn additional features and resources, for example, to implement service chains for multiple predictions.

About the author: The author has worked on machine learning, deep learning, reinforcement learning and Markov decision process for more than 10 years and knows the joys and sorrows of them. He has developed his own tools and systems, mostly open-source, for industrial-grade applications ranging from NASA’s Mars rover activity models to Airbus’s cyber-security systems to industrial automation systems. A year ago, he focused more on commercializing AI toolsets, deep learning and neural networks. It’s been amazing to see a number of great machine learning libraries open source, transparent to developers, friendly to developers, and updated in a timely manner.

Xia Tian focuses on bigdata, machine learning and mathematics, and has a public account: Bigdata_ny to share relevant technical articles.

The original article:
A Machine Learning API to rule them all: Caffe, XGBoost and Tensorflow are in A boat…


Translation:
InfoQ


By Emmanuel Benazera