PredictionIO: An open source recommendation system

PredictionIO

PredictionIO is an open source machine learning server application written in Scala that allows you to easily build a recommendation engine using RESTFul apis. PredictionIO’s core uses a scalable machine learning library that builds on Spark’s complete end-to-end Pipeline, making it easy to build a recommendation system from scratch. “

PredictionIO consists of three components:

PredictionIO platform
Event Server: Collects data from applications, either instantaneously or regularly.
Engine: Trains models and provides queries with the results in Restful apis.

Install

A quick one-click installation method is available, but manual installation is also possible.

$ bash -c "$(curl -s https://install.prediction.io/install.sh)"
$ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH
Copy the code

You can check for successful installation by specifying the following, and the status of each kit connection will be returned

$ pio status

### Return:[INFO] [Console$] Inspecting PredictionIO... [INFO] [Console$] PredictionIO 0.9.6 is installed at... [INFO] [Console$] Inspecting Apache Spark... [INFO] [Console$] Apache Spark is installed at ... [INFO] [Console$] Apache Spark 1.6.0 detected... [INFO] [Console$] Inspecting storage backend connections... [INFO] [Storage$] Verifying Meta Data Backend (Source: MYSQL)... [INFO] [Storage$] Verifying Model Data Backend (Source: MYSQL)... [INFO] [Storage$] Verifying Event Data Backend (Source: MYSQL)... [INFO] [Storage$] Test writing to Event Store (App Id 0)... [INFO] [Console$] (sleeping 5 secondsfor all messages to show up...)
[INFO] [Console$] Your system is all ready to go.
Copy the code

Quick Start

Step 1. Run PredictionIO

The PredictionIO main program is executed first. There are different execution methods for different packets.

$ pio eventserver &
# If you are using PostgreSQL or MySQL, run the following to start PredictionIO Event Server

or

$ pio-start-all
# If instead you are running HBase and Elasticsearch, run the following to start all PredictionIO Event Server, HBase, and Elasticsearch
Copy the code

Step 2. Create a new Engine from an Engine Template

Choose Engine Templates as an appropriate Engine.

$ pio template get <template-repo-path> <your-app-directory>
$ cd MyRecommendation
Copy the code

It can be selected from Engine Templates or customized. In this case, we use Universal Recommender as an example.

Step 3. Generate an App ID and Access Key

Generates an APP from the Engine and retrieves the corresponding Key.

$ pio app new MyRecommendation

### Return:
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$] Name: MyRecommendation
[INFO] [App$] ID: 1
[INFO] [App$] Access Key: ...

$ pio app list

### Return:
[INFO] [App$] Name | ID | Access Key | Allowed Event(s)
[INFO] [App$] MyRecommendation | 1 | ... | (all)
[INFO] [App$] Finished listing 1 app(s).
Copy the code

Step 4. Collecting Data

To import data, the basic Cooperative Filtering (CF) format consists of user-action-item elements. Use data/ import_eventServer.py to import formatted data into the database.

$ curl <sample_data> --create-dirs -o data/<sample_data>
$ python data/import_eventserver.py --access_key <access-key>
Copy the code

. 0::2::3 0::3::1 3::9::4 6::9::1...Copy the code

Step 5. Deploy the Engine as a Service

Before deploying the application, set basic information in engine. json, such as the appName or how many times the algorithm will run.

. "datasource": { "params" : { "appName": MyRecommendation # make sure the appName parameter match your App Name } }, ...Copy the code

The process of deploying the system to the Web Service is divided into three steps: PIO build -> PIO train -> Pio deploy Building Prepares the basic Spark environment and documents. Training is responsible for performing algorithmic modeling. Deployment runs the results on a Web Service and exposes them as Restful apis.

Bulid and Training the Predictive Model

$ pio build

### Return:
[INFO] [Console$] Your engine is ready for training.


$ pio train

### Return:
[INFO] [CoreWorkflow$] Training completed successfully.

$ pio deploy

### Return:
[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Bind successful. Ready to serve.

Copy the code

Step 6. Use the Engine

The default will be on port 8000, and the input parameter will be the user to recommend the number of goods.

$ curl -H "Content-Type: application/json" \
-d '{ "user": "1", "num": 4 }' https://localhost:8000/queries.json

### Retnrn:
{
  "itemScores":[
    {"item":"22"."score": 4.072304374729956}, {"item":"62"."score": 4.058482414005789}, {"item":"75"."score": 4.046063009943821}, {"item":"68"."score"] : 3.8153661512945325}}Copy the code

Reference

PredictionIO
PredictionIO quick start

License

This work is by Chang Wei-Yaun (V123582) and is distributed under an INNOVATIVE CC name – Share in the same way with 3.0 Unported license.