Big data | building personalized recommendation engine system

What is RecEng:

Recommendation Engine (RecEng) is A Recommendation service framework established in ali Cloud computing environment, which is used to predict users’ preferences for items in real time, support you to customize Recommendation algorithms, and support A/B Test effect comparison. Ali Cloud recommendation engine, data-driven business, through artificial intelligence to achieve 1-to-1 marketing, to provide customized services for your customers, help enterprises quickly innovate. At the same time, it can reduce operating costs for your enterprise, improve customer satisfaction and loyalty to the enterprise, and improve the business objectives of the enterprise.

Course details :(click “learn more” below to learn now)

This certification system explains the concept, application and algorithm principle of recommendation system, and introduces ali’s recommendation engine product RecEng in detail. Finally, through a micro project, let students build a recommendation system by hand.

The whole process is divided into four parts: data upload, data preprocessing, recommendation system setting and test on-line. Students can refer to this experiment and apply what they have learned to practice in combination with their own business and needs.

Through this case, students can understand the concept, application, algorithm principle of recommendation system and the application method of RecEng, ali’s recommendation engine product. Through hands-on practice, students can independently use RecEng, the recommendation engine product, to quickly build enterprise recommendation system.

Recommendation engine basic concept:

Client/Tenant (org/ Tenant)

Refers to RecEng users, represented in the system by their Ari Cloud accounts. Usually a customer is an organization, and org is often used in RecEng to represent a customer.

User

Refers to the user of the customer, that is, the RecEng user. Recommendation is a 2C service. Customers who use the recommendation service must have their own users. The user of RecEng user is referred to as “user”, and user is often used in the system to represent the user.

Item

It refers to the content recommended to users, which can be goods, songs, videos and other contents. Item commonly used in the system refers to items.

Business (BIZ)

The business defines the data range that the algorithm can use for the data set definition. A customer can have multiple businesses on RecEng, and different businesses must have different data sets. RecEng requires each business to provide four types of data (not all) : user data, item data, user behavior data, and recommendation effect data. Each set of such data constitutes a business. Biz is commonly used in the system.

For example, A customer A has two recommended items, namely video and song, so customer A can establish two businesses M and N on RecEng, in which M’s item data is video, N’s item data is song, and other data (user data, user behavior data, etc.) can be the same. In this scheme, the data of business M and N are independent, that is, although business M can see the user’s behavior for the song, business M does not contain the item data of the song, so the user’s behavior for the song will be discarded. If a user in service M takes actions only for songs but not for videos, service M also discards such users. And vice versa for business N.

It is best for a business to recommend only one category of items. The recommendation of multiple types of items will be supported in the subsequent industry template, so the concept of plate should be introduced. One piece of business data can generate data sets of multiple plates, and the scene can be bound to a certain plate for recommendation algorithm calculation.

Scenario (SCN)

Scenarios refer to the recommended context, each scenario outputs an API, and the scenario is determined by the parameters available at the time of the recommendation. The two most common scenarios are recommended on the home page and the details page respectively. As the name implies, only user information is available when performing home page recommendations; When performing the detailed page recommendation, the available parameters include not only user information, but also the item information displayed on the current detailed page. SCN is commonly used in the system to represent scenarios.

A business can contain multiple scenarios, that is, it is perfectly acceptable for A business A to contain multiple home page scenarios.

In fact, going back to the original definition of the scene, the scene is only determined by the context of the recommendation. Customers can completely establish a new scene according to their own needs, such as the recommendation scene for the search keywords, at this time the available parameters are not only user information, but also the keywords entered by the user.

Flow

Algorithm process index data end-to-end processing process, part of the process belongs to the business category, such as data import process, effect calculation process, data quality calculation process; Some of them are scenes, such as scene algorithm flow. According to data source type and output, it can be divided into offline process, near-line process and online process

1. Offline processes

Generally, the input and output of offline processes are MaxCompute (original ODPS) tables, so the offline data specification is actually a format specification of a group of MaxCompute tables, including the format specification of access data, intermediate data and output data. The access data refers to the data of users, items and logs provided by customers offline. The intermediate data refers to the result data table of various intermediate properties generated in the offline algorithm process. The output data refers to the recommendation result data table, which will be imported into the online storage for the online computing module.

2. Near-line process

The nearline process of recommendation engine mainly deals with the updating of offline recommendation results when user behavior changes and recommended items are updated. Unlike offline algorithms, which naturally take MaxCompute (original ODPS) tables as input and output, the input data of near-line programs can come from multiple data sources, such as online table storage (original OTS), user API requests, or variables in the program. The output can be a program variable, written back to online storage, or returned to the user. For security reasons, the recommendation engine provides a set of SDKS for customers to customize the online code reading and writing Table Store. Direct access is not allowed, so you need to define the alias and format of each type of online Store. For frequently used online data, whether from online storage or user API requests, RecEng pre-reads and stores it in variables of the online application, which can be read and written directly by custom code.

3. Online processes

The online process of the recommendation engine is responsible for filtering, reweighting and complementing the recommendation results generated by offline and near line correction in real time when the recommendation API receives the API request. The latter mainly deals with the updating of offline recommendation results when user behaviors change and recommended items are updated

A scenario contains only one offline process and one nearline process, and can contain multiple online processes to support A/BTest.

Algorithm Strategy

The algorithm policy defines a set of offline/nearline processes. In addition, relevant algorithm parameters are revealed to help customers build their own algorithm flow. A scenario can be configured with multiple algorithm policies, which will eventually be combined and executed to produce a series of recommendation candidate sets and filter sets. Online processes can reference these candidate sets to complete personalized recommendations.

Job/Task (Task)

A job refers to an instance of a running offline process. The relationship between a job and an offline process is exactly the same as that between a process and a program. Each job is non-reentrant, meaning that only one instance is allowed to run at a time for each offline process. Jobs have a direct upstream and downstream relationship. If the upstream job fails, the downstream task will also be canceled.

Ali cloud developer community comprehensive upgrade, one-stop experience, with much more: https://developer.aliyun.com?spm=a2c41.12958151.0.0 (copy the link to the browser, remember to collect)

Big data | building personalized recommendation engine system

Related Posts

Use of the Flume log Collection framework

“River Boy” reading notes

<Python heuristic automation > wechat push