By Han Hongying; By Ein

On the origin of recommender system from the Game between people and information

Let me start with the recommendation system as I understand it.

When it comes to the definition of a recommendation system, every book and article has a different definition. Collaborative filtering has been around since 1992, and over the past 30 years, countless leaders have analyzed the origins and significance of personalized recommendations. The world doesn’t need one more person’s opinion. But when everyone says something is true, we also need to think about why it is true.

If you ask me what a recommendation system is, I’ll tell you it’s the precise distribution of information to people. So why is the recommendation system born in this era? The ancients did not need accurate information distribution, chariot information is very slow, the ancients learned five cars but now a bag of information; Only now people need accurate information distribution. Too much information is too little time, and the clutter will become attractive. Therefore, we need an intelligent system to help you filter information.

Of course, just as Rome wasn’t built in a day, building a bridge over the Internet has to evolve, starting with a small wooden bridge — the portal, which distributes information with classified navigation; Later evolved to the Stone bridge – search engine, people can find information more accurately; Too much information, step by step to become information contact, in the process, whether the consumers of information, or information producers, had never foresee difficulties, information consumers can’t find the information, information producers cannot let own information show in front of consumers, there is pain there is demand, the demand is the product, so the recommendation system as a product, Just right and inevitable arrival. Kevin Kelly, in Inevitable, calls this trend “filtering” :

Filtering is necessary because we are constantly making new things. And one of the first things that we’re going to make is new ways to filter information and personalize to highlight our differences.

The recommendation system is neither the beginning nor the end of how people deal with information, but it is currently the best practice people can do with information.

How does the recommendatory system meet the requirements

The recommendation system should be considered as a product in its own right. What is it? As an information processing product, it must meet the needs of both ends of the information supply and demand, to have value.

As a recommendation system, therefore, to define themselves in an intermediate position, to say the C end users and product managers are your users, needs to be satisfied at the ends of the demand, so you want to need both technical solution, also need you to think about it, you how to better meet the needs of both ends, users only need to precise you help him find information. As for the product side, it needs to find out what it wants to get through the recommendation system.

For the client side (information demand side), the most urgent need is how to help me accurately find the information I need.

For company information (supply side), in order to meet the needs of some commercial, such as to attract users, enhance the user viscosity, improve conversion rates, such as information platform, the short video platform, information platform to enhance user activity, extend the user retention time, electric business platform hope to improve the users buy conversion rate.

Recommended system general architecture

It can be seen from the figure above that a complete recommendation system includes the data part and the model part. The data part mainly relies on the offline or online processing platform of big data, and the main work completed includes data collection and ETL processing to generate the characteristic data required by the recommendation model.

The recommendation system model part is the main body, which should complete the training of the model before providing the recommendation model service, then process the input data, and generate the final output results through different recall or sorting strategies. The model part of a conventional industrial-level recommendation system mainly includes recall layer, filter layer and sorting layer. It can also judge whether the strategy and algorithm layer need to be supplemented according to the business needs.

1. “Recall layer” generally uses efficient recall rules, algorithms or simple models to quickly recall items that users may be interested in from a large number of candidate sets.

2. The “filter layer” filters the recalled data based on the service requirements of a specific scenario.

3. “Sorting layer” uses the sorting model to refine the candidate set of initial screening.

4. The “supplementary strategy and algorithm of layer”, also known as “reordering layer”, can be in the list of recommended before returning to the user, for both the result of “diversity” “popularity” indices such as “fresh”, combined with some complementary strategy slightly and the algorithm of adjustment, the recommended list of recommended list eventually form visible to users.

Overview and comparison of common models of recommender systems

Let’s start with a timeline of the development of recommendation algorithms

As can be seen from the figure, 2016 is the turning point of recommendation system from traditional machine learning model to Deep learning model. In this year, Microsoft’s Deep Crossing, Google’s Wide&Deep, and a large number of excellent Deep learning recommendation models such as FNN and PNN were launched successively. Then gradually become the mainstream recommendation system. But traditional recommendation model still to be seriously, the first they are the basis of in-depth study, many things are in line, the implicit vector matrix decomposition thoughts continue to grow in the Embedding, core idea – characteristics of FM cross also continue to use in the deep learning, logistic regression can be regarded as another form of neurons. Second, these algorithms have low hardware requirements, strong interpretability and easy training, and are still applicable to a large number of scenarios.

Evolution of machine learning recommendation models

As can be seen from the figure, 2016 is the turning point of recommendation system from traditional machine learning model to Deep learning model. In this year, Microsoft’s Deep Crossing, Google’s Wide&Deep, and a large number of excellent Deep learning recommendation models such as FNN and PNN were launched successively. Then gradually become the mainstream recommendation system. But traditional recommendation model still to be seriously, the first they are the basis of in-depth study, many things are in line, the implicit vector matrix decomposition thoughts continue to grow in the Embedding, core idea – characteristics of FM cross also continue to use in the deep learning, logistic regression can be regarded as another form of neurons. Second, these algorithms have low hardware requirements, strong interpretability and easy training, and are still applicable to a large number of scenarios.

Collaborative filtering

Collaborative filtering recommendation system is the most widely used in the field of the model, and everyone comes to recommend system, will be directly related to the collaborative filtering, and user-based collaborative filtering or collaborative filtering based on objects, in fact, to some extent understanding, also is a kind of collaborative filtering matrix decomposition, based on the user, collaborative filtering of commodity belongs to collaborative filtering based on neighbor, On a larger scale, most machine learning and classification algorithms can be understood as a branch of collaborative filtering, which can be seen as a generalization of the classification problem. It is for this reason that many models applicable to classification can also be applied to collaborative filtering by generalization.

This section mainly aims at the nearest-based collaborative filtering, which is understood in a broad sense. In fact, this kind of collaborative filtering is based on the similarity calculation of user – user and item – item.

User-based collaborative filtering

When users need personalized recommendations, they can first find other users similar to them (through interests, hobbies or behavior habits, etc.), and then recommend the items that users like and don’t know to users.

Steps:

  • Prepare the user vector, in which each user theoretically gets one vector
  • The dimension of a vector is the number of items, the vector is sparse, and the value of the vector can be simple 0 or 1
  • Using each user vector, the similarity between users is calculated in pairs, and a similarity threshold is set to retain the most similar users for each user
  • Generate recommendation results for each user

Collaborative filtering based on items

The simple application scenario of item-based collaborative filtering algorithm is as follows: When a user needs personalized recommendation, for example, because he has purchased The Legend of Condor Heroes by Jin Yong before, he will be recommended to the Return of Condor Heroes, because many other users have purchased the two books at the same time.

Steps:

  • Build the relationship matrix of the user’s items, which can be the purchase behavior, or the post-purchase evaluation, purchase times, etc
  • The similarity of items is calculated in pairs to obtain the similarity matrix
  • There are two typical forms of recommendation results: (1) recommending related items for an item; ② personal home page “guess you like”

There are several ways to calculate the similarity:

(1) Cosine Similarity measures the Angle between user vector T and user vector J. Obviously, the smaller the included Angle, the greater the cosine similarity, the more similar the two users.

(2) Pearson correlation coefficient, compared with cosine similarity, Pearson correlation coefficient was used to modify the independent scores by using the average score of users, which reduced the impact of user bias.

Matrix decomposition

As for matrix factorization, one thinks it is collaborative filtering, which is based on the model, while the other thinks it is the evolution of collaborative filtering. In fact, this has little influence. Matrix factorization adds the concept of implicit vector on the basis of collaborative filtering.

Matrix factorization can solve some problems that cannot be solved by neighborhood model: (1) there is correlation between objects, and the information quantity does not increase linearly with the vector dimension; ② The matrix elements are sparse, the calculation results are unstable, add or subtract a vector, the results are very different.

Matrix decomposition takes the User matrix and Item matrix as unknown quantities, and uses them to represent the predicted score of each Item by each User. Then, by minimizing the difference between the predicted score and the actual score, we learn the User matrix and Item matrix. In other words, in Figure 2, only the matrix on the left of the equal sign is known, while the User matrix and Item matrix on the right of the equal sign are unknown quantities, which are learned by matrix decomposition through minimizing the difference between the predicted score and the actual score.

The user behavior data used in matrix decomposition can be divided into explicit data and implicit data. Explicit data refers to the explicit user rating of an item, such as the user rating of a movie or a product, usually on a 5-point and 10-point scale. Implicit data refers to the user’s browsing, clicking, buying, collecting, liking, commenting, sharing and other data on item. Its characteristic is that the user does not give an explicit score on item, and the user’s interest in item is reflected in the intensity of his browsing, clicking, buying, liking, liking, commenting, sharing and other behaviors on item. We are currently focused on implicit data.

The objective function learns all user and item vectors by minimizing the sum of squares of residu between the predicted rating and the actual rating Ruirui

Displays the matrix objective function

Implicit matrix objective function

Solution method: matrix decomposition method is more than one, there are singular value decomposition, gradient descent, alternate least squares, here a simple example of alternate least squares.

ALS (alternating least squares) : X is fixed to optimize Y, then Y is fixed to optimize X, and the process is repeated until X and Y converge.

Here is an example of an explicit matrix where fixed Y optimizes X and fixed X optimizes Y:

  • The objective function is disassembled into multiple independent objective functions, and each user corresponds to one objective function. The objective function of user U is:

  • The objective function is converted to matrix form.
  • Take the gradient of the objective function J with respect to Xu, and set the gradient to zero.

Logistic regression →POLY2→FM→FFM

First of all, the idea of logistic regression is very clever, which regards the recommendation problem of recommendation system as a classification problem. Why can it be said like this?

Logistic regression can be used by sigmoid function to input the feature vector x=(x1,x2……) xn)x=(x1,x2…… Xn), is mapped to the interval (0,1), as shown in the figure below:

Logistic regression has many advantages:

  • In mathematical terms, logistic regression assumes that the dependent variable Y follows the Bernoulli distribution, which is very consistent with our understanding of the CTR model. In contrast, linear regression assumes that y follows the Gaussian distribution, which is not consistent with our understanding of the CTR prediction model (the dictation problem).
  • The simple mathematical form of logistic regression is very consistent with human intuition about the process of prediction
  • Engineering is simple, with easy parallelization, training is simple, training cost is small

But there are some drawbacks:

  • The expression ability is not strong, and the operation such as feature crossing and feature screening cannot be carried out, so some information loss will be caused

It is for this reason that the POLY2 model, FM model and FFM model appear, and I will explain them together:

POLY2 model — “violent” combination of features

This model crosses all features in pairs (features Xj1 and Xjz), and gives weight Wh(J1, J2) to all feature combinations, which is essentially a linear model:

FM — feature crossing of hidden vectors

The main difference between FM and POLY2 is that the inner product of two vectors (Wj1,Wj2) replaces the single weight coefficient Wh(J1,j2).

  • FM learns a hidden weight vector for each feature. When features cross, the inner product of two feature hidden vectors is used as the weight of the cross feature.
  • The introduction of implicit vectors, similar to the idea of matrix factorization, extends from simple user-item to all features.
  • The weight parameter of POLY2 model N ^2 is reduced to NK.

The advantage of FM in this way lies in that the introduction of implicit vector enables FM to better solve the problem of data sparsity and greatly improve the generalization ability. In engineering, FM can also be learned by gradient descent, so that it does not lose real-time and flexibility.

FFM, characteristics of the domain

The difference between FFM and FM lies in that the hidden vector changes from the original Wj1 to Wj1 and F2, which means that each feature corresponds to a set of hidden vectors rather than a single hidden vector. When feature Xj1 crosses feature Xj2, Xj1 will cross the hidden vectors Wj1 and f2 that correspond to the domain F2 of feature Xj2 from the set of hidden vectors of Xj1. Similarly, xj2 will cross the hidden vectors that correspond to the domain F1 of Xj1.

A visual representation of model evolution

POLY2 model

FM model

FFM model

Compare traditional machine learning algorithms

How do big factories play recommendation system

Dafang practice comparison

Several typical recommendation system implementations are selected here, which belong to typical scenarios of several recommendation systems

Deep learning algorithm comparison

Some deep learning models have been adopted for several large factories, and the characteristics, advantages and disadvantages of deep learning models have been investigated and compared here

Personalized recommendation for cloud classroom

Characteristics of the engineering

The user behavior data is mainly selected, and the user behavior data has two kinds of explicit feedback behavior and implicit feedback behavior in the recommendation system. In the cloud classroom scenario, the user’s rating is an explicit behavior, while the user’s purchasing, learning and taking notes are all implicit behaviors. For each of these behaviors, initial scores are given according to business importance, generating an initial user-course scoring matrix

The scoring matrix is simply shown as follows:

Algorithm selection

In the early stage of building the personalized recommendation system, because we started from 0 to 1, we did not choose complex deep learning algorithm and rich user portrait at the early stage. We hoped to quickly build an MVP version and put it online at the early stage, and then gradually reflect and optimize the iteration

Therefore, in the selection of algorithm, we evaluate and select from the following three schemes

  • Tag based matching
  • Collaborative filtering based on user/behavior
  • Collaborative filtering based on matrix Factorization

So how do we make these trade-offs?

As for scheme 1, if you want scheme 1 to achieve better results, the key point is to rely on the construction of the label system. Only the label system is perfect enough, that is to say, the quality of the recommendation results is predictable and strongly depends on the construction of the label system.

About the second scheme, its shortcoming lies in the ability to deal with the sparse matrix is weak, and the cloud user’s learning behavior in classroom and is not a high frequency behavior, and at the same time the head effect is obvious, and we hope that through the personalized recommendation system, reveal the possibility of more, keep more platforms more at ordinary times no chance exposed course, apparently collaborative filtering based on neighbor way, Not a very good choice. The method based on matrix factorization can enhance the processing power of sparse matrix to some extent, and at the same time, the implicit vector can be introduced to mine more possibilities from user behaviors.

We chose the ALS (alternate least squares) based matrix factorization model as the first practical algorithm, using the Spark MLlib API.

During the construction of ALS model, the following parameters need to be adjusted to achieve the best results

For the above parameters, the parameters were adjusted several times, and MSE and RMSE were used as evaluation indexes

Mean Square Error (MSE) and Root Mean Square Error (RMSE) are often used to measure the quality of regression models. In general, RMSE can well reflect the deviation between the predicted value and the true value of regression model. However, in practical application, if there are some outliers with a very large degree of deviation, even if the number of outliers is very small, these two indicators will become very bad.

Engineering to the ground

A landing recommendation system, data collection module, ETL module, feature engineering module, recommendation algorithm module and Web service module are essential. First, an overall architecture diagram is presented:

The following is a brief description of the implementation of several modules:

reference

1. Deep Learning Recommendation System, Zhe Wang

2. Recommendation System Principles and Practices by Charu C. Aggarwal

-END-