Author: xu neglected, novak s | tencent network operations, a senior researcher

Tencent Shield’s open general recommendation system generally transforms the recommendation problem into a classification problem, while for the list recommendation scenario, the recommendation problem is more similar to the sorting problem. This paper will introduce the Pairwise method combining Ranking learning technology and recommendation algorithm and its concrete implementation — Pairwise Ranking Factorization Machines (PRFM) algorithm, and share the application of PRFM algorithm in Q animation feed stream recommendation scene. For students who want to apply this algorithm in other scenarios.

1. An overview of the

At present, the algorithm of S.H.I.E.L.D. recommendation generally formalizes the recommendation problem into binary classification problem: Take animation recommendation as an example, as shown in the figure below, for the user and the set of objects exposed by him, click is regarded as a positive sample (label =1), and click is regarded as a negative sample (label =0), and a training sample composed of < user, item > and label can be constructed for training a binary classification model


This method is called the Pointwise method, that is, the model only considers the independent score of each < user, item > in the parameter training stage, and the goal is to make the score of all positive samples higher than the score of negative samples. As shown in the example above, the training objective of the Pointwise method is that the model gives a higher score to “Blue, Fuchsbane Red”, “Blue, Rogue Sword Heart”, and “Yellow, Dragon Ball” than to “Blue, Demon Spirit” and “Yellow, Detective Conan”.

Careful thinking is not difficult to find the shortcomings of the Pointwise method, for example, for Huang, the model only needs to give the score of < Huang, Dragon Ball > is higher than < Huang, detective Conan >, as for < Huang, Dragon Ball > is higher than < Blue, demon God > is not important. In the abstract, the Pointwise method does not consider the ordering relationship between items for the same user.

The Pairwise method introduced in this paper improves the shortcomings of Pairwise method. The training sample consists of triplet < user, item 1, item 2>, in which item 1 is the item clicked by the user and item 2 is the item not clicked by the user. The Pairwise method focuses on determining whether item pairs < item 1, item 2> satisfy the sequential relationship (i.e. whether < user, item 1> is rated higher than < user, item 2>). As shown in the figure below, the objectives of Pairwise method in the model training stage are as follows: the score of “Little Blue, little Fuchsdemon Matchmaking” is higher than that of “Little Blue, Little Demon God”; the score of “little Blue, Little Rogue Sword Heart” is higher than that of “Little Blue, Little Demon God”; the score of “little Yellow, Dragon Ball” is higher than that of “Little Yellow, famous detective Conan”.


On the other hand, in the sample construction of the Pointwise method, we simply regard clicking as positive sample and unclicking as negative sample, which is equivalent to adding an assumption to the Pointwise method: the items clicked by the user are clearly liked by the user, and the items unclicked are clearly disliked by the user. In fact, the assumption that click behavior is implicit feedback is too strong. Pairwise’s approach is more realistic: the user likes the items he clicks on more than the ones he doesn’t.

As a combination of learning to rank technology and recommendation system algorithm, Pairwise method has attracted close attention and research in academia and industry in recent years. The Pairwise Ranking Factorization Machines (PRFM) algorithm introduced in this paper is a concrete implementation of the Pairwise method. We have implemented the PRFM algorithm in AEGis recommendation and applied it to the Q animation home page Feeds stream recommendation scene. Compared with Pointwise FM, PRFM achieves a relative increase of about 5% in UV click-through rate using the same features.

Note: BlackWidow: PRFM algorithm, Feeds material cascade _900: Pointwise FM algorithm

2. Pairwise Ranking Factorization Machines (PRFM) algorithm details

Like the Pointwise method, the Pairwise method can select different models to score < user, item >, such as LR, FM, etc. Since FM model can save LR model’s labor consumption in feature engineering, and it has been proved in practice that FM using original features can achieve better online effect than LR after manual tuning (see Pointwise FM algorithm), FM model is selected as the scoring formula. In terms of sample construction, many item pairs < item 1, item 2> can be constructed for each user to get sample instances of < user, item 1, item 2>. If all sample instances are considered, the sample size will be too large to be trained. Therefore, the strategy we adopt is to randomly select 100 item pairs for each user.

The above sample construction and scoring model constitute the Pairwise Ranking Factorization Machines (PRFM) algorithm. Here are some mathematical details about the PRFM algorithm, which can be skipped for those who are not interested.

Is a feature vector composed of user features, item features, context features, etc.)


Among them


Is the parameter that needs training. The loss function used in parameter training is defined as:



3. Offline evaluation index and parameter tuning of sorting algorithm

Instead of using AUC as an evaluation index for classification problems, The commonly used evaluation indexes to measure the sorting performance mainly include Precision at K, Mean Average Precision (MAP), and NORMALIZED Quotients cumulative gain (NDCG) at K. Below, we briefly introduce the definition and calculation method of these indicators through an example (in the list of exposed items shown in the figure below, green represents the items clicked by the user and gray represents the items not clicked by the user).


Precision at k: (Prec@k)

The proportion of items that the user has clicked in the first K items in the list, the higher the value, the higher the recommendation quality. In the example above, Prec@5 = 2/5 = 0.4 for list (a) and Prec@5 = 1/5 = 0.2 for list (b). A Precision AT K can be calculated for each user, and the Precision at K of the user set can be obtained by averaging all users.

MAP (mean average precision):

It can be regarded as the enhanced version of Precision at K. The position of the items clicked by the user in the list is added into the calculation of the index. The larger the value is, the higher the recommendation quality is. Average Precision (AP) means to calculate the precision at K once at the location k of the item clicked by each user, and then calculate the average. For example, AP of List (A) =(1/2+2/4)/2=0.5, and AP of list (b) =(1/4+2/7)/2=0.268. It can be seen that from the perspective of Precision at 10, there is no difference between list (a) and list (b), but from the perspective of AP, List (A) is superior to list (b). Similarly, an AP can be calculated for each user, and the MAP of the user set can be obtained by averaging all users.

NDCG (Normalized Discounted Cumulative Gain) at k: (NDCG@k)

The calculation of NDCG at K is complicated and will not be introduced in detail here. Interested students can refer to the calculation of NDCG. To put it simply, in the calculation of NDCG, the higher the items that the user clicks appear in the list, the more valuable they are considered to be, which is consistent with MAP.

After understanding the off-line evaluation index of the algorithm, we can tune its parameters. Same as Pointwise FM algorithm, PRFM algorithm can tune the following parameters: normal distribution standard deviation (init_STD), regularization coefficient (REG) and hidden vector dimension (Factor) used in initialization of model parameters. The following figure shows an example of tuning the hand-q animation Feeds stream. Based on the tuning results, we decided to use the parameters init_std=0.005, reg=0.0001, factor=100.


Note: The orange box indicates the adjusted parameters, and the green box indicates the maximum value of each indicator

4. Algorithm improvement plan

As mentioned above, the sample construction method of PRFM algorithm is to randomly select 100 item pairs < item 1, item 2> for each user. However, some studies have shown that different sampling strategies can improve the model effect (see Reference 1). For example, for each user, sort item pairs < item 1, item 2> by the difference between item 1 and item 2 in the exposure list, taking the first 100. The idea of this sampling strategy is: item 1 and item 2 in exposure in the list of position, the greater the difference between the said in the item of relative user did not click on the items, users click on the items, the back row, namely < item 1, item 2 > error in the list of order relations, model need more training on these items for. In the future, we will try different sampling strategies in S.H.I.E.L.D. recommendation, and then share our experience with you. Welcome interested students to explore with us!

The resources

  1. (PRFM paper wnzhang.net/papers/lamb…).
  2. Sorting evaluation index (spark.apache.org/docs/2.2.0/…).