1. An overview of the

With the explosion of information, one of the challenges people face is how to quickly and easily find what they need most from the vast amount of goods or information on offer. In order to solve this problem and put forward the “recommendation system” came into being. At the beginning, the recommendation system was just a simple collaborative filtering strategy, but with the development of the system, it evolved into a complex structure of recall, sorting and rearrangement. Recall is at the front of the system, determining the sort ceiling. This article describes the most common collaborative filtering methods. Collaborative filtering is based on the idea of “collective intelligence”, the idea that people gravitate towards certain parts of a group that have in common. For example, if you want to watch a movie but don’t know which one to watch, most people will ask around and tend to choose recommendations from people who have similar tastes to their own. The advantage of collaborative filtering is that it is not domain sensitive, that is, you do not need to have a good knowledge of recommendation transactions, and the results are often better than content-based approaches. But collaborative filtering can cause the problem of “cold start” difficulty. Several commonly used collaborative filtering algorithms are described below.

2. The algorithm

2.1 neighborhood CF

2.1 the principle

Neighborhood – based collaborative filtering algorithm is one of the most commonly recommended algorithms. The advantages are: low implementation cost, not error-prone, fast calculation, good effect, can recommend long tail items. The disadvantage is that users and items with fewer clicks on the tail are prone to bad cases. The nearest neighbor method includes item-based collaborative filtering and user-based collaborative filtering. “Item-based collaborative filtering” -Item based refers to finding other items similar to Item A, and recommending other items similar to Item A to users when they view Item A. “User-based collaborative filtering” -User based means to search for other users B,C and D similar to User A. If User A and User B have similar viewing preferences, since User A and User B are very similar, other items B that User B has seen are also likely to be liked by User A. Using Koren as an example to illustrate the user-based method, Joe likes the three movies a, B, and C on the left. When recommending movies for Joe, we find users A,B, and C who are similar to him. All three of them have seen movie 1, Saving Private Ryan, so this movie is recommended first.

2.2 Calculation Method

The method of item-base is used to illustrate how to calculate the similarity of item-item. Suppose user A watches or clicks on items A and B, similarly, User B clicks on items B and D, and User C clicks on items A,b, and C.

item a item b item c item d
user A 1 1
user B 1 1
user C 1 1 1

Expand the click sequence to get the matrix


2.3 the conclusion

  • There will be some “non-user clicks”, that is, crawlers, etc., which are characterized by excessive clicks and need to be removed.
  • In items, although we want each item to be pushed to the user at least once, which works well for the long tail, it can also be removed in some cases for very few clicks, which default to low quality.
  • Instead of really expanding as a vector, calculate the number of occurrences of item A, the number of occurrences of item B, and the number of occurrences of ITam A and ITam B together. Therefore, this method is well suited to parallelism.
  • Item – and user-based collaborative filtering is selected for different scenarios. For example, on Amazon shopping website, the number of users is much higher than the number of items, and the preferences of users vary greatly. They buy different things in different periods, and the similarity is unstable. However, the number of items is relatively small to users, and the similarity between items is stable, so item-based is more commonly used.

2. The OCF (order – cf)

2.1 the principle

Item-cf calculates the similarity between items after receiving the full information, and treats all clicked items equally without considering the order of clicking. OCF takes into account the sequence of clicks before and after. For example, in the news scenario, clicking on article B after article A is more relevant to article A than clicking on article C before article A.

2.2 Calculation Method


symbol meaning
i,j item
order(i,j) In a transaction, j clicked after I is counted as 1, otherwise it is counted as 0
click(i) I The total number of clicks
N Total number of transactions

2.3 Improvement of OCF

The above method defaults to everything clicked after article A being equally important. However, users’ interests tend to drift with time. For example, after clicking on article A to browse, clicking on article B immediately may be more similar than clicking on article C after clicking 10 articles. As the number of articles increases, the user’s interest may be constantly diverted. Based on the above assumptions, oICF based on position hypothesis is proposed, and the formula is as follows:


In fact, each position is given a different weight, the closer to the original article click weight.

3. SCF

3.1 the principle

This method comes from Ali’s article “Engineering Implementation of Real-time recommendation Algorithm SWING based on Graph structure”. Assume the following figure

3.2 calculation


symbol meaning
i,j item
u,v user
parameter
Click on the user collection for I
Click through j’s user collection

SCF makes more use of user collective intelligence

3.3 the conclusion

And you can see from the formula that if more people click on I and j at the same time,The greater the. Vulgar articles many users to point, it is easy to row to the front, so SCF is easy to popular, vulgar. It can be controlled and resolved by other means. An item clicked by a user can be several hundred or so a day at most, which is not a big problem in calculation. However, an item will be clicked by millions or even tens of millions of users, and these users will make a pairwise pair to calculate it, thus resulting in a large amount of time consumption. Some optimization is needed here. The key to optimization is the combination (U1, U2).

  1. Broadcast Map(item, userList) and Map(User, itemList);
  2. RDD (u1, i1) – > RDD (u1, itemList) – > RDD ((i1, i2), u) – > RDD ((i1, i2), (u1, u2)) – > calculate each pair (i1, i2) score – > according to the key and (i1, i2)
  3. Another improvement is that only (userA, userB) can be calculated, because score(A,B)=score(B,A), all save when write (userA, userB, score) and (userB, userA, score).

4. FM matrix decomposition

Different from the previous simple calculation, matrix decomposition belongs to model method and needs optimization solution.

4.1 Fundamental matrix transformation

The essence of matrix transformation is a function, and the difference with ordinary function is that matrix transformation receives a vector, and output a vector. That’s one vector moving into another vector, the original unit vector moving into a new position. Let’s say I have a vectorBefore the changeAfter the transformationAssuming that.,That is, if I get the transformationandThe position of, that is, the new coordinate reference system is determined, and any vector of the space can be obtainedThe location of the. If you write these two vectors together, that’s matrix multiplication


The graph is:















4.2 Basic Model

The basic matrix decomposition model is, includingIs user U’s rating of item I,Is the vector of item,Is the vector of user u. Optimization objectives are as follows:


For the knownThe set of (u, I) of. Since neither Q nor P is known, the equation is not convex and its solution is ALS(Alternating least squares). Fix one of them at a time, like P, and then the optimization problem is equivalent to the quadratic optimization problem, find the optimal solution with respect to P with SGD, and then fix p again, find the optimization problem with respect to Q, and so on.

4.3 Considering deviation

Due to the different consumption habits of each user, some users tend to give higher scores to each movie, while others give lower scores. Therefore, bias is superimposed on the basic model.



Optimization objectives are as follows:


Is the average value of all items. For example, Joe is predicted to score Titanic, and the average score of all films is 3.7. Titanic is a good film, and the score is 0.7 higher than that of ordinary films. Joe is a cautious user, and his score is about 0.3 lower than that of other users, so the final score is 3.9 (3.7+0.5-0.3).

4.4 Implicit feedback scenarios

The most common scenario for matrix decomposition above is display feedback, but what we need to do now is implicit feedback. Display feedback is when users clearly know their own attitudes, such as rating movies. Implicit feedback does not directly reflect the user’s attitude, such as clicking data or browsing data. A click by a user does not mean that the user likes this article or news. Implicit feedback is very different from display feedback, implicit feedback is denser, more stable; The display feedback is more sparse. Implicit feedback is truer and data is easier to obtain; Displaying feedback is a better indicator of the user’s attitude. Implicit data is handled differently from explicit data. There are currently two approaches to these two types of data: Point Wise (predicting dot preferences) and Pair Wise (sorting two items). For implicit data, the optimization method becomes x and y, respectively:



References:

Matrix factorization techniques for recommender systems

Collaborative Filtering for Implicit Feedback Datasets