And far away

What’s the first thing that comes to mind when you think about recommendation systems?

The first thing product and operation people think of is tagging, and people who have done it think of collaborative filter (CF).

Yes, CF is almost a highlight in the development history of recommendation systems. The idea behind CF is simple and profound. In today’s interconnected world, collaborative filtering is more powerful. CF seems to be a technology, more like a methodology, where “collective intelligence” is recommending you rather than machines.

The basic assumption of CF is that “birds of a feather flock together” and that your circle determines what you see. That makes sense, but it hides some important questions: Is it possible for us as users to see anything new? Could there still be a surprise? Is it still possible to flow in circles?

Yeah, I knew you’d ask such a subtle question, otherwise this article wouldn’t be here. After all, you and I have stood on the high valley and sung: “The recommendation system is not only the present existence, but also the poem and the distant field”. It is also an EE issue that has been extensively studied in the recommendation and advertising worlds: Exploit & Explore is the present and Explore is the poem and the field beyond.

There are many ways to do Explore, and Bandit is one of them. Several bandit algorithms have been introduced before, which basically estimate the confidence interval and then make recommendations according to the upper bound of the confidence interval, represented by UCB and LinUCB.

Could bandit’s romantic algorithm, which seeks poetry and distance, be combined with CF? In fact, some people have tried this. Recently, I saw a paper on Arxiv titled Collaborative Filtering Landscape [1], Moreover, another article on arxiv is an earlier attempt of Online Clustering of field by the same author [2]. The difference between the two is that the latter only Clustering of users (that is, only considering user-based collaborative filtering), while the former adopts co-clustering, which is characterized by field Clustering. It can be understood that item-based and user-based collaborative modes are carried out simultaneously.) The latter is a special case of the former. Today, we will explore our ideas together, and see how this article Collaborative Filtering landscape combines CF and Bandit.

How does Bandit incorporate CF

There are two rules in many recommended scenarios:

  1. Similar users may respond the same way to the same item. In other words, the same item is recommended to a cluster of users, who may or may not all like it. Similarly, the same user will give the same feedback to similar items. This is a problem that CAN be solved by CF;

  2. In the process of using recommendation system, users’ decision is dynamic, especially new users. As a result, it is impossible to prepare recommendation candidates for users in advance. It is a dynamic recommendation process, which can only be seen step by step.

This paper proposed that users can be clustered into different groups for each recommended candidate item according to the payoff of different users’ preferences. One group can collectively predict the possible payoff of this item, which has synergy effect. Then, users’ personal parameters can be updated by observing the real feedback in real time. And that’s where the bandit idea comes in.

For example, if your parents have arranged a lot of blind dates for you, would you like to meet them in person? That needs to look at each blind date in advance, every time we are divided into several parties, some said yes, some said to look again, also said no; You yourself will be a member of one of these groups. Every time, the group to which you belong will give you a collective score, because they are “people of three views” with you and “don’t cheat me”. So pick out the highest score from a pile of data that person, you go out to see TA, come back to the actual feeling said to everyone to listen to, at the same time their own heart of the standard is also some adjustment, re give the rest of the other object score, finish points to see again, go round and round……

So that’s the idea of combining CF and Bandit.

In addition, if there are more candidate items to be recommended, item clustering is also needed, so that it is not necessary to cluster users according to each item, but to cluster users according to the class cluster of each item. In this way, the number of class clusters of item is greatly reduced relative to the number of items.

COFIBA algorithm

Based on these ideas, the algorithm COFIBA (pronounced coffee bar) designed in this paper is briefly described as follows:

At time T, a user visits the recommendation system. The recommendation system needs to recommend the best item from the existing candidate pool to him, and then observe his feedback to update the selection strategy with the observed feedback. Every item here has an eigenvector, so the bandit algorithm here is context-dependent. Here, ridge regression is still used to fit the weight vector of users and predict the payoff of users for each item, which is the same as linUCB algorithm we introduced last time.

Compared with the LinUCB algorithm introduced last time, there are two differences of COFIBA:

  1. Select the best item based on user clustering (bandit similar to user collective decision)

  2. Adjust clustering of User and Item based on user feedback (CF part)

The overall algorithm process is as follows:

Review images

The key part, in human terms:

For user I, do the following for each round of trials:

  1. The user’s bandit parameter W is calculated first (the same as LinUCB), but this parameter is not directly involved in bandit’s selection decision (unlike LinUCB), but is used to update the user cluster.

  2. I’m going to go through the candidate items, and each item is represented as a context vector.

  3. Each item corresponds to A set of user clustering results, so when traversing each item, we can judge which class cluster the current user belongs to under the current item. Then, we can aggregate the M matrix (corresponding to the A matrix in LinUCB) and payoff vector (corresponding to the B vector in LinUCB) of each user in the corresponding class cluster. Thus, a ridge regression parameter was solved for this class cluster (similar to what LinUCB did for each user individually), and the payoff predicted value and confidence upper boundary were calculated

  4. Each item can get a payoff predicted value and upper bound of confidence interval, and the item with the biggest upper bound can be picked out (same as LinUCB).

  5. Observe the user’s real feedback, and then update the user’s own M-matrix and B-vector (update personal, corresponding to the rest of the class cluster does not update)

The above is a decision process of COFIBA algorithm. After receiving real feedback from users, there are two more calculations:

  1. Update the user cluster

  2. Update the Item cluster

How do I update the clustering of user and item? A schematic diagram is given in the article:

Review images

Explain this graph.

(a) There are 6 users and 8 items, and when initialized, the number of class clusters for both user and item is 1

(B1) In a certain round of test, the user facing the recommendation system is 4. The recommendation process is to iterate over each item from 1 to 8, and then see which class cluster User4 is in for each item. It can aggregate the users in the corresponding class cluster to predict payoff and CB for this item. Let’s say item5 wins out and gets recommended.

(B2) At time t, item has three class clusters, and the user cluster to be updated is the class cluster of User4 corresponding to Item5. Item5 can have some payoff in it for users other than user4. If user4 can payoff in Item5, there can be some payoff in item5. After deleting edges, rebuild the clustering results. Assume that the original class cluster of user4 is split into two class clusters {4,5} and {6}.

(c) After the user class cluster is updated, the corresponding class cluster of Item5 should also be updated. Update method: For each item J that has some payoff edge with Item5, we can construct a neighbor set N of user whose users have a payoff close to item J. Then we can see if N is in the same class cluster as user4, which has just been updated. If so, Preserve the edge between Item5 and item j, otherwise delete. This assumes that the joining edge between item 3 and item 5 is removed. Item3 stands alone and initializes a cluster: all users are still a class cluster.

It’s like this in a nutshell:

  • User-based collaborative filtering is used to select the item to be recommended, and the idea of LinUCB is used in selection

  • The user-based and item-based clustering results are adjusted according to the feedback of users

  • Item-based clustering changes change the clustering of users

  • Continuously divide user-item matrix based on real-time dynamic feedback from users

The COFIBA algorithm is also easy to implement and is available on Github [3]. The paper also proves its effectiveness from both theoretical and experimental aspects, but whether it can be used in practical projects is still doubtful, after all, the complexity is not low.

Thoughts on EE issues

The reason why I want to write the bandit algorithm series is that the exploit-explore contradiction has always existed objectively, and bandit algorithm is recognized as a better solution to EE problems. In addition to bandit, there are other explore approaches, such as one proposed in a discussion with xlvector that randomly removes historical user behavior from recommendations.

To solve Explore is to take A risk, to go into the unknown, and that is clearly what hurts the user experience: to recommend non-A options at A fraction of the time, knowing that users will definitely like A.

In practice, few companies take such a rational approach to Explore, preferring a more subjective approach. The reasons may be as follows:

  1. Internet products have a short life cycle, and Explore has little incentive to do so because it wants to boost long-term profits.

  2. The time spent by users using Internet products is increasingly fragmented, and Explore’s long duration makes it difficult to reflect the value of Explore.

  3. Homogenous Internet products, users choose more, a little careless, users vote with their feet, minutes to abandon you in disregard.

  4. In fact, there is no incentive for established platforms to do Explore.

With this in mind, if we want to introduce Explore into our recommendation system, we need to take note of the following:

  1. The items used by Explore need to be of their own quality, so that users won’t be offended if they aren’t interested.

  2. Explore’s own products need to be designed so that users have the patience to play with you.

  3. Think deeply, so that you don’t make dumb products, products don’t die early, and Explore can be useful.

Ok, let’s finish this series by singing it one more time.

Recommend not only the casual, there are poems and distant fields.

References:

[1] http://arxiv.org/abs/1401.8257

[2] http://arxiv.org/abs/1502.03473

[3] https://github.com/qw2ky/CoLinUCB_Revised/blob/master/COFIBA.py

This is the end of a three-part series. You can see it all by replying to bandit.

The author:

Chen Kaijiang @ punishment without knife

Before 2013, I worked as a senior algorithm engineer in Sina Weibo Search Department and Commercial Product Department. I was responsible for back-end algorithms of Weibo anti-garbage, basic data mining, intelligent customer service platform, personalized recommendation and other products. 2012-2013 Led the translation of machine Learning: A Practical Case Study.

At the end of 2013, I joined the traditional media company Cheyu Media as algorithm supervisor and was responsible for building the personalized recommendation system of the company’s transformed product Kaola FM from zero. Now, personalized recommendation has become the biggest differentiating feature between Kaola FM and other FM.

From 2015 to 2016, I quit to start my own business and obtained angel investment from IDG and Morningside Capital

uWhat do you like?Talk about “Search is dead, recommend to the top”

Review images