Discussion on personalized recommendation System (1) - Collaborative filtering algorithm based on users from netease Yunritui

This is the first article of 2019. Recently, my tutor gave me a new task about personalized recommendation of an app. It happened to be my first time to learn this knowledge, so I wanted to summarize it. For reasons of length, this part is prepared to be narrated separately. This article mainly introduces collaborative filtering algorithm based on users, hoping to help you. If there are fallacies, please correct them!

What is personalized recommendation system?

In fact, personalized recommendation system has penetrated into our life for a long time. Netease Cloud Music’s “Daily recommendation” and Taobao’s “Guess you like” are very common cases of personalized recommendation in life. Nowadays, with the development of big data, personalized recommendation has long been involved in many fields, such as e-commerce website (jingdong taobao), film and television (youtube), personalized music radio network (netease cloud music), social network (QQ), personalized reading (WeChat reading), based on the location of the personalized services (Meituan), etc. The essence of recommendation algorithm is to connect users and items in certain ways, and different recommendation systems will adopt different recommendation methods according to the actual situation.

Generally speaking, a complete recommendation system generally includes the following three participants:

Recommended target
Providers of recommended items
A website that provides a recommendation system

Take the daily tweets of netease Cloud Music as an example:

First of all, the recommendation system needs to meet the needs of users and recommend them the music that interests them. Secondly, the recommendation system should try to make the songs of various singers be recommended to the users who are interested in them, rather than just recommending the songs of a few popular singers. Finally, good design of recommendation system can make the recommendation system collect high-quality user feedback, constantly improve the quality of recommendation, increase the interaction between users and the website, and improve the income of the website. As shown below:

What is a good recommendation system?

In order to judge whether something is good or not, there must be a standard. So what are the criteria for a good recommendation system? Just think why everyone likes the daily recommendation of netease Cloud Music but not the daily push of Toutiao. The most intuitive feeling is that you love the daily tweet songs of netease Cloud, while you hate the push of the headlines. So prediction accuracy is an important indicator in the field of recommendation systems.

A good recommendation system can not only accurately predict users’ behavior, but also expand users’ horizons and help them find things that they might be interested in, but are not so easy to find (for example, netease Cloud Music often pushes you those very good but relatively unpopular songs). At the same time, recommendation systems should be able to help businesses introduce good products buried in the long tail to people who might be interested in them.

Collaborative Filtering

In order to make recommendations palatable to the user, we need to know the user deeply. How do you get to know someone? In the Analects of Confucius · Gongye Chang, it is said that “listen to what you say and observe what you do”, which means that you can understand users’ interests and needs through their words and behaviors.

Realize the personalized recommendation of the most ideal situation is that the user can take the initiative to tell the system what he likes, such as long before registration netease cloud music will let the user choose what types of songs, but this method has three drawbacks: first, now it is difficult to understand natural language understanding technology users of natural language used to describe the interest; Secondly, the user’s interest is constantly changing, but the user will not constantly update the interest description; Finally, a lot of times users don’t know what they like, or it’s hard to put it into words.

Therefore, we need to use algorithms to automatically explore user behavior data, infer users’ interests from their behaviors, and then recommend items that meet their interests to users.

The recommendation Algorithm based on user behavior analysis is an important Algorithm of personalized recommendation system, which is called Collaborative Filtering Algorithm in academic circles. Collaborative filtering, as the name suggests, is the idea that users can work together to constantly interact with the site to make their recommendation list more and more satisfying by filtering out items they don’t like.

Since the analysis is based on the user’s behavior, the user’s behavior must be represented. The following table gives a representation of the user’s behavior (of course, in different systems, the behavior of each user is also different), which represents the user’s behavior into six parts. That is, the user who produces the behavior and the object of the behavior, the type of the behavior, the context of the behavior, the content and weight of the behavior.

said	note
user id	Unique identification of the user who generated the behavior
item id	The unique identity of the object generating the behavior
behavior type	Type of behavior (like or like)
context	The context in which the action occurs, including information such as time and place
behavior weight	Weight of behavior (if it is the behavior of listening to music, then the weight can be frequently listening to music)
behavior content	The content of the action (if the comment action, then the text of the comment)

As the big guys in the academic world have been studying collaborative filtering algorithms, they have come up with a lot of methods, For example, neighborhood-based method, Latent factor model, random walk on graph and so on. Among these methods, the most famous and widely used algorithm in the industry is the neighborhood based method, which mainly includes the following two algorithms:

User-based Collaborative Filtering, or UserCF or UCF, suggests items that other users like with similar interests.
Item-based Collaborative Filtering, or ItemCF or ICF, is an algorithm that recommends items to users that are similar to items they previously liked.

Collaborative filtering algorithm based on user

In an online personalized recommendation system, when user A needs personalized recommendation, he can first find other users who have similar interests with him, and then recommend the items that user A likes but user A has never heard of to USER A. This method is called user-based collaborative filtering algorithm.

User-based collaborative filtering algorithm mainly includes two steps:

Find A set of users whose interests are similar to those of target user A.
Find items that users in this collection like and that target user A has never heard of and recommend to target users.

The key to step 1 is to calculate the similarity of interests between the two users. Here, the collaborative filtering algorithm mainly uses the similarity of behavior to calculate the similarity of interest.

For example, 🌰 : Suppose there are three users A, B and C, who already know that A has listened to Jay Chou and Jj Lin’s songs for 5 consecutive days, B has listened to Andy Lau and Jacky Cheung’s songs for 5 consecutive days, and C has listened to Jj Lin and Jj Zhang’s songs for 5 consecutive days.

You were thinking about this in your head, but what if you asked the machine to think about whose interests A is more similar to?

It’s pretty simple, but before we move on, let’s review some basic math:

In mathematics, we measure the similarity between two vectors by measuring the cosine of their included Angle. When two vectors have the same point, the cosine similarity value is 1. When the Angle between the two vectors is 90°, the cosine similarity value is 0. Cosine similarity is -1 when two vectors are pointing in opposite directions. It’s called cosine similarity. The most important thing is that this law applies not only to two-dimensional space, but to any dimensional vector space, so cosine similarity is often applied to higher dimensional positive space. For example, in information retrieval, each term is assigned different dimensions, and a document is represented by a vector whose values in each dimension correspond to the frequency of the term in the document. Cosine similarity thus gives the similarity of two documents in their subject matter.

Since most people have probably forgotten how to calculate cosine similarity, here’s a quick review of how to calculate cosine similarity. If you want to know more, please Google.

Suppose there are two dimensional vectors A and b as shown below:

Then their cosine similarity is:

Generalize to multidimensional vectors.:

Now that we understand the math, we’re going to start with the basic operations. We’re going to start with two usersand,saidA collection of songs I’ve listened to,saidA collection of songs I’ve listened to, soIt means,A collection of songs that we’ve all heard,It meansorThe total number of sets of songs you’ve listened to,userAnd the userIs similar to.

Let’s try to calculate the interest similarity between A and D:

从 “Songs that users have heard”As you can see, both A and D have only heard D, which is one song. Number of web pages opened by user A =3, number of web pages opened by user D =3. So the similarity between A and D. Other calculations are similar.

After obtaining the similarity between users, what we need to do is to solve the problem in Step 2. Suppose that E is A new song that has just been released, and both users C and D have listened to it, how to calculate A’s interest in the new song E?

After obtaining the interest similarity between users, UserCF algorithm will recommend K users’ favorite items most similar to his interest to the user. The following formula measures user U’s interest in the item in UserCF algorithm:

Among them,Includes and usersWith the closest interestsA user,Is the itemA collection of users who have done something,Is the userAnd the userInterest similarity of,On behalf of the userInterest in objects, because the implicit feedback data of a single behavior is used, so all= 1.

The above paragraph is excerpted from the recommendation System Practice. Many people may find the formula and the analysis somewhat confusing. In short, it is:

Among themRepresents A’s interest in E,Is the similarity between A and B,It’s B’s interest in E, and so on. Since we are not using a rating system here, we are considering whether or not we have heard the song, so if C has heard E, C’s interest in E is 1, and B has not heard the song, so B’s interest in E is 0.

Therefore, we can predict the similarity between A and E as follows:

conclusion

Based on the daily recommendation of netease Cloud Music, which is a very common example in daily life, this paper introduces what is a recommendation system and what a good recommendation system looks like, and then leads to the concept of collaborative filtering, and introduces what is user-based collaborative filtering. At the same time, it also reviews mathematical knowledge such as cosine similarity. Really in the actual enterprise products, however, based on user collaborative filtering is not so simple, judge the similar degree of the two users are not simply using cosine similarity and considering the purpose of this paper is to let more people to the personalized recommendation is a simple concept, this article is not in detail, recommend reading related papers, The next article will introduce content-based collaborative filtering algorithms.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Discussion on personalized recommendation System (1) — Collaborative filtering algorithm based on users from netease Yunritui