1/ Currently commonly used recommendation algorithms

Recommendation algorithm has a lot of application scenarios and commercial value, so it is worth studying. There are many kinds of recommendation algorithms, but the most widely used one at present is the recommendation algorithm of collaborative filtering category. This paper makes a summary of the recommendation algorithm of collaborative filtering category, and then summarizes the principle of some typical collaborative filtering recommendation algorithms. Recommendation algorithms are very old and have been needed and applied since before machine learning emerged. Generally speaking, it can be divided into the following five types:Copy the code

<1> Content-based recommendations

This category generally relies on some knowledge of NLP in natural language processing, and obtains user preferences by mining tF-IDF feature vectors of texts, and then makes recommendations. This kind of recommendation algorithm can find the user's unique niche preferences, and has a good interpretation. This class requires the foundation of NLP. Everyone's preferences are different, and this recommendation algorithm can meet this requirement.Copy the code

<2> Collaborative filtering is recommended

Collaborative filtering is one of the most mainstream recommendation algorithms, which has been widely used in the industry. The advantage of this method is that it does not require much knowledge of specific fields and can be recommended by machine learning algorithms based on statistics. The biggest advantage is that it is easy to implement in engineering and can be easily applied to products. At present, most of the practical recommendation algorithms are collaborative filtering recommendation algorithms. user-based item-based model-basedCopy the code

<3> Mixed recommendation:

This is similar to the integrated learning in machine learning, with multiple talents and multiple strengths. Through the combination of multiple recommendation algorithm models, a better recommendation algorithm can be obtained, and three heads are better than one zhuge liang. For example, by establishing a model of multiple recommendation algorithms, the final recommendation result is determined by 'voting method'. In theory, mixed recommendation is not worse than any single recommendation algorithm, but the algorithm complexity is improved by using mixed recommendation, which is used in practical applications, but not as extensive as single collaborative filtering recommendation algorithm, such as binary recommendation algorithm such as logistic regression.Copy the code

<4> Rule-based recommendation:

This kind of algorithm is common, such as the recommendation method based on the maximum number of users clicking, the maximum number of users browsing, etc., which belongs to the mass type, but is not mainstream in the current era of big data. It's kind of like human intervention.Copy the code

<5> Recommendations based on demographic information:

This kind of recommendation algorithm is the simplest. It just finds the correlation degree of users according to the basic information of system users and then makes recommendations. Currently, it has been rarely used in large systems.Copy the code

2/ Collaborative filtering recommendation algorithm

Collaborative Filtering (CF), as the most classical recommendation algorithm, includes online Collaborative Filtering and offline Filtering. The so-called online collaboration is to find items that users may like through online data, while offline filtering is to filter out some data that is not worthy of recommendation, such as data with low recommendation score, or data that users have already purchased despite high recommendation value. Collaborative filtering model is commonly m items item, m a user the user data, only part of the user and some items have score data, the other part of the score is blank, at this time we want to use the existing part of the sparse data to predict the blank items and the score of relationship between users, find the highest score items recommended to the user. In general, collaborative filtering recommendations fall into three types. The first one is user-based collaborative filtering, the second one is item-based collaborative filtering, and the third one is model based collaborative filtering. User-based collaborative filtering mainly considers the similarity between users. As long as we find out the items that similar users like and predict the rating of target users on corresponding items, we can find several items with the highest rating and recommend them to users, recall them first, and then sort them. Birds of a feather flock together. And based on the project (item) -based collaborative filtering and the collaborative filtering based on user, only then we turned to find the similarity between items and items, only to find the target user rating of some of the items, then we can forecast high similarity of similar items, will score the highest number of similar items to recommend to the user. For example, if you buy a book about machine learning on the Internet, the website will immediately recommend a bunch of books about machine learning and big data to you, which obviously uses the idea of project-based collaborative filtering. We can make a simple comparison between user-based collaborative filtering and project-based collaborative filtering: (1) User-based collaborative filtering needs to find the similarity relationship between users and users online, and the computational complexity is definitely higher than project-based collaborative filtering. But it can help users find new categories of surprise items. (2) Project-based collaborative filtering can be easily calculated offline because the similarity of items will not change over a period of time, and the accuracy is generally acceptable. However, it is difficult to surprise users with the diversity of recommendations. For small recommendation systems, project-based collaborative filtering is definitely the mainstream. However, if it is a large recommendation system, user-based collaborative filtering can be considered. Of course, we can consider our third type, model-based collaborative filtering. Model based collaborative filtering is the most mainstream type of collaborative filtering at present, and a lot of our machine learning algorithms can also be used here. Next, we will focus on model-based collaborative filtering.Copy the code

3/ Model-based collaborative filtering

As the most mainstream collaborative filtering type, model-based collaborative filtering is mainly classified and summarized here. Our problem is that m items, m a user's data, only some are score data between a user and the partial data, the other part of the score is blank, at this time we want to use the existing part of the sparse data to predict the blank items and the score of relationship between data, find the highest score items recommended to the user. For this problem, the idea of machine learning to solve the model, the mainstream method can be divided into: association rule algorithm, clustering algorithm, classification algorithm, regression algorithm, matrix decomposition, neural network, graph model and cryptic model to solve. Below we introduce them respectively.Copy the code

<1> Use association rule recommendation algorithm for collaborative filtering

In general, we can find out the frequent item set sequence in the data of all items purchased by users to conduct frequent item set mining and find the frequent N item set or sequence of associated items that meet the support threshold. If the user buys part of the frequent N item set or the sequence, we can recommend other items in the frequent N item set or the sequence to the user according to certain scoring criteria, which can include support, confidence and promotion. Common association rule recommendation algorithms include Apriori, FP Growth and PrefixSpan.Copy the code

<2> Use clustering algorithm for collaborative filtering

Collaborative filtering using clustering algorithm is similar to collaborative filtering based on users or items. We can cluster by user or by item based on some distance metric. Based on user clustering, users can be divided into different target groups according to a certain distance measurement method, and items with high scores from the same target group can be recommended to target users. Based on item clustering, similar items with high user ratings are recommended to users. The commonly used clustering recommendation algorithms include K-means, BIRCH, DBSCAN and spectral clustering.Copy the code

<3> Collaborative filtering with classification algorithm

If we divide the score into segments based on how high the user rated it, the problem becomes a classification problem. For example, the most direct way is to set a rating threshold. If the rating is higher than the threshold, it is recommended, and if the rating is lower than the threshold, it is not recommended. We turn the problem into a dichotomous problem. Although there are numerous algorithms for classification problems, LR is the most widely used one at present. Since logistic regression is highly explanatory, we have a clear probability of whether each item is recommended or not. Meanwhile, we can engineer the characteristics of the data to achieve the purpose of tuning. At present logistic regression to do collaborative filtering in BAT and other large factories has been very mature. Common classification recommendation algorithms include logistic regression and naive Bayes, both of which are characterized by strong interpretation.Copy the code

<4 Use regression algorithm for collaborative filtering

Collaborative filtering using regression algorithms seems more natural than classification algorithms. Our score can be a continuous value rather than a discrete value, and we can get the predicted score of the target users for a certain product through regression model. The commonly used regression recommendation algorithms include Ridge regression, regression tree and support vector regression.Copy the code

Summary of collaborative Filtering

Collaborative filtering, as a kind of classical recommendation algorithm, has been widely used in the industry. It has many advantages, strong universality of the model, does not need too much professional knowledge in the corresponding data field, simple engineering implementation and good effect. These are the reasons why it is popular. Collaborative filtering also has some unavoidable problems, such as the headache of "cold start", when we don't have any data of new users, we can't better recommend items to new users. It also doesn't take into account situational differences, such as the user's situation and current mood. Of course, there's no way to capture the unique preferences of a niche audience, which content-based recommendations are good at. The above is a summary of collaborative filtering recommendation algorithm.Copy the code