I. Business background

A recommendation system is an information filtering system designed to predict a user’s “rating” or “preference” for an item.

The platform uses the recommendation system to distribute accurate traffic, so that users can get their favorite products more quickly and the platform becomes more sticky. Traffic, click rate and contact/purchase rate form a positive cycle.

In the early stage of the platform operation, we recommended scenes by guessing your favorite, recommending object push and popular recommendation, and the cosine similarity algorithm was selected.

Cosine similarity algorithm, that is, the user is labeled, compared with the product label, cosine similarity value is obtained, and the corresponding product is recommended to the appropriate user through the descending order of value. In practice, we found that it has natural disadvantages: in the low and sparse scenario between users and commodities, effective mediating cannot be carried out, and model tuning cannot be carried out, and the ceiling is obvious.

After exploring the mainstream platforms in the industry: 58.com, NETFLIX and AirBnb, we made clear the direction of a new recommendation algorithm: the recommendation algorithm based on collaborative filtering!

Second, technical research

Collaborative filtering is a widely used technology in recommendation systems. The technology analyzes similarities (” synergy “) between users or things to predict what users might be interested in and recommend that content to users.

Collaborative filtering is broadly divided into three types, which we can quickly understand through a simple example:

The first is user-based collaborative filtering. If user A gives high ratings to the film and TV series tianlongba Bu, A Chinese Odyssey and The Night, user D gives high ratings to the film and TV series The Night. Therefore, we believe that user A and user D belong to the same category, and can recommend user D’s highly rated Works Tianlong Ba Bu and A Chinese Odyssey to the West.

The second kind: collaborative filtering based on commodities if “The Night”, “A Chinese Odyssey” and “Tian Long Ba Bu” belong to the same category of xixia films and TV dramas. If user D has seen “General Night”, we can recommend the works of “A Chinese Odyssey to the West” and “Tian Long Ba Bu” to user D.

Now, if you look at this, you might ask, what’s the third one?

Get to the point: The third kind: based on user-commodity collaborative filtering, users are classified, which is combined with commodity classification into collaborative consideration to obtain our prediction results. This model algorithm is abbreviated as collaborative filtering ALS recommendation algorithm.

Three, collaborative filtering ALS introduction

Collaborative filtering ALS: Model decomposition is performed on the sparse matrix to evaluate the value of missing items, so as to obtain the basic training model. In terms of Collaborative Filtering classification, ALS algorithm belongs to User-item CF (Collaborative Filtering), which takes into account both User and Item items, also known as hybrid CF.

Here we through the example, let you quickly understand:

Modeling and deduction process of ALS: according to the 8-2 principle, test set P2, was deduced from training data set P1. If P2, ≈P2, the best model was proved.

Order by score desc limit N by the best predicted value of the model to obtain the preceding item for recommendation

Next, let’s briefly understand the core of ALS algorithm:

Note: User1 and movie1 score 0.88, i.e. 1 * 0.9 + 0.1 * (-0.2)

In real life, the users set is often tens of millions, and the items set is also tens of thousands. In this way, the scoring matrix is tens of millions to tens of millions, and it is a huge amount of calculation to deduce the hundred-billion matrix.

So, we use the least alternating two factorial operation, the users-items matrix, to multiply the users-k and k-items small matrices.

Core of ALS recommendation algorithm: least squares algorithm based on latent factor model & matrix decomposition to reduce computational complexity

Iv. Practice of collaborative filtering ALS

Business scenarios:

List of bamboo shoots and disks recommended daily. Cosine algorithm was used to obtain data before, and collaborative filtering ALS algorithm was applied under the condition that the process remained unchanged

Note: the picture is from Anjuke. Our business scenario is similar to anjuke. It is for reference only

Business process:

  1. Record detailed browsing history of users

  2. By business weighting, userID-Postid-score three-dimensional array is obtained

  3. Predict userId and all-object score through machine learning ALS modeling

  4. Bamboo shoots are recommended based on recommendations and rules

For the ALS recommendation algorithm code based on Spark, see Portal

Practical conclusions:

  1. Through the comparison between preferences and ALS recommended objects, it is found that there is a strong similarity between them, and users have a great chance to click on them

  2. ALS recommended that compared with the cosine set, the 7-day average daily click rate and contact rate both increased by about 20%

5. Experience sharing

Practical experience:

  • Long tail effect of business data, proper cleaning of data sources

  • Model adjustment: UserItem supports only int, explicit and implicit ratings

  • It is better to recommend the result set and combine cosine and embeding to optimize the planning

Starting from the background of business recommendation, this paper reconstructs the recommendation algorithm from cosine similarity to ALS modeling, which has a good effect. Therefore, THE ALS algorithm, as well as the practical results and experience are shared with everyone.

At the same time, depending on the size of the platform, the recommendation system needs to dig deeply from feature extraction, model correction, architecture optimization (offline, online, real-time) and other multidimensional dimensions to get better recommendation effect.

Machine learning ALS algorithm is like a key, which opens my door to the new world of technology and makes me see that matrix decomposition based on mathematics can perfectly fit with business recommendation scenes. I hope you can keep exploring, uphold technical enthusiasm, and find more wonderful!

References:

The Collaborative Filtering – RDD – -based API | Spark

The Spark ALS API Reference | Spark

The PySpark Collaborative Filtering with ALS “| Snehal Nair

The Embedding technology application in the real estate recommendation | 58 city