preface

As mentioned in the last article of DM, my roommates and I had a drama shortage and wanted to build a personal recommendation system to solve the drama shortage. After a round of persistent struggle, this personal recommendation system finally took shape.

Today, let’s share our experience. If you are interested in it, you can write one by yourself.

Traditional recommendation system algorithm

Let me first introduce the traditional recommendation system method, which is called traditional because most of the learning materials are based on this method.

Let’s assume that we have a matrix (represented as a Python list):

,0,0,4,4 [# A B C D E [2], the # 1,5,5,3,3 [5], # 2,4,2,1,2 [2]] # 3...Copy the code

The rows of the matrix represent the user, the columns represent the item, and the intersection points represent the user’s rating of the item.

Assuming that user 1 now needs to select an item, the recommendation system assumes that user 1 will select an item that has not been selected, so the system looks for the item with a score of 0 in the first row, and will obviously find B and C. How do you know whether to recommend B or C? (assuming the user only needs to recommend one), the similarity between B, C and the user’s previous selection (rated) needs to be calculated.

It’s not enough just to figure out the similarity, because you can’t tell whether it’s a good part of the similarity or a bad part of the similarity. Therefore, at this time, we need to introduce the user’s score as the weight of similarity calculation, and score X similarity will get the final score (the score will keep accumulating, so the recommendation score of B will be the cumulative sum of the similarity score of B and A, D and E). In this way, low-rated items will naturally end up with lower scores, and high-rated items will naturally end up with higher scores, simplifying the problem down to ranking.

Obviously, the core of the above problem is how to calculate the similarity.

Here are two ways to calculate the similarity:

  • Similar = 1/ SQRT ((0-2)^2 + (5-5)^2 + (4-2)^2……) Finally, the score of B and A is multiplied by the score, score = similar * 2

  • Cosine similarity costheta = fracAcdotB | | A | | | | B | | AB for two columns, | | A | | said A norm Special note, the cosine value is 1 ~ 1, we need to be normalized, namely the scope into on 0 ~ 1. So the similarity calculation company becomes 0.5 + 0.5*cos

Innovation of a few user recommendation system

In the above content, we can find that the traditional method has a special problem, the traditional algorithm needs a large number of user ratings, that is, the number of rows of the matrix requires a large number of results worthy of reference. At first glance, it seems that there is no problem with this requirement, which also conforms to our logic. Only with enough data, can we find more accurate rules.

However, as for my needs, this is an obvious shortcoming. In the introduction, I mentioned that this is a recommendation system for dormitory or even personal use.

In other words:

  • We can’t provide a lot of data.

  • We are lazy. We are most likely to tell the system which movie we have adopted from its recommendation, rather than rating it. We may tell it whether the quality is acceptable or not, but we will not give an exact score from 0 to 10, as douban users do.

Therefore, the traditional recommendation algorithm does not meet my needs in many places, but we should look at the essence of the problem. It is nothing more than a similarity comparison with the trained model based on the characteristics of the user, or the characteristics of the product. I took advantage of these characteristics and made a little innovation.

  • At present, many commodities, especially music or movies, have their own labels, such as comedy, suspense, followed by leading actors and directors, etc., which can be taken as their characteristics. E-commerce platforms also have categories of goods such as clothes, women’s shoes, bags, etc. Some items, such as clothes, the brand and size of clothes can be used as a feature of users’ choice.

  • It is not difficult to understand that the user model is dynamically updated. If a user has been using the system for a long time, a large number of labels are likely to be covered in his choices. In this case, the recommendation system based on labels is difficult to distinguish the user’s preferences. So we have two solutions to this. The first is to allow users to customize labels. For example, SF can customize the labels of questions or articles, which increases the diversity of labels. Of course, this solution is only a mitigation solution, and to be a complete solution, I think we need to give the feature an expiration date. After the validity period is added, the user’s choice can reflect the demand within a period of time. Consider a scenario where a user is about to travel. He may browse a lot of travel product sales pages, such as disposable toothpaste, and then recommend websites that sell travel products to the user. When the feature expires, for example, one week, the user has returned from travel, because the feature is invalid, the recommendation system will not recommend travel products (so that the user will not feel confused. Personal experience, now some websites often appear recommendations that have obviously exceeded the time limit of my interest), but start to collect the characteristics of the user’s browsing in the next week, build the user model dynamically, and recommend the user what he may need in the next stage

To implement this idea, in Python we can do this by implementing the following dictionary

Record = {"labelName":(weight,time), "labelName2":(weight,time)... } #labelName is the tag name, under which there is a tuple, the first field of the tuple is the weight of the tag. The higher the weight, the more users like the tag. The second field is the start time when the tag was createdCopy the code

When implementing recommendations, it is easier to implement, given a testList. This requires:

  • Create an empty dictionary named res

  • Iterate through the testList, naming each object t

  • Iterates over the label that T has, and obtains information from the record according to the label.

  • At the same time get the current time time2, if time2-time exceeds the specified time, the label information is invalid, ignore the label, and delete the corresponding field in record.

  • If the tag is valid, the score of t is incremented by 1 and the index of T is placed as key in an RES

  • Once traversal is complete, sort the RES dictionary by value

  • Finally, the sorting results can be accessed as needed. Merge only the top 5.

In this way, a recommendation system suitable for fewer users comes out

Now is the dormitory put into operation, as for the effect of how may take a period of time to know

The latter

Making the address

To be clear, Github only provides a class that implements the above improvements. Py is not a full-size recommendation system. You can download the class and use it for secondary development, such as:

  • The Flask framework is used to wrap a Web application

  • Combined with this class and the use of SMTP protocol, get an automatic message to the mailbox script, recommended movie information

  • Instantiate the class to create a simple command-line application

Later, I will upload a simple webserver encapsulated by Falsk to Github, which can be requested through Web API and return the movie information in JSON format.

Please correct any mistakes.