♚ \

Author: Yishui Hancheng, CSDN blog expert, personal research interests: machine learning, deep learning, NLP, CV

Blog: yishuihancheng.blog.csdn.net

Recommendation system plays a very important role in our daily life, I believe that people who have actually engaged in recommended-related engineering projects will more or less read the book “Recommendation System Combat”, I am one of the readers, personally feel for the introduction of recommendation system this book is good material. Many shopping malls, giant’s recommendation system is very complicated is also very strong, mostly to design powerful computing system, based on the deep learning in a later article I will introduce the recommendation system project practice of deep learning, today is largely based on surprise module to achieve the recommended books, movies, system design and implementation.

The data set used in this article can be downloaded from the following link:

https://download.csdn.net/download/together_cz/10916350
Copy the code

See the links below for introductions and examples of the Surprise module:

https://surprise.readthedocs.io/en/stable/getting_started.html
Copy the code

Use surprise to load your own data set. First, define a data reader to format the data.

Line_format =data_format,sep=sep) mydata=Dataset. Load_from_file (data_path,reader=reader)Copy the code

The schematic diagram of book recommendation system design is as follows:

The overall design idea is very simple, there is no difficult to understand the node, next look at the specific code implementation:

def bookRecommendSystem(map_data='book.csv',train_data='rating.csv',data_format='book user rating',sep=', ',flag='SVD',k=10) :' 'Book Recommendation System' '
    id_name_dic,name_id_dic=bookDataMapping(map_data)
    myModel,dataset=buildModel(data_path=train_data,data_format=data_format,sep=sep,flag=flag)
    print '==================model Training Finished========================'
    performace=evaluationModel(myModel,dataset)
    print '==================model performace==================='
    print performace
    current_playlist_id='1239'
    print u'Current user ID:'+current_playlist_id
    current_playlist_name=id_name_dic[current_playlist_id]
    print u'Current book name:'+current_playlist_name
    playlist_inner_id=myModel.trainset.to_inner_uid(current_playlist_id)
    print u'Current user internal ID:'+ STR (playlist_inner_id) #10Playlist_neighbors = mymodel.get_neighbors (playlist_inner_id,k=k) playlist_neighbors_id=(myModel.trainset.to_raw_uid(inner_id)for inner_id inPlaylist_neighbors_name =(id_name_dic[playlist_id])for playlist_id in playlist_neighbors_id)
    print("And user <", current_playlist_name, '> < p style = "max-width: 100%; clear: both;)
    for playlist_name in playlist_neighbors_name:
        print(playlist_name, name_id_dic[playlist_name])
Copy the code

The above functions implement the book recommendation system, the corresponding annotations are in the inside, there is no more explanation, several of the key functions are described below.

Model initialization module:

def initModel(flag='NormalPredictor') :' ''Comparative Use of multiple recommendation Algorithms'' '
    if flag=='NormalPredictor': # NormalPredictorreturn NormalPredictor()
    elif flag=='BaselineOnly'Use BaselineOnlyreturn BaselineOnly()
    elif flag=='KNNBasic'Use basic collaborative filteringreturn KNNBasic()
    elif flag=='KNNWithMeans'# Use mean collaborative filteringreturn KNNWithMeans()
    elif flag=='KNNBaseline': # Use collaborative filtering baselinereturn KNNBaseline()
    elif flag=='SVD'# use SVDreturn SVD()
    elif flag=='SVDpp'# use SVD++return SVDpp()
    elif flag=='NMF'# use NMFreturn NMF()
    else:
        return SVD()
Copy the code

Recommendation system model building modules:

def buildModel(data_path='rating.csv',data_format='user item rating',sep=', ',flag='KNNBasic') :' ''Recommendation System Model'' 'Line_format =data_format,sep=sep) mydata=Dataset. Load_from_file (data_path,reader=reader) Train_set =mydata.build_full_trainset() print'================model training================'
    model=initModel(flag=flag)
    model.fit(train_set)
    return model,mydata
Copy the code

Data set mapping building module:

def bookDataMapping(data_path='book.csv') :' ''Load the raw' ID,name 'data to build the dictionary map'' '
    csv_reader=csv.reader(open(data_path))
    id_name_dic,name_id_dic={},{}
    for row in csv_reader:
        id_name_dic[row[0]]=row[10]
        name_id_dic[row[10]]=row[0]
    return id_name_dic, name_id_dic
Copy the code

The simple call looks like this:

bookRecommendSystem(map_data='book.csv',train_data='RRR.csv',data_format='user item rating',sep=', ',flag='KNNBasic',k=10)
Copy the code

By default, the KNN algorithm is adopted, and the calculation of 50% folding cross verification is performed. The specific results are output as follows:

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE0.9211
MAE:  0.7108
FCP:  0.7038
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE0.9211
MAE:  0.7093
FCP:  0.6996
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE0.9234
MAE:  0.7133
FCP:  0.7010
------------
Fold 4
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE0.9210
MAE:  0.7119
FCP:  0.7017
------------
Fold 5
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE0.9268
MAE:  0.7167
FCP:  0.6983
------------
------------
Mean RMSE: 0.9227
Mean MAE : 0.7124
Mean FCP : 0.7009
------------
------------
==================model performace===================
defaultdict(<type 'list'>, {u' FCP ': [0.703847835793307, 0.6995619798679573, 0.7009530691688108, 0.7017142119961722, 0.6982634284783771], u'mae': [0.7107821167817494, 0.7093057204220446, 0.7132818148803571, 0.7119004793330316, 0.7167381500990199], u' RMSE ': [0.921100168545926, 0.9210542860057216, 0.9234120678271927, 0.9209873056186509, 0.9267740800608146]}) 1239 Current book name: Chronicle of a Death Foretold 537 (u'\u548c\u7528\u6237<', 'Chronicle of a Death Foretold', u'> \u6700\u63a5\u8fd1\u768410\u672c\u4e66\u4e3a\uff1a\n') ('Frostbite (Vampire Academy, #2)', '384') ('The Call of the Wild', '375') ('The Knife of Never Letting Go (Chaos Walking, #1)', '1050') ('The Neverending Story', '877') ('Lord of the Flies', '28') ('Olive Kitteridge', '930') ('Twenty Thousand Leagues Under the Sea', '699') ("1st to Die (Women's Murder Club, #1)", '336') ('The Big Short: Inside the Doomsday Machine', '985') ('The Black Echo (Harry Bosch, #1; Harry Bosch Universe, #1)', '902')Copy the code

Below I use the built-in movie dataset from Suprise to build the movie recommendation model, using the built-in mL-100K dataset.

First, construct the mapping dictionary of ID and name as follows:

def dataMapping(data='item.txt') :' ''Build a dictionary of mappings between IDS and names'' '
    id_name_dict,name_id_dict={},{}
    with open(data) as f:
        data_list=[one_line.strip().split('|'for one_line in f.readlines() if one_line]
    for one_list in data_list:
        id_name_dict[one_list[0]]=one_list[1]
        name_id_dict[one_list[1]]=one_list[0]
    return id_name_dict,name_id_dict
Copy the code

The rest of the recommendation work is similar to that of the Book recommendation. I won’t explain it in detail here, but I’ll go straight to the code:

def movieRecommendSystem():
    ' ''Movie Recommendation System'' 'Movie_data =Dataset. Load_builtin ('ml-100k'Trainset =movie_data.build_full_trainset() algo=KNNBasic() algo.train(trainset id_name_dict,name_id_dict=dataMapping(data='item.txt') # Movie recommendation # With movie Armyof Darkness (1993Raw_id =name_id_dict['Army of Darkness (1993)'[raw_id inner_id= algo.trainset.to_inner_iID (raw_id)]10) # Model recommended movies (default10Res_ids =[algo.trainset.to_raw_iid(_id)for _id inNeighbors =[id_name_dict[raw_id]for raw_id inRes_ids] # print u"= = = = = = = = = = = = = = = = = = = = = = = = 10 most similar film: = = = = = = = = = = = = = = = = = = = = = = = ="
    for movie in movies:
        print name_id_dict[movie],'= = = = = = = = = = >',movie
Copy the code

Recommended results are as follows:

Done computing similarity matrix.
========================10The most similar movie: ========================242 ==========> Kolya (1996)
486 ==========> Sabrina (1954)
88 ==========> Sleepless in Seattle (1993)
603 ==========> Rear Window (1954)
20 ==========> Angels and Insects (1995)
479 ==========> Vertigo (1958)
1336 ==========> Kazaam (1996)
673 ==========> Cape Fear (1962)
568 ==========> Speed (1994)
623 ==========> Angels in the Outfield (1994)
Copy the code

Here is the end of the work of this paper, I am very glad to review my knowledge and write something to share, if you think my content can or is enlightening and helpful to you, also hope to get your encouragement and support, thank you!

Appreciate the author

Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the ministry, tsinghua university, Peking University, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, such as Google, Microsoft, government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.

Click to become a Registered member of the Community.