An overview of the

We used the user vector and item vector to calculate the similarity, but if it is a sparse array, there are a lot of None values in the number of groups, which may result in some similarity cannot be calculated. This is not typically done in a production environment.

Baseline

It is a collaborative filtering based on regression model. If we think of ratings as continuous values rather than discrete values, then linear regression can be used to predict how a target user will rate an item. One such implementation strategy is called Baseline.

Baseline design idea

  • Some users generally score higher than others, and some users generally score lower than others. For example, some users are naturally willing to praise others and are soft and easy-going, while others are more demanding and always score no more than 3 out of 5.

  • Some items are generally rated higher than others, and some items are generally rated lower than others. For example, the status of some objects is determined by their production. Some are more popular than others.

    The difference between a user or an item that is generally above or below the average is called bias.

Baseline target

  • Find the offset value bu that is generally higher or lower for each user than for others

  • Find the bi bias value for each item that is generally higher or lower than other items

  • Our goal is to find the optimal BU and BI


So the square variance is going to be, r u i This is the user u For items i The score, r ^ u i This is the predicted value Predictive value r ^ u i = The average of all the scores + b u + b i namely r ^ u i = u + b u + b i C o s t = u . i R ( r u i r ^ u i ) 2 = u . i R ( r u i u b u b i ) 2 So the square variance is r_{UI} is how much user U rated item I, \ widehat r_ {UI} represents the projections forecast \ \ \ widehat r_} {UI = all scoring average + b {u} + b_ {I} \ \ \ namely widehat r_ {u}} {UI = u + b + b_ {I} \ \ Cost = \ sum_ {u, I ∈ R} (r_ (UI} – \ widehat r_} {UI) ^ {2} = \ sum_} {u, I ∈ R (r_ {u} {UI} – u – b – b_ {I}) ^ {2} \ \

To prevent overfitting, regularize


C o s t = u . i R ( r u i u b u b i ) 2 + Lambda. ( u b u 2 + i b i 2 ) Cost = \ sum_ {u, I ∈ R} (r_ {u} {UI} – u – b – b_ {I}) ^ {2} + \ lambda * (\ sum_ {u} b_ {u} ^ {2} + \ sum_ {I} b_ {I} ^ {2})

Gradient descent

The above equation is already a binary function of bu and bi


J ( Theta. ) = f ( b u . b i ) J ( Theta. ) : = J ( Theta. ) Alpha. J ( Theta. ) J(\theta) = f(b_u,b_i)\\ J(\theta) := J(\theta) – \alpha \nabla{J(\theta)}

partial partial b u J ( Theta. ) = partial partial b u f ( b u . b i ) = 2 u . i R ( r u i u b u b i ) ( 1 ) + 2 Lambda. u b u partial partial b i J ( Theta. ) = partial partial b i f ( b u . b i ) = 2 u . i R ( r u i u b u b i ) ( 1 ) + 2 Lambda. i b i \ frac {\ partial} {\ partial {b_u}} J (\ theta) = \ frac {\ partial} {\ partial {b_u}} \ sum_ f (b_u b_i) = 2} {u, I ∈ R (r_ (UI} – u – b_u – b_i) \cdot(-1)+2\lambda\sum_{u}b_u\\ \frac{\partial}{\partial{b_i}}J(\theta) = \frac{\partial}{\partial{b_i}}f(b_u,b_i) = 2 \ sum_ {u, I ∈ R} (r_ (UI} – u – b_u – b_i) \ cdot (1) + 2 \ lambda \ sum_ {I} b_i \ \

Plug in gradient formula


b u : = b u Alpha. partial partial b u J ( Theta. ) = b u Alpha. ( 2 u . i R ( r u i u b u b i ) ( 1 ) + 2 Lambda. u b u ) because Alpha. It’s artificially controlled. The formula can be reduced to b u : = b u Alpha. partial partial b u J ( Theta. ) = b u + Alpha. ( u . i R ( r u i u b u b i ) Lambda. u b u ) b i : = b i Alpha. partial partial b i J ( Theta. ) = b i + Alpha. ( u . i R ( r u i u b u b i ) Lambda. i b i ) B_u: = b_u – \ alpha \ frac {\ partial} {\ partial {b_u}} J (\ theta) = b_u – \ alpha (2 \ sum_} {u, I ∈ R (r_ (UI} – u – b_u – b_i) \cdot(-1)+2\lambda\sum_{u}b_u)\\ Because \alpha is artificially controlled, Formula can be simplified as \ \ b_u: = b_u – \ alpha \ frac {\ partial} {\ partial {b_u}} J (\ theta) = b_u + \ alpha (\ sum_} {u, I ∈ R (r_} {UI -u-b_u-b_i)-\lambda\sum_{u}b_u)\\ b_i:= b_i – \alpha \frac{\partial}{\partial{b_i}}J(\theta)= bi +\alpha (\ sum_ {u, I ∈ R} (r_ (UI} – u – b_u – b_i) – \ lambda \ sum_ {I} b_i)

In the above formula, we need to calculate the sum of each user’s score for the item and the predicted score, and we can use random gradient descent

Since the stochastic gradient descent method essentially uses the loss of each sample to update the parameters, instead of calculating the total loss sum each time


e r r o r = r u i r ^ u i = r u i u b u b i b u : = b u + Alpha. [ ( r u i u b u b i ) Lambda. b u ] b i : = b i + Alpha. [ ( r u i u b u b i ) Lambda. b i ] error = r_{ui} – \widehat r_{ui} = r_{ui} – u – b{u} – b_{i}\\ b_u:= b_u +\alpha [(r_{ui} -u-b_u-b_i)-\lambda b_u]\\ b_i:= bi +\alpha [(r_{ui} -u-b_u-b_i)-\lambda b_i]

Algorithm implementation

# data files can be downloaded here https://grouplens.org/datasets/movielens/ import pandas as pd import numpy as np from scipy. Interpolate import make_interp_spline class BaselineCFBySGD(object): def __init__(self, number_epochs, alpha, reg, columns=["uid", "iid", "rating"]): Number_epochs = number_epochs # learning rate or step self.alpha = alpha # reg = reg # Self. columns = columns def fit(self, dataset): "" :param dataset: uid, iid, rating :return: Print (self.dataset. Itertuples (index=False)) self.users_ratings = "self dataset.groupby(self.columns[0]).agg([list])[[self.columns[1], Self.columns [2]]] # Self. Items_ratings = dataset. Groupby (self.columns[1]).agg([list])[[self.columns[0], Self.columns [2]] self.global_mean = self.dataset[self.columns[2]].mean() Self.bi = self.sgd() self.showchart () def SGD (self): self.bi = self.sgd() self.showchart () def SGD (self) Bu, bi "" # initialize the value of bu, bi, Bu = dict(zip(self.users_ratings.index, np.zeros(len(self.users_ratings)))) bi = dict(zip(self.items_ratings.index, np.zeros(len(self.items_ratings)))) for i in range(self.number_epochs): print("iter %d" % i) for uid, iid, real_rating in self.dataset.itertuples(index=False): error = real_rating - (self.global_mean + bu[uid] + bi[iid]) bu[uid] += self.alpha * (error - self.reg * bu[uid]) bi[iid] += self.alpha * (error - self.reg * bi[iid]) return bu, bi def predict(self, uid, iid): Self. Bi [iid] return predict_rating def showChart(self): Import matplotlib.pyplot as PLT plt.rcparams ['font. Sans-serif '] = ['SimHei'] # Plt.rcparams ['axes. Unicode_minus '] = False # userId = self.users_ratings.index. Values [10] # Movie_ids = [] real_ratings = [] for uid, iid, real_rating in self.dataset.itertuples(index=False): if uid == userId: movie_ids.append(iid) real_ratings.append(real_rating) model = make_interp_spline(movie_ids, real_ratings) xs = np.linspace(min(movie_ids), max(movie_ids), 1000) ys = model(xs) plt.plot(xs, ys, color='green', label='training accuracy') predict_ratings = [] for iid in movie_ids: predict_ratings.append(self.predict(userId, iid)) model = make_interp_spline(movie_ids, predict_ratings) xs = np.linspace(min(movie_ids), max(movie_ids), 1000) ys = model(xs) plt.plot(xs, ys, color='red', label='training accuracy') plt.xticks([]) plt.yticks([]) plt.show() if __name__ == '__main__': dtype = [("userId", np.int32), ("movieId", np.int32), ("rating", np.float32)] dataset = pd.read_csv("datasets/ml-latest-small/ratings.csv", usecols=range(3), Dtype =dict(dtype)) BCF = BaselineCFBySGD(10, 0.1, 0.1, ["userId", "movieId", "rating"]) bcf.fit(dataset) # while True: # uid = int(input("uid: ")) # iid = int(input("iid: ")) # print(bcf.predict(uid, iid))Copy the code