Recommendation system based on matrix decomposition algorithm

Recommendation system

The recommendation system can recommend different things to users according to their preferences.

Recommended system type:

  1. Set the recommended content manually
  2. Rank items by sales, exposure, etc., and recommend them to users
  3. According to different algorithms, integrate data of different dimensions to recommend items intelligently

Simple recommendation system model

Set:

U stands for all users

P is the collection of all items

R is the user’s preference for the item

Model(R) = U * P

Algorithm core:

Through the user’s rating of different items, to predict the user’s preference for other items. The attributes of users and items, such as age, gender, education background, job, price, category and appearance, are not considered here.

Through the user to the item score, can establish a recommendation value matrix, then can calculate the matrix to predict the user’s preference, that is, matrix decomposition algorithm!

Matrix decomposition:

The recommended value matrix R is decomposed into matrix U and matrix P, so that the product of U and P results in elements in the new matrix R* that are very close to the values of known elements in R, so the values in R* corresponding to unknown elements in R are the predicted values.

Recommended value matrix:

A brief history of time Wanli thirty years Daqin empire A dream of red mansions Mathematics a brief history of
Xiao Ming 1 4 1
wang 2 2 4
Xiao li 4 1 4
Xiao zhang, 5 1 4

Key issues of recommendation value matrix:

  1. Initial value acquisition, data collection
  2. Predict unknown data from known data in the recommendation value matrix
  3. Establish an evaluation system to verify the effectiveness of the recommendation system

To collect data

Generally, we can adopt the way of web crawler, for example, to score data, we can crawl the data on Douban reading, and we can also collect user information by burying points on websites that we can control.

Predicting unknown data

Key challenges:

  • When the number of users and items are relatively large, usually recommend the matrix is a sparse matrix (in the matrix, the number of elements if a value of 0 is far more than the number of non-zero elements and non-zero element distribution without law, says that the matrix of the sparse matrix), shows that most users may not express preferences for most of the items.

  • Cold start is a problem that every recommendation system needs to face.

Example of matrix decomposition:


That is:

By comparing the leftmost element matrix with the rightmost prediction matrix, the element value in the missing value position of the original matrix is the predicted value.

And you can also get

That is, the preference data of the item at position ij can be represented by the portrait vector of the ith user and the portrait vector of the JTH item.

The graphical representation is as follows:

The mathematical meaning of K is the rank of matrix decomposition, and the business meaning is the k factors that affect the user’s rating of items. At present, we cannot directly know the value of K, and generally adopt the way of cross verification to find the optimal value of K during model training.

We can use the sum variance as the loss function

This is done by using the {}, calculate the “and variance” to minimize it, that is, the closer the predicted value is to the true value. That gives us the values of U and P that we need.

The gradient of the loss function

Separate extraction error

The derivative of the error L on U and P can be obtained

Now that we know the gradient (derivative) of the loss function, we can use gradient descent to solve for U and P.

Gradient descent method

The optimal solution can be obtained by randomly selecting a starting point and continuing training in the direction of negative gradient until the gradient of the loss function is closer and closer to zero.

Introduce regularization

In order to prevent over-fitting, regularization parameters are added to the loss function

Lambda > 0

In this way, when U and P are both guaranteed to be relatively small, and the value of U or P changes dramatically, the dot product of U and P will not change much.

The final loss function is:

+

The gradient of the final loss function is:

The gradient descent method is used to find the optimal solution

The gradient descent rate γ (learning rate) and k value were set, and U and P were randomly initialized. The training was repeated until the error was satisfied.

Evaluation and recommendation system

  • Basically, we train the model through the training set, test the model through the test set, and if the model’s performance on the test set meets our expectations, the model can be deployed online. The average absolute deviation is generally used to verify the predicted value of the model

N: The total number of recommended values in the test set

: Recommended value of item P by the real user U

: The predicted user u’s recommended value for item P

  • Online A/B testing

The project of actual combat

The data set format is as follows:

1	1119	9.000000
1	167	8.000000
1	6265	8.000000
1	1440	9.000000
1	1427	9.000000
1	5404	8.000000
1	259	7.000000
1	4156	8.000000
2	419	9.000000
2	415	10.000000
2	2834	9.000000
2	228	10.000000
2	107	10.000000
2	440	9.000000
2	44	10.000000
2	455	10.000000
Copy the code

The first column is user ID, the second column is item ID, and the third column is the corresponding score (1-10)

The overall code is based on the Surprise library and can be installed first

pip install scikit-surprise
Copy the code

Let’s import the related libraries and datasets

import numpy as np
import surprise
from surprise import BaselineOnly
from surprise import Dataset
from surprise import Reader
from surprise import accuracy
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split


reader = Reader(line_format='user item rating', sep='\t', rating_scale=(1.10))
data = Dataset.load_from_file('book_ratings.dat.txt', reader=reader)
# Randomly divide the data into training and test data sets
trainset, testset = train_test_split(data, test_size=25.)
Copy the code

According to the formula, define the algorithm function

class MatrixFactorization(surprise.AlgoBase):
    def __init__(self, lr, n_epochs, n_factors, lmd):
        self.lr = lr  # Learning rate of gradient descent
        self.n_epochs = n_epochs  # Number of iterations of gradient descent method
        self.n_factors = n_factors  # The rank of the decomposed matrix, which is the hidden factor that affects the user's rating
        self.lmd = lmd  # regularize parameters
    
    def fit(self, trainset):
        print("Fitting data...")
        Initialize u and p matrices randomly
        u = np.random.normal(0.1., (trainset.n_users, self.n_factors))  # mean 0, variance 0.1, (number of rows, number of columns)
        p = np.random.normal(0.1., (trainset.n_items, self.n_factors))
        
        Gradient descent
        for _ in range(self.n_epochs):
            print("Round:", _)
            for i, j, r_ij in trainset.all_ratings():
                # Here is just applying the formula above
                # u_old[i] = u[i]
                err = r_ij - np.dot(u[i], p[j])
                u[i] -= -self.lr * err * p[j] + self.lr * self.lmd * u[i]
                p[j] -= -self.lr * err * u[i] + self.lr * self.lmd * p[j]
        
        self.u, self.p = u, p
        self.trainset = trainset
        print("End fitting!")
        
    def estimate(self, i, j):
        if self.trainset.knows_user(i) and self.trainset.knows_item(j):
            return np.dot(self.u[i], self.p[j])
        else:
            return self.trainset.global_mean  # return average value
Copy the code

Finally, train, predict and evaluate

Algo = MatrixFactorization(0.005, 60, 3, 0.2) Algo.fit (trainset) Predictions = Algo.test (testset) accuracy. Mae (Predictions)Copy the code

We can adjust the learning rate, the number of iterations, the number of hidden factors and regularization parameters to train different models and evaluate the results to obtain satisfactory models.