Matrix Factorization is one of the most commonly used collaborative filtering algorithms for recommendation systems. This set of algorithms used to be the mainstay of recommendation systems and are useful in many places even now. Here, we simply sort out the thinking from the most basic Matrix Factorization (MF) algorithm to its derived Probabilistic matrix Factorization (PMF). Specific solutions can be referred to the recommendation algorithm – based on matrix decomposition recommendation algorithm and probability matrix decomposition of these two blogs.

Matrix Factorization

In short, MF can be considered as a solution to the problem under the guidance of Latent Factor Model, which is actually a branch of collaborative filtering method. The author introduced collaborative filtering in a previous article.

In the collaborative filtering method, we mention the existence of user-item-preference matrix. The main idea of Latent Factor model is that there are some invisible hidden variables representing user preferences, user preferences can be fully represented by these hidden variables, these hidden variables can also determine the user’s preference for item. Expressed in user-item-preference matrix, that is, we can decompose the User-item-preference matrix into the product of two matrices.

Let’s say we haveA user,The item,The user-item-preference matrixTo say,Represents the matrix of user against Latent factor,Represents the matrix of item against Latent factor. Under the assumption of latent Factor model, the matrix decomposition algorithm can be expressed as. Specifically, forEach of the, there are, i.e.,.

When we solve this problem, the loss function is zero

.

Take the negative gradient of the loss function, then

Using these two negative gradient formulas, we can use the gradient descent method for matrix decomposition.

Probabilistic Matrix Factorization

Above is the basic MF algorithm. PMF adds a Gaussian probability function on the basis of MF. The purpose of adding this function is to solve the problem that users with few scores are difficult to obtain accurate results, that is, to solve the user cold start problem. It turns the above formula into a conditional distribution based on an observation matrix:

Meanwhile, it is assumed that the prior distribution of user and item feature is as follows:

Then their joint posterior distribution can be expressed as:


The actual method used in the original paper is more complicated, because a sigmoid function is added:

Hyperparameter estimation

If you want to estimate U, V and hyperparameters at the same time, the posterior above will become

The maximum a posteriori estimate at this point is solved by the EM algorithm.