0. Write first

Recommendation system practitioners must be familiar with FM(Factorization Machines) model. A series of optimization models have been proposed based on FM in the industrial and academic circles, and these models are still widely used in various scenarios. This article will lead you to review the FM model and explore its principles.

Personal experience:

  1. Compared with LR, FM introduces second-order feature combination
  2. The feature parameters are estimated by introducing hidden vector into matrix decomposition, which solves the feature sparsity problem and greatly reduces the number of parameters
  3. Compared with Matrix Factorization (MF), FM can introduce other features other than user features and item features, with a wider application range

Thesis Address:

www.csie.ntu.edu.tw/~b97053/pap…

1. The background

In the initial stage of personalized recommendation system, collaborative filtering has been widely used. Collaborative filtering realizes personalized recommendation based on the idea that similar items have the same audience or similar users have the same preference. In order to solve the sparse problem of collaborative filtering UI matrix (user-item matrix), MF is proposed to estimate the score between users and items that do not interact. However, collaborative filtering can only consider the interactive characteristics of users and items, and the characteristics of users and items themselves, and the model cannot be effectively utilized. In order to make full use of features, LR(Linear Regression) models are applied to recommended scenes. However, as is known to all, LR is a Linear model that cannot fit second-order and higher-order features. In order to solve this problem, and at the same time to ensure that the model can be applied, FM model came out.

2. Model architecture

Let’s start by looking at a typical set of features in a recommended scenario, as shown in the figure below.

After feature coding for id class features, we get a new feature combination as

As mentioned above, if first-order linear model LR is used to fit features, LR can be expressed as


y = w 0 + i = 1 n w i x i y=w_{0}+\sum_{i=1}^{n}w_{i}x_{i}

LR here only considers the first-order features of the fitting sample and does not consider the case of second-order features. However, second-order combinatorial features, such as “sex = male, color = blue”, are very meaningful. Then, LR considering the second-order combination characteristics can be expressed as


y = w 0 + i = 1 n w i x i + i = 1 n 1 j = i + 1 n w i j x i x j y=w_{0}+\sum_{i=1}^{n}w_{i}x_{i}+\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}w_{ij}x_{i}x_{j}

Compared with first-order LR, second-order LR has more second-order features, that is, features are combined in pairs and enumerated. There are two problems with this place:

  1. The number of second-order characteristic parameters is N2n ^{2}n2, and the number of parameters increases exponentially.
  2. Learning problems caused by sparse samples. After encoding, most bits of the category variable are 0. After combination with other features, the value is still 0, leading to the situation of sparse samples. In the training process of LR, for sparse samples, the parameters corresponding to features with a value of 0 will be difficult to be updated.

FM model effectively solves the above problems while completing the work of second-order feature combination, so it is widely used in all kinds of second-order feature intersection scenes. The implementation form of FM model is shown in the following formula


y = w 0 + i = 1 n w i x i + i = 1 n 1 j = i + 1 n < v i . v j > x i x j y=w_{0}+\sum_{i=1}^{n}w_{i}x_{i}+\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}< v_{i},v_{j} >x_{i}x_{j}

It can be seen that, except for the difference between the second-order characteristic term and the second-order LR model, other parts are completely the same. Now, let’s talk about this part in detail.

The second order term of FM model is still the case of binary cross of enumerated features, but with a slight difference, the parameters of the second order cross feature change from the scalar wi,jw_{I,j}wi,j to the dot product of the vectors Viv_ {I}vi, vjv_{j}vj. Viv_ {I}vi, vjv_{j}vj As shown below, each one-dimensional first-order feature xix_{I}xi has its corresponding vector viv_{I}vi.


V n x k = [ a 11 a 12 a 1 k a 21 a 22 a 2 k a n 1 a n 2 a n k ] = [ v 1 v 2 v n ] V_{n\times k}= \begin{bmatrix} a_{11}& a_{12}& \cdots & a_{1k}\\ a_{21}& a_{22}& \cdots & a_{2k}\\ \vdots & \vdots & \ddots & \vdots \\ a_{n1}& a_{n2}& \cdots & a_{nk} \end{bmatrix} =\begin{bmatrix} v_{1}\\ v_{2}\\ \vdots\\ v_{n} \end{bmatrix}

As can be seen, viv_{I}vi, vjv_{j}vj is a vector with dimension KKK, where KKK is a hyperparameter set by the model user. So, here we can calculate that the number of parameters used by FM is N ∗ kN * kN ∗ K, which is the size of the vector matrix above. Where, NNN is the number of first-order features, and KKK is the vector length (KKK << NNN). Compared with the number of parameters for n∗nn*nn∗n in second-order LR, FM model can be said to greatly reduce the number of model parameters and solve the first problem above at the same time. Since each first-order feature has its corresponding vector VVV, and the update of each vector during training no longer interacts with only one vector, but with all vectors,

For example, even if xixjx_{I}x_{j}xixj does not appear together in all samples, it does not affect the weight of wiJW_ {ij} Wij training, because viv_{I}vi will be trained according to vkv_{k}vk (k! =i,jk ! = i,jk! = I,j), no longer dependent on vjv_{j}vj, similarly, vjv_{j}vj will also be based on vkv_{k}vk (k! =i,jk ! = i,jk! = I,j) and no longer depends on viv_{I}vi.

Thus, FM model solves the problem of parameter learning and updating in the case of sparse samples.

3. Summary

As a classical model in the recommendation system, FM model solves the problem of parameter explosion and parameter update under sparse samples in the traditional LR model after feature second-order crossover, so it is widely used in various recommendation scenarios. Following the FM model, the academia and industry also try to carry out FM optimization work from different angles, and a number of excellent models applied to the recommendation scenario emerge.