0. Write first

After introducing DeepFM model last time, this paper introduces another work of combining FM model idea with Neural network — NFM (Neural Factorization Machine). Let’s have a look at the differences and optimization points of NFM compared with FM model and DeepFM model.

Personal experience:

  1. Bi – Interaction pooling layer, a feature cross layer of element-wise product

Thesis Address:

Arxiv.org/pdf/1708.05…

1. The background

DeepFM successfully introduces neural network into FM model and achieves ideal results. In addition, scholars in the recommendation field also proposed the NFM model, which also introduced the idea of explicit feature crossing into the neural network. Similar to DeepFM, the structure of THE NFM model still follows the similar two-part structure, including shallow-part (shallow) and DNN-part (deep). Dnn-part in NFM is described in detail below.

2. Model architecture

The NFM model architecture is shown in figure 1. As mentioned above, NFM is divided into two parts, namely, linear part and neural network part. The linear part has the ability of sample memory, while the neural network part is responsible for the complex work of feature extraction, feature crossover and specific sample generalization. Let’s focus on the DNN part.

The processing methods of NFM neural network for input samples are completely consistent. The embedding dimension of onehot and continuous features is consistent, which is equivalent to the dimension of hidden vector in the second-order feature cross term of FM model. After the embedding vector of each feature field goes through a bi-interaction pooling layer, the new vector which is consistent with the embedding vector dimension is output and finally sent to the full connection layer. Therefore, except for the bi-interaction pooling layer just mentioned, other parts of NFM are ordinary. Let’s see what functions the Bi-interaction pooling layer, as an independent layer, realizes.

3. bi-interaction pooling layer

Previously, we said that DeepFM realizes feature intersection through vector inner product of feature embedding. The focus is on “feature intersection”. In NFM, the Bi-Interaction layer is the key layer to achieve feature intersection. The mode of feature crossing of bi-interaction pooling layer is shown in the following formula.

Where xix_{I}xi represents the eigenvalue, viv_{I}vi represents the corresponding embedding vector of eigenxix_ {I}xi, and @@@ operation symbol represents element-wise product, that is, the multiplication of the corresponding elements of two vectors. The vector dimension obtained is the same as that of the embedding vector in each feature domain. After all feature fields are crossed in pairs, a series of results after feature crossing can be obtained. Finally, all results are carried out sum pooling operation to obtain the final feature vector representation after feature crossing. By comparing the feature crossing mode of DeepFM, we can find the bi-interaction pooling layer:

  1. Feature pooling is adopted to replace the transverse connection operation of second-order feature vectors in DeepFM, and the obtained result vectors are optimized from N ∗ kN * kN ∗ K dimension to K dimension, greatly reducing the number of training parameters. (NNN is the number of feature fields, KKK is the dimension of embedding vector)
  2. Information loss may occur when second-order feature information is integrated by sum pooling

4. To summarize

NFM is the first work that introduces bi-interaction layer into neural network and puts forward corresponding optimization points in feature crossing. Many subsequent work in the recommendation field refer to the idea of Bi-interaction pooling in NFM. On the other hand, it also shows the representativeness of NFM in the development of recommendation algorithm.