The principle of FM algorithm

FM full name Factor Machine, Chinese translation for Factor factorization Machine.

The expression of FM is:

The first is feature combination. By combining two features, the cross-term feature (the last term in the expression) is introduced to solve the problem of insufficient learning under sparse data. For example, the learning of the original polynomial regression parameter W12W12 can only depend on the features x1x1 and x2x2; For the parameter ⟨v1, V2 ⟩, ⟨v1, V2 ⟩ it is quite different, consisting of v1v1 and v2v2. For each vector, it can be learned by multiple cross combination features, such as x1x2,x1x3,… X1x2, x1x3,… Learning, so that the number of non-zero samples to learn from is greatly increased.

The second is the high-dimensional disaster. By introducing the implicit vector (matrix decomposition of the parameter matrix), the dot product of the implicit vector can represent the relationship between the original two unrelated parameters, so that the number of parameters is greatly reduced from N ×(n−1)/2 to NK, and the parameter estimation of the feature is completed.

Computational complexity of FM algorithm

The complexity of the pinions should be O(k*n^2), but in fact, by a clever mathematical transformation, it becomes O(kn). The transformation formula is shown below, in fact, using the idea of 2xy = (x+y)^2 – x^2 – y^2.

The principle of FFM algorithm

Field-aware Factorization Machines for CTR Prediction

www.csie.ntu.edu.tw/~cjlin/pape…

By introducing the concept of field, FFM attributes features of the same nature to the same field, such as “Day=26/11/15”, “Day=1/7/14” and “Day=19/2/15”, which represent dates and are placed in the same field

When “Day=26/11/15” is combined with Country feature and Ad_type feature, different field-aware vectors are used, because the fields of Country feature and Ad_type feature are different

DeepFM algorithm principle

DeepFM: A Factorization Based Neural Network for CTR Prediction, 2017

Arxiv.org/abs/1703.04…

FM can do feature combination, but the calculation is large, generally only consider the second-order feature combination

How to consider both low order (order 1 +2) and high order features => DeepFM=FM+DNN

An end-to-end model structure is designed => without feature engineering

It works well in various benchmark and projects

Libfm

Download at github.com/srendle/lib…

Documentation: www.libfm.org/libfm-1.42….

Tools provided by Steffen Rendle, FM Paper author (2010)

In THE KDD CUP 2012, as well as the Music Hackathon has achieved good results

Not only for recommendation systems, but also for machine learning (classification problems)

Three optimization algorithms are realized: SGD, ALS and MCMC

Supports two input data formats: text (recommended) and binary

Libffm

libFFM

Github.com/ycjuan/libf…

Providing a Python interface

Support LR, FM, FFM algorithm

Run efficiently, faster than libfm, libffm

Xlearn

xlearn

Xlearn – doc – cn. Readthedocs. IO/en/latest/I…

Providing a Python interface

Support LR, FM, FFM algorithm

Run efficiently, faster than libfm, libffm

Deepctr

Github.com/shenweichen…

Various CTR depth models are implemented

Compatible with Tensorflow 1.4 and 2.0

Neighborhood based collaborative filtering UserCF, ItemCF

Collaborative filtering is one of the classical algorithms of recommendation system

Community-based Collaborative Filtering

UserCF: Recommend items to the user that other users with similar interests like

ItemCF: Recommend items to the user that are similar to their previous favorites

Definition of similar neighbors

Called K-Neighborhoods

Regardless of the distance principle, only a fixed number of K nearest neighbors are taken

K – on his Neighbor, KNN

The neighbor based on similarity threshold falls in the region with the current point as the center and distance K as the neighbor

The way similarity is calculated

• The default value is MSD. You can also set the baseline to cosine, Pearson, pearson_baseline

• MIN_support, minimum support, to filter users or products

• Shrinkage: Shrinkage parameter (related only to Pearson correlation similarity). The default value is 100

The neighborhood based collaborative filtering algorithm in Surprise tool is used