In the process of dealing with actual machine learning problems, id features are often encountered, such as user ID, user gender, product ID, etc. Here, will usually use and learn the common solution ideas for a summary.

1. OneHot

This is the most common processing method for id class features. In this case, the ID class features can be enumerated, and each value is 1 and the other bits are 0. For example, the genders are male, female and unknown. So male is [1,0,0].

2. Multi-OneHot

In addition to the above cases, a variable may have more than one value at the same time. For example, in the e-mall scenario, where a user may interact with multiple items in their historical behavior, the OneHot describing the item the user interacts with has more than one bit of one, such as [1,0,0,1,0].

3. Statistical methods

The use of statistical methods to represent ID features generally refers to the representation of ID features in the dimension with statistical significance (occurrence frequency, co-occurrence times, etc.). The more common statistical models include word bag model, TFIDF, etc. The model constructs a new vector to represent the original ID sequence by calculating the correlation statistics of each ID in the id sequence.

4. Embedding

If the value range of id characteristic dimension is too large, the above two methods are easy to cause the situation of excessive characteristic dimension and sparse samples. In fact, if you use Logistic regression to fit sample data, it basically doesn’t need embedding, because LR is very suitable for large-scale sparse sample data. However, if you want to try the neural network correlation model, the sparse OneHot results need to be added to get the dense representation of features. Embedding is a large technical category, and some common methods are described below.

4.1 Unsupervised Sequence Embedding

Currently, unsupervised embedding is mainly applied to serialized features or data. Among them, the more classic ones are Word2vector and Glove. Word2vec idea originated in NLP, but gradually applied to all kinds of orderly sequence of scenario, the principle of zhuanlan.zhihu.com/p/26306795 can reference. The idea of Word2VEc can be applied to many scenarios. In the scenario of commodity recommendation, the vector representation of item can be learned by using word2vector in combination with user’s historical purchase sequence, namely Item2vec. In addition, song2vec (song embedding) and moive2vec(movie embedding) can also be extended.

4.2 Graph Embedding

If the id class features have the connection similar to graph, graph algorithm-based embedding model can be used to learn the representation of features, such as deepwalk, graph embedding, etc. Similar to Word2vec, deepWalk. Random walk is used to get the ID sequence, and then word2vec is used to get the results of each ID.

4.3 DNN embedding layer

Compared with unsupervised embedding method, embedding layer is added after input layer in DNN, so that the representation of ID type features can be obtained through training. Most mainstream recommendation models applied in recommendation scenarios, such as DeepFM/DIN, adopt this embedding layer to implement the following operations on ID class features.