Introduction: This paper will introduce SIGIR 2018 paper Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks, hoping to inspire you.


Author: Yang Kunlin, a 2015 undergraduate student. His current research interests include knowledge graph and recommendation system, from Beijing Key Laboratory of Big Data Management and Analysis Methods, Renmin University of China.

Note: this article was first published on Zhihu [RUC intelligent Intelligence station], if you need to reprint, please inform the author and indicate the address of this column

The introduction

When I often visit shopping websites, my attention will be attracted by the recommendation on the home page. I usually only want to buy a small product, but it takes a lot of time and money to be recommended. Sometimes you wonder why it’s appealing, but the reason behind it isn’t always clear. This SIGIR 2018 paper proposes a new serialized Recommender (KSR) model, which utilizes Knowledge graph and memory network to improve the accuracy of recommendation results and capture more detailed user preferences. Improve the interpretability of the recommendation system.

The problem background

I believe that recommendation system is not strange to everyone, many Internet companies have also spent a lot of effort to build recommendation system. In order to make accurate recommendations, it is necessary to first understand what users are thinking and interested in, which is often not static and will change with the user’s activities on the platform. In view of this, compared with previous recommendation methods such as collaborative filtering, the academic circle proposes a serialized recommendation system based on sequential neural model, which uses the recurrent neural network RNN to capture the dynamic changes of user interest points over time.

In serialization recommendation modeling, the user’s past interaction records are usually taken as input, and each interaction record is encoded by hidden state vector to represent the user’s preference in the sequence. However, this approach has two drawbacks:

  • It only considers the user sequence preference, but ignores the detailed user preference, such as which attribute of an item the user likes.
  • The implied feature vector representation in the recommendation process is too abstract to explain the recommendation results.

solution

The main contribution of this paper lies in the use of memory network fused with knowledge base information, which not only improves the accuracy but also enhances the interpretability of the recommendation system by describing the user’s preference for item attributes in detail.

In order to capture user’s preference for item attributes and enhance explainability, knowledge base information and commodity information are combined in this paper.

In order to integrate knowledge base information into recommendation system, key-value Memory Networks are used in this paper.

The specific practices

Because only from the user interaction with the product record reflects the argot meaning model of goods, more granular enough to describe the user preferences for goods, so, in this paper, with the help of a knowledge base (knowledge base) of the entity (entity) information to recommend items in the different characteristics of properties, can be more detailed describe items already so, It also enhances interpretability. For example, if a song has attributes such as singer and album, the specific entities corresponding to these attributes can be learned through the knowledge base, and the context information of this entity in the knowledge base can be converted into an embeddings by embedding (e.g., TransE). That’s the eigenrepresentation vector. The knowledge base chosen in this paper is FreeBase, which connects the commodities in the data set with the existing entities in the knowledge base. At this point, in addition to a feature representation vector obtained from the transaction history (in this paper, bayesian personalized Ranking is used), each item will get a feature representation vector on top of each attribute.

As for how to obtain the feature representation vector in the entity, TransE is used as an example to illustrate. Each piece of data in the knowledge base is A triplet, and each triplet contains two entities and the relationship between them, such as (head entity A, relation R, tail entity B). We hope that the representation of entity A+ relation R and entity B in space is as close as possible. Sherlock Holmes (head entity A) + the leading actor (relation R) in British TV series should be approximately equal to Benedict (tail entity B).

Different from the real understanding, in the knowledge base, every entity and relation is a vector in space, and we can get the corresponding feature representation vector by training with the error minimization of triples. But now there’s a question. The lead role includes not only Benedict but also Martin, so how does that distinguish Benedict from Martin? For each target entity, this paper does not directly use the feature of the target entity to represent the vector, but uses the object feature to represent the vector + relational feature to represent the vector. For example, the feature representation vector of Benedict is not directly used, but the sum of the feature representation vector of Sherlock Holmes + the leading actor is used as the feature representation vector of the director attribute of Avatar film. The following formula represents the a property of item IEntity by item IAnd attribute AIs represented by the sum of the relations between.

Of course, each object has so many attributes that it would be too computationally expensive to splicing all the feature representation vectors of each attribute together. In this paper, each attribute vector is weighted and summed to obtain a vectorTo represent the fine-grained interests of each user, the weights here

Is the user’s attention to each attribute, weight and attribute keyAnd the current sequence preference, as shown in the following formula:

So, once you’ve solved the problem of fine-grained presentation and interpretation, it’s time to start thinking about how to integrate it with the serialized recommendation model. This paper uses Gated Recurrent Unit (GRU) to build a serialized recommendation model. However, although GRU can remember the information of several neighboring nodes, its memory is still too short for long-term memory to store the entity information of knowledge base. Therefore, Memory Networks are introduced in this paper to store information of different attributes and interact with knowledge base.

Memory network uses precise memory mechanism to solve the information memory problem of neural network. It uses an external array to store the data to be “remembered”, which can be read repeatedly by the neural network, and the information stored in the array can also be updated or expanded. In this paper, when using Memory network to manage the properties of items, considering that each property of items is relatively independent, this paper expands the basic Memory network and uses key-value Memory Networks (KV-MN) to better use information. Among them, the Key is relation in the knowledge base. Value is the corresponding entity. Note that keys represent the relationship between items and attributes, not users, so all users share the same key matrix; Different users have different preferences for different attributes, so the value matrix is private to users, and each user has his or her own value matrix. Therefore, it is not difficult to find that items in the recommendation model and key-value pairs in the memory network exactly correspond to triples of knowledge base (items, keys (attributes) and values).

Now the problem is the combination of memory network and GRU. In order to consider both user sequence preference and commodity attribute preference, a sequence preference representation vector can be obtained at each node recommended by GRU. By using this vector to query the key-value memory network, user preference representation for each attribute can be obtained. By splicing the two vectors together, the new vector can describe the user’s attributes in a more comprehensive way, including the user’s preference in the sequence and the user’s detailed preference for object attributes. The general process can be referred to the following figure:

Overall frame diagram of the model

As can be seen from the figure above, for a key-value memory network, there is a write and read process, which is also the second most basic operation of a memory network. For the read operation, as mentioned earlier, you need to enter a query, which is the sequence preference vector for the current momentAccording to this vector, the final preference vector can be obtained after the weighted sum of sequence preference vector and attribute preference quantity as described above

For write operation, the purpose is to update the degree of interest in each attribute according to the current item of interest, so two factors need to be considered at the same time: the feature representation vector of the current item and the attribute preference vector of the current user. In order to determine how much information should be updated for each attribute, this paper first calculates a gate vector z_A, through which the attribute preference vector is updated. The consideration factor of door vector is item feature representation vectorAnd the characteristic representation vector of the corresponding attribute, see the following formula for details:

After such updates, the model in this paper can detect users’ interest preferences at the user’s attribute level for a long time and update them into the recommendation system in time. Write operations are as follows:

The model of this paper is shown in the figure below:

Detailed model diagram

Firstly, the GRU network obtains the user sequence interest representation vector based on short-range memory. Then, according to the representation vector of GRU and the feature representation vector of the product itself, the detailed user preference feature representation vector is obtained by the key-value pair memory network, which is connected to the sequential interest vector. Finally, the reasons for the recommendation system can be explained according to the weight of users to each interest.

See the following figure for details of how to reflect the explanatory nature of the model:

Recommended process diagram

The top of the diagram represents the timeline; The second row represents the attributes that each good has, namely the keys in the memory network, in this case singers and albums; The third behavior is a list of recommendations generated for each attribute. It can be seen from the second line in the figure that, at the beginning of initialization, the recommendation system thinks that the user prefers the album of the song (at the beginning, the artist has less weight and the box is lighter, while the album has more weight and the color is darker). Over time, the recommendation system gradually found that users preferred the artist of the song to the album (the artist box became darker, the album became lighter). It can be found from the third line that the judgment of the recommendation system is wrong at the beginning and the recommendation list generated is not so accurate. However, as time goes by, the judgment tends to be accurate and the recommendation reason is also given that users want to listen to the songs of this singer rather than this album.

conclusion

In view of the fact that serialized recommendation system does not have interpretability and cannot obtain user’s fine-grained features, this paper proposes to use memory network combined with knowledge base to enhance feature capture capability and interpretability of recommendation system, which not only achieves higher accuracy, but also makes recommendation system have stronger interpretability. The experimental results show that this model has a significant breakthrough in accuracy and interpretation compared with previous models.

In this paper, the original link: pan.baidu.com/s/1WWhpHBjY… Password: v19f

(SIGIR 2018 Proceedings has not yet been released and will be replaced with a web disk)