This article was published in September 2018 and is an improved version of the previous DIN. As we mentioned in Ali’s CTR prediction (I) : Deep Interest Network, they tried to use LSTM to learn the characteristics of serialized data, but there was no improvement in DIN results. DIEN has improved on this.

Deep Interest Evolution Network (DIEN)

We’ll skip the background, especially the Base Model, if you want to read it, you can go to the previous article. Now let’s go straight to the structure of DIEN. The biggest characteristic of DIEN is not only to find users’ interest, but also to grasp the evolution process of users’ interest. The authors fuse grUs into the network to capture the changing sequence. Understanding GRU Networks if you are interested in an introduction to GRU, please check out this article.

It can be seen that the Embedding Layer still exists in DIEN and the Embedding method is the same as before. The processing methods of other User profile, target AD and context feature are the same, but the User behavior is organized into the form of sequence data. A simple activation unit completed by cross product is transformed into a attention-based GRU network.

Interest Extractor Layer

Now let’s specifically talk about the Interest Extractor Layer, which is the Layer where the GRU unit is. As the name implies, the main goal of this Layer is to extract interest from the embedding data. However, the interest of a user at a certain time is not only related to the current behavior, but also related to the previous behavior, so the authors use GRU unit to extract interest.

This is an expression of a GRU unit, whereRepresents the embedding of BEHAVIOUR at t time,Is the sigmoid function,Stands for element-by-element multiplication.

They also introduced one in this step, used to assist Interest Extractor. Because ordinaryInformation can only be extracted from embedding, but not necessarily interest. The final action is only determined by the final interest, and other incentives are needed to retain the interest in the intermediate state. With the introduction ofThe following

And set the final model target as, the GRU can extract interest information in the intermediate state.

Interest Evolution Layer

Take users’ interest in clothes as an example. With the changing seasons and fashion trends, users’ interest will also change constantly.

  • Tracking users’ interest can enable us to include more historical information when learning the expression of final interest
  • CTR prediction can be better performed according to the changing trend of interest, and interest follows the following rules during the changing process:
  • Interest drift: Users’ interest in a certain period of time will have a certain concentration. For example, a user might keep buying books at one time and clothes at another.
  • Interest individual: a kind of interest has its own development trend, and different kinds of interest rarely affect each other. For example, interest in buying books and clothes is basically unrelated to each other.

To take advantage of these two timing features, we need to add another layer of GRU variants and add attention mechanism to find interest related to target AD. Wherein, attention function can be expressed as:

Is the embedding of Target AD.

There are many mechanisms that combine Attention and GRU,

  • AIGRU:
  • AGRU:
  • AUGRU:

All three mechanisms are tried in this article.

Results

The paper conducted experiments on public data and its own data set, and the results of the offline experiment are as follows: