A quick talk about PullNet

PullNet: Open Domain Question with Iterative Retrieval on Knowledge Bases and Text

Address: arxiv.org/abs/1904.09…

Source: no source code

background

A large KB cannot hold all knowledge completely, so KB is incomplete. However, corpus generally stores knowledge in a relatively complete way, but it is not as structured as KB. The questions posed are not simple one-hop answers, but multi-hop questions that require reasoning.

PullNet combines KB and corpus to use an iterative approach to build problem-specific subgraphs that contain problem-related information. A heterogeneous subgraph (including relevant text nodes and triplet nodes) is constructed from the problem and the entities contained in the problem. The probability of expansion of each node is calculated according to GCN in each iteration, and k nodes larger than the threshold are selected for expansion. Then for the selected k nodes, pull their relevant documents, entities and triples, and update them to the constructed subgraph. After T iterations, use similar GCN to get the answer.

The specific methods

Background

For a particular problem QQQ, the problem subgraph Gq={V,E}G_q = \{V,E\}Gq={V,E}. The problem subgraph contains a heterogeneous map of qQQ-related corpus and KB related information. VVV is the node in the picture.

There are three types of nodes V=Ve∪Vd∪VfV = V_e \cup V_d \cup V_fV=Ve∪Vd∪Vf:

  • VeV_eVe: entity node, VE ∈Vev_e \in V_eve∈ VE, that is, entities in KB

  • VdV_dVd: a text node, vd∈Vdv_d \in V_dvd∈ VD, a document in the corpus that contains sequences of tokens: (W1,w2… ∣ d ∣), w (w_1, w_2,… ,w_{|d|})(w1,w2,… , W ∣ D ∣), the document is just one sentence and can be found directly by mentioning entities.

  • VfV_fVf: The fact node, VF ∈Vfv_f \in V_fvf∈ VF, KB triples (VS, R, VO)(v_S, R, v_O)(vs, R, VO), including header entities and tail entities and the relation RRR.

EEE is an edge between nodes:

  • If an entity node vEV_EVE is connected to the fact node vfV_fvf, then vEV_EVE is the head or tail of the fact node vFV_Fvf.

  • If an entity node vev_EVE is connected to the text node vDV_DVD, then the text vDV_DVD mentions the entity vev_EVE.

algorithm

  1. At first initialization problem figure Gq0 = {where V0, E0} G_q ^ 0 = \ {V ^ 0, E ^ 0 \} Gq0 = {where V0, E0}. Where V0 = {eqi} V ^ 0 = \ {e_ {q_i} \} where V0 = {eqi}, metion list of entities in the problem. E0E to the 0E0 is the empty set.

  2. Iterating TTT times, KKK entities are selected for expansion each time. For each iteration, relevant text sets and fact sets are retrieved for the selected entities.

  3. At the end of each iteration, new edges are added to the problem graph. After TTT iteration expansion, the classification step is added to the end to identify the answer in the problem graph.

Pull the operation

Simple extraction pull:

  • Pull_entities (vd)pull\_entities(v_d)pull_entities(vd) : Extract all mention entities from the text.

  • Pull_headtail (VF)pull\_headtail(v_F)pull_headtail(VF) : Extracts the head and tail entities from the triplet.

Complex retrieval pull:

  • Pull_docs (VE,q)pull\_docs(v_e, Q)pull_docs(VE, Q) : In his thesis method, all documents can be retrieved by entity link, i.e. A sentence mentions two entities A and B, then the sentence can be retrieved directly by A and B as the index. In addition to the entity can directly get the text, but also rank the similarity between the text and the question QQQ IDF, finally can get the top NdN_dNd text sentences.

  • Pull_facts (ve,q)pull\_facts(v_e,q)pull_facts(ve,q) pull_facts(ve,q) : Retrieve top NfN_fNf facts related to entity vev_EVE in KB. It limits that vEV_EVE in the fact node must be its head entity or tail entity, that is, a hop subgraph of entity Vev_EVE is obtained in KB, and the similarity between RRR and QQQ in the fact node is ranked based on the formula S(r,q)S(r,q)S(r, Q).

    • The embdding of RRR is hrh_rhr, and the question QQQ is the sequence of words q=(w1,w2… ∣) ∣ q q, w = (w_1, w_2,… ,w_{|q|})q=(w1,w2,… ∣ ∣ q, w). The similarity is the dot product of the rrrembedding that LSTM converted QQQ to represent the last layer and the relationship. And then input it into the Sigmoid formula to get the probability.

    • h q = L S T M ( w 1 . w 2 . . . . . w q ) R n h_q = LSTM(w_1,w_2,… ,w_{|q|}) \in R^n

    • S ( r . q ) = s i g m o i d ( h r T h q ) S(r, q) = sigmoid(h_r^Th_q)

Classify the operating

Both classifyClassifyclassify operations are evaluated on nodes on the subgraph GqtG_q^tGqt (the computation continues only on the solid nodes of the graph). Graph CNN is used, so even if only solid nodes are counted, some non-solid nodes and edges can affect the final classification result.

  • Classify_pullnodes (Gqt)classify\ _pullNodes (G_q^ T) ClassiFY_pullNodes (Gqt) : Gets the probability of whether the entity node vEV_EVE expands in the next iteration. In the subgraph construction stage, only KKK nodes with the highest probability are selected in each iteration.

  • Classify_answer (Gqt)classify\_answer(G_q^t) ClassiFY_answer (Gqt) : After the subgraph is built, whether the entity node can answer the question is calculated, and the entity with the highest probability is the final answer.

  • GraftNet: All of the above operations are computed by GraftNet, a variant of Graph CNN.

The update operation

To the last iteration to get the figure Gqt – 1 g_q ^ {1} t – Gqt – 1 add entity node {ve (f) (I)} ∪ {ve (d) (I)} \ {v_ (f) _i {e} \} \ cup \ {v_ (d) _i {e} \} {ve (f) (I)} ∪ {I} ve (d), Text node {vdi} \ {v_ {d_i} \} {vdi}, {vfi} and fact node \ {v_ {f_i} \} {vfi}. And updates the edges of the connections between nodes.

training

The training process is given question and answer pairs, where the derivation path is implicit. The training includes the classifyClassifyclassify operation and the similarity calculation described above.

In the training process, given that the question and answer are correct, firstly, the shortest path from question entity to answer entity is found in KB, and all the entities passed by this path are marked as candidate intermediate entities. The shortest path teT_ETE to the problem node is recorded for each candidate intermediate entity.

  • Classify_pullnodesclassify \ _pullNodesClassify_pullNodes training in TTT iteration, those positive examples E ‘e ‘e are connected to candidate intermediate entity EEE, And the distance is et=t+1e_t =t+ 1et=t+1

  • The positive example relation of S(r,q)S(r, q)S(r,q) is also et=t+1e_t =t+ 1et=t+1 to the candidate intermediate entity EEE

  • In this way, more attention is paid to the nodes on the shortest path to the answer during retrieval.

During training pull all entity nodes as long as their predicted score is greater than a given threshold ϵ\epsilonϵ rather than just k nodes with the highest score. During training, if the pull operation does not get the candidate intermediate entity, it is added to the diagram itself.

Both T and the threshold ϵ\epsilonϵ are hyperparameters, and T must be able to be greater than or equal to the maximum length of the derivation chain.