This is the sixth day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

I have been reading literature recently, mainly in the direction of question and answer, and I will focus on this aspect in the future. Just in time to blog about it.

Relation-aware Bidirectional Path Reasoning for Commonsense Question Answering

General knowledge question answering is an important task in THE field of NLP. Its main goal is to predict the correct answer of the target through common sense reasoning. Previous studies have used pre-training models such as BERT on large-scale corpora, or tried reasoning on knowledge graphs.

However, these methods do not explicitly model the relationships between connected entities, but these relationships are informative things that can be used to enhance reasoning.

To solve this problem, we propose a relational perception reasoning method.

Our approach is a relation-aware graph neural networ to capture rich contextual information between entities and relationships. Compared with the fixed relation embedding method in the pre-training model, our model uses the context information in the multi-source subgraph constructed by multiple external knowledge sources to dynamically update the relationship. The representation of the reinforcing relationship is fed back to the bidirectional inference model. Using a bi-directional attention mechanism between the sequence of problems and the paths of related entities gives us a transparent interpretability.

Experimental results on the CommonsenseQA dataset show that our method provides a clear inference path while significantly improving the baseline.

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering

Proposed VQA-MHUG: A multimodal dataset of 49 participants was collected using a high-speed eye tracker, which recorded human gaze images and questions during visual question answering (VQA).

We used our data set to analyze five state-of-the-art VQA models:

  1. Modular co-attention Network (MCAN) with either Grid or region features (2)
  2. Pythia
  3. Bilinear Attention Network (BAN),
  4. the Multimodal Factorized Bilinear Pooling Network (MFB)

Although previous work has focused primarily on image modes, our analysis shows for the first time that the high relevance of text to human attention is an important predictor of VQA performance for all models. This finding points to the potential to improve VQA performance and calls for further research into neurotextual attention mechanisms and their integration into the architecture of visual and linguistic tasks, including, but possibly beyond, VQA.