Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Visual commonsense reasoning with directional visual connections

面向视觉常识推理的有向视觉连接

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

To boost research into cognition-level visual understanding, i.e., making an accurate inference based on a thorough understanding of visual details, visual commonsense reasoning (VCR) has been proposed. Compared with traditional visual question answering which requires models to select correct answers, VCR requires models to select not only the correct answers, but also the correct rationales. Recent research into human cognition has indicated that brain function or cognition can be considered as a global and dynamic integration of local neuron connectivity, which is helpful in solving specific cognition tasks. Inspired by this idea, we propose a directional connective network to achieve VCR by dynamically reorganizing the visual neuron connectivity that is contextualized using the meaning of questions and answers and leveraging the directional information to enhance the reasoning ability. Specifically, we first develop a GraphVLAD module to capture visual neuron connectivity to fully model visual content correlations. Then, a contextualization process is proposed to fuse sentence representations with visual neuron representations. Finally, based on the output of contextualized connectivity, we propose directional connectivity to infer answers and rationales, which includes a ReasonVLAD module. Experimental results on the VCR dataset and visualization analysis demonstrate the effectiveness of our method.

摘要

为推动认知层面视觉内容理解的研究, 即基于视觉细节的深入理解做出精确推理, 视觉常识推理的概念被提出. 相比仅需模型正确回答问题的传统视觉问答, 视觉常识推理不仅需要模型正确地回答问题, 还需给出相应解释. 最近关于人类认知的研究指出大脑认知可以看作局部神经元连接的全局动态集成, 有助于解决特定的认知任务. 受其启发, 本文提出有向连接网络. 通过使用问题和答案的语义来情景化视觉神经元从而动态重组神经元连接, 以及借助方向信息增强推理能力, 所提方法能有效实现视觉常识推理. 具体地, 首先开发一个GraphVLAD模块来捕捉能够充分表达视觉内容相关性的视觉神经元连接. 然后提出一个情景化模型来融合视觉和文本表示. 最后, 基于情景化连接的输出设计有向连接来推断答案及对应解释, 其中包含了ReasonVLAD模块. 实验结果和可视化分析证明了所提方法的有效性.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Yahong HAN designed the research. Aming WU conducted the experiments and drafted the manuscript. Linchao ZHU helped organize the manuscript. Yi YANG revised the paper.

Corresponding author

Correspondence to Yahong Han  (韩亚洪).

Ethics declarations

Yahong HAN, Aming WU, Linchao ZHU, and Yi YANG declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61876130 and 61932009)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Y., Wu, A., Zhu, L. et al. Visual commonsense reasoning with directional visual connections. Front Inform Technol Electron Eng 22, 625–637 (2021). https://doi.org/10.1631/FITEE.2000722

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2000722

Key words

关键词

CLC number

Navigation