Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3652583.3658103acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction

Published: 07 June 2024 Publication History

Abstract

Multimodal Relation Extraction (MRE) is an entity relationship extraction method based on multimodal information. Most existing MRE methods have two issues: 1) Weak cross-modal correlation and poor semantic consistency. 2) They do not achieve text-guided fusion of different modalities, resulting in excessive introduction of image noise. To address these issues, we propose an innovative MRE method inspired by genetics-A Comprehensive Genetic-Inspired For Multimodal Relation Extraction (CGI-MRE). It consists of two main modules: Gene Extraction And Recombination Module (GERM) and Text-Guided Fusion Module (TGFM). In the GERM module, we regard the text features and visual features as a feature body respectively, and decompose each feature body into common sub-features and unique sub-features. For these sub-features, we designed a Common Gene Extraction Mechanism (CGEM) to extract common advantageous genes in different modalities, a Unique Gene Extraction Mechanism (UGEM) to extract unique advantageous genes in each modality, and we finally use a Gene Recombination Mechanism (GRM) to obtain recombinant features that highly correlated with different modalities and have strong semantic consistency. In TGFM module, we organically fuse and extract the features in the recombined features that are beneficial to MRE. We use gate to adjust the text-guided original attention score and pooling attention score to obtain the text-guided saliency attention score. We can use this score to strictly extract information that is text-guided and beneficial to MRE from the image recombinant feature. Experimental results on the MNRE dataset show that our model outperforms the state-of-the-art performance and achieves F1-score of 84.62%.

References

[1]
Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. 2019. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 2895--2905. https://doi.org/10.18653/v1/P19-1279
[2]
Tadas Baltru?aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2019), 423--443. https://doi.org/10.1109/TPAMI.2018.2798607
[3]
Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction. In Findings of the Association for Computational Linguistics: NAACL 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 1607--1618. https://doi.org/10.18653/v1/2022.findings-naacl.121
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423
[5]
Ivo Düntsch and Günther Gediga. 2019. Confusion Matrices and Rough Set Data Analysis. Journal of Physics: Conference Series 1229, 1 (may 2019), 012055. https: //doi.org/10.1088/1742-6596/1229/1/012055
[6]
Yusheng Huang and Zhouhan Lin. 2023. I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction. Proceedings of the 5th ACM International Conference on Multimedia in Asia (2023). https: //api.semanticscholar.org/CorpusID:263830448
[7]
Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jing Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chu Jin, Yuchong Sun, Shizhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, and Ji rong Wen. 2021. WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training. ArXiv abs/2103.06561 (2021). https://api.semanticscholar.org/CorpusID:232185258
[8]
Hankun Kang, Xiaoyu Li, Li Jin, Chunbo Liu, Zequn Zhang, Shuchao Li, and Yanan Zhang. 2022. TSPNet: Translation supervised prototype network via residual learning for multimodal social relation extraction. Neurocomputing 507 (2022), 166--179. https://doi.org/10.1016/j.neucom.2022.07.079
[9]
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019. VisualBERT: A Simple and Performant Baseline for Vision and Language. ArXiv abs/1908.03557 (2019). https://api.semanticscholar.org/CorpusID:199528533
[10]
Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019. Entity-Relation Extraction as Multi-Turn Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 1340--1350. https://doi.org/10.18653/v1/P19-1129
[11]
Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual Attention Model for Name Tagging in Multimodal Social Media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, Melbourne, Australia, 1990--1999. https: //doi.org/10.18653/v1/P18-1185
[12]
Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Keh-Yih Su, Jian Su, Janyce Wiebe, and Haizhou Li (Eds.). Association for Computational Linguistics, Suntec, Singapore, 1003--1011. https://aclanthology.org/P09-1113
[13]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (Bellevue, Washington, USA) (ICML'11). Omnipress, Madison, WI, USA, 689--696.
[14]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:231591445
[15]
John S. Torday and William B. Miller. 2020. Darwin, the Modern Synthesis, and a New Biology. https://api.semanticscholar.org/CorpusID:214103632
[16]
Patrick Verga, Emma Strubell, and Andrew McCallum. 2018. Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Marilyn Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, New Orleans, Louisiana, 872--884. https://doi.org/10.18653/v1/N18-1080
[17]
Li Yuan, Yi Cai, Jin Wang, and Qing Li. 2023. Joint Multimodal Entity-Relation Extraction Based on Edge-Enhanced Graph Alignment Network and Word-Pair Relation Tagging. Proceedings of the AAAI Conference on Artificial Intelligence 37 (06 2023), 11051--11059. https://doi.org/10.1609/aaai.v37i9.26309
[18]
Li Yuan, Yi Cai, Jin Wang, and Qing Li. 2023. Joint multimodal entity-relation extraction based on edge-enhanced graph alignment network and word-pair relation tagging. In Proceedings of the AAAI conference on artificial intelligence, Vol. 37. 11051--11059.
[19]
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2002. Kernel Methods for Relation Extraction. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Association for Computational Linguistics, 71--78. https://doi.org/10.3115/1118693.1118703
[20]
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lluís Màrquez, Chris Callison-Burch, and Jian Su (Eds.). Association for Computational Linguistics, Lisbon, Portugal, 1753--1762. https://doi.org/10.18653/v1/ D15-1203
[21]
Dong Zhang, Suzhong Wei, Shoushan Li, Hanqian Wu, Qiaoming Zhu, and Guodong Zhou. 2021. Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance. Proceedings of the AAAI Conference on Artificial Intelligence 35, 16 (May 2021), 14347--14355. https://doi.org/10.1609/aaai.v35i16. 17687
[22]
Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. 2018 Adaptive Coattention Network for Named Entity Recognition in Tweets. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (Apr. 2018). https://doi.org/10.1609/aaai.v32i1.11962
[23]
Changmeng Zheng, Junhao Feng, Ze Fu, Yiru Cai, Qing Li, and Tao Wang. 2021. Multimodal Relation Extraction with Efficient Graph Alignment. Proceedings of the 29th ACM International Conference on Multimedia (2021). https://api.semanticscholar.org/CorpusI 239011558

Index Terms

  1. CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
      May 2024
      1379 pages
      ISBN:9798400706196
      DOI:10.1145/3652583
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 June 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cross attention
      2. genetics
      3. multimodal fusion
      4. multimodal relation extraction

      Qualifiers

      • Research-article

      Funding Sources

      • The Key Program of the National Natural Science Foundation of China

      Conference

      ICMR '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 254 of 830 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 108
        Total Downloads
      • Downloads (Last 12 months)108
      • Downloads (Last 6 weeks)19
      Reflects downloads up to 18 Nov 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media