Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3595916.3626397acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cross-modal Consistency Learning with Fine-grained Fusion Network for Multimodal Fake News Detection

Published: 01 January 2024 Publication History

Abstract

Previous studies on multimodal fake news detection have observed the mismatch between text and images in the fake news and attempted to explore the consistency of multimodal news based on global features of different modalities. However, they fail to investigate this relationship between fine-grained fragments in multimodal content. To gain public trust, fake news often includes relevant parts in the text and the image, making such multimodal content appear consistent. Using global features may suppress potential inconsistencies in irrelevant parts. Therefore, in this paper, we propose a novel Consistency-learning Fine-grained Fusion Network (CFFN) that separately explores the consistency and inconsistency from high-relevant and low-relevant word-region pairs. Specifically, for a multimodal post, we divide word-region pairs into high-relevant and low-relevant parts based on their relevance scores. For the high-relevant part, we follow the cross-modal attention mechanism to explore the consistency. For low-relevant part, we calculate inconsistency scores to capture inconsistent points. Finally, a selection module is used to choose the primary clue (consistency or inconsistency) for identifying the credibility of multimodal news. Extensive experiments on two public datasets demonstrate that our CFFN substantially outperforms all the baselines. Our code can be found at: https://github.com/uestc-lj/CFFN/.

References

[1]
Yi Bin, Yujuan Ding, Bo Peng, Liang Peng, Yang Yang, and Tat-Seng Chua. 2021. Entity slot filling for visual captioning. IEEE Transactions on Circuits and Systems for Video Technology 32, 1 (2021), 52–62.
[2]
Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, and Heng Tao Shen. 2023. Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval. In Proceedings of the 31st ACM International Conference on Multimedia.
[3]
C. Boididou, S. Papadopoulos, D. Dang-Nguyen, G. Boato, and Y. Kompatsiaris. 2016. Verifying Multimedia Use at MediaEval 2016. In MediaEval 2016 Workshop.
[4]
Christina Boididou, Symeon Papadopoulos, Markos Zampoglou, Lazaros Apostolidis, Olga Papadopoulou, and Yiannis Kompatsiaris. 2018. Detection and visualization of misleading content on Twitter. International Journal of Multimedia Information Retrieval 7, 1 (2018), 71–86.
[5]
Dustin P Calvillo, Abraham M Rutchick, and Ryan JB Garcia. 2021. Individual differences in belief in fake news about election fraud after the 2020 US election. Behavioral Sciences 11, 12 (2021), 175.
[6]
Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web. 675–684.
[7]
Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3640–3649.
[8]
Yixuan Chen, Dongsheng Li, Peng Zhang, Jie Sui, Qin Lv, Lu Tun, and Li Shang. 2022. Cross-modal ambiguity learning for multimodal fake news detection. In Proceedings of the ACM Web Conference 2022. 2897–2905.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
[10]
Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the 25th ACM international conference on Multimedia. 795–816.
[11]
Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. 2016. Novel visual and statistical image features for microblogs news verification. IEEE transactions on multimedia 19, 3 (2016), 598–608.
[12]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171–4186.
[13]
Dhruv Khattar, Jaipal Singh Goud, Manish Gupta, and Vasudeva Varma. 2019. Mvae: Multimodal variational autoencoder for fake news detection. In The world wide web conference. 2915–2921.
[14]
Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. 2013. Prominent features of rumor propagation in online social media. In 2013 IEEE 13th international conference on data mining. IEEE, 1103–1108.
[15]
Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European conference on computer vision (ECCV). 201–216.
[16]
Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, and Heng Tao Shen. 2023. Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination. In Proceedings of the 31st ACM International Conference on Multimedia.
[17]
Chunxiao Liu, Zhendong Mao, An-An Liu, Tianzhu Zhang, Bin Wang, and Yongdong Zhang. 2019. Focus your attention: A bidirectional focal attention network for image-text matching. In Proceedings of the 27th ACM international conference on multimedia. 3–11.
[18]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022.
[19]
Ilaria Montagni, Kevin Ouazzani-Touhami, A Mebarki, N Texier, S Schück, Christophe Tzourio, and Confins Group. 2021. Acceptance of a Covid-19 vaccine is associated with ability to detect fake news and health literacy. Journal of Public Health 43, 4 (2021), 695–702.
[20]
Liang Peng, Shuangji Yang, Yi Bin, and Guoqing Wang. 2021. Progressive graph attention network for video question answering. In Proceedings of the 29th ACM International Conference on Multimedia. 2871–2879.
[21]
Liang Peng, Yang Yang, Zheng Wang, Xiao Wu, and Zi Huang. 2019. Cra-net: Composed relation attention network for visual question answering. In Proceedings of the 27th ACM international conference on multimedia. 1202–1210.
[22]
Peng Qi, Juan Cao, Xirong Li, Huan Liu, Qiang Sheng, Xiaoyue Mi, Qin He, Yongbiao Lv, Chenyang Guo, and Yingchao Yu. 2021. Improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues. In Proceedings of the 29th ACM International Conference on Multimedia. 1212–1220.
[23]
Shengsheng Qian, Jinguang Wang, Jun Hu, Quan Fang, and Changsheng Xu. 2021. Hierarchical multi-modal contextual attention network for fake news detection. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 153–162.
[24]
Leigang Qu, Meng Liu, Jianlong Wu, Zan Gao, and Liqiang Nie. 2021. Dynamic modality interaction modeling for image-text retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1104–1113.
[25]
Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. Csi: A hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 797–806.
[26]
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19, 1 (2017), 22–36.
[27]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.
[28]
Shivangi Singhal, Tanisha Pandey, Saksham Mrig, Rajiv Ratn Shah, and Ponnurangam Kumaraguru. 2022. Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection. In Companion Proceedings of the Web Conference 2022. 726–734.
[29]
Shivangi Singhal, Rajiv Ratn Shah, Tanmoy Chakraborty, Ponnurangam Kumaraguru, and Shin’ichi Satoh. 2019. Spotfake: A multi-modal framework for fake news detection. In 2019 IEEE fifth international conference on multimedia big data (BigMM). IEEE, 39–47.
[30]
Chenguang Song, Nianwen Ning, Yunlei Zhang, and Bin Wu. 2021. A multimodal fake news detection model based on crossmodal attention residual and multichannel convolutional neural networks. Information Processing & Management 58, 1 (2021), 102437.
[31]
Xiaobing Sun and Wei Lu. 2020. Understanding attention for text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3418–3428.
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[33]
Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3156–3164.
[34]
Jing Wang, Jinhui Tang, and Jiebo Luo. 2020. Multimodal attention with image text spatial relationship for ocr-based image captioning. In Proceedings of the 28th ACM International Conference on Multimedia. 4337–4345.
[35]
Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao. 2018. Eann: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining. 849–857.
[36]
Yang Wu, Pengwei Zhan, Yunjian Zhang, Liming Wang, and Zhen Xu. 2021. Multimodal fusion with co-attention networks for fake news detection. In Findings of the association for computational linguistics: ACL-IJCNLP 2021. 2560–2569.
[37]
Yahui Xu, Yi Bin, Jiwei Wei, Yang Yang, Guoqing Wang, and Heng Tao Shen. 2023. Multi-Modal Transformer with Global-Local Alignment for Composed Query Image Retrieval. IEEE Transactions on Multimedia (2023).
[38]
Junxiao Xue, Yabo Wang, Yichen Tian, Yafei Li, Lei Shi, and Lin Wei. 2021. Detecting fake news by exploring the consistency of multimodal data. Information Processing & Management 58, 5 (2021), 102610.
[39]
Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012. Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD workshop on mining data semantics. 1–7.
[40]
Yang Yang, Jie Zhou, Jiangbo Ai, Yi Bin, Alan Hanjalic, Heng Tao Shen, and Yanli Ji. 2018. Video captioning by adversarial LSTM. IEEE Transactions on Image Processing 27, 11 (2018), 5600–5611.
[41]
Feng Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan, 2017. A Convolutional Approach for Misinformation Identification. In IJCAI. 3901–3907.
[42]
Huaiwen Zhang, Quan Fang, Shengsheng Qian, and Changsheng Xu. 2019. Multi-modal knowledge-aware event memory network for social media rumor detection. In Proceedings of the 27th ACM international conference on multimedia. 1942–1951.
[43]
Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. SAFE: Similarity-Aware Multi-modal Fake News Detection. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II. Springer, 354–367.
[44]
Xinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR) 53, 5 (2020), 1–40.

Index Terms

  1. Cross-modal Consistency Learning with Fine-grained Fusion Network for Multimodal Fake News Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
    December 2023
    745 pages
    ISBN:9798400702051
    DOI:10.1145/3595916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cross-modal Consistency Learning
    2. Fine-grained Fusion
    3. Multimodal Fake News Detection
    4. Social Media

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    MMAsia '23
    Sponsor:
    MMAsia '23: ACM Multimedia Asia
    December 6 - 8, 2023
    Tainan, Taiwan

    Acceptance Rates

    Overall Acceptance Rate 59 of 204 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 192
      Total Downloads
    • Downloads (Last 12 months)192
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media