Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3592572.3592842acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Synthetic Misinformers: Generating and Combating Multimodal Misinformation

Published: 12 June 2023 Publication History

Abstract

With the expansion of social media and the increasing dissemination of multimedia content, the spread of misinformation has become a major concern. This necessitates effective strategies for multimodal misinformation detection (MMD) that detect whether the combination of an image and its accompanying text could mislead or misinform. Due to the data-intensive nature of deep neural networks and the labor-intensive process of manual annotation, researchers have been exploring various methods for automatically generating synthetic multimodal misinformation - which we refer to as Synthetic Misinformers - in order to train MMD models. However, limited evaluation on real-world misinformation and a lack of comparisons with other Synthetic Misinformers makes difficult to assess progress in the field. To address this, we perform a comparative study on existing and new Synthetic Misinformers that involves (1) out-of-context (OOC) image-caption pairs, (2) cross-modal named entity inconsistency (NEI) as well as (3) hybrid approaches and we evaluate them against real-world misinformation; using the COSMOS benchmark. The comparative study showed that our proposed CLIP-based Named Entity Swapping can lead to MMD models that surpass other OOC and NEI Misinformers in terms of multimodal accuracy and that hybrid approaches can lead to even higher detection accuracy. Nevertheless, after alleviating information leakage from the COSMOS evaluation protocol, low Sensitivity scores indicate that the task is significantly more challenging than previous studies suggested. Finally, our findings showed that NEI-based Synthetic Misinformers tend to suffer from a unimodal bias, where text-only models can outperform multimodal ones.

References

[1]
Sahar Abdelnabi, Rakibul Hasan, and Mario Fritz. 2022. Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14940–14949.
[2]
Tankut Akgul, Tugce Erkilic Civelek, Deniz Ugur, and Ali C Begen. 2021. COSMOS on steroids: a Cheap detector for cheapfakes. In Proceedings of the 12th ACM Multimedia Systems Conference. 327–331.
[3]
Firoj Alam, Stefano Cresci, Tanmoy Chakraborty, Fabrizio Silvestri, Dimiter Dimitrov, Giovanni Da San Martino, Shaden Shaar, Hamed Firooz, and Preslav Nakov. 2021. A survey on multimodal disinformation detection. arXiv preprint arXiv:2103.12541 (2021).
[4]
Muhannad Alkaddour, Abhinav Dhall, Usman Tariq, Hasan Al Nashash, and Fares Al-Shargie. 2022. Sentiment-aware Classifier for Out-of-Context Caption Detection. In Proceedings of the 30th ACM International Conference on Multimedia. 7180–7184.
[5]
Shivangi Aneja, Chris Bregler, and Matthias Nießner. 2021. Cosmos: Catching out-of-context misinformation with self-supervised learning. arXiv preprint arXiv:2101.06278 (2021).
[6]
Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Michael Alexander Riegler, Paal Halvorsen, Matthias Nießner, Balu Adsumilli, and Chris Bregler. 2021. MMSys’ 21 grand challenge on detecting cheapfakes. arXiv preprint arXiv:2107.05297 (2021).
[7]
Giscard Biamby, Grace Luo, Trevor Darrell, and Anna Rohrbach. 2021. Twitter-comms: Detecting climate, covid, and military multimodal misinformation. arXiv preprint arXiv:2112.08594 (2021).
[8]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the association for computational linguistics 5 (2017), 135–146.
[9]
Remi Cadene, Corentin Dancette, Matthieu Cord, Devi Parikh, 2019. Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems 32 (2019).
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11]
Max Glockner, Yufang Hou, and Iryna Gurevych. 2022. Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation. arXiv preprint arXiv:2210.13865 (2022).
[12]
Mingzhen Huang, Shan Jia, Ming-Ching Chang, and Siwei Lyu. 2022. Text-Image De-Contextualization Detection Using Vision-Language Models. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8967–8971.
[13]
Ayush Jaiswal, Ekraam Sabir, Wael AbdAlmageed, and Premkumar Natarajan. 2017. Multimedia semantic integrity assessment using joint embedding of images and text. In Proceedings of the 25th ACM international conference on Multimedia. 1465–1471.
[14]
Rina Kumari and Asif Ekbal. 2021. Amfb: Attention based multimodal factorized bilinear pooling for multimodal fake news detection. Expert Systems with Applications 184 (2021), 115412.
[15]
Tuan-Vinh La, Minh-Son Dao, Duy-Dong Le, Kim-Phung Thai, Quoc-Hung Nguyen, and Thuy-Kieu Phan-Thi. 2022. Leverage Boosting and Transformer on Text-Image Matching for Cheap Fakes Detection. Algorithms 15, 11 (2022), 423.
[16]
Peiguang Li, Xian Sun, Hongfeng Yu, Yu Tian, Fanglong Yao, and Guangluan Xu. 2021. Entity-oriented multi-modal alignment and fusion network for fake news detection. IEEE Transactions on Multimedia 24 (2021), 3455–3468.
[17]
Yiyi Li and Ying Xie. 2020. Is a picture worth a thousand words? An empirical study of image content and social media engagement. Journal of Marketing Research 57, 1 (2020), 1–19.
[18]
Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. 2020. Visual news: Benchmark and challenges in news image captioning. arXiv preprint arXiv:2010.03743 (2020).
[19]
Lihui Liu, Houxiang Ji, Jiejun Xu, and Hanghang Tong. 2022. Comparative Reasoning for Knowledge Graph Fact Checking. In 2022 IEEE International Conference on Big Data (Big Data). IEEE, 2309–2312.
[20]
Grace Luo, Trevor Darrell, and Anna Rohrbach. 2021. Newsclippings: Automatic generation of out-of-context multimodal media. arXiv preprint arXiv:2104.05893 (2021).
[21]
Muhammad F Mridha, Ashfia Jannat Keya, Md Abdul Hamid, Muhammad Mostafa Monowar, and Md Saifur Rahman. 2021. A comprehensive review on fake news detection with deep learning. IEEE Access 9 (2021), 156151–156170.
[22]
Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, and Ralph Ewerth. 2020. Multimodal analytics for real-world news using measures of cross-modal entity consistency. In Proceedings of the 2020 International Conference on Multimedia Retrieval. 16–25.
[23]
Eryn J Newman, Maryanne Garry, Daniel M Bernstein, Justin Kantner, and D Stephen Lindsay. 2012. Nonprobative photographs (or words) inflate truthiness. Psychonomic Bulletin & Review 19 (2012), 969–974.
[24]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
[25]
Md Shohel Rana, Mohammad Nur Nobi, Beddhu Murali, and Andrew H Sung. 2022. Deepfake detection: A systematic literature review. IEEE Access (2022).
[26]
Ekraam Sabir, Wael AbdAlmageed, Yue Wu, and Prem Natarajan. 2018. Deep multimodal image-repurposing detection. In Proceedings of the 26th ACM international conference on Multimedia. 1337–1345.
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[28]
Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, 2022. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7959–7971.
[29]
Chuanming Yu, Yinxue Ma, Lu An, and Gang Li. 2022. BCMF: A bidirectional cross-modal fusion model for fake news detection. Information Processing & Management 59, 5 (2022), 103063.

Cited By

View all
  • (2024)Navigating the Multimodal Landscape: A Review on Integration of Text and Image Data in Machine Learning ArchitecturesMachine Learning and Knowledge Extraction10.3390/make60300746:3(1545-1563)Online publication date: 9-Jul-2024
  • (2024)Sniffer: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01240(13052-13062)Online publication date: 16-Jun-2024
  • (2024)VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal biasInternational Journal of Multimedia Information Retrieval10.1007/s13735-023-00312-613:1Online publication date: 8-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MAD '23: Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation
June 2023
65 pages
ISBN:9798400701870
DOI:10.1145/3592572
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Comparative study
  2. Misinformation detection
  3. Multimodal learning
  4. Synthetic datasets

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICMR '23
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)170
  • Downloads (Last 6 weeks)12
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Navigating the Multimodal Landscape: A Review on Integration of Text and Image Data in Machine Learning ArchitecturesMachine Learning and Knowledge Extraction10.3390/make60300746:3(1545-1563)Online publication date: 9-Jul-2024
  • (2024)Sniffer: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01240(13052-13062)Online publication date: 16-Jun-2024
  • (2024)VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal biasInternational Journal of Multimedia Information Retrieval10.1007/s13735-023-00312-613:1Online publication date: 8-Jan-2024
  • (2024)Counterfactual Multimodal Fact-Checking Method Based on Causal InterventionPattern Recognition and Computer Vision10.1007/978-981-97-8620-6_40(582-595)Online publication date: 20-Oct-2024
  • (2024)MMOOC: A Multimodal Misinformation Dataset for Out-of-Context News AnalysisInformation Security and Privacy10.1007/978-981-97-5101-3_24(444-459)Online publication date: 15-Jul-2024
  • (2023)Scoping Review on Image-Text Multimodal Machine Learning Models2023 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI62032.2023.00035(186-192)Online publication date: 13-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media