Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3587259.3627566acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

Capturing Pertinent Symbolic Features for Enhanced Content-Based Misinformation Detection

Published: 05 December 2023 Publication History

Abstract

Preventing the spread of misinformation is challenging. The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability. Content-based models have managed to identify deceptive language by learning representations from textual data such as social media posts and web articles. However, aggregating representative samples of this heterogeneous phenomenon and implementing effective real-world applications is still elusive. Based on analytical work on the language of misinformation, this paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are. We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content. Our results achieve state-of-the-art performance in misinformation datasets across the board, showing that our approach offers a valid and robust alternative to multi-task transfer learning without requiring any additional training data. Furthermore, our results show evidence that structured knowledge can provide the extra boost required to address a complex and unpredictable real-world problem like misinformation detection, not only in terms of accuracy but also time efficiency and resource utilization.

References

[1]
Firoj Alam, Stefano Cresci, Tanmoy Chakraborty, Fabrizio Silvestri, Dimiter Dimitrov, Giovanni Da San Martino, Shaden Shaar, Hamed Firooz, and Preslav Nakov. 2022. A Survey on Multimodal Disinformation Detection. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 6625–6643. https://aclanthology.org/2022.coling-1.576
[2]
Firoj Alam, Shaden Shaar, Fahim Dalvi, Hassan Sajjad, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Nadir Durrani, Kareem Darwish, 2020. Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. arXiv preprint arXiv:2005.00033 (2020).
[3]
Hunt Allcott, Matthew Gentzkow, and Chuan Yu. 2019. Trends in the diffusion of misinformation on social media. Research & Politics 6, 2 (2019), 2053168019848554.
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[5]
Carlos Carrasco-Farré. 2022. The fingerprints of misinformation: how deceptive content differs from reliable sources in terms of cognitive effort and appeal to emotions. Humanities and Social Sciences Communications 9, 1 (2022), 1–18.
[6]
Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web. 675–684.
[7]
Meri Coleman and Ta Lin Liau. 1975. A computer readability formula designed for machine scoring.Journal of Applied Psychology 60, 2 (1975), 283.
[8]
Michael Crawshaw. 2020. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796 (2020).
[9]
Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeno, Rostislav Petrov, and Preslav Nakov. 2019. Fine-grained analysis of propaganda in news article. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 5636–5646.
[10]
Ronald Denaux and José Manuél Gómez-Pérez. 2019. Textual Analysis for Radicalisation Narratives Aligned with Social Sciences Perspectives. In Text2Story@ ECIR. 39–45.
[11]
Ronald Denaux and Jose Manuel Gomez-Perez. 2020. Linked credibility reviews for explainable misinformation detection. In International Semantic Web Conference. Springer, 147–163.
[12]
Ullrich KH Ecker, Stephan Lewandowsky, John Cook, Philipp Schmid, Lisa K Fazio, Nadia Brashier, Panayiota Kendeou, Emily K Vraga, and Michelle A Amazeen. 2022. The psychological drivers of misinformation belief and its resistance to correction. Nature Reviews Psychology 1, 1 (2022), 13–29.
[13]
Lisa Fan, Marshall White, Eva Sharma, Ruisi Su, Prafulla Kumar Choubey, Ruihong Huang, and Lu Wang. 2019. In plain sight: Media bias through the lens of factual reporting. arXiv preprint arXiv:1909.02670 (2019).
[14]
Anastasia Giachanou, Paolo Rosso, and Fabio Crestani. 2019. Leveraging emotional signals for credibility detection. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 877–880.
[15]
Jennifer Golbeck, Matthew Mauriello, Brooke Auxier, Keval H Bhanushali, Christopher Bonk, Mohamed Amine Bouzaghrane, Cody Buntain, Riya Chanduka, Paul Cheakalos, Jennine B Everett, 2018. Fake news vs satire: A dataset and analysis. In Proceedings of the 10th ACM Conference on Web Science. 17–21.
[16]
Jack Grieve and Helena Woodfield. 2023. The Language of Fake News. Cambridge University Press.
[17]
Bin Guo, Yasan Ding, Lina Yao, Yunji Liang, and Zhiwen Yu. 2019. The future of misinformation detection: new perspectives and trends. arXiv preprint arXiv:1909.03654 (2019).
[18]
Lars Kai Hansen, Adam Arvidsson, Finn Årup Nielsen, Elanor Colleoni, and Michael Etter. 2011. Good friends, bad news-affect and virality in twitter. In Future Information Technology: 6th International Conference, FutureTech 2011, Loutraki, Greece, June 28-30, 2011, Proceedings, Part II. Springer, 34–43.
[19]
Benjamin Horne and Sibel Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, Vol. 11. 759–766.
[20]
Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. 2014. Social spammer detection with sentiment information. In 2014 IEEE international conference on data mining. IEEE, 180–189.
[21]
Ian Kelk, Benjamin Basseri, Wee Yi Lee, Richard Qiu, and Chris Tanner. 2022. Automatic Fake News Detection: Are current models" fact-checking" or" gut-checking"?arXiv preprint arXiv:2204.07229 (2022).
[22]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[23]
Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, and Madian Khabsa. 2021. On Unifying Misinformation Detection. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5479–5485. https://doi.org/10.18653/v1/2021.naacl-main.432
[24]
Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S Yu, and Lifang He. 2022. A survey on text classification: From traditional to deep learning. ACM Transactions on Intelligent Systems and Technology (TIST) 13, 2 (2022), 1–41.
[25]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 1907. RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv preprint arXiv:1907.11692 364 (1907).
[26]
Pietro Lucisano and M Piemontese. 1988. GULPEASE: una formula per la predizione della difficoltà dei testi in lingua italiana [GULPEASE: a formula to predict the difficulty of texts in Italian language], in «Scuola e città», 3. La Nuova Italia (1988).
[27]
Nicholas Micallef, Marcelo Sandoval-Castañeda, Adi Cohen, Mustaque Ahamad, Srijan Kumar, and Nasir Memon. 2022. Cross-platform multimodal misinformation: Taxonomy, characteristics and detection for textual posts and videos. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 651–662.
[28]
Sharan Narang and Aakanksha Chowdhery. 2022. Pathways language model (palm): Scaling to 540 billion parameters for breakthrough performance. Google AI Blog (2022).
[29]
Subhadarshi Panda and Sarah Ita Levitan. 2022. Improving cross-domain, cross-lingual and multi-modal deception detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 383–390.
[30]
Vishnu S Pendyala and Foroozan Sadat Akhavan Tabatabaii. 2023. Spectral analysis perspective of why misinformation containment is still an unsolved problem. In 2023 IEEE Conference on Artificial Intelligence (CAI). IEEE, 210–213.
[31]
Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aishwarya Kamath, Ivan Vulić, Sebastian Ruder, Kyunghyun Cho, and Iryna Gurevych. 2020. AdapterHub: A Framework for Adapting Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020): Systems Demonstrations. Association for Computational Linguistics, Online, 46–54. https://www.aclweb.org/anthology/2020.emnlp-demos.7
[32]
Martin Potthast, Tim Gollub, Matti Wiegmann, Benno Stein, Matthias Hagen, Kristof Komlossy, Sebastian Schuster, and Erika P. Garces Fernandez. 2018. Webis Clickbait Corpus 2017 (Webis-Clickbait-17). https://doi.org/10.5281/zenodo.5530410 https://doi.org/10.5281/zenodo.5530410.
[33]
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2017. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638 (2017).
[34]
Vahed Qazvinian, Emily Rosengren, Dragomir Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying misinformation in microblogs. In Proceedings of the 2011 conference on empirical methods in natural language processing. 1589–1599.
[35]
Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing. 2931–2937.
[36]
Victoria L Rubin and Tatiana Lukoianova. 2015. Truth and deception at the rhetorical structure level. Journal of the Association for Information Science and Technology 66, 5 (2015), 905–917.
[37]
Andreas Rücklé, Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, and Iryna Gurevych. 2020. Adapterdrop: On the efficiency of adapters in transformers. arXiv preprint arXiv:2010.11918 (2020).
[38]
Kai Shu, Suhang Wang, and Huan Liu. 2019. Beyond news contents: The role of social context for fake news detection. In Proceedings of the twelfth ACM international conference on web search and data mining. 312–320.
[39]
Edgar A Smith and RJ Senter. 1967. Automated readability index. Vol. 66. Aerospace Medical Research Laboratories, Aerospace Medical Division, Air ….
[40]
Y Song, T Wang, SK Mondal, and JP Sahoo. [n. d.]. A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities (2022). arXiv preprint arxiv:2205.06743 ([n. d.]).
[41]
Sho Tsugawa and Hiroyuki Ohsaki. 2017. On the relation between message sentiment and its virality on social media. Social network analysis and mining 7 (2017), 1–14.
[42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[43]
B Venkatesh and J Anuradha. [n. d.]. A review of feature selection and its methods. Cybernetics and information technologies 19, 1 ([n. d.]), 3–26.
[44]
Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. science 359, 6380 (2018), 1146–1151.
[45]
Thomas Wahl, Cornelia Riehle, and Anna Pingen. 2023. News–European Union. In Eucrim–European Law Forum: Prevention• Investigation• Prosecution. 3–46.
[46]
Liang Wu, Fred Morstatter, Kathleen M Carley, and Huan Liu. 2019. Misinformation in social media: definition, manipulation, and detection. ACM SIGKDD explorations newsletter 21, 2 (2019), 80–90.
[47]
Arkaitz Zubiaga, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, and Peter Tolmie. 2016. Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS one 11, 3 (2016), e0150989.

Index Terms

  1. Capturing Pertinent Symbolic Features for Enhanced Content-Based Misinformation Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023
    December 2023
    270 pages
    ISBN:9798400701412
    DOI:10.1145/3587259
    • Editors:
    • Brent Venable,
    • Daniel Garijo,
    • Brian Jalaian
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adapters.
    2. deception
    3. large language models
    4. misinformation
    5. neural networks
    6. symbolic models
    7. transfer learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    K-CAP '23
    Sponsor:
    K-CAP '23: Knowledge Capture Conference 2023
    December 5 - 7, 2023
    FL, Pensacola, USA

    Acceptance Rates

    Overall Acceptance Rate 55 of 198 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 55
      Total Downloads
    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media