research-article

Automatic and Manual Detection of Generated News: Case Study, Limitations and Challenges

Authors:

Jérémie Bogaert,

Marie-Catherine de Marneffe,

Antonin Descampe,

Francois-Xavier StandaertAuthors Info & Claims

MAD '22: Proceedings of the 1st International Workshop on Multimedia AI against Disinformation

Pages 18 - 26

https://doi.org/10.1145/3512732.3533589

Published: 27 June 2022 Publication History

Abstract

In this paper, we study the exploitation of language generation models for disinformation purposes from two viewpoints. Quantitatively, we argue that language models hardly deal with domain adaptation (i.e., the ability to generate text on topics that are not part of a training database, as typically required for news). For this purpose, we show that both simple machine learning models and manual detection can spot machine-generated news in this practically-relevant context. Qualitatively, we put forward the differences between these automatic and manual detection processes, and their potential for a constructive interaction in order to limit the impact of automatic disinformation campaigns. We also discuss the consequences of these findings for the constructive use of natural language generation to produce news items.

References

[1]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (2003), 993--1022.

[2]

Taina Bucher. 2018. If...Then: Algorithmic Power and Politics. Oxford University Press.

[3]

Fabrice Colas and Pavel Brazdil. 2006. Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks. In Artificial Intelligence in Theory and Practice, IFIP 19th World Computer Congress, TC 12: IFIP AI 2006 Stream, August 21--24, 2006, Santiago, Chile (IFIP, Vol. 217), Max Bramer (Ed.). Springer, 169--178. https://doi.org/10.1007/978-0--387--34747--9_18

[4]

Nadia K. Conroy, Victoria L. Rubin, and Yimin Chen. 2015. Automatic Deception Detection: Methods for Finding Fake News. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 1--4.

[5]

Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. 20, 3 (1995), 273--297. https://doi.org/10.1007/BF00994018

[6]

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. In ICLR. OpenReview.net.

[7]

Antonin Descampe, Clément Massart, Simon Poelman, François-Xavier Standaert, and Olivier Standaert. 2022. Automated news recommendation in front of adversarial examples and the technical limits of transparency in algorithmic accountability. AI Soc. 37, 1 (2022), 67--80.

Digital Library

[8]

Nicholas Diakopoulos. 2019. Automating the News: How Algorithms Are Rewriting the Media (harvard university press ed.).

[9]

Benoît Frénay and Michel Verleysen. 2014. Classification in the Presence of Label Noise: A Survey. IEEE Trans. Neural Networks Learn. Syst. 25, 5 (2014), 845--869.

[10]

David J Hand and Keming Yu. 2001. Idiot's Bayes-not so stupid after all? International statistical review 69, 3 (2001), 385--398.

[11]

Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster. In KDD. ACM, 1803--1812.

Digital Library

[12]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735

Digital Library

[13]

David W. Hosmer and Stanley Lemeshow. 2000. Applied Logistic Regression, Second Edition. Wiley. https://doi.org/10.1002/0471722146

[14]

Edson C. Tandoc Jr., Zheng Wei Lim, and Richard Ling. 2018. Defining "Fake News". Digital Journalism 6, 2 (2018), 137--153.

[15]

David M. J. Lazer, Matthew A. Baum, Yochai Benkler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Menczer, Miriam J. Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, Michael Schudson, Steven A. Sloman, Cass R. Sunstein, Emily A. Thorson, Duncan J. Watts, and Jonathan L. Zittrain. 2018. The Science of Fake News. Science 359, 6380 (2018), 1094--1096.

[16]

Gang Liu and Jiabao Guo. 2019. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337 (2019), 325--338. https://doi.org/10.1016/j.neucom.2019.01.078

Digital Library

[17]

Marko Milosavljevic and Igor Vobic. 2019. Human Still in the Loop. Digital Journalism 7, 8 (Sept. 2019), 1098--1116. https://doi.org/10.1080/21670811.2019.1601576

[18]

Tomas Pranckevicius and Virginijus Marcinkevicius. 2017. Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification. Balt. J. Mod. Comput. 5, 2 (2017). https://doi.org/10.22364/bjmc.2017.5.2.05

[19]

Juan Ramos. 2003. Using TF-IDF to Determine Word Relevance in Document Queries. In Proceedings of the First Instructional Conference on Machine Learning. 29--48.

[20]

Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. CSI: A Hybrid Deep Model for Fake News Detection. In CIKM. ACM, 797--806.

[21]

Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, and Yan Liu. 2019. Combating Fake News: A Survey on Identification and Mitigation Techniques. ACM Trans. Intell. Syst. Technol. 10, 3 (2019), 21:1--21:42.

Digital Library

[22]

Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake News Detection on Social Media: A Data Mining Perspective. SIGKDD Explor. 19, 1 (2017), 22--36.

Digital Library

[23]

Stephen J. A. Ward. 2018. Epistemologies of Journalism. In Journalism, Tim P. Vos (Ed.). De Gruyter, 63--82. https://doi.org/10.1515/9781501500084-004

[24]

Wen Xu, Jing He, and Yanfeng Shu. 2020. Transfer Learning and Deep Domain Adaptation. In Advances and Applications in Deep Learning, Marco Antonio Aceves-Fernandez (Ed.). IntechOpen, Chapter 3.

[25]

Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending Against Neural Fake News. In NeurIPS. 9051--9062.

[26]

Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis C. M. Lau. 2015. A C-LSTM Neural Network for Text Classification. CoRR abs/1511.08630 (2015). arXiv:1511.08630 http://arxiv.org/abs/1511.08630

[27]

Xinyi Zhou and Reza Zafarani. 2020. A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. ACM Comput. Surv. 53, 5 (2020), 109:1--109:40.

Cited By

Index Terms

Automatic and Manual Detection of Generated News: Case Study, Limitations and Challenges
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Annotation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches

Recommendations

Automatic Fake News Detection by Exploiting User’s Assessments on Social Networks: A Case Study of Twitter
Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices
Abstract
Nowadays, social media has been becoming the main news source for millions of people all over the world. Users easily can create and share their information on social platforms. Information on social media can spread rapidly in the community. ...
Evaluating the Role of News Content and Social Media Interactions for Fake News Detection
Disinformation in Open Online Media
Abstract
Societies across the globe suffer from the effects of disinformation campaigns creating an urgent need for a way of tracking falsehoods before they become widely spread. Although building a detection tool for online disinformation campaigns is a ...
Beyond News Contents: The Role of Social Context for Fake News Detection
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Social media is becoming popular for news consumption due to its fast dissemination, easy access, and low cost. However, it also enables the wide propagation of fake news, i.e., news with intentionally false information. Detecting fake news is an ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MAD '22: Proceedings of the 1st International Workshop on Multimedia AI against Disinformation

June 2022

93 pages

ISBN:9781450392426

DOI:10.1145/3512732

Program Chairs:
Bogdan Ionescu
University Politehnica of Bucharest, Romania
,
Giorgos Kordopatis-Zilos
ITI-CERTH, Greece
,
Symeon Papadopoulos
ITI-CERTH, Greece
,
Adrian Popescu
CEA LIST, France
,
Luca Cuccovillo
Fraunhofer IDMT, Germany

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR '22

Sponsor:

SIGMM

ICMR '22: International Conference on Multimedia Retrieval

June 27 - 30, 2022

NJ, Newark, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
133
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)2

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents