Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3637528.3671463acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
tutorial
Open access

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Published: 24 August 2024 Publication History

Abstract

Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain.

References

[1]
2023. ZeroGPT: AI Text Detector. https://www.zerogpt.com
[2]
Wissam Antoun, Virginie Mouilleron, Benoît Sagot, and Djamé Seddah. 2023. Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect? ArXiv, Vol. abs/2306.05871 (2023).
[3]
Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, and Arthur D. Szlam. 2019. Real or Fake? Learning to Discriminate Machine from Human Generated Text. ArXiv, Vol. abs/1906.03351 (2019).
[4]
Himanshu Batra, Narinder Singh Punn, Sanjay Kumar Sonbhadra, and Sonali Agarwal. 2021. BERT-Based Sentiment Analysis: A Software Engineering Perspective. In International Conference on Database and Expert Systems Applications.
[5]
Meghana Moorthy Bhat, Rui Meng, Ye Liu, Yingbo Zhou, and Semih Yavuz. 2023. Investigating Answerability of LLMs for Long-Form Question Answering. ArXiv, Vol. abs/2309.08210 (2023).
[6]
Souradip Chakraborty, A. S. Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang. 2023. On the Possibilities of AI-Generated Text Detection. ArXiv, Vol. abs/2304.04736 (2023).
[7]
Mengyang Chen, Lingwei Wei, Han Cao, Wei Zhou, and Song Hu. 2023. Can Large Language Models Understand Content and Propagation for Misinformation Detection: An Empirical Study. ArXiv, Vol. abs/2311.12699 (2023).
[8]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Xiaodong Song. 2017. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. ArXiv, Vol. abs/1712.05526 (2017).
[9]
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2023. Deep reinforcement learning from human preferences. arxiv: 1706.03741 [stat.ML]
[10]
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc.
[11]
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. ArXiv, Vol. abs/2110.14168 (2021).
[12]
Shiyao Cui, Zhenyu Zhang, Yilong Chen, Wenyuan Zhang, Tianyun Liu, Siqi Wang, and Tingwen Liu. 2023. FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity. ArXiv, Vol. abs/2311.18580 (2023).
[13]
A. Deshpande, Vishvak S. Murahari, Tanmay Rajpurohit, A. Kalyan, and Karthik Narasimhan. 2023. Toxicity in ChatGPT: Analyzing Persona-assigned Language Models. ArXiv, Vol. abs/2304.05335 (2023).
[14]
Yilun Du, Shuang Li, Antonio Torralba, Joshua Tenenbaum, and Igor Mordatch. 2023. Improving Factuality and Reasoning in Language Models through Multiagent Debate.
[15]
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A Smith. 2020. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462 (2020).
[16]
Sebastian Gehrmann, Hendrik Strobelt, and Alexander M. Rush. 2019. GLTR: Statistical Detection and Visualization of Generated Text. In Annual Meeting of the Association for Computational Linguistics.
[17]
Zhen Guo and Shangdi Yu. 2023. AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising. ArXiv, Vol. abs/2311.07700 (2023).
[18]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: Retrieval-Augmented Language Model Pre-Training (ICML'20). JMLR.org, Article 368, 10 pages.
[19]
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, and Jared Kaplan. 2022. Language Models (Mostly) Know What They Know. arxiv: 2207.05221 [cs.CL]
[20]
Mohammad Khalil and Erkan Er. 2023. Will ChatGPT get you caught? Rethinking of Plagiarism Detection. arxiv: 2302.04335 [cs.AI]
[21]
Aisha Khatun and Daniel Brown. 2023. Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording. ArXiv, Vol. abs/2306.06199 (2023).
[22]
Kiana Kheiri and Hamid Karimi. 2023. SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning. ArXiv, Vol. abs/2307.10234 (2023).
[23]
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A Watermark for Large Language Models. In International Conference on Machine Learning.
[24]
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. 2023. On the Reliability of Watermarks for Large Language Models. ArXiv, Vol. abs/2306.04634 (2023).
[25]
George Kour, Marcel Zalmanovici, Naama Zwerdling, Esther Goldbraich, Ora Nova Fandina, Ateret Anaby-Tavor, Orna Raz, and Eitan Farchi. 2023. Unveiling Safety Vulnerabilities of Large Language Models. ArXiv, Vol. abs/2311.04124 (2023).
[26]
Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. 2023. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. ArXiv, Vol. abs/2303.13408 (2023).
[27]
Michael Kuchnik, Virginia Smith, and George Amvrosiadis. 2023. Validating large language models with relm. Proceedings of Machine Learning and Systems, Vol. 5 (2023).
[28]
Philippe Laban, Wojciech Kryscinski, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq R. Joty, and Chien-Sheng Wu. 2023. LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond. ArXiv, Vol. abs/2305.14540 (2023).
[29]
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra. 2022. Solving Quantitative Reasoning Problems with Language Models. arxiv: 2206.14858 [cs.CL]
[30]
Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Longyue Wang, Linyi Yang, Shuming Shi, and Yue Zhang. 2023. Deepfake Text Detection in the Wild. ArXiv, Vol. abs/2305.13242 (2023).
[31]
Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou. 2023. GPT detectors are biased against non-native English writers. Patterns, Vol. 4, 7 (2023), 100779. https://doi.org/10.1016/j.patter.2023.100779
[32]
Zeyan Liu, Zijun Yao, Fengjun Li, and Bo Luo. 2023. Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT. ArXiv, Vol. abs/2306.05524 (2023).
[33]
Ning Lu, Shengcai Liu, Ruidan He, and Ke Tang. 2023. Large Language Models can be Guided to Evade AI-Generated Text Detection. ArXiv, Vol. abs/2305.10847 (2023).
[34]
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. arxiv: 2303.17651 [cs.CL]
[35]
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. 2023. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. ArXiv, Vol. abs/2301.11305 (2023).
[36]
Maximilian Mozes, Xuanli He, Bennett Kleinberg, and Lewis D. Griffin. 2023. Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities. ArXiv, Vol. abs/2308.12833 (2023).
[37]
I Muneeswaran, Shreya Saxena, Siva Prasad, M V Sai Prakash, Advaith Shankar, V Varun, Vishal Vaddina, and Saisubramaniam Gopalakrishnan. 2023. Minimizing Factual Inconsistency and Hallucination in Large Language Models. ArXiv, Vol. abs/2311.13878 (2023).
[38]
Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, and Augustus Odena. 2022. Show Your Work: Scratchpads for Intermediate Computation with Language Models.
[39]
Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. 2023. On the Risk of Misinformation Pollution with Large Language Models.
[40]
Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. 2023. On the Risk of Misinformation Pollution with Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 1389--1403.
[41]
Wenjun Peng, Jingwei Yi, Fangzhao Wu, Shangxi Wu, Bin Benjamin Zhu, Lingjuan Lyu, Binxing Jiao, Tong Xu, Guangzhong Sun, and Xing Xie. 2023. Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark. In ACL 2023.
[42]
Mujahid Ali Quidwai, Chun Xing Li, and Parijat Dube. 2023. Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level. ArXiv, Vol. abs/2306.08122 (2023).
[43]
Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4932--4942. https://doi.org/10.18653/v1/P19-1487
[44]
Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi. 2024. Can AI-Generated Text be Reliably Detected? https://openreview.net/forum?id=NvSwR4IvLO
[45]
Avi Schwarzschild, Micah Goldblum, Arjun Gupta, John P. Dickerson, and Tom Goldstein. 2020. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks. ArXiv, Vol. abs/2006.12557 (2020).
[46]
Damith Chamalke Senadeera and Julia Ive. 2022. Controlled Text Generation using T5 based Encoder-Decoder Soft Prompt Tuning and Analysis of the Utility of Generated Text in AI. ArXiv, Vol. abs/2212.02924 (2022).
[47]
Erfan Shayegani, Md. Abdullah Al Mamun, Yu Fu, Pedram Zaree, Yue Dong, and Nael B. Abu-Ghazaleh. 2023. Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks. ArXiv, Vol. abs/2310.10844 (2023).
[48]
Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learning. arxiv: 2303.11366 [cs.AI]
[49]
Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, and Jasmine Wang. 2019. Release Strategies and the Social Impacts of Language Models. ArXiv, Vol. abs/1908.09203 (2019).
[50]
Chris Stokel-Walker. 2022. AI bot ChatGPT writes smart essays-should academics worry? Nature (2022).
[51]
Jinyan Su, Terry Yue Zhuo, Di Wang, and Preslav Nakov. 2023. DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text. ArXiv, Vol. abs/2306.05540 (2023).
[52]
Edward Tian. 2023. https://gptzero.me/
[53]
Christoforos Vasilatos, Manaar Alam, Talal Rahwan, Yasir Zaki, and Michail Maniatakos. 2023. HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis. ArXiv, Vol. abs/2305.18226 (2023).
[54]
Ivan Vykopal, Mat'uvs Pikuliak, Ivan Srba, Róbert Móro, Dominik Macko, and Mária Bieliková. 2023. Disinformation Capabilities of Large Language Models. ArXiv, Vol. abs/2311.08838 (2023).
[55]
Hong Wang, Xuan Luo, Weizhi Wang, and Xifeng Yan. 2023. Bot or Human? Detecting ChatGPT Imposters with A Single Question. ArXiv, Vol. abs/2305.06424 (2023).
[56]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.
[57]
Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Alham Fikri Aji, and Preslav Nakov. 2023. M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection. ArXiv, Vol. abs/2305.14902 (2023).
[58]
Debora Weber-Wulff, Alla Anohina-Naumeca, Sonja Bjelobaba, Tom'avs Foltýnek, Jean Gabriel Guerrero-Dib, Olumide Popoola, Petr Sigut, and Lorna Waddington. 2023. Testing of detection tools for AI-generated text. International Journal for Educational Integrity, Vol. 19 (2023), 1--39.
[59]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed H. Chi, Quoc V Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.).
[60]
Laura Weidinger, John F. J. Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zachary Kenton, Sande Minnich Brown, William T. Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William S. Isaac, Sean Legassick, Geoffrey Irving, and Iason Gabriel. 2021. Ethical and social risks of harm from Language Models. ArXiv, Vol. abs/2112.04359 (2021).
[61]
Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, and Minlie Huang. 2023. Unveiling the Implicit Toxicity in Large Language Models. In Conference on Empirical Methods in Natural Language Processing.
[62]
Max Wolff. 2020. Attacking Neural Text Detectors. ArXiv, Vol. abs/2002.11768 (2020).
[63]
Tianci Xue, Ziqi Wang, Zhenhailong Wang, Chi Han, Pengfei Yu, and Heng Ji. 2023. RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought. ArXiv, Vol. abs/2305.11499 (2023).
[64]
Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, and Bin He. 2021. Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models. ArXiv, Vol. abs/2103.15543 (2021).
[65]
Xi Yang, Kejiang Chen, Weiming Zhang, Chang rui Liu, Yuang Qi, Jie Zhang, Han Fang, and Neng H. Yu. 2023. Watermarking Text Generated by Black-Box Language Models. ArXiv, Vol. abs/2305.08883 (2023).
[66]
Xianjun Yang, Wei Cheng, Linda Petzold, William Yang Wang, and Haifeng Chen. 2023. DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text. ArXiv, Vol. abs/2305.17359 (2023).
[67]
Xiao Yu, Yuang Qi, Kejiang Chen, Guoqiang Chen, Xi Yang, Pengyuan Zhu, Weiming Zhang, and Neng H. Yu. 2023. GPT Paternity Test: GPT Generated Text Detection with GPT Genetic Inheritance. ArXiv, Vol. abs/2305.12519 (2023).
[68]
Munazza Zaib, Dai Hoang Tran, Subhash Sagar, Adnan Mahmood, Wei Emma Zhang, and Quan Z. Sheng. 2021. BERT-CoQAC: BERT-Based Conversational Question Answering in Context. In International Symposium on Parallel Architectures, Algorithms and Programming.
[69]
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah Goodman. 2022. STaR: Bootstrapping Reasoning With Reasoning. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.).
[70]
Haolan Zhan, Xuanli He, Qiongkai Xu, Yuxiang Wu, and Pontus Stenetorp. 2023. G3Detector: General GPT-Generated Text Detector. ArXiv, Vol. abs/2305.12680 (2023).
[71]
Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. 2023. Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models. Cryptology ePrint Archive, Paper 2023/1776. https://eprint.iacr.org/2023/1776 https://eprint.iacr.org/2023/1776.
[72]
Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, and Maarten Sap. 2024. Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty.
[73]
Daniel M. Ziegler, Nisan Stiennon, Jeff Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-Tuning Language Models from Human Preferences. ArXiv, Vol. abs/1909.08593 (2019).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2024
6901 pages
ISBN:9798400704901
DOI:10.1145/3637528
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Check for updates

Author Tags

  1. ai-generated text detection
  2. data poisoning
  3. llm
  4. paraphrasing attacks
  5. responsible ai
  6. watermarking

Qualifiers

  • Tutorial

Conference

KDD '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 479
    Total Downloads
  • Downloads (Last 12 months)479
  • Downloads (Last 6 weeks)155
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media