Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3698900.3699002guideproceedingsArticle/Chapter ViewAbstractPublication PagessecConference Proceedingsconference-collections
research-article

REMARK-LLM: a robust and efficient watermarking framework for generative large language models

Published: 12 August 2024 Publication History

Abstract

We present REMARK-LLM, a novel efficient, and robust watermarking framework designed for texts generated by large language models (LLMs). Synthesizing human-like content using LLMs necessitates vast computational resources and extensive datasets, encapsulating critical intellectual property (IP). However, the generated content is prone to malicious exploitation, including spamming and plagiarism. To address the challenges, REMARK-LLM proposes three new components: (i) a learning-based message encoding module to infuse binary signatures into LLM-generated texts; (ii) a reparame-terization module to transform the dense distributions from the message encoding to the sparse distribution of the watermarked textual tokens; (iii) a decoding module dedicated for signature extraction; Besides, we introduce an optimized beam search algorithm to generate content with coherence and consistency. REMARK-LLM is rigorously trained to encourage the preservation of semantic integrity in watermarked content, while ensuring effective watermark retrieval. Extensive evaluations on multiple unseen datasets highlight REMARK-LLM's proficiency and transferability in inserting 2× more signature bits into the same texts when compared to prior art, all while maintaining semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against a spectrum of watermark detection and removal attacks.

References

[1]
Sahar Abdelnabi and Mario Fritz. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 42nd IEEE Symposium on Security and Privacy, 2021.
[2]
Hervé Abdi and Lynne J Williams. Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2(4):433-459, 2010.
[3]
David R Anderson, Kenneth P Burnham, and William L Thompson. Null hypothesis testing: problems, prevalence, and an alternative. The journal of wildlife management, pages 912-923, 2000.
[4]
Rishi Bommasani et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
[5]
David J Chalmers. Syntactic transformations on distributed representations. Connectionist Natural Language Processing: Readings from Connection Science, pages 46-55, 1992.
[6]
Huili Chen, Bita Darvish Rouhani, Cheng Fu, Jishen Zhao, and Farinaz Koushanfar. Deepmarks: A secure fingerprinting framework for digital rights management of deep learning models. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, pages 105-113, 2019.
[7]
Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
[8]
Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In ASPLOS, pages 485-497, 2019.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[10]
Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng Wu. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arxiv:2301.07597, 2023.
[11]
Xuanli He et al. Protecting intellectual property of language generation apis with lexical watermark. In AAAI.
[12]
Xuanli He et al. Cater: Intellectual property protection on text generation apis via conditional watermarks. Advances in Neural Information Processing Systems, 35:5431-5445, 2022.
[13]
Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual property of language generation apis with lexical watermark. arXiv preprint arXiv:2112.02701, 2021.
[14]
Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
[15]
Robin Keskisärkkä. Automatic text simplification via synonym replacement, 2012.
[16]
Young-Won Kim, Kyung-Ae Moon, and Il-Seok Oh. A text watermarking algorithm based on word classification and inter-word space statistics. In ICDAR, pages 775-779. Citeseer, 2003.
[17]
John Kirchenbauer et al. On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634, 2023.
[18]
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023.
[19]
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
[20]
Thomas Lancaster. Artificial intelligence, text generation tools and chatgpt-does digital watermarking offer a solution? International Journal for Educational Integrity, 19(1):10, 2023.
[21]
Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Protecting intellectual property of large language model-based code generation apis via watermarks. In CCS, pages 2336-2350, 2023.
[22]
Aiwei Liu, Leyi Pan, Xuming Hu, Shu'ang Li, Lijie Wen, Irwin King, and Philip S Yu. A private watermark for large language models. arXiv preprint arXiv:2307.16230, 2023.
[23]
David Megias et al. Architecture of a fake news detection system combining digital watermarking, signal processing, and machine learning. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, 13(1):33-55, 2022.
[24]
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016.
[25]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[26]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
[27]
Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, and Ahmed Awadallah. Orca: Progressive learning from complex explanation traces of gpt-4, 2023.
[28]
Travis Munyer and Xin Zhong. Deeptextmark: Deep learning based text watermarking for detection of large language model generated text. arXiv preprint arXiv:2305.05773, 2023.
[29]
Avanika Narayan, Ines Chami, Laurel Orr, Simran Arora, and Christopher Ré. Can foundation models wrangle your data? arXiv preprint arXiv:2205.09911, 2022.
[30]
Paarth Neekhara et al. Facesigns: semi-fragile neural watermarks for media authentication and countering deepfakes. arXiv preprint arXiv:2204.01960, 2022.
[31]
Nicolai Thorer Sivesind. Chatgpt-generated-abstracts, 2023.
[32]
R OpenAI. Gpt-4 technical report. arXiv, pages 2303-08774, 2023.
[33]
OpenAI Team. GPT-4. https://openai.com/research/gpt-4.
[34]
Laurel J Orr, Karan Goel, and Christopher Ré. Data management opportunities for foundation models. In CIDR, 2022.
[35]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002.
[36]
Tong Qiao, Yuyan Ma, Ning Zheng, Hanzhou Wu, Yanli Chen, Ming Xu, and Xiangyang Luo. A novel model watermarking for protecting generative adversarial network. Computers & Security, 127:103102, 2023.
[37]
Colin Raffel et al. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485-5551, 2020.
[38]
John Schulman et al. Chatgpt: Optimizing language models for dialogue. OpenAI blog, 2022.
[39]
Ruixiang Tang, Yu-Neng Chuang, and Xia Hu. The science of detecting llm-generated texts. Communications of the ACM, 2024.
[40]
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
[41]
Hugging Face Team. Transformers: State-of-the-art natural language processing. In EMNLP.
[42]
Torch Contributors. PyTorch. https://pytorch.org/, 2023. Last Access on December 26, 2022.
[43]
Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
[44]
Lijun Wu, Jinhua Zhu, et al. Machine translation with weakly paired documents. In EMNLP, pages 4375-4384, 2019.
[45]
Xi Yang, Kejiang Chen, Weiming Zhang, Chang Liu, Yuang Qi, Jie Zhang, Han Fang, and Nenghai Yu. Watermarking text generated by black-box language models. arXiv preprint arXiv:2305.08883, 2023.
[46]
KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, and Nojun Kwak. Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2092-2115, 2023.
[47]
Jie Zhang et al. Deep model intellectual property protection via deep watermarking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4005-4020, 2021.
[48]
Susan Zhang et al. Opt: Open pre-trained transformer language models, 2022.
[49]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
[50]
Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439, 2023.
[51]
Xuandong Zhao, Yu-Xiang Wang, and Lei Li. Protecting language generation models via invisible watermarking. arXiv preprint arXiv:2302.03162, 2023.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SEC '24: Proceedings of the 33rd USENIX Conference on Security Symposium
August 2024
7480 pages
ISBN:978-1-939133-44-1

Sponsors

  • Bloomberg Engineering
  • Google Inc.
  • NSF
  • Futurewei Technologies
  • IBM

Publisher

USENIX Association

United States

Publication History

Published: 12 August 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 40 of 100 submissions, 40%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media