Paper 2023/1661
Publicly-Detectable Watermarking for Language Models
Abstract
We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.
Metadata
- Available format(s)
-
PDF
- Category
- Applications
- Publication info
- Published by the IACR in CIC 2024
- Keywords
- public-detectabilitywatermarkinglarge language modelscryptographic protocolsprovable securitymachine learning
- Contact author(s)
-
fairoze @ berkeley edu
sanjamg @ berkeley edu
jha @ cs wisc edu
saeedm @ meta com
mohammad @ virginia edu
mingyuan wang @ nyu edu - History
- 2025-01-04: last of 4 revisions
- 2023-10-26: received
- See all versions
- Short URL
- https://ia.cr/2023/1661
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2023/1661, author = {Jaiden Fairoze and Sanjam Garg and Somesh Jha and Saeed Mahloujifar and Mohammad Mahmoody and Mingyuan Wang}, title = {Publicly-Detectable Watermarking for Language Models}, howpublished = {Cryptology {ePrint} Archive, Paper 2023/1661}, year = {2023}, url = {https://eprint.iacr.org/2023/1661} }