Nothing Special   »   [go: up one dir, main page]

What a lovely hat

Is it made out of tin foil?

Paper 2023/1661

Publicly-Detectable Watermarking for Language Models

Jaiden Fairoze, University of California, Berkeley
Sanjam Garg, University of California, Berkeley
Somesh Jha, University of Wisconsin–Madison
Saeed Mahloujifar, Fundamental Artificial Intelligence Research at Meta
Mohammad Mahmoody, University of Virginia
Mingyuan Wang, New York University Shanghai
Abstract

We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Published by the IACR in CIC 2024
Keywords
public-detectabilitywatermarkinglarge language modelscryptographic protocolsprovable securitymachine learning
Contact author(s)
fairoze @ berkeley edu
sanjamg @ berkeley edu
jha @ cs wisc edu
saeedm @ meta com
mohammad @ virginia edu
mingyuan wang @ nyu edu
History
2025-01-04: last of 4 revisions
2023-10-26: received
See all versions
Short URL
https://ia.cr/2023/1661
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2023/1661,
      author = {Jaiden Fairoze and Sanjam Garg and Somesh Jha and Saeed Mahloujifar and Mohammad Mahmoody and Mingyuan Wang},
      title = {Publicly-Detectable Watermarking for Language Models},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1661},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1661}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.