Computer Science > Computation and Language

arXiv:2211.08714 (cs)

[Submitted on 16 Nov 2022 (v1), last revised 1 Jun 2023 (this version, v3)]

Title:Reward Gaming in Conditional Text Generation

Authors:Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P. Parikh, He He

View PDF

Abstract:To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring spurious correlation, and covariate shift. We show that even though learned metrics achieve high performance on the distribution of the data used to train the reward function, the undesirable patterns may be amplified during RL training of the text generation model. While there has been discussion about reward gaming in the RL or safety community, in this discussion piece, we would like to highlight reward gaming in the natural language generation (NLG) community using concrete conditional text generation examples and discuss potential fixes and areas for future work.

Comments:	ACL 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2211.08714 [cs.CL]
	(or arXiv:2211.08714v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2211.08714

Submission history

From: Richard Yuanzhe Pang [view email]
[v1] Wed, 16 Nov 2022 07:10:02 UTC (6,508 KB)
[v2] Thu, 16 Feb 2023 06:42:17 UTC (6,512 KB)
[v3] Thu, 1 Jun 2023 06:30:59 UTC (6,519 KB)

Computer Science > Computation and Language

Title:Reward Gaming in Conditional Text Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Reward Gaming in Conditional Text Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators