A Grounded Preference Model for LLM Alignment

Tahira Naseem, Guangxuan Xu, Sarathkrishna Swaminathan, Asaf Yehudai, Subhajit Chaudhury, Radu Florian, Ramón Astudillo, Asim Munawar

Abstract

Despite LLMs’ recent advancements, they still suffer from factual inconsistency and hallucination. An often-opted remedy is retrieval-augmented generation – however, there is no guarantee that the model will strictly adhere to retrieved grounding. Fundamentally, LLMs need to be aligned to be more faithful to grounding, which will require high-quality preference annotations. This paper investigates whether we can create high-quality grounded preference data for model alignment without using annotations from humans or large proprietary models. We experimented with existing entailment data and proposed approaches to generate synthetic grounded preference data, with which we train a Grounded Preference Model(GPM). We demonstrate through Proximal Policy Optimization(PPO) training of Mistral-7B-Instruct that our GPM model can successfully align powerful LLMs to generate much better grounded responses as judged by GPT4. Moreover, we show that our GPM is also a great faithfulness classifier, achieving SoTA in dialogue sub-tasks of the TRUE faithfulness Benchmark. We will release our GPM under the Apache 2.0 license.

Anthology ID:: 2024.findings-acl.10
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 151–162
Language:
URL:: https://aclanthology.org/2024.findings-acl.10
DOI:: 10.18653/v1/2024.findings-acl.10
Bibkey:
Cite (ACL):: Tahira Naseem, Guangxuan Xu, Sarathkrishna Swaminathan, Asaf Yehudai, Subhajit Chaudhury, Radu Florian, Ramón Astudillo, and Asim Munawar. 2024. A Grounded Preference Model for LLM Alignment. In Findings of the Association for Computational Linguistics: ACL 2024, pages 151–162, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: A Grounded Preference Model for LLM Alignment (Naseem et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.10.pdf

PDF Cite Search