Computer Science > Computation and Language

arXiv:2403.01197 (cs)

[Submitted on 2 Mar 2024 (v1), last revised 28 Apr 2024 (this version, v2)]

Title:DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

Abstract:The performance of the reward model (RM) is a critical factor in improving the effectiveness of the large language model (LLM) during alignment fine-tuning. There remain two challenges in RM training: 1) training the same RM using various categories of data may cause its generalization performance to suffer from multi-task disturbance, and 2) the human annotation consistency rate is generally only $60\%$ to $75\%$, causing training data to contain a lot of noise. To tackle these two challenges, we introduced the idea of Mixture-of-Experts (MoE) into the field of RM for the first time. We propose the Double-Layer MoE RM (DMoERM). The outer layer MoE is a sparse model. After classifying an input into task categories, we route it to the corresponding inner layer task-specific model. The inner layer MoE is a dense model. We decompose the specific task into multiple capability dimensions and individually fine-tune a LoRA expert on each one. Their outputs are then synthesized by an MLP to compute the final rewards. To minimize costs, we call a public LLM API to obtain the capability preference labels. The validation on manually labeled datasets confirms that our model attains superior consistency with human preference and outstrips advanced generative approaches. Meanwhile, through BoN sampling and RL experiments, we demonstrate that our model outperforms state-of-the-art ensemble methods of RM and mitigates the overoptimization problem. Our code and dataset are available at: this https URL.

Comments:	23 pages, 8 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.01197 [cs.CL]
	(or arXiv:2403.01197v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.01197

Submission history

From: Shanghaoran Quan [view email]
[v1] Sat, 2 Mar 2024 12:31:22 UTC (7,463 KB)
[v2] Sun, 28 Apr 2024 03:24:41 UTC (7,510 KB)

Computer Science > Computation and Language

Title:DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators