research-article

Fine-Tuning LLMs for Multi-Turn Dialogues: Optimizing Cross-Entropy Loss with KL Divergence for All Rounds of Responses

Authors:

Zeyu Teng,

Yong Song,

Xiaozhou Ye,

Ye OuyangAuthors Info & Claims

ICMLC '24: Proceedings of the 2024 16th International Conference on Machine Learning and Computing

Pages 128 - 133

https://doi.org/10.1145/3651671.3651702

Published: 07 June 2024 Publication History

Get Access

Abstract

Large language models (LLMs) have shown strong capabilities in natural language generation. To further enhance its ability in multi-turn dialogue generation, supervised fine-tuning (SFT) methods based on multi-turn dialogue data have been widely adopted. These methods aim to improve the understanding of dialogue context and enable LLMs to generate appropriate responses by incorporating historical dialogue information. However, current multi-turn dialogue SFT methods have some drawbacks. Either it cannot fully utilize the multi-turn dialogue data, or it has a quite long training time. This paper proposes a sufficient, efficient and effective fine-tuning method based on multi-turn dialogue data. The proposed method optimizes the cross-entropy loss with KL divergence, takes the entire multi-turn dialogue as input, and calculates the loss for all rounds of responses. Comparing with commonly used SFT approach that only takes the loss of the last turn response during training, our method can use training data sufficiently by digesting the entire conversation data and taking all rounds’ responses loss for weights update. Also, compared to the SFT method which splits a multi-turn dialogue into numerous progressive conversation data for training, the proposed multi-turn dialogues SFT method is more efficient in training LLMs. This is demonstrated by significantly reducing the amount of time required for training. Meanwhile, the trained model can mimic the chat style on the training set by making consistency between the generated output and the expected response with the help of KL divergence. Experimental results on the doc2dial dataset and a medical dataset demonstrate that the proposed method outperforms traditional fine-tuning methods in terms of ROUGE, BLEU score and overall training time. It shows the effectiveness of the proposed method in achieving sufficient and efficient training of large language models for multi-turn dialogue data.

References

[1]

Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. 2023. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595 (2023).

Abstract

References

Index Terms

Recommendations

Discourse Relation-Aware Multi-turn Dialogue Response Generation

A Multi-party Conversational Social Robot Using LLMs

Memory Graph with Message Rehearsal for Multi-Turn Dialogue Generation

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations