Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3651671.3651702acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Fine-Tuning LLMs for Multi-Turn Dialogues: Optimizing Cross-Entropy Loss with KL Divergence for All Rounds of Responses

Published: 07 June 2024 Publication History

Abstract

Large language models (LLMs) have shown strong capabilities in natural language generation. To further enhance its ability in multi-turn dialogue generation, supervised fine-tuning (SFT) methods based on multi-turn dialogue data have been widely adopted. These methods aim to improve the understanding of dialogue context and enable LLMs to generate appropriate responses by incorporating historical dialogue information. However, current multi-turn dialogue SFT methods have some drawbacks. Either it cannot fully utilize the multi-turn dialogue data, or it has a quite long training time. This paper proposes a sufficient, efficient and effective fine-tuning method based on multi-turn dialogue data. The proposed method optimizes the cross-entropy loss with KL divergence, takes the entire multi-turn dialogue as input, and calculates the loss for all rounds of responses. Comparing with commonly used SFT approach that only takes the loss of the last turn response during training, our method can use training data sufficiently by digesting the entire conversation data and taking all rounds’ responses loss for weights update. Also, compared to the SFT method which splits a multi-turn dialogue into numerous progressive conversation data for training, the proposed multi-turn dialogues SFT method is more efficient in training LLMs. This is demonstrated by significantly reducing the amount of time required for training. Meanwhile, the trained model can mimic the chat style on the training set by making consistency between the generated output and the expected response with the help of KL divergence. Experimental results on the doc2dial dataset and a medical dataset demonstrate that the proposed method outperforms traditional fine-tuning methods in terms of ROUGE, BLEU score and overall training time. It shows the effectiveness of the proposed method in achieving sufficient and efficient training of large language models for multi-turn dialogue data.

References

[1]
Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. 2023. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595 (2023).
[2]
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023) (2023).
[3]
Song Feng, Hui Wan, Chulaka Gunasekara, Siva Sankalp Patel, Sachindra Joshi, and Luis A Lastras. 2020. doc2dial: A goal-oriented document-grounded dialogue dataset. arXiv preprint arXiv:2011.06623 (2020).
[4]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
[5]
Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems 31 (2018).
[6]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
[7]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
[8]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).
[9]
Tianxiang Sun, Xiaotian Zhang, Zhengfu He, Peng Li, Qinyuan Cheng, Hang Yan, Xiangyang Liu, Yunfan Shao, Qiong Tang, Xingjian Zhao, 2023. Moss: Training conversational language models from synthetic data. arXiv preprint arXiv:2307.15020 7 (2023).
[10]
Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, and Yongdong Zhang. 2023. ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences. arXiv preprint arXiv:2311.06025 (2023).
[11]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
[12]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
[13]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[14]
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2022. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560 (2022).
[15]
Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2021. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021).
[16]
BigScience Workshop, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
[17]
Honglin Xiong, Sheng Wang, Yitao Zhu, Zihao Zhao, Yuxiao Liu, Qian Wang, and Dinggang Shen. 2023. Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097 (2023).

Index Terms

  1. Fine-Tuning LLMs for Multi-Turn Dialogues: Optimizing Cross-Entropy Loss with KL Divergence for All Rounds of Responses

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMLC '24: Proceedings of the 2024 16th International Conference on Machine Learning and Computing
    February 2024
    757 pages
    ISBN:9798400709234
    DOI:10.1145/3651671
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Large Language Models
    2. Multi-turn Dialogue
    3. Supervised Fine-Tuning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICMLC 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 91
      Total Downloads
    • Downloads (Last 12 months)91
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media