research-article

Fine-Tuning Pretrained Language Models to Enhance Dialogue Summarization in Customer Service Centers

Authors:

Sunghyon KyeongAuthors Info & Claims

ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance

Pages 365 - 373

https://doi.org/10.1145/3604237.3626838

Published: 25 November 2023 Publication History

Abstract

The application of pretrained language models in real-world business domains has gained significant attention. However, research on the practical use of generative artificial intelligence (AI) to address real-world downstream tasks is limited. This study aims to enhance the routine tasks of customer service (CS) representatives, particularly in the finance domain, by applying a fine-tuning method to dialogue summarization in CS centers. KakaoBank handles an average of 15,000 CS calls daily. By employing a fine-tuning method using real-world CS dialogue data, we can reduce the time required to summarize CS dialogues and standardize summarization skills. To ensure effective dialogue summarization in the finance domain, pretrained language models should acquire additional knowledge and skills, such as specific knowledge of financial products, problem-solving abilities, and the capacity to handle emotionally charged customers. In this study, we developed a reference fine-tuned model using Polyglot-Ko (5.8B) as the baseline PLM and a dataset containing a wide range of zero-shot instructions and partially containing summarization instructions. We compared this reference model with another model fine-tuned using KakaoBank’s CS dialogues and summarization data as the instruct dataset. The results demonstrated that the fine-tuned model based on KakaoBank’s internal datasets outperformed the reference model, showing a 199% and 12% improvement in ROUGE-L and RDASS, respectively. This study emphasizes the significance of task-specific fine-tuning using appropriate instruct datasets for effective performance in specific downstream tasks. Considering its practical use, we suggest that fine-tuning using real-world instruct datasets is a powerful and cost-effective technique for developing generative AI in the business domain.

References

[1]

Dogu Araci. 2019. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arxiv:1908.10063

[2]

Jacqui Ayling and Adriane Chapman. 2022. Putting AI ethics to work: are the tools fit for purpose?AI and Ethics 2, 3 (aug 2022), 405–429.

[3]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 1877–1901.

[4]

Erik Brynjolfsson, Danielle Li, and Lindsey Raymond. 2023. Generative AI at Work. arxiv:2304.11771

[5]

Yi-Syuan Chen and Hong-Han Shuai. 2021. Meta-Transfer Learning for Low-Resource Abstractive Summarization. Proceedings of the AAAI Conference on Artificial Intelligence 35, 14, 12692–12700. https://doi.org/10.1609/aaai.v35i14.17503

[6]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. arxiv:2204.02311

[7]

Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang, Juanzi Li, and Maosong Sun. 2023. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence 5, 3 (Jun 2023), 220–235.

[8]

Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock. 2023. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. arxiv:2303.10130

[9]

Kavita Ganesan. 2018. ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks. arxiv:1803.01937

[10]

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Aleksander Wawer. 2019. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization. Association for Computational Linguistics, Hong Kong, China, 70–79.

[11]

Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, and Dawn Song. 2023. The False Promise of Imitating Proprietary LLMs. arxiv:2305.15717

[12]

Yichong Huang, Xiachong Feng, Xiaocheng Feng, and Bing Qin. 2023. The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey. arxiv:2104.14839

[13]

Hyunwoong Ko, Kichang Yang, Minho Ryu, Taekyoon Choi, Seungmu Yang, jiwung Hyun, and Sungho Park. 2022. Polyglot-Ko: Open-Source Korean Autoregressive Language Model.

[14]

Hyunwoong Ko, Kichang Yang, Minho Ryu, Taekyoon Choi, Seungmu Yang, Jiwung Hyun, Sungho Park, and Kyubyong Park. 2023. A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models. arxiv:2306.02254

[15]

Faisal Ladhak, Esin Durmus, He He, Claire Cardie, and Kathleen McKeown. 2022. Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 1410–1421.

[16]

Dongyub Lee, Myeong Cheol Shin, Taesun Whang, Seungwoo Cho, Byeongil Ko, Daniel Lee, EungGyun Kim, and Jaechoon Jo. 2020. Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 5604–5616.

[17]

Huije Lee, Wonsuk Yang, Chaehun Park, Hoyun Song, Eugene Jang, and Jong C. Park. 2021. Optimizing Domain Specificity of Transformer-based Language Models for Extractive Summarization of Financial News Articles in Korean. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation. Association for Computational Lingustics, Shanghai, China, 611–621.

[18]

Yunxiang Li, Zihan Li, Kai Zhang, Ruilong Dan, and You Zhang. 2023. ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge. arxiv:2303.14070

[19]

Vladislav Lialin, Vijeta Deshpande, and Anna Rumshisky. 2023. Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning. arxiv:2303.15647

[20]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.

[21]

Tiedong Liu and Bryan Kian Hsiang Low. 2023. Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks. arxiv:2305.14201

[22]

Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1906–1919.

[23]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arxiv:2203.02155

[24]

Seongmin Park, Dongchan Shin, and Jihwa Lee. 2022. Leveraging Non-dialogue Summaries for Dialogue Summarization. In Proceedings of the First Workshop On Transcript Understanding. International Conference on Computational Linguistics, Gyeongju, South Korea, 1–7.

[25]

Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, and Jianfeng Gao. 2023. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. arxiv:2302.12813

[26]

Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. 2023. Instruction Tuning with GPT-4. arxiv:2304.03277

[27]

Michael Polanyi. 1966. The Tacit Dimension. University of Chicago Press, Chicago, IL.

[28]

George Prodan and Elena Pelican. 2022. Prompt scoring system for dialogue summarization using GPT-3. TechRxiv (5 2022).

[29]

Chad Syverson. 2011. What Determines Productivity?Journal of Economic Literature 49, 2 (Jun 2011), 326–65.

[30]

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.

[31]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605.

[32]

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. Self-Instruct: Aligning Language Models with Self-Generated Instructions. arxiv:2212.10560

[33]

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. 2022. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations. https://openreview.net/forum?id=gEZrGCozdqR

[34]

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arxiv:2302.11382

[35]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language Models. arxiv:2205.01068

[36]

Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu, and Dragomir Radev. 2021. QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization. arxiv:2104.05938

Cited By

Bozkir EÖzdel SLau KWang MGao HKasneci E(2024)Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and PrivacyProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665563(1-7)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665563

Index Terms

Fine-Tuning Pretrained Language Models to Enhance Dialogue Summarization in Customer Service Centers
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Information systems
  1. Information systems applications
    1. Enterprise information systems

Recommendations

How to Interact and Change? Abstractive Dialogue Summarization with Dialogue Act Weight and Topic Change Info
Knowledge Science, Engineering and Management
Abstract
Conventional sequence-to-sequence frameworks in neural abstractive summarization treat every document as a single topic text without interaction, so the results are often unsatisfactory when given dialogues. To tackle this problem, we propose a ...
Topic-Features for Dialogue Summarization
Natural Language Processing and Chinese Computing
Abstract
Texts such as news reports and academic papers come from one single speaker and are well-structured. However, dialogues often come from two or more speakers exchanging information. In this case, the topic or intention may change in a dialogue, and ...
Mutually improved response generation and dialogue summarization for multi-domain task-oriented dialogue systems
Abstract
With the development of pre-trained language models (PrLM), the research of PrLM-based multi-domain task-oriented dialogue systems (TOD) has attracted growing attention and has achieved great progress. However, most current studies suffer from ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICAIF '23: Proceedings of the Fourth ACM International Conference on AI in Finance

November 2023

697 pages

ISBN:9798400702402

DOI:10.1145/3604237

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICAIF '23

ICAIF '23: 4th ACM International Conference on AI in Finance

November 27 - 29, 2023

NY, Brooklyn, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
324
Total Downloads

Downloads (Last 12 months)182
Downloads (Last 6 weeks)5

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bozkir EÖzdel SLau KWang MGao HKasneci E(2024)Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and PrivacyProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665563(1-7)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665563

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten