CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation

Zhijie Qu¹³,
Juan Li¹³,
Zerui Ma¹³ &
…
Jianqiang Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

719 Accesses

Abstract

Medical dialogue generation relies on natural language generation techniques to enable online medical consultations. Recently, the widespread adoption of large-scale models in the field of natural language processing has facilitated rapid advancements in this technology. Existing medical dialogue models are mostly based on BERT and pre-trained on English corpora, but there is a lack of high-performing models on the task of Chinese medical dialogue generation. To solve the above problem, this paper proposes CMed-GPT, which is the GPT pre-training language model based on Chinese medical domain text. The model is available in two versions, namely, base and large, with corresponding perplexity values of 8.64 and 8.01. Additionally, we incorporate lexical and entity embeddings into the dialogue text in a uniform manner to meet the requirements of downstream dialogue generation tasks. By applying both fine-tuning and p-tuning to CMed-GPT, we lowered the PPL from 8.44 to 7.35. This study not only confirms the exceptional performance of the CMed-GPT model in generating Chinese biomedical text but also highlights the advantages of p-tuning over traditional fine-tuning with prefix prompts. Furthermore, we validate the significance of incorporating external information in medical dialogue generation, which enhances the quality of dialogue generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation

Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences

Article Open access 14 March 2024

Text Preprocessing and Enrichment of Large Text Corpus-Based Keyphrase Generation for Goal-Oriented Dialogue Systems

Notes

References

He, X., et al.: MedDialog: Two large-scale medical dialogue datasets (2020). arXiv preprint arXiv:2004.03329
Liu, W., Tang, J., Qin, J., Xu, L., Liang, X.: MedDG: A large-scale medical consultation dataset for building medical dialogue system (2020). arXiv preprint arXiv:2010.07497
Li, D., et al.: Semi-supervised variational reasoning for medical dialogue generation. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 544–554. Association for Computing Machinery, New York (2021)
Google Scholar
Wei, Z., et al.: Task-oriented dialogue system for automatic Diagnosis. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 201–207. Association for Computational Linguistics, Melbourne (2018)
Google Scholar
Xu, L., Zhou Q., Gong, K., Liang, X., Tang, J., Lin, L.: End-to-end knowledge-routed relational dialogue system for automatic diagnosis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7346–7353. Association for the Advancement of Artificial Intelligence (2019)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355. Association for Computational Linguistics, Brussels (2018)
Google Scholar
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp.58–65. Association for Computational Linguistics, Florence (2019)
Google Scholar
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3(1), 1–23 (2022)
Article Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. J. Leukoc. Biol. 36(4), 1234–1240 (2020)
MathSciNet Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp.3615–3620. Association for Computational Linguistics, Hong Kong (2019)
Google Scholar
Roitero, K., et al.: DiLBERT: cheap embeddings for disease related medical NLP. IEEE Access 9(9), 2169–3536 (2021)
Google Scholar
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach (2019). arXiv preprint arXiv:1907.11692
Zhang, N., Jia, Q., Yin, K., Dong, L., Gao, F., Hua, N.: Conceptualized Representation Learning for Chinese Biomedical Text Mining (2020). arXiv preprint arXiv:2008.10813
Zhang, T., Cai, Z., Wang, C., Qiu, M., Yang, B., He, X.: SMedBERT: a knowledge-enhanced pre-trained language model with structured semantics for medical text mining. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp.5882–5893. Association for Computational Linguistics (2021)
Google Scholar
He, B., et al.: BERT-MK: integrating graph contextualized knowledge into pre-trained language models. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp.2281–2290. Association for Computational Linguistics (2020)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. GPT-2 OpenAI blog (2019)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp.1877–1901. Curran Associates Inc, Red Hook (2020)
Google Scholar
Papanikolaou, Y., Pierleoni, A.: DARE: Data augmented relation extraction with GPT-2 (2020). arXiv preprint arXiv:2004.13845
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp.6000–6010. Curran Associates Inc., Red Hook (2017)
Google Scholar
Loshchilov, I., Hutter, H.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2017)
Google Scholar
Peng, X., et al.: Fine-Tuning a transformer-based language model to avoid generating non-normative text (2020). arXiv preprint arXiv:2001.08764v1
Davier, M.V., Training optimus prime, M.D.: Generating medical certification items by Fine-Tuning OpenAI’s gpt2 transformer model (2019). arXiv preprint arXiv:1908.08594
Tsai, D.C.L., et al.: Short answer questions generation by Fine-Tuning BERT and GPT-2. In: 29th International Conference on Computers in Education Conference, pp. 509–515. Asia-Pacific Society for Computers in Education (2021)
Google Scholar
Li, X., Liang, P.: Prefix-Tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp.4582–4597. Association for Computational Linguistics (2021)
Google Scholar
Lester, B., et al.: The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.3045–3059. Association for Computational Linguistics (2021)
Google Scholar
Cui, L., et al.: Knowledge enhanced fine-tuning for better handling unseen entities in dialogue generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.2328–2337. Association for Computational Linguistics (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Technology, Beijing, 100124, China
Zhijie Qu, Juan Li, Zerui Ma & Jianqiang Li

Authors

Zhijie Qu
View author publications
You can also search for this author in PubMed Google Scholar
Juan Li
View author publications
You can also search for this author in PubMed Google Scholar
Zerui Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jianqiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhijie Qu .

Editor information

Editors and Affiliations

Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qu, Z., Li, J., Ma, Z., Li, J. (2024). CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_7

Download citation

DOI: https://doi.org/10.1007/978-981-97-2253-2_7
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics