Journal of Data Science

Evaluation of Text Cluster Naming with Generative Large Language Models

Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference, pp. 376–392

Alexander J. Preiss

Caren A. Arbeit

Anthony Berghammer

All authors (9)

https://doi.org/10.6339/24-JDS1149

Pub. online: 26 August 2024 Type: Data Science In Action

Open Access

Received
30 November 2023

Accepted
20 July 2024

Published
26 August 2024

Abstract

Text clustering can streamline many labor-intensive tasks, but it creates a new challenge: efficiently labeling and interpreting the clusters. Generative large language models (LLMs) are a promising option to automate the process of naming text clusters, which could significantly streamline workflows, especially in domains with large datasets and esoteric language. In this study, we assessed the ability of GPT-3.5-turbo to generate names for clusters of texts and compared these to human-generated text cluster names. We clustered two benchmark datasets, each from a specialized domain: research abstracts and clinical patient notes. We generated names for each cluster using four prompting strategies (different ways of including information about the cluster in the prompt used to get LLM responses). For both datasets, the best prompting strategy beat the manual approach across all quality domains. However, name quality varied by prompting strategy and dataset. We conclude that practitioners should consider trying automated cluster naming to avoid bottlenecks or when the scale of the effort is enough to take advantage of the cost savings offered by automation, as detailed in our supplemental blueprint for using LLM cluster naming. However, to get the best performance, it is vital to test a variety of prompting strategies and perform a small test to identify which one performs best on each project’s unique data.

Supplementary material

Supplementary Material

Appendices A–D.

References

BERTopic (2023a). The algorithm. Accessed 2023.

BERTopic (2023b). c-tf-idf. Accessed 2023.

Bowman SR, Dahl GE (2021). What will it take to fix benchmarking in natural language understanding? arXiv preprint: https://arxiv.org/abs/2104.02145.

Carbonell J, Goldstein J (1998). The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 335–336.

Dang HT (2005). Overview of DUC 2005. Technical report, National Institute of Standards and Technology (NIST).

Fabbri AR, Kryściński W, McCann B, Xiong C, Socher R, Radev D (2021). Summeval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics, 9: 391–409. https://doi.org/10.1162/tacl_a_00373

Giray L (2023). Prompt engineering with ChatGPT: A guide for academic writers. Annals of Biomedical Engineering, 51: 2629–2633. https://doi.org/10.1007/s10439-023-03272-4

Hdbscan (2016). The hdbscan clustering library. Accessed 2023.

Hosna A, Merry E, Gyalmo J, Alom Z, Aung Z, Abdul M (2022). Transfer learning: A friendly introduction. Journal of Big Data, 9: 102. https://doi.org/10.1186/s40537-022-00652-w

Kamalloo E, Dziri N, Clarke CLA, Rafiei D (2023). Evaluating open-domain question answering in the era of large language models. arXiv preprint: https://arxiv.org/abs/2305.06984.

Kaur J, Buttar PK (2018). A systematic review on stopword removal algorithms. International Journal of Future Revolution in Computer Science & Communication Engineering, 4(4): 207–210.

KeyBERT (2022). About the project. Accessed 2023.

Kryściński W, McCann B, Xiong C, Socher R (2020). Evaluating the factual consistency of abstractive text summarization. arXiv preprint: https://arxiv.org/abs/1910.12840.

Ma C, Zhang WE, Guo M, Wang H, Sheng QZ (2021). Multi-document summarization via deep learning techniques: A survey. arXiv preprint: https://arxiv.org/abs/2011.04843.

Ramos J (2003). Using TF-IDF to determine word relevance in document queries. Technical report.

Reimers N, Gurevych I (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv preprint: https://arxiv.org/abs/1908.10084.

Rose S, Engel D, Cramer N, Cowley W (2010). Automatic keyword extraction from individual documents. In: Text Mining: Applications and Theory (MW Berry, J Kogan, eds.). John Wiley & Sons, Ltd.

UMAP (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. Accessed 2023.

Xiao W, Beltagy I, Carenini G, Cohan A (2022). Primera: Pyramid-based masked sentence pre-training for multi-document summarization. arXiv preprint: https://arxiv.org/abs/2110.08499.

Zhang T, Ladhak F, Durmus E, Liang P, McKeown K, Hashimoto TB (2023a). Benchmarking large language models for news summarization. arXiv preprint: https://arxiv.org/abs/2301.13848.

Zhang Y, Li Y, Cui L, Cai D, Liu L, Fu T, et al. (2023b). Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint: https://arxiv.org/abs/2309.01219.

Zhao Z, Jin Q, Chen F, Peng T, Yu S (2023). PMC-patients: A large-scale dataset of patient summaries and relations for benchmarking retrieval-based clinical decision support systems. arXiv preprint: https://arxiv.org/abs/2202.13876.

2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

cluster profiling large language model natural language processing text clustering topic modeling unsupervised learning

Funding

This work was funded internally by an RTI International research and development funding mechanism.

Metrics

since February 2021

552 Article info views	0 Full article views
135 PDF downloads	18 XML downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file