research-article

Automatic Topic Label Generation using Conversational Models

Authors:

Virginia Ramón-Ferrer,

Carlos Badenes-Olmedo,

Oscar CorchoAuthors Info & Claims

K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023

Pages 17 - 24

https://doi.org/10.1145/3587259.3627574

Published: 05 December 2023 Publication History

Abstract

In probabilistic topic models, a topic is characterised by a set of words, with a probability associated to each of them. Even though it is not necessary to understand the meaning of topics to perform common downstream tasks where topic models are used, such as topic inference or document similarity, there have been attempts to uncover the semantics of topics by providing labels to them, consisting in a couple of concepts. In this paper we propose a methodology, Conversational Probabilistic Topic Labelling (CPTL), to study whether conversational models can be used to generate labels that describe probabilistic topics given their most representative keywords. We evaluate and compare the performance of a selection of conversational models for the topic label generation task with the performance of a task-specific language model trained to generate topic labels.

References

[1]

Mehdi Allahyari and Krys Kochut. 2015. Automatic Topic Labeling Using Ontology-Based Topic Models. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). 259–264. https://doi.org/10.1109/ICMLA.2015.88

[2]

Areej Alokaili, Nikolaos Aletras, and Mark Stevenson. 2020. Automatic Generation of Topic Labels(SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 1965–1968. https://doi.org/10.1145/3397271.3401185

Digital Library

[3]

Thushari Atapattu and Katrina Falkner. 2016. A Framework for Topic Generation and Labeling from MOOC Discussions. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (Edinburgh, Scotland, UK) (L@S ’16). Association for Computing Machinery, New York, NY, USA, 201–204. https://doi.org/10.1145/2876034.2893414

Digital Library

[4]

Carlos Badenes-Olmedo, José Luis Redondo-Garcia, and Oscar Corcho. 2017. Distributing Text Mining Tasks with LibrAIry. In Proceedings of the 2017 ACM Symposium on Document Engineering (Valletta, Malta) (DocEng ’17). Association for Computing Machinery, New York, NY, USA, 63–66. https://doi.org/10.1145/3103010.3121040

Digital Library

[5]

Shraey Bhatia, Jey Han Lau, and Timothy Baldwin. 2016. Automatic Labelling of Topics with Neural Embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 953–963. https://aclanthology.org/C16-1091

[6]

David Blei, Andrew Ng, and Michael Jordan. 2001. Latent Dirichlet Allocation. In Advances in Neural Information Processing Systems, T. Dietterich, S. Becker, and Z. Ghahramani (Eds.). Vol. 14. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2001/file/296472c9542ad4d4788d543508116cbc-Paper.pdf

[7]

David M. Blei. 2012. Probabilistic Topic Models. Commun. ACM 55, 4 (apr 2012), 77–84. https://doi.org/10.1145/2133806.2133826

Digital Library

[8]

David M. Blei and John D. Lafferty. 2006. Dynamic Topic Models. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML ’06). Association for Computing Machinery, New York, NY, USA, 113–120. https://doi.org/10.1145/1143844.1143859

Digital Library

[9]

Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and Jiafeng Guo. 2014. BTM: Topic Modeling over Short Texts. IEEE Transactions on Knowledge and Data Engineering 26, 12 (2014), 2928–2941. https://doi.org/10.1109/TKDE.2014.2313872

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[11]

Thomas Griffiths, Michael Jordan, Joshua Tenenbaum, and David Blei. 2003. Hierarchical Topic Models and the Nested Chinese Restaurant Process. In Advances in Neural Information Processing Systems, S. Thrun, L. Saul, and B. Schölkopf (Eds.). Vol. 16. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2003/file/7b41bfa5085806dfa24b8c9de0ce567f-Paper.pdf

[12]

Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arxiv:2203.05794 [cs.CL]

[13]

HuggingFace. 2023. deepset/bert-large-uncased-whole-word-masking-squad2 · Hugging Face. https://huggingface.co/deepset/bert-large-uncased-whole-word-masking-squad2

[14]

HuggingFace. 2023. deepset/deberta-v3-base-squad2 · Hugging Face. https://huggingface.co/deepset/deberta-v3-base-squad2

[15]

HuggingFace. 2023. deepset/deberta-v3-large-squad2 · Hugging Face.

[16]

HuggingFace. 2023. deepset/roberta-base-squad2-distilled · Hugging Face. https://huggingface.co/deepset/roberta-base-squad2-distilled

[17]

HuggingFace. 2023. deepset/xlm-roberta-large-squad2 · Hugging Face. https://huggingface.co/deepset/xlm-roberta-large-squad2

[18]

HuggingFace. 2023. facebook/blenderbot-400M-distill · Hugging Face. https://huggingface.co/facebook/blenderbot-400M-distill

[19]

HuggingFace. 2023. Hugging Face – The AI community building the future.https://huggingface.co/

[20]

HuggingFace. 2023. microsoft/DialoGPT-medium · Hugging Face. https://huggingface.co/microsoft/DialoGPT-medium

[21]

HuggingFace. 2023. PygmalionAI/pygmalion-6b · Hugging Face. https://huggingface.co/PygmalionAI/pygmalion-6b

[22]

Ioana Hulpus, Conor Hayes, Marcel Karnstedt, and Derek Greene. 2013. Unsupervised Graph-Based Topic Labelling Using Dbpedia. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (Rome, Italy) (WSDM ’13). Association for Computing Machinery, New York, NY, USA, 465–474. https://doi.org/10.1145/2433396.2433454

Digital Library

[23]

Wanqiu Kou, Fang Li, and Timothy Baldwin. 2015. Automatic labelling of topic models using word vectors and letter trigram vectors. In Information Retrieval Technology: 11th Asia Information Retrieval Societies Conference, AIRS 2015, Brisbane, QLD, Australia, December 2-4, 2015. Proceedings 11. Springer, 253–264.

[24]

John Lafferty and David Blei. 2005. Correlated Topic Models. In Advances in Neural Information Processing Systems, Y. Weiss, B. Schölkopf, and J. Platt (Eds.). Vol. 18. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2005/file/9e82757e9a1c12cb710ad680db11f6f1-Paper.pdf

[25]

Jey Han Lau, David Newman, Sarvnaz Karimi, and Timothy Baldwin. 2010. Best topic word selection for topic labelling. In Coling 2010: Posters. 605–613.

[26]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7871–7880.

[27]

Davide Magatti, Silvia Calegari, Davide Ciucci, and Fabio Stella. 2009. Automatic Labeling of Topics. In 2009 Ninth International Conference on Intelligent Systems Design and Applications. 1227–1232. https://doi.org/10.1109/ISDA.2009.165

Digital Library

[28]

Xian-Ling Mao, Zhao-Yan Ming, Zheng-Jun Zha, Tat-Seng Chua, Hongfei Yan, and Xiaoming Li. 2012. Automatic Labeling Hierarchical Topics(CIKM ’12). Association for Computing Machinery, New York, NY, USA, 2383–2386. https://doi.org/10.1145/2396761.2398646

Digital Library

[29]

Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai. 2006. A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland) (WWW ’06). Association for Computing Machinery, New York, NY, USA, 533–542. https://doi.org/10.1145/1135777.1135857

Digital Library

[30]

OpenAI. 2023. Introducing ChatGPT. https://openai.com/blog/chatgpt

[31]

Cristian Popa and Traian Rebedea. 2021. BART-TL: Weakly-Supervised Topic Label Generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1418–1425. https://doi.org/10.18653/v1/2021.eacl-main.121

[32]

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).

[33]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don’t Know: Unanswerable Questions for SQuAD. CoRR abs/1806.03822 (2018). arXiv:1806.03822http://arxiv.org/abs/1806.03822

[34]

Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 conference on empirical methods in natural language processing. 248–256.

[35]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992. https://doi.org/10.18653/v1/D19-1410

[36]

Malik Sallam. 2023. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 11, 6 (2023). https://doi.org/10.3390/healthcare11060887

[37]

SentenceTransformers. 2022. sentence-transformers/all-mpnet-base-v2 · Hugging Face. https://huggingface.co/sentence-transformers/all-mpnet-base-v2

[38]

SentenceTransformers. 2023. sentence-transformers/all-MiniLM-L6-v2 · Hugging Face. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

[39]

SentenceTransformers. 2023. SentenceTransformers Documentation. https://www.sbert.net/index.html

[40]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

[41]

Elaine Zosa, Lidia Pivovarova, Michele Boggia, and Sardana Ivanova. 2022. Multilingual Topic Labelling of News Topics Using Ontological Mapping. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 248–256.

Index Terms

Automatic Topic Label Generation using Conversational Models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
      2. Natural language generation

Recommendations

Automatic topic labeling using graph-based pre-trained neural embedding
Abstract
It is necessary to reduce the cognitive overhead of interpreting the native topic term list of the Latent Dirichlet Allocation (LDA) style topic model. In this regard, automatic topic labeling has become an effective approach to ...
Automatic labeling of multinomial topic models
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can ...
Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Extracting parts of a text document relevant to a class label is a critical information retrieval task. We propose a semi-supervised multi-label topic model for jointly achieving document and sentence-level class inferences. Under our model, each ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023

December 2023

270 pages

ISBN:9798400701412

DOI:10.1145/3587259

Editors:
Brent Venable
University of West Florida and Institute for Human and Machine Cognition, Pensacola, FL, USA
,
Daniel Garijo
Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
,
Brian Jalaian
University of West Florida and Institute for Human & Machine Cognition, Pensacola, FL, USA

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Knowledge Spaces

Conference

K-CAP '23

Sponsor:

SIGAI

K-CAP '23: Knowledge Capture Conference 2023

December 5 - 7, 2023

FL, Pensacola, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
70
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)5

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents