Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3587259.3627574acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

Automatic Topic Label Generation using Conversational Models

Published: 05 December 2023 Publication History

Abstract

In probabilistic topic models, a topic is characterised by a set of words, with a probability associated to each of them. Even though it is not necessary to understand the meaning of topics to perform common downstream tasks where topic models are used, such as topic inference or document similarity, there have been attempts to uncover the semantics of topics by providing labels to them, consisting in a couple of concepts. In this paper we propose a methodology, Conversational Probabilistic Topic Labelling (CPTL), to study whether conversational models can be used to generate labels that describe probabilistic topics given their most representative keywords. We evaluate and compare the performance of a selection of conversational models for the topic label generation task with the performance of a task-specific language model trained to generate topic labels.

References

[1]
Mehdi Allahyari and Krys Kochut. 2015. Automatic Topic Labeling Using Ontology-Based Topic Models. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). 259–264. https://doi.org/10.1109/ICMLA.2015.88
[2]
Areej Alokaili, Nikolaos Aletras, and Mark Stevenson. 2020. Automatic Generation of Topic Labels(SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 1965–1968. https://doi.org/10.1145/3397271.3401185
[3]
Thushari Atapattu and Katrina Falkner. 2016. A Framework for Topic Generation and Labeling from MOOC Discussions. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (Edinburgh, Scotland, UK) (L@S ’16). Association for Computing Machinery, New York, NY, USA, 201–204. https://doi.org/10.1145/2876034.2893414
[4]
Carlos Badenes-Olmedo, José Luis Redondo-Garcia, and Oscar Corcho. 2017. Distributing Text Mining Tasks with LibrAIry. In Proceedings of the 2017 ACM Symposium on Document Engineering (Valletta, Malta) (DocEng ’17). Association for Computing Machinery, New York, NY, USA, 63–66. https://doi.org/10.1145/3103010.3121040
[5]
Shraey Bhatia, Jey Han Lau, and Timothy Baldwin. 2016. Automatic Labelling of Topics with Neural Embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 953–963. https://aclanthology.org/C16-1091
[6]
David Blei, Andrew Ng, and Michael Jordan. 2001. Latent Dirichlet Allocation. In Advances in Neural Information Processing Systems, T. Dietterich, S. Becker, and Z. Ghahramani (Eds.). Vol. 14. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2001/file/296472c9542ad4d4788d543508116cbc-Paper.pdf
[7]
David M. Blei. 2012. Probabilistic Topic Models. Commun. ACM 55, 4 (apr 2012), 77–84. https://doi.org/10.1145/2133806.2133826
[8]
David M. Blei and John D. Lafferty. 2006. Dynamic Topic Models. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML ’06). Association for Computing Machinery, New York, NY, USA, 113–120. https://doi.org/10.1145/1143844.1143859
[9]
Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and Jiafeng Guo. 2014. BTM: Topic Modeling over Short Texts. IEEE Transactions on Knowledge and Data Engineering 26, 12 (2014), 2928–2941. https://doi.org/10.1109/TKDE.2014.2313872
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11]
Thomas Griffiths, Michael Jordan, Joshua Tenenbaum, and David Blei. 2003. Hierarchical Topic Models and the Nested Chinese Restaurant Process. In Advances in Neural Information Processing Systems, S. Thrun, L. Saul, and B. Schölkopf (Eds.). Vol. 16. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2003/file/7b41bfa5085806dfa24b8c9de0ce567f-Paper.pdf
[12]
Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arxiv:2203.05794 [cs.CL]
[13]
HuggingFace. 2023. deepset/bert-large-uncased-whole-word-masking-squad2 · Hugging Face. https://huggingface.co/deepset/bert-large-uncased-whole-word-masking-squad2
[14]
HuggingFace. 2023. deepset/deberta-v3-base-squad2 · Hugging Face. https://huggingface.co/deepset/deberta-v3-base-squad2
[15]
HuggingFace. 2023. deepset/deberta-v3-large-squad2 · Hugging Face.
[16]
HuggingFace. 2023. deepset/roberta-base-squad2-distilled · Hugging Face. https://huggingface.co/deepset/roberta-base-squad2-distilled
[17]
HuggingFace. 2023. deepset/xlm-roberta-large-squad2 · Hugging Face. https://huggingface.co/deepset/xlm-roberta-large-squad2
[18]
HuggingFace. 2023. facebook/blenderbot-400M-distill · Hugging Face. https://huggingface.co/facebook/blenderbot-400M-distill
[19]
HuggingFace. 2023. Hugging Face – The AI community building the future.https://huggingface.co/
[20]
HuggingFace. 2023. microsoft/DialoGPT-medium · Hugging Face. https://huggingface.co/microsoft/DialoGPT-medium
[21]
HuggingFace. 2023. PygmalionAI/pygmalion-6b · Hugging Face. https://huggingface.co/PygmalionAI/pygmalion-6b
[22]
Ioana Hulpus, Conor Hayes, Marcel Karnstedt, and Derek Greene. 2013. Unsupervised Graph-Based Topic Labelling Using Dbpedia. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (Rome, Italy) (WSDM ’13). Association for Computing Machinery, New York, NY, USA, 465–474. https://doi.org/10.1145/2433396.2433454
[23]
Wanqiu Kou, Fang Li, and Timothy Baldwin. 2015. Automatic labelling of topic models using word vectors and letter trigram vectors. In Information Retrieval Technology: 11th Asia Information Retrieval Societies Conference, AIRS 2015, Brisbane, QLD, Australia, December 2-4, 2015. Proceedings 11. Springer, 253–264.
[24]
John Lafferty and David Blei. 2005. Correlated Topic Models. In Advances in Neural Information Processing Systems, Y. Weiss, B. Schölkopf, and J. Platt (Eds.). Vol. 18. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2005/file/9e82757e9a1c12cb710ad680db11f6f1-Paper.pdf
[25]
Jey Han Lau, David Newman, Sarvnaz Karimi, and Timothy Baldwin. 2010. Best topic word selection for topic labelling. In Coling 2010: Posters. 605–613.
[26]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7871–7880.
[27]
Davide Magatti, Silvia Calegari, Davide Ciucci, and Fabio Stella. 2009. Automatic Labeling of Topics. In 2009 Ninth International Conference on Intelligent Systems Design and Applications. 1227–1232. https://doi.org/10.1109/ISDA.2009.165
[28]
Xian-Ling Mao, Zhao-Yan Ming, Zheng-Jun Zha, Tat-Seng Chua, Hongfei Yan, and Xiaoming Li. 2012. Automatic Labeling Hierarchical Topics(CIKM ’12). Association for Computing Machinery, New York, NY, USA, 2383–2386. https://doi.org/10.1145/2396761.2398646
[29]
Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai. 2006. A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland) (WWW ’06). Association for Computing Machinery, New York, NY, USA, 533–542. https://doi.org/10.1145/1135777.1135857
[30]
OpenAI. 2023. Introducing ChatGPT. https://openai.com/blog/chatgpt
[31]
Cristian Popa and Traian Rebedea. 2021. BART-TL: Weakly-Supervised Topic Label Generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1418–1425. https://doi.org/10.18653/v1/2021.eacl-main.121
[32]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).
[33]
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don’t Know: Unanswerable Questions for SQuAD. CoRR abs/1806.03822 (2018). arXiv:1806.03822http://arxiv.org/abs/1806.03822
[34]
Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 conference on empirical methods in natural language processing. 248–256.
[35]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992. https://doi.org/10.18653/v1/D19-1410
[36]
Malik Sallam. 2023. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 11, 6 (2023). https://doi.org/10.3390/healthcare11060887
[37]
SentenceTransformers. 2022. sentence-transformers/all-mpnet-base-v2 · Hugging Face. https://huggingface.co/sentence-transformers/all-mpnet-base-v2
[38]
SentenceTransformers. 2023. sentence-transformers/all-MiniLM-L6-v2 · Hugging Face. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
[39]
SentenceTransformers. 2023. SentenceTransformers Documentation. https://www.sbert.net/index.html
[40]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
[41]
Elaine Zosa, Lidia Pivovarova, Michele Boggia, and Sardana Ivanova. 2022. Multilingual Topic Labelling of News Topics Using Ontological Mapping. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 248–256.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023
December 2023
270 pages
ISBN:9798400701412
DOI:10.1145/3587259
  • Editors:
  • Brent Venable,
  • Daniel Garijo,
  • Brian Jalaian
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conversational model
  2. probabilistic topic labelling
  3. topic label
  4. topic label generation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Knowledge Spaces

Conference

K-CAP '23
Sponsor:
K-CAP '23: Knowledge Capture Conference 2023
December 5 - 7, 2023
FL, Pensacola, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 70
    Total Downloads
  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media