research-article

Free access

Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

Authors:

Daniel Ramage,

David Hall,

Ramesh Nallapati,

Christopher D. ManningAuthors Info & Claims

EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Pages 248 - 256

Published: 06 August 2009 Publication History

PDF eReader

Abstract

A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDA's latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA's improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA outperforms SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative baseline on a variety of datasets.

References

[1]

D. M. Blei and J. Lafferty. 2006. Correlated Topic Models. NIPS, 18:147.

Google Scholar

[2]

D. Blei and J McAuliffe. 2007. Supervised Topic Models. In NIPS, volume 21.

Google Scholar

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. JMLR.

Digital Library

Google Scholar

[4]

T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. PNAS, 1:5228--35.

Crossref

Google Scholar

[5]

P. Heymann, G. Koutrika, and H. Garcia-Molina. 2008. Can social bookmarking improve web search. In WSDM.

Digital Library

Google Scholar

[6]

S. Ji, L. Tang, S. Yu, and J. Ye. 2008. Extracting shared subspace for multi-label classification. In KDD, pages 381--389, New York, NY, USA. ACM.

Digital Library

Google Scholar

[7]

H. Kazawa, H. Taira T. Izumitani, and E. Maeda. 2004. Maximal margin labeling for multi-topic text categorization. In NIPS.

Google Scholar

[8]

S. Lacoste-Julien, F. Sha, and M. I. Jordan. 2008. DiscLDA: Discriminative learning for dimensionality reduction and classification. In NIPS, volume 22.

Google Scholar

[9]

D. D. Lewis, Y. Yang, T. G. Rose, G. Dietterich, F. Li, and F. Li. 2004. RCV1: A new benchmark collection for text categorization research. JMLR, 5:361--397.

Digital Library

Google Scholar

[10]

Wei Li and Andrew McCallum. 2006. Pachinko allocation: Dag-structured mixture models of topic correlations. In International conference on Machine learning, pages 577--584.

Digital Library

Google Scholar

[11]

A. McCallum and K. Nigam. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, volume 7.

Google Scholar

[12]

Q. Mei, X. Shen, and C Zhai. 2007. Automatic labeling of multinomial topic models. In KDD.

Digital Library

Google Scholar

[13]

D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina. 2009. Clustering the tagged web. In WSDM.

Digital Library

Google Scholar

[14]

N. Ueda and K. Saito. 2003. Parametric mixture models for multi-labeled text includes models that can be seen to fit within a dimensionality reduction framework. In NIPS.

Google Scholar

Cited By

View all

Tang YHuang HShi XMao XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Beyond Labels and Topics: Discovering Causal Relationships in Neural Topic ModelingProceedings of the ACM Web Conference 202410.1145/3589334.3645715(4460-4469)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645715
Zhang JLi XWang L(2023)A Review Selection Method Based on Consumer Decision Phases in E-commerceACM Transactions on Information Systems10.1145/358726542:1(1-27)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3587265
Cabral ERezaeipourfarsangi SOliveira MMilios EMinghim R(2023)Addressing the gap between current language models and key-term-based clusteringProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604900(1-10)Online publication date: 22-Aug-2023
https://dl.acm.org/doi/10.1145/3573128.3604900
Show More Cited By

Index Terms

Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora
1. Computing methodologies

Recommendations

Hetero-labeled LDA: a partially supervised topic model with heterogeneous labels
ECMLPKDD'14: Proceedings of the 2014th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

We propose Hetero-Labeled LDA (hLLDA), a novel semi-supervised topic model, which can learn from multiple types of labels such as document labels and feature labels (i.e., heterogeneous labels), and also accommodate labels for only a subset of classes (...
ADM-LDA: An aspect detection model based on topic modelling using the structure of review sentences

Probabilistic topic models are statistical methods whose aim is to discover the latent structure in a large collection of documents. The intuition behind topic models is that, by generating documents by latent topics, the word distribution for each ...
Hetero-Labeled LDA: A Partially Supervised Topic Model with Heterogeneous Labels
Machine Learning and Knowledge Discovery in Databases
Abstract
We propose Hetero-Labeled LDA (hLLDA), a novel semi-supervised topic model, which can learn from multiple types of labels such as document labels and feature labels (i.e., heterogeneous labels), and also accommodate labels for only a subset of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

August 2009

505 pages

ISBN:9781932432596

Program Chairs:
Philipp Koehn
University of Edinburgh
,
Rada Mihalcea
University of North Texas

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 August 2009

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

225
Total Citations
View Citations
6,840
Total Downloads

Downloads (Last 12 months)108
Downloads (Last 6 weeks)17

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tang YHuang HShi XMao XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Beyond Labels and Topics: Discovering Causal Relationships in Neural Topic ModelingProceedings of the ACM Web Conference 202410.1145/3589334.3645715(4460-4469)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645715
Zhang JLi XWang L(2023)A Review Selection Method Based on Consumer Decision Phases in E-commerceACM Transactions on Information Systems10.1145/358726542:1(1-27)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3587265
Cabral ERezaeipourfarsangi SOliveira MMilios EMinghim R(2023)Addressing the gap between current language models and key-term-based clusteringProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604900(1-10)Online publication date: 22-Aug-2023
https://dl.acm.org/doi/10.1145/3573128.3604900
Ishii MMori KKuwana RMatsuura S(2022)Multi-label Classification of Cybersecurity Text with Distant SupervisionProceedings of the 17th International Conference on Availability, Reliability and Security10.1145/3538969.3543795(1-9)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3538969.3543795
Harandizadeh BPriniski JMorstatter FSelcuk Candan KLiu HAkoglu LLuna Dong XTang J(2022)Keyword Assisted Embedded Topic ModelProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498518(372-380)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1145/3488560.3498518
Kang DPark YChari S(2022)Hetero-Labeled LDA: A Partially Supervised Topic Model with Heterogeneous LabelsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_41(640-655)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44848-9_41
Latorre JAmores J(2021)Topic modelling of racist and xenophobic YouTube comments. Analyzing hate speech against migrants and refugees spread through YouTube in SpanishNinth International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM'21)10.1145/3486011.3486494(456-460)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3486011.3486494
Huang HYoo SXu CShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Deep Clustering based on Bi-Space Association LearningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475270(12-21)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475270
Chauhan UShah A(2021)Topic Modeling Using Latent Dirichlet allocationACM Computing Surveys10.1145/346247854:7(1-35)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1145/3462478
Zhang DLauw HDemartini GZuccon GCulpepper JHuang ZTong H(2021)Topic Modeling for Multi-Aspect Listwise ComparisonsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482398(2507-2516)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482398
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Hetero-labeled LDA: a partially supervised topic model with heterogeneous labels

ADM-LDA: An aspect detection model based on topic modelling using the structure of review sentences

Hetero-Labeled LDA: A Partially Supervised Topic Model with Heterogeneous Labels

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations