research-article

Topic Modeling for Multi-Aspect Listwise Comparisons

Authors:

Delvin Ce Zhang,

Hady W. LauwAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 2507 - 2516

https://doi.org/10.1145/3459637.3482398

Published: 30 October 2021 Publication History

Abstract

As a well-established probabilistic method, topic models seek to uncover latent semantics from plain text. In addition to having textual content, we observe that documents are usually compared in listwise rankings based on their content. For instance, world-wide countries are compared in an international ranking in terms of electricity production based on their national reports. Such document comparisons constitute additional information that reveal documents' relative similarities. Incorporating them into topic modeling could yield comparative topics that help to differentiate and rank documents. Furthermore, based on different comparison criteria, the observed document comparisons usually cover multiple aspects, each expressing a distinct ranked list. For example, a country may be ranked higher in terms of electricity production, but fall behind others in terms of life expectancy or government budget. Each comparison criterion, or aspect, observes a distinct ranking. Considering such multiple aspects of comparisons based on different ranking criteria allows us to derive one set of topics that inform heterogeneous document similarities. We propose a generative topic model aimed at learning topics that are well aligned to multi-aspect listwise comparisons. Experiments on public datasets demonstrate the advantage of the proposed method in jointly modeling topics and ranked lists against baselines comprehensively.

References

[1]

Hossein Azari Soufiani, William Ziwei Chen, David C Parkes, and Lirong Xia. Generalized method-of-moments for rank aggregation. In Advances in Neural Information Processing Systems. Neural Information Processing Systems Foundation, Inc., 2013.

Digital Library

[2]

Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.

Digital Library

[3]

David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993--1022, 2003.

Digital Library

[4]

Gerlof Bouma. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL, pages 31--40, 2009.

[5]

Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324--345, 1952.

[6]

Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89--96, 2005.

Digital Library

[7]

Sophie Burkhardt and Stefan Kramer. Decoupling sparsity and smoothness in the dirichlet variational autoencoder topic model. Journal of Machine Learning Research, 20(131):1--27, 2019.

[8]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129--136, 2007.

Digital Library

[9]

Francois Caron, Yee Whye Teh, Thomas Brendan Murphy, et al. Bayesian nonparametric plackett--luce models for the analysis of preferences for college degree programmes. Annals of Applied Statistics, 8(2):1145--1181, 2014.

[10]

Jonathan Chang and David Blei. Relational topic models for document networks. In Artificial Intelligence and Statistics, pages 81--88, 2009.

[11]

Yu Chen and Mohammed J Zaki. Kate: K-competitive autoencoder for text. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 85--94, 2017.

Digital Library

[12]

Adji B Dieng, Francisco JR Ruiz, and David M Blei. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics, 8:439--453, 2020.

[13]

Cynthia Dwork, Ravi Kumar, Moni Naor, and Dandapani Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th international conference on World Wide Web, pages 613--622, 2001.

Digital Library

[14]

Ronald Fagin, Ravi Kumar, and Dakshinamurthi Sivakumar. Comparing top k lists. SIAM Journal on discrete mathematics, 17(1):134--160, 2003.

Digital Library

[15]

Isobel Claire Gormley and Thomas Brendan Murphy. Exploring voting blocs within the irish electorate: A mixture modeling approach. Journal of the American Statistical Association, 103(483):1014--1027, 2008.

[16]

John Guiver and Edward Snelson. Bayesian inference for plackett-luce ranking models. In proceedings of the 26th annual international conference on machine learning, pages 377--384, 2009.

Digital Library

[17]

Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. An unsupervised neural attention model for aspect extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 388--397, 2017.

[18]

Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57, 1999.

Digital Library

[19]

Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142, 2002.

Digital Library

[20]

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, and Roy Schwartz. A dataset of peer reviews (peerread): Collection, insights and nlp applications. arXiv preprint arXiv:1804.09635, 2018.

[21]

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[22]

Simon Lacoste-Julien, Fei Sha, and Michael Jordan. Disclda: Discriminative learning for dimensionality reduction and classification. Advances in neural information processing systems, 21:897--904, 2008.

Digital Library

[23]

Chenghua Lin and Yulan He. Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 375--384, 2009.

Digital Library

[24]

Dong C Liu and Jorge Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1--3):503--528, 1989.

Digital Library

[25]

Tie-Yan Liu. Learning to rank for information retrieval. Springer Science & Business Media, 2011.

[26]

R Duncan Luce. Individual choice behavior: A theoretical analysis Courier Corporation, 2012.

[27]

Alireza Makhzani and Brendan Frey. K-sparse autoencoders. arXiv preprint arXiv:1312.5663, 2013.

[28]

Colin L Mallows. Non-null ranking models. i. Biometrika, 44(1/2):114--130, 1957.

[29]

Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172, 2013.

Digital Library

[30]

Jon Mcauliffe and David Blei. Supervised topic models. Advances in neural information processing systems, 20:121--128, 2007.

Digital Library

[31]

Michael Paul. Cross-collection topic models: Automatically comparing and contrasting text. Urbana, 51:61801, 2009.

[32]

Robin L Plackett. The analysis of permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2):193--202, 1975.

[33]

Md Mustafizur Rahman and Hongning Wang. Hidden topic sentiment model. In Proceedings of the 25th International Conference on World Wide Web, pages 155--165, 2016.

Digital Library

[34]

Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 conference on empirical methods in natural language processing, pages 248--256, 2009.

Digital Library

[35]

Daniel Ramage, Christopher D Manning, and Susan Dumais. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457--465, 2011.

Digital Library

[36]

Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J Mooney. Spherical topic models. In ICML, 2010.

Digital Library

[37]

Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Icml, 2011.

Digital Library

[38]

Akash Srivastava and Charles Sutton. Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488, 2017.

[39]

Maksim Tkachenko and Hady W Lauw. Plackett-luce regression mixture model for heterogeneous rankings. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 237--246, 2016.

Digital Library

[40]

Maksim Tkachenko and Hady W Lauw. Comparelda: A topic model for document comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7112--7119, 2019.

[41]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12), 2010.

Digital Library

[42]

Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. The Journal of Machine Learning Research, 11:2837--2854, 2010.

Digital Library

[43]

Maksims N Volkovs, Hugo Larochelle, and Richard S Zemel. Learning to rank by aggregating expert preferences. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 843--851, 2012.

Digital Library

[44]

Maksims N Volkovs and Richard S Zemel. A flexible generative model for preference aggregation. In Proceedings of the 21st international conference on World Wide Web, pages 479--488, 2012.

Digital Library

[45]

Chong Wang and David M Blei. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 448--456, 2011.

Digital Library

[46]

Hongning Wang, Yue Lu, and Chengxiang Zhai. Latent aspect rating analysis on review text data: a rating regression approach. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 783--792, 2010.

Digital Library

[47]

Hongning Wang, Yue Lu, and ChengXiang Zhai. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 618--626, 2011.

Digital Library

[48]

Cheng Xiang Zhai, Atulya Velivelli, and Bei Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 743--748, 2004.

Digital Library

[49]

Ce Zhang and Hady W Lauw. Topic modeling on document networks with adjacent-encoder. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6737--6745, 2020.

[50]

Delvin Ce Zhang and Hady W Lauw. Semi-supervised semantic visualization for networked documents. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2021.

Digital Library

[51]

Zhibing Zhao, Peter Piech, and Lirong Xia. Learning mixtures of plackett-luce models. In International Conference on Machine Learning, pages 2906--2914. PMLR, 2016.

Digital Library

[52]

Jun Zhu, Amr Ahmed, and Eric P Xing. Medlda: maximum margin supervised topic models. the Journal of machine Learning research, 13(1):2237--2278, 2012.

Digital Library

Cited By

Zhang DYang MYing RLauw HChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Text-Attributed Graph Representation Learning: Methods, Applications, and ChallengesCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641255(1298-1301)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641255
Zhang DLauw H(2023)Topic Modeling on Document Networks With Dirichlet Optimal Transport BarycenterIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330346536:3(1328-1340)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1109/TKDE.2023.3303465

Index Terms

Topic Modeling for Multi-Aspect Listwise Comparisons
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Document topic models
    2. Retrieval models and ranking
      1. Learning to rank
  2. World Wide Web
    1. Web searching and information discovery
      1. Content ranking

Recommendations

Topic aspect analysis for multi-document summarization
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Query-based multi-document summarization aims to create a short summary given a collection of documents and a query. Most of the existing methods treat the query as one single sentence and rank the sentences in the documents based on their similarities ...
Targeted aspects oriented topic modeling for short texts
Abstract
Topic modeling has demonstrated its value in short text topic discovery. For this task, a common way adopted by many topic models is to perform a full analysis to find all the possible topics. However, these topic models overlook the importance of ...
Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon

User-generated reviews on the Web reflect users' sentiment about products, services and social events. Existing researches mostly focus on the sentiment classification of the product and service reviews in document level. Reviews of social events such ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
123
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang DYang MYing RLauw HChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Text-Attributed Graph Representation Learning: Methods, Applications, and ChallengesCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641255(1298-1301)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641255
Zhang DLauw H(2023)Topic Modeling on Document Networks With Dirichlet Optimal Transport BarycenterIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330346536:3(1328-1340)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1109/TKDE.2023.3303465

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents