Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3459637.3482398acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Topic Modeling for Multi-Aspect Listwise Comparisons

Published: 30 October 2021 Publication History

Abstract

As a well-established probabilistic method, topic models seek to uncover latent semantics from plain text. In addition to having textual content, we observe that documents are usually compared in listwise rankings based on their content. For instance, world-wide countries are compared in an international ranking in terms of electricity production based on their national reports. Such document comparisons constitute additional information that reveal documents' relative similarities. Incorporating them into topic modeling could yield comparative topics that help to differentiate and rank documents. Furthermore, based on different comparison criteria, the observed document comparisons usually cover multiple aspects, each expressing a distinct ranked list. For example, a country may be ranked higher in terms of electricity production, but fall behind others in terms of life expectancy or government budget. Each comparison criterion, or aspect, observes a distinct ranking. Considering such multiple aspects of comparisons based on different ranking criteria allows us to derive one set of topics that inform heterogeneous document similarities. We propose a generative topic model aimed at learning topics that are well aligned to multi-aspect listwise comparisons. Experiments on public datasets demonstrate the advantage of the proposed method in jointly modeling topics and ranked lists against baselines comprehensively.

References

[1]
Hossein Azari Soufiani, William Ziwei Chen, David C Parkes, and Lirong Xia. Generalized method-of-moments for rank aggregation. In Advances in Neural Information Processing Systems. Neural Information Processing Systems Foundation, Inc., 2013.
[2]
Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
[3]
David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993--1022, 2003.
[4]
Gerlof Bouma. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL, pages 31--40, 2009.
[5]
Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324--345, 1952.
[6]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89--96, 2005.
[7]
Sophie Burkhardt and Stefan Kramer. Decoupling sparsity and smoothness in the dirichlet variational autoencoder topic model. Journal of Machine Learning Research, 20(131):1--27, 2019.
[8]
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129--136, 2007.
[9]
Francois Caron, Yee Whye Teh, Thomas Brendan Murphy, et al. Bayesian nonparametric plackett--luce models for the analysis of preferences for college degree programmes. Annals of Applied Statistics, 8(2):1145--1181, 2014.
[10]
Jonathan Chang and David Blei. Relational topic models for document networks. In Artificial Intelligence and Statistics, pages 81--88, 2009.
[11]
Yu Chen and Mohammed J Zaki. Kate: K-competitive autoencoder for text. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 85--94, 2017.
[12]
Adji B Dieng, Francisco JR Ruiz, and David M Blei. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics, 8:439--453, 2020.
[13]
Cynthia Dwork, Ravi Kumar, Moni Naor, and Dandapani Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th international conference on World Wide Web, pages 613--622, 2001.
[14]
Ronald Fagin, Ravi Kumar, and Dakshinamurthi Sivakumar. Comparing top k lists. SIAM Journal on discrete mathematics, 17(1):134--160, 2003.
[15]
Isobel Claire Gormley and Thomas Brendan Murphy. Exploring voting blocs within the irish electorate: A mixture modeling approach. Journal of the American Statistical Association, 103(483):1014--1027, 2008.
[16]
John Guiver and Edward Snelson. Bayesian inference for plackett-luce ranking models. In proceedings of the 26th annual international conference on machine learning, pages 377--384, 2009.
[17]
Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. An unsupervised neural attention model for aspect extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 388--397, 2017.
[18]
Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57, 1999.
[19]
Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142, 2002.
[20]
Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, and Roy Schwartz. A dataset of peer reviews (peerread): Collection, insights and nlp applications. arXiv preprint arXiv:1804.09635, 2018.
[21]
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[22]
Simon Lacoste-Julien, Fei Sha, and Michael Jordan. Disclda: Discriminative learning for dimensionality reduction and classification. Advances in neural information processing systems, 21:897--904, 2008.
[23]
Chenghua Lin and Yulan He. Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 375--384, 2009.
[24]
Dong C Liu and Jorge Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1--3):503--528, 1989.
[25]
Tie-Yan Liu. Learning to rank for information retrieval. Springer Science & Business Media, 2011.
[26]
R Duncan Luce. Individual choice behavior: A theoretical analysis Courier Corporation, 2012.
[27]
Alireza Makhzani and Brendan Frey. K-sparse autoencoders. arXiv preprint arXiv:1312.5663, 2013.
[28]
Colin L Mallows. Non-null ranking models. i. Biometrika, 44(1/2):114--130, 1957.
[29]
Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172, 2013.
[30]
Jon Mcauliffe and David Blei. Supervised topic models. Advances in neural information processing systems, 20:121--128, 2007.
[31]
Michael Paul. Cross-collection topic models: Automatically comparing and contrasting text. Urbana, 51:61801, 2009.
[32]
Robin L Plackett. The analysis of permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2):193--202, 1975.
[33]
Md Mustafizur Rahman and Hongning Wang. Hidden topic sentiment model. In Proceedings of the 25th International Conference on World Wide Web, pages 155--165, 2016.
[34]
Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 conference on empirical methods in natural language processing, pages 248--256, 2009.
[35]
Daniel Ramage, Christopher D Manning, and Susan Dumais. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457--465, 2011.
[36]
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J Mooney. Spherical topic models. In ICML, 2010.
[37]
Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Icml, 2011.
[38]
Akash Srivastava and Charles Sutton. Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488, 2017.
[39]
Maksim Tkachenko and Hady W Lauw. Plackett-luce regression mixture model for heterogeneous rankings. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 237--246, 2016.
[40]
Maksim Tkachenko and Hady W Lauw. Comparelda: A topic model for document comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7112--7119, 2019.
[41]
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12), 2010.
[42]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. The Journal of Machine Learning Research, 11:2837--2854, 2010.
[43]
Maksims N Volkovs, Hugo Larochelle, and Richard S Zemel. Learning to rank by aggregating expert preferences. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 843--851, 2012.
[44]
Maksims N Volkovs and Richard S Zemel. A flexible generative model for preference aggregation. In Proceedings of the 21st international conference on World Wide Web, pages 479--488, 2012.
[45]
Chong Wang and David M Blei. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 448--456, 2011.
[46]
Hongning Wang, Yue Lu, and Chengxiang Zhai. Latent aspect rating analysis on review text data: a rating regression approach. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 783--792, 2010.
[47]
Hongning Wang, Yue Lu, and ChengXiang Zhai. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 618--626, 2011.
[48]
Cheng Xiang Zhai, Atulya Velivelli, and Bei Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 743--748, 2004.
[49]
Ce Zhang and Hady W Lauw. Topic modeling on document networks with adjacent-encoder. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6737--6745, 2020.
[50]
Delvin Ce Zhang and Hady W Lauw. Semi-supervised semantic visualization for networked documents. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2021.
[51]
Zhibing Zhao, Peter Piech, and Lirong Xia. Learning mixtures of plackett-luce models. In International Conference on Machine Learning, pages 2906--2914. PMLR, 2016.
[52]
Jun Zhu, Amr Ahmed, and Eric P Xing. Medlda: maximum margin supervised topic models. the Journal of machine Learning research, 13(1):2237--2278, 2012.

Cited By

View all
  • (2024)Text-Attributed Graph Representation Learning: Methods, Applications, and ChallengesCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641255(1298-1301)Online publication date: 13-May-2024
  • (2023)Topic Modeling on Document Networks With Dirichlet Optimal Transport BarycenterIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330346536:3(1328-1340)Online publication date: 9-Aug-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. comparative documents
  2. generative topic model
  3. text mining

Qualifiers

  • Research-article

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Text-Attributed Graph Representation Learning: Methods, Applications, and ChallengesCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641255(1298-1301)Online publication date: 13-May-2024
  • (2023)Topic Modeling on Document Networks With Dirichlet Optimal Transport BarycenterIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330346536:3(1328-1340)Online publication date: 9-Aug-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media