research-article

Discovering voter preferences in blogs using mixtures of topic models

Authors:

Rohini Srihari,

Smruthi MukundAuthors Info & Claims

AND '09: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data

Pages 85 - 92

https://doi.org/10.1145/1568296.1568311

Published: 23 July 2009 Publication History

Abstract

In this paper we propose a new approach to capture the inclination towards a certain election candidate from the contents of blogs and to explain why that inclination may be so. The method is based on the availability of "ground truth" speeches from the election candidates that are labeled and also on the collection of noisy blogs which are not labeled in any way. In this unsupervised learning scenario, we used probabilistic topic models to cluster the ground truth documents for each candidate into different underlying latent themes. The same topic models were then applied on the blog collection and the "orientation" of each of the blogs with different themes of the election candidate speeches was performed using KL divergence of the topic distribution over the overlapping vocabularies. We used four models for such theme matching, one with a baseline topic model and the other three by weighting the baseline topic model with the positive, negative and the neutral sentiments of the topics. We then used a collaborative objective function to combine the outcome of candidate preference for the blogs under the four models using an Expectation Maximization algorithm. The novelty of our method is highlighted in its use of unannotated data as well as in the combination of the views of the different "experts" explaining the same phenomenon.

References

[1]

Lada Adamic and Natalie Glace. The political blogosphere and the 2004 u.s. election: divided they blog. In LinkKDD '05: Proceedings of the 3rd international workshop on Link discovery, pages 36--43, 2005.

Digital Library

[2]

David Blei and John Lafferty. Correlated topic models. In Advances in Neural Information Processing Systems, volume 18, 2005.

[3]

David Blei and Jon McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems, volume 20, 2008.

[4]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3: 993--1022, 2003.

Digital Library

[5]

Eric Breck, Yejin Choi, and Claire Cardie. Identifying expressions of opinion in context. In Twentieth International Joint Conference on Artificial Intelligence, 2007.

Digital Library

[6]

Yejin Choi and Claire Cardie. Learning with compositional semantics as structural inference for subsentential sentiment analysis. In Empirical Methods in Natural Language Processing (EMNLP), 2008.

Digital Library

[7]

Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1--38, 1977.

[8]

Kathleen Durant and Michael Smith. Mining sentiment classification from political web logs. In Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (WebKDD-2006), 2006.

[9]

Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC, pages 417--422, 2006.

[10]

ICWSM. Icwsm 2009 spinn3r dataset. In Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM 2009), San Jose, CA, May 2009.

[11]

Michael I. Jordan. Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6: 181--214, 1994.

Digital Library

[12]

Frank Lin and William W. Cohen. The multirank bootstrap algorithm: Semi-supervised political blog classification and ranking using semi-supervised link classification. In ICWSM'08 Poster, 2008.

[13]

Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and Chengxiang Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In In Proc. of the 16th Int. Conference on World Wide Web, pages 171--180, 2007.

Digital Library

[14]

Tony Mullen and Robert Malouf. A preliminary investigation into sentiment analysis of informal political discourse. In Proceedings of the AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.

[15]

Tae Yano, William W. Cohen, and Noah A. Smith. Predicting response to political blog posts with topic models. In Proceedings of NAACL HLT, page TBD, 2009.

Digital Library

Cited By

Yano TSmith NWilkerson JChu-Carroll J(2012)Textual predictors of bill survival in congressional committeesProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382157(793-802)Online publication date: 3-Jun-2012
https://dl.acm.org/doi/10.5555/2382029.2382157
Soo‐Guan Khoo CNourbakhsh ANa J(2012)Sentiment analysis of online news text: a case study of appraisal theoryOnline Information Review10.1108/1468452121128793636:6(858-878)Online publication date: 23-Nov-2012
https://doi.org/10.1108/14684521211287936

Recommendations

Topic sentiment mixture: modeling facets and opinions in weblogs
WWW '07: Proceedings of the 16th international conference on World Wide Web

In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent ...
Modeling online reviews with multi-grain topic models
WWW '08: Proceedings of the 17th international conference on World Wide Web

In this paper we present a novel framework for extracting the ratable aspects of objects from online user reviews. Extracting such aspects is an important challenge in automatically mining product opinions from the web and in generating opinion-based ...
Topic model tutorial: A basic introduction on latent dirichlet allocation and extensions for web scientists
WebSci '16: Proceedings of the 8th ACM Conference on Web Science

In this tutorial, we teach the intuition and the assumptions behind topic models. Topic models explain the co-occurrences of words in documents by extracting sets of semantically related words, called topics. These topics are semantically coherent and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AND '09: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data

July 2009

127 pages

ISBN:9781605584966

DOI:10.1145/1568296

Program Chairs:
Daniel Lopresti
Lehigh University
,
Shourya Roy
Xerox India Innovation Hub
,
Klaus Schulz
University of Munich
,
L. Venkata Subramaniam
IBM India Research Lab

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AND '09

AND '09: Third Workshop on Analytics for Noisy Unstructured Text Data

July 23 - 24, 2009

Barcelona, Spain

Acceptance Rates

AND '09 Paper Acceptance Rate 15 of 22 submissions, 68%;

Overall Acceptance Rate 15 of 22 submissions, 68%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
267
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yano TSmith NWilkerson JChu-Carroll J(2012)Textual predictors of bill survival in congressional committeesProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382157(793-802)Online publication date: 3-Jun-2012
https://dl.acm.org/doi/10.5555/2382029.2382157
Soo‐Guan Khoo CNourbakhsh ANa J(2012)Sentiment analysis of online news text: a case study of appraisal theoryOnline Information Review10.1108/1468452121128793636:6(858-878)Online publication date: 23-Nov-2012
https://doi.org/10.1108/14684521211287936

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents