Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2888451.2888453acmotherconferencesArticle/Chapter ViewAbstractPublication PagesikddConference Proceedingsconference-collections
research-article

SocialStories: Segmenting Stories within Trending Twitter Topics

Published: 13 March 2016 Publication History

Abstract

This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.

References

[1]
S. Ahmed and M. M. Skoric. My name is khan: The use of twitter in the campaign for 2013 pakistan general election. In System Sciences (HICSS), 2014 47th Hawaii International Conference on, pages 2242--2251. IEEE, 2014.
[2]
J. Allan. Topic detection and tracking: event-based information organization, volume 12. Springer Science & Business Media, 2002.
[3]
F. Alvanaki, S. Michel, K. Ramamritham, and G. Weikum. See what's enblogue: real-time emergent topic identification in social media. In Proceedings of the 15th International Conference on Extending Database Technology, pages 336--347. ACM, 2012.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.
[5]
M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining, page 4. ACM, 2010.
[6]
C. de Mazancourt and U. Dieckmann. TradeâĂŘoff geometries and frequencyâĂŘdependent selection. The American Naturalist, 164(6):765--778, 2004.
[7]
A. Guille. Diffusion de l'information dans les médias sociaux: modélisation et analyse. PhD thesis, Université Lumière Lyon 2, 2014.
[8]
S. P. Kasiviswanathan, P. Melville, A. Banerjee, and V. Sindhwani. Emerging topic detection using dictionary learning. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 745--754. ACM, 2011.
[9]
H. Koga and T. Taniguchi. Developing a user recommendation engine on twitter using estimated latent topics. In Human-Computer Interaction. Design and Development Approaches, pages 461--470. Springer, 2011.
[10]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
[11]
M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 1155--1158. ACM, 2010.
[12]
R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 889--892. ACM, 2013.
[13]
X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th international conference on World Wide Web, pages 91--100. ACM, 2008.
[14]
H. Sayyadi, M. Hurst, and A. Maykov. Event detection and tracking in social streams. In ICWSM, 2009.
[15]
C. Shirky. It is not information overload. it is filter failure. 2008.
[16]
J. Weng and B.-S. Lee. Event detection in twitter. ICWSM, 11:401--408, 2011.
[17]
W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338--349. Springer, 2011.

Cited By

View all
  • (2024)A Literature Review on Detecting, Verifying, and Mitigating Online MisinformationIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.328903111:4(5119-5145)Online publication date: Aug-2024
  • (2021)Detection of Rumors in Tweets Using Machine Learning TechniquesAdvances in Automation, Signal Processing, Instrumentation, and Control10.1007/978-981-15-8221-9_289(3095-3111)Online publication date: 5-Mar-2021
  • (2018)Detection and Resolution of Rumours in Social MediaACM Computing Surveys10.1145/316160351:2(1-36)Online publication date: 20-Feb-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS '16: Proceedings of the 3rd IKDD Conference on Data Science, 2016
March 2016
122 pages
ISBN:9781450342179
DOI:10.1145/2888451
  • General Chairs:
  • Madhav Marathe,
  • Mukesh Mohania,
  • Program Chairs:
  • Mausam,
  • Prateek Jain
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • ACM India: ACM India

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. Information Retrieval
  3. NLP
  4. Text Classification
  5. Topic Detection
  6. Twitter
  7. document categorization
  8. social media
  9. story
  10. text analysis
  11. topic labelling
  12. trending topics
  13. tweet stories

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CODS '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Literature Review on Detecting, Verifying, and Mitigating Online MisinformationIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.328903111:4(5119-5145)Online publication date: Aug-2024
  • (2021)Detection of Rumors in Tweets Using Machine Learning TechniquesAdvances in Automation, Signal Processing, Instrumentation, and Control10.1007/978-981-15-8221-9_289(3095-3111)Online publication date: 5-Mar-2021
  • (2018)Detection and Resolution of Rumours in Social MediaACM Computing Surveys10.1145/316160351:2(1-36)Online publication date: 20-Feb-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media