Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2818869.2818877acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesase-bigdataConference Proceedingsconference-collections
research-article

Novel Topic Diffusion Prediction using Latent Semantic and User Behavior

Published: 07 October 2015 Publication History

Abstract

Predicting diffusions on big social media data using natural language processing (NLP) and social network analysis (SNA) techniques is an emerging research domain. To predict diffusions of novel topics, previous studies focus on predicting the diffusions on cross-topic-observed diffusions (the diffusions between the source and target user of the diffusion are not observed for the topic to be predicted, but still observed for other topics). However, in real world social network, many diffusions to be predicted are actually unobserved. For example, the diffusions may be unseen (the diffusions between the source and target user of the diffusion are not observed in training data), or even with silence users (one or both of the users of the diffusion never participate a diffusion before). In this paper, we generalize the diffusion prediction on novel topic problem to predict both cross-topic-observed and unobserved diffusions, which is very challenging because of lacking previous diffusion records. We design a learning-based framework to solve the problem. Leveraging NLP and SNA techniques to deal with such Big Data, we exploit the latent semantic derived from diverse information sources (e.g., user, topic, user-topic, and topological), and utilize the idea that "users with the same attribute value tend to have similar behavior for similar topics", to extract features for prediction. Our framework is evaluated on real-world microblog data, and the experiments show that we can achieve 73% AUC in this difficult prediction task. Our dataset is also publicly available at http://mslab.csie.ntu.edu.tw/~tim/ase_big_data_2015.zip.

References

[1]
David M. Blei, Andrew Y. Ng & Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research (JMLR), 3.993--1022.
[2]
Jesse Davis & Mark Goadrich. 2006. The Relationship Between Precision-Recall and ROC Curves. 23rd International Conference on Machine Learning (ICML).
[3]
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang & Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research (JMLR), 9.1871--74.
[4]
Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic & Wolfgang Kellerer. 2010. Outtweeting the Twitterers - Predicting Information Cascades in Microblogs. 3rd Conference on Online Social Networks.
[5]
Hightman. 2012. Simple Chinese Words Segmentation (SCWS).
[6]
David Kempe, Jon Kleinberg & Eva Tardos. 2003. Maximizing the Spread of Influence Through a Social Network. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).
[7]
Tsung-Ting Kuo, San-Chuan Hung, Wei-Shih Lin, Shou-De Lin, Ting-Chun Peng & Chia-Chun Shih. 2011. Assessing the Quality of Diffusion Models Using Real-World Social Network Data. 2011 International Conference on Technologies and Applications of Artificial Intelligence (TAAI).
[8]
Tsung-Ting Kuo, San-Chuan Hung, Wei-Shih Lin, Nanyun Peng, Shou-De Lin & Wei-Fen Lin. 2012. Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks. 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2.
[9]
Cheng-Te Li, Hsun-Ping Hsieh, Tsung-Ting Kuo & Shou-De Lin. 2014. Opinion Diffusion and Analysis on Social Networks. Encyclopedia of Social Network Analysis and Mining, ed. by R. Alhajj & J. Rokne, 1212--18: Springer New York.
[10]
Dazhen Lin & Donglin Cao. 2014. Blog Topic Diffusion Prediction Model Based on Link Information Flow. Knowledge Engineering and Management, ed. by Z. Wen & T. Li, 73--81: Springer Berlin Heidelberg.
[11]
Hao Ma, Haixuan Yang, Michael R. Lyu & Irwin King. 2008. Mining Social Networks Using Heat Diffusion Processes for Marketing Candidates Selection. 17th ACM International Conference on Information and Knowledge Management (CIKM).
[12]
Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit.
[13]
Saša Petrovic, Miles Osborne & Victor Lavrenko. 2011. Rt to Win! Predicting Message Propagation in Twitter. 5th International AAAI Conference on Weblogs and Social Media (ICWSM).
[14]
Xuan-Hieu Phan & Cam-Tu Nguyen. 2007. GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA).
[15]
Kazumi Saito, Masahiro Kimura, Kouzou Ohara & Hiroshi Motoda. 2015. Super mediator -- A new centrality measure of node importance for information diffusion over social network. Information Sciences.
[16]
Paulo Shakarian, Matthias Broecheler, V. S. Subrahmanian & Cristian Molinaro. 2013. Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems. ACM Trans. Comput. Logic, 14.1--40.
[17]
Devesh Varshney, Sandeep Kumar & Vineet Gupta. 2014. Modeling Information Diffusion in Social Networks Using Latent Topic Information. Intelligent Computing Theory, ed. by D.-S. Huang, V. Bevilacqua & P. Premaratne, 137--48: Springer International Publishing.
[18]
Jiang Zhu, Fei Xiong, Dongzhen Piao, Yun Liu & Ying Zhang. 2011. Statistically Modeling the Effectiveness of Disaster Information in Social Media. Proceedings of the 2011 IEEE Global Humanitarian Technology Conference, 2011.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ASE BD&SI '15: Proceedings of the ASE BigData & SocialInformatics 2015
October 2015
381 pages
ISBN:9781450337359
DOI:10.1145/2818869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data
  2. Big Data Analytics
  3. Big Data Mining
  4. Data Mining
  5. Machine Learning
  6. Natural Language Processing
  7. Social Network Analysis

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ASE BD&SI '15
ASE BD&SI '15: ASE BigData & SocialInformatics 2015
October 7 - 9, 2015
Kaohsiung, Taiwan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 165
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media