Abstract
Twitter has become, over the last years, a major source of information. Twitter enables its users to send and read short text-based messages called tweets. Users are busy reporting news about what’s going around and within their personal. Numerous researchers from various disciplines have examined Twitter, due to the heterogeneity and immense scale of the data. One of the challenging problems is to automatically identify trending topics in real time on Twitter. Trending topics detection in real time is, thus, of high value to journalists, news reporters, analysts, e-marketing specialists, real-time application developers, and social media researchers to understand what is happening, what emergent trending topics are exchanged between people. In this paper, we propose a new approach that discovers many different trending topics from tweets in real time. Our trending topics are detected for a specific geographic town and compared with the top trending topics shown on Twitter. Contrary to Twitter, our proposed approach distinguishes between different terms corresponding to the same trending topic. We exploit the semantic similarity between keywords composing tweets, by unifying them using a tweets thesaurus former created. Each trending topic has a description presented by keywords of ten tweets that are more representative.
Similar content being viewed by others
Notes
WordNet is a free lexical resource of English language available on web. It groups terms denoting a given concept (names, verbs, adjectives and adverbs) in sets of synonyms named synsets (Brigitte et al. 2007).
MongoDB is a schema-free document database written in C++ and developed in an open-source project which is mainly driven by the company 10gen Inc that also offers professional services around MongoDB. According to its developers the main goal of MongoDB is to close the gap between the fast and highly scalable key-value-stores and feature-rich traditional RDBMSs relational database management systems (Strauch 2011).
References
Aggarwal CC (2006) Data streams: models and algorithms (advances in database systems). Springer-Verlag Inc, New York
Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR ’98, pp 37–45
Allan J, Lavrenko V, Jin H (2000) First story detection in tdt is hard. In: Proceedings of the 9th international conference on information and knowledge management, ACM, CIKM ’00, pp 374–381
Benhardus J, Kalita J (2013) Streaming trend detection in twitter. Int J Web Based Communities 9(1):122–139
Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the 13th international conference on discovery science, Springer-Verlag, DS’10, pp 1–15
Blei D, Lafferty J (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35
Blei D, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, ACM, ICML ’06, pp 113–120
Brants T, Chen F, Farahat A (2003) A system for new event detection. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, ACM, SIGIR ’03, pp 330–337
Brewer EA (2000) Towards robust distributed systems (abstract). In: Proceedings of the 19th annual ACM symposium on principles of distributed computing, ACM, PODC ’00, pp 7–19
Brigitte S, Chantal R, Francois-Elie C (2007) Techniques d’alignement d’ontologies bases sur la structure d’une ressource complementaire. In: 1eres Journees Francophones sur les Ontologies, JFO 2007
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Budak C, Agrawal D, El Abbadi A (2011) Structural trend analysis for online social networks. Proc VLDB Endow 4(10):646–656
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the 10th international workshop on multimedia data mining, ACM, MDMKDD ’10, pp 4:1–4:10
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Fellbaum C (1988) WordNet : an electronic lexical database. MIT Press, Cambridge
He Q, Chang K, Lim EP, Zhang J (2007) Bursty feature representation for clustering text streams. In: SDM conference, pp 491–496
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR ’99, pp 50–57
Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the 1st workshop on social media analytics, ACM, SOMA ’10, pp 80–88
Hurford JR (1983) Semantics: a coursebook. Cambridge University Press, Cambridge
Kontostathis A, Galitsky L, Pottenger W, Roy S, Phelps D (2003) A survey of emerging trend detection in textual data mining. In: Berry MW (ed) Survey of text mining. Springer, New York, pp 185–224
Kubota Ando R, Lee L (2001) Iterative residual rescaling: an analysis and generalization of lsi. In: Proceedings of SIGIR, New Orleans, pp 154–162
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World Wide Web, ACM, WWW ’10, pp 591–600
Liu Y, Cai JR, Yin J, Fu AC (2008) Clustering text data streams. JCST 32:112–128
Lu R, Yang Q (2012) Trend analysis of news topics on twitter. Int J Mach Learn Comput 2
Madani A, Boussaid O, Zegour DE (2011) Clust-xpaths: clustering of xml paths. In: Proceedings of the 7th international conference on machine learning and data mining in pattern recognition. Springer-Verlag, MLDM’11, pp 294–305
Madani A, Boussaid O, Zegour DE (2014) Whats happening: a survey of tweets event detection. In: Proceedings of the 3rd international conference on communications, computation, networks and technologies, INNOV 2014, pp 16–22
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ACM, SIGMOD ’10, pp 1155–1158
Mei Q, Zhai CX (2005) Discovering evolutionary theme patterns from text—an exploration of temporal text mining. In: KDD conference, Chicago, pp 198–207
Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. In: Proceedings of empirical methods for natural language processing, pp 404– 411
Naaman M, Boase J, Lai CH (2010) Is it really about me?: message content in social awareness streams. In: Proceedings of the 2010 ACM conference on computer supported cooperative work, ACM, CSCW ’10, pp 189–192
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report 1999-66, Stanford InfoLab
Petrovic OM S, Lavrenko V (2010) The Edinburgh twitter corpus. In: Proceedings of NAACL workshop on social media
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Porter MF (2001) Snowball: a language for stemming algorithms. Published online. http://snowball.tartarus.org/texts/introduction.html
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World Wide Web, ACM, WWW ’10, pp 851–860
Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) Twitterstand: news in tweets. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, ACM, GIS ’09, pp 42–51
Steyvers M, Griffiths T (2005) Probabilistic topic models. In: Landauer T, Mcnamara D, Dennis S, Kintsch W (eds) Latent semantic analysis: a road to meaning. Laurence Erlbaum
Strauch C (2011) Nosql databases. Lecture selected topics on software-technology ultra-large scale sites. Manuscript, Stuttgart Media University. http://www.christof-strauch.de/nosqldbs.pdf
Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web, ACM, WWW ’07, pp 697–706
Surendran AC, Sra S (2006) Incremental aspect models for mining document streams. In: Proceedings of the 10th European conference on principle and practice of knowledge discovery in databases, Springer-Verlag, PKDD’06, pp 633–640
Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical dirichlet processes. J Am Stati Assoc 101:1566–1581
Wang X, Zhai C, Hu X, Sproat R (2007) Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’07, pp 784–793
Wartena C, Brussee R (2008) Topic detection by clustering keywords. In: Proceedings of the 2008 19th international conference on database and expert systems application, IEEE Computer Society, DEXA ’08, pp 54–58
Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR ’98, pp 28–36
Yang Y, Zhang J, Carbonell J, Jin C (2002) Topic-conditioned novelty detection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’02, pp 688–693
Zhong S (2005) 2005 special issue: efficient streaming text clustering. Neural Netw 18(5–6):790–798
Zubiaga A, Spina D, Fresno V, Martínez R (2013) Real-time classification of twitter trends. J Am Soc Inf Sci Technol (JASIST) 66(3):462–473
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Madani, A., Boussaid, O. & Zegour, D.E. Real-time trending topics detection and description from Twitter content. Soc. Netw. Anal. Min. 5, 59 (2015). https://doi.org/10.1007/s13278-015-0298-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-015-0298-5