Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Real-time trending topics detection and description from Twitter content

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Twitter has become, over the last years, a major source of information. Twitter enables its users to send and read short text-based messages called tweets. Users are busy reporting news about what’s going around and within their personal. Numerous researchers from various disciplines have examined Twitter, due to the heterogeneity and immense scale of the data. One of the challenging problems is to automatically identify trending topics in real time on Twitter. Trending topics detection in real time is, thus, of high value to journalists, news reporters, analysts, e-marketing specialists, real-time application developers, and social media researchers to understand what is happening, what emergent trending topics are exchanged between people. In this paper, we propose a new approach that discovers many different trending topics from tweets in real time. Our trending topics are detected for a specific geographic town and compared with the top trending topics shown on Twitter. Contrary to Twitter, our proposed approach distinguishes between different terms corresponding to the same trending topic. We exploit the semantic similarity between keywords composing tweets, by unifying them using a tweets thesaurus former created. Each trending topic has a description presented by keywords of ten tweets that are more representative.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://support.twitter.com/articles/101125-abouttrending-topics.

  2. http://www.nltk.org.

  3. WordNet is a free lexical resource of English language available on web. It groups terms denoting a given concept (names, verbs, adjectives and adverbs) in sets of synonyms named synsets (Brigitte et al. 2007).

  4. http://www.mongodb.org/.

  5. MongoDB is a schema-free document database written in C++ and developed in an open-source project which is mainly driven by the company 10gen Inc that also offers professional services around MongoDB. According to its developers the main goal of MongoDB is to close the gap between the fast and highly scalable key-value-stores and feature-rich traditional RDBMSs relational database management systems (Strauch 2011).

  6. http://www.mpii.mpg.de/~suchanek/yago.

References

  • Aggarwal CC (2006) Data streams: models and algorithms (advances in database systems). Springer-Verlag Inc, New York

    Google Scholar 

  • Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR ’98, pp 37–45

  • Allan J, Lavrenko V, Jin H (2000) First story detection in tdt is hard. In: Proceedings of the 9th international conference on information and knowledge management, ACM, CIKM ’00, pp 374–381

  • Benhardus J, Kalita J (2013) Streaming trend detection in twitter. Int J Web Based Communities 9(1):122–139

    Article  Google Scholar 

  • Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the 13th international conference on discovery science, Springer-Verlag, DS’10, pp 1–15

  • Blei D, Lafferty J (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35

    Article  MathSciNet  MATH  Google Scholar 

  • Blei D, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, ACM, ICML ’06, pp 113–120

  • Brants T, Chen F, Farahat A (2003) A system for new event detection. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, ACM, SIGIR ’03, pp 330–337

  • Brewer EA (2000) Towards robust distributed systems (abstract). In: Proceedings of the 19th annual ACM symposium on principles of distributed computing, ACM, PODC ’00, pp 7–19

  • Brigitte S, Chantal R, Francois-Elie C (2007) Techniques d’alignement d’ontologies bases sur la structure d’une ressource complementaire. In: 1eres Journees Francophones sur les Ontologies, JFO 2007

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117

    Article  Google Scholar 

  • Budak C, Agrawal D, El Abbadi A (2011) Structural trend analysis for online social networks. Proc VLDB Endow 4(10):646–656

    Article  Google Scholar 

  • Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the 10th international workshop on multimedia data mining, ACM, MDMKDD ’10, pp 4:1–4:10

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Fellbaum C (1988) WordNet : an electronic lexical database. MIT Press, Cambridge

    Google Scholar 

  • He Q, Chang K, Lim EP, Zhang J (2007) Bursty feature representation for clustering text streams. In: SDM conference, pp 491–496

  • Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR ’99, pp 50–57

  • Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the 1st workshop on social media analytics, ACM, SOMA ’10, pp 80–88

  • Hurford JR (1983) Semantics: a coursebook. Cambridge University Press, Cambridge

    Google Scholar 

  • Kontostathis A, Galitsky L, Pottenger W, Roy S, Phelps D (2003) A survey of emerging trend detection in textual data mining. In: Berry MW (ed) Survey of text mining. Springer, New York, pp 185–224

    Google Scholar 

  • Kubota Ando R, Lee L (2001) Iterative residual rescaling: an analysis and generalization of lsi. In: Proceedings of SIGIR, New Orleans, pp 154–162

  • Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World Wide Web, ACM, WWW ’10, pp 591–600

  • Liu Y, Cai JR, Yin J, Fu AC (2008) Clustering text data streams. JCST 32:112–128

    Google Scholar 

  • Lu R, Yang Q (2012) Trend analysis of news topics on twitter. Int J Mach Learn Comput 2

  • Madani A, Boussaid O, Zegour DE (2011) Clust-xpaths: clustering of xml paths. In: Proceedings of the 7th international conference on machine learning and data mining in pattern recognition. Springer-Verlag, MLDM’11, pp 294–305

  • Madani A, Boussaid O, Zegour DE (2014) Whats happening: a survey of tweets event detection. In: Proceedings of the 3rd international conference on communications, computation, networks and technologies, INNOV 2014, pp 16–22

  • Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ACM, SIGMOD ’10, pp 1155–1158

  • Mei Q, Zhai CX (2005) Discovering evolutionary theme patterns from text—an exploration of temporal text mining. In: KDD conference, Chicago, pp 198–207

  • Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. In: Proceedings of empirical methods for natural language processing, pp 404– 411

  • Naaman M, Boase J, Lai CH (2010) Is it really about me?: message content in social awareness streams. In: Proceedings of the 2010 ACM conference on computer supported cooperative work, ACM, CSCW ’10, pp 189–192

  • Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report 1999-66, Stanford InfoLab

  • Petrovic OM S, Lavrenko V (2010) The Edinburgh twitter corpus. In: Proceedings of NAACL workshop on social media

  • Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Article  Google Scholar 

  • Porter MF (2001) Snowball: a language for stemming algorithms. Published online. http://snowball.tartarus.org/texts/introduction.html

  • Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World Wide Web, ACM, WWW ’10, pp 851–860

  • Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) Twitterstand: news in tweets. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, ACM, GIS ’09, pp 42–51

  • Steyvers M, Griffiths T (2005) Probabilistic topic models. In: Landauer T, Mcnamara D, Dennis S, Kintsch W (eds) Latent semantic analysis: a road to meaning. Laurence Erlbaum

    Google Scholar 

  • Strauch C (2011) Nosql databases. Lecture selected topics on software-technology ultra-large scale sites. Manuscript, Stuttgart Media University. http://www.christof-strauch.de/nosqldbs.pdf

  • Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web, ACM, WWW ’07, pp 697–706

  • Surendran AC, Sra S (2006) Incremental aspect models for mining document streams. In: Proceedings of the 10th European conference on principle and practice of knowledge discovery in databases, Springer-Verlag, PKDD’06, pp 633–640

  • Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical dirichlet processes. J Am Stati Assoc 101:1566–1581

    Article  MathSciNet  MATH  Google Scholar 

  • Wang X, Zhai C, Hu X, Sproat R (2007) Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’07, pp 784–793

  • Wartena C, Brussee R (2008) Topic detection by clustering keywords. In: Proceedings of the 2008 19th international conference on database and expert systems application, IEEE Computer Society, DEXA ’08, pp 54–58

  • Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR ’98, pp 28–36

  • Yang Y, Zhang J, Carbonell J, Jin C (2002) Topic-conditioned novelty detection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’02, pp 688–693

  • Zhong S (2005) 2005 special issue: efficient streaming text clustering. Neural Netw 18(5–6):790–798

    Article  MATH  Google Scholar 

  • Zubiaga A, Spina D, Fresno V, Martínez R (2013) Real-time classification of twitter trends. J Am Soc Inf Sci Technol (JASIST) 66(3):462–473

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amina Madani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Madani, A., Boussaid, O. & Zegour, D.E. Real-time trending topics detection and description from Twitter content. Soc. Netw. Anal. Min. 5, 59 (2015). https://doi.org/10.1007/s13278-015-0298-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-015-0298-5

Keywords

Navigation