Abstract
The growing popularity of social media provides a huge volume of social data including Tweets. These collections of social data can be potentially useful, but the extent of meaningful data in these collections has not been sufficiently researched, especially in South Korea Twitter data. In general, the South Korea Twitter data has been researched as a source of political media. Nonetheless, previous research on South Korea Twitter data has not adequately covered what kind of trend Twitter represents in terms of major topic categories such as politics, economics, or sports. In this paper, we present a cross-media approach to define the nature of South Korea Tweets by inferring the topic category distribution through short-text categorization. We select newspapers as cross-media, examine the categorization of news articles from major newspapers, and then train our classifier based on the features from each topic category. In addition, for grafting news topics onto South Korea Tweets, we propose a word clustering and filtering approach to exclude those words that do not provide semantic content for the topic categories. Based on the proposed procedures, we analyze the South Korea Tweets to determine the primary topic category focus of Twitter users. We observe the special behaviors of the South Korea Twitter users based on various parameters such as date, time slot, and day of the week. Because our research includes a macroscopic analysis of Twitter data using a cross-media strategy, our research can provide useful resources for other social media analysis as well.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
AlSumait L, Barbará D, Domeniconi C (2008) On-line lda: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, pp 3–12
Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71
Blei DM (2004) Probabilistic models of text and images. University of California, Berkeley
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Chen M, Jin X, Shen D (2011) Short text classification improved by learning multi-granularity topics. In: IJCAI. Citeseer, pp 1776–1781
Cheong M, Lee V (2009) Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base. In: Proceedings of the 2nd ACM workshop on Social web search and mining. ACM, pp 1–8
Cho SW, Cha MS, Kim SY, Song JC, Sohn K-A (2014) Investigating temporal and spatial trends of brand images using twitter opinion mining. In: Information Science and Applications (ICISA), 2014 International Conference on. IEEE, pp 1–4
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 50–57
Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics. ACM, pp 80–88
Hsu C, Park SJ, Park HW (2013) Political discourse among key twitter users: the case of Sejong city in South Korea. J Contemp Eastern Asia 12(1):65–79
Joachims T (1996) A Probabilistic analysis of the Rocchio Algorithm with TFIDF for text categorization. DTIC Document
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web. ACM, pp 591–600
Lau JH, Collier N, Baldwin T (2012) On-line trend analysis with topic models:\# twitter trends detection topic model online. In: COLING. Citeseer, pp 1519–1534
Lee K, Palsetia D, Narayanan R, Patwary MMA, Agrawal A, Choudhary A (2011) Twitter trending topic classification. In: Data Mining Workshops (ICDMW), 2011 I.E. 11th International Conference on. IEEE, pp 251–258
Lu R, Yang Q (2012) Trend analysis of news topics on twitter
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, pp 1155–1158
McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu
Phan X-H, Nguyen L-M, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on World Wide Web. ACM, pp 91–100
Ramos J (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning
Song G, Ye Y, Du X, Huang X, Bie S (2014) Short text classification: a survey. J Multimed 9(5):635–643
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Yoon HY, Park HW (2014) Strategies affecting twitter-based networking pattern of South Korean politicians: social network analysis and exponential random graph model. Qual Quant 48(1):409–423
Zhao WX, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Advances in information retrieval. Springer, Berlin, pp 338–349
Acknowledgments
This research was supported by the Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Science, ICT, and Future Planning (MSIP) (2014R1A1A3051169), and by the Ministry of Education (2012R1A1A2042792).
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cho, S.W., Cha, M. & Sohn, KA. Topic category analysis on twitter via cross-media strategy. Multimed Tools Appl 75, 12879–12899 (2016). https://doi.org/10.1007/s11042-015-2866-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2866-0