Abstract
Given the recent availability of large volumes of social media discussions, finding temporal unusual phenomena, which can be called events, from such data is of great interest. Previous works on social media event detection either assume a specific type of event, or assume certain behavior of observed variables. In this paper, we propose a general method for event detection on social media that makes few assumptions. The main assumption we make is that when an event occurs, affected semantic aspects will behave differently from their usual behavior, for a sustained period. We generalize the representation of time units based on word embeddings of social media text, and propose an algorithm to detect durative events in time series in a general sense. In addition, we also provide an incremental version of the algorithm for the purpose of real-time detection. We test our approaches on synthetic data and two real-world tasks. With the synthetic dataset, we compare the performance of retrospective and incremental versions of the algorithm. In the first real-world task, we use a novel setting to test if our method and baseline methods can exhaustively catch all real-world news in the test period. The evaluation results show that when the event is quite unusual with regard to the base social media discussion, it can be captured more effectively with our method. In the second real-world task, we use the event captured to help improve the accuracy of stock market movement prediction. We show that our event-based approach has a clear advantage compared to other ways of adding social media information.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Availability of data and material
The data used in this paper are available from the corresponding author upon request.
Notes
The preprint can be accessed online at http://arxiv.org/abs/2106.02250
An example online resource that provides an implementation under this setting: https://github.com/philipperemy/japanese-words-to-vectors
An implementation of this test is available as an R package: https://cran.r-project.org/web/packages/randtests/randtests.pdf
Since politician are public, such a list can be found in many online sources, for example: https://meyou.jp/group/category/politician/
A list of popular Japanese news Twitter accounts can be found on the same source: https://meyou.jp/ranking/follower_media
References
Atefeh, F., & Khreich, W. (2015). A survey of techniques for event detection in twitter. Computational Intelligence, 31(1), 132–164. https://doi.org/10.1111/coin.12017
Bartels, R. (1982). The rank version of von neumann’s ratio test for randomness. Journal of the American Statistical Association, 77(377), 40–46. https://doi.org/10.1080/01621459.1982.10477764
Batal, I., Fradkin, D., Harrison, J., Moerchen, F., Hauskrecht, M. (2012). Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280–288. https://doi.org/10.1145/2339530.2339578
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8. https://doi.org/10.1016/j.jocs.2010.12.007
Cataldi, M., Di Caro, L., Schifanella, C. (2010). Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, pp. 4:1–4:10. https://doi.org/10.1145/1814245.1814249
Chen, Y., Amiri, H., Li, Z., Chua, T. S. (2013). Emerging topic detection for organizations from microblogs. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–52. ACM. https://doi.org/10.1145/2484028.2484057
Cheng, H., Tan, P. N., Potter, C., Klooster, S. (2009). Detection and characterization of anomalies in multivariate time series. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 413–424. SIAM. https://doi.org/10.1137/1.9781611972795.36
Dong, X., Mavroeidis, D., Calabrese, F., & Frossard, P. (2015). Multiscale event detection in social media. Data Mining and Knowledge Discovery, 29(5), 1374–1405. https://doi.org/10.1007/s10618-015-0421-2
Gao, Y., Wang, S., Padmanabhan, A., Yin, J., & Cao, G. (2018). Mapping spatiotemporal patterns of events using social media: a case study of influenza trends. International Journal of Geographical Information Science, 32(3), 425–449. https://doi.org/10.1080/13658816.2017.1406943
Guralnik, V., Srivastava, J. (1999) Event detection from time series data. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 33–42. https://doi.org/10.1145/312129.312190
Hua, T., Chen, F., Zhao, L., Lu, C. T., Ramakrishnan, N. (2013). Sted: semi-supervised targeted-interest event detectionin in twitter. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1466–1469. https://doi.org/10.1145/2487575.2487712
Khalifa, M. B., Diaz Redondo, R. P., Vilas, A. F., & Rodríguez, S. S. (2017). Identifying urban crowds using geo-located social media data: a Twitter experiment in New York City. Journal of Intelligent Information Systems, 48(2), 287–308. https://doi.org/10.1007/s10844-016-0411-x
Khodabakhsh, M., Kahani, M., & Bagheri, E. (2020). Predicting future personal life events on twitter via recurrent neural networks. Journal of Intelligent Information Systems, 54(1), 101–127. https://doi.org/10.1007/s10844-018-0519-2
Kim, J. (1976). Events as property exemplifications. In: Action Theory, pp. 159–177. Springer. https://doi.org/10.1007/978-94-010-9074-2_9
Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373–397. https://doi.org/10.1023/A:1024940629314
Li, R., Lei, K. H., Khadiwala, R., Chang, K. C. (2012). TEDAS: A Twitter-based event detection and analysis system. In: Proceedings of 28th International Conference on Data Engineering, pp. 1273–1276. https://doi.org/10.1109/ICDE.2012.125
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J. (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119
Minnen, D., Isbell, C., Essa, I., Starner, T. (2007). Detecting subdimensional motifs: An efficient algorithm for generalized multivariate pattern discovery. In: Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 601–606. IEEE. https://doi.org/10.1109/ICDM.2007.52
Olteanu, A., Castillo, C., Diaz, F., Vieweg, S. (2014). CrisisLex: A lexicon for collecting and filtering microblogged communications in crises. In: In Proceedings of the 8th International AAAI Conference on Weblogs and Social Media, pp. 376–385
Parikh, R., Karlapalem, K. (2013). ET: events from tweets. In: Proceedings of the 22nd International Conference on World Wide Web, Companion Volume, pp. 613–620. ACM. https://doi.org/10.1145/2487788.2488006
Pennington, J., Socher, R., Manning, C. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543
Popescu, A.M., Pennacchiotti, M. (2010). Detecting controversial events from Twitter. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1873–1876. https://doi.org/10.1145/1871437.1871751
Qian, B., & Rasheed, K. (2007). Stock market prediction with multiple classifiers. Applied Intelligence, 26(1), 25–33. https://doi.org/10.1007/s10489-006-0001-7
Ritter, A., Etzioni, O., Clark, S., et al. (2012). Open domain event extraction from twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1104–1112. ACM. https://doi.org/10.1145/2339530.2339704
Rossi, C., Acerbo, F., Ylinen, K., Juga, I., Nurmi, P., Bosca, A., Tarasconi, F., Cristoforetti, M., & Alikadic, A. (2018). Early detection and information extraction for weather-induced floods using social media streams. International Journal of Disaster Risk Reduction, 30, 145–157. https://doi.org/10.1016/j.ijdrr.2018.03.002
Saeed, Z., Abbasi, R. A., Maqbool, O., Sadaf, A., Razzak, I., Daud, A., Aljohani, N. R., & Xu, G. (2019). What’ s happening around the world? a survey and framework on event detection techniques on twitter. Journal of Grid Computing, 17(2), 279–312. https://doi.org/10.1007/s10723-019-09482-2
Sakaki, T., Okazaki, M., Matsuo, Y. (2010). Earthquake shakes Twitter users: Real-time event detection by social sensors. In: Proceedings of the 19th International World Wide Web Conference, pp. 851–860. https://doi.org/10.1145/1772690.1772777
Sakaki, T., Okazaki, M., & Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering, 25(4), 919–931. https://doi.org/10.1109/TKDE.2012.29
Shoji, Y., Takahashi, K., Dürst, M.J., Yamamoto, Y., Ohshima, H. (2018). Location2vec: Generating distributed representation of location by using geo-tagged microblog posts. In: International Conference on Social Informatics, pp. 261–270. Springer. https://doi.org/10.1007/978-3-030-01159-8_25
Sul, H. K., Dennis, A. R., & Yuan, L. (2017). Trading on twitter: Using social media sentiment to predict stock returns. Decision Sciences, 48(3), 454–488. https://doi.org/10.1111/deci.12229
Suliman, A. T., Al Kaabi, K., Wang, D., Al-Rubaie, A., Al Dhanhani, A., Ruta, D., Davies, J., Clarke, S. S. (2016). Event identification and assertion from social media using auto-extendable knowledge base. In: Proceedings of 2016 International Joint Conference on Neural Networks, pp. 4443–4450. IEEE. https://doi.org/10.1109/IJCNN.2016.7727781
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. https://doi.org/10.1257/mac.1.1.58
Taylor, J. B., & Williams, J. C. (2009). A black swan in the money market. American Economic Journal: Macroeconomics, 1(1), 58–83.
Unankard, S., Li, X., & Sharaf, M. A. (2015). Emerging event detection in social networks with location sensitivity. World Wide Web, 18(5), 1393–1417. https://doi.org/10.1007/s11280-014-0291-3
Vahdatpour, A., Amini, N., Sarrafzadeh, M. (2009). Toward unsupervised activity discovery using multi dimensional motif detection in time series. In: Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence
Walther, M., Kaisser, M. (2013). Geo-spatial event detection in the twitter stream. In: Proceedings of the 2013 European Conference on Information Retrieval, pp. 356–367. Springer. https://doi.org/10.1007/978-3-642-36973-5_30
Wang, Y., Jin, F., Su, H., Wang, J., Zhang, G. (2018). Reasearch on user profile based on user2vec. In: Proceedings of the 2018 International Conference on Web Information Systems and Applications, pp. 479–487. Springer. https://doi.org/10.1007/978-3-030-02934-0_44
Weng, J., Lee, B. S. (2011). Event detection in twitter. In: Proceedings of the Fifth International Conference on Weblogs and Social Media, pp. 401–408
Xie, W., Zhu, F., Jiang, J., Lim, E. P., & Wang, K. (2016). TopicSketch: Real-time bursty topic detection from twitter. IEEE Transactions on Knowledge and Data Engineering, 28(8), 2216–2229. https://doi.org/10.1109/TKDE.2016.2556661
Xu, Y., Cohen, S. B. (2018). Stock movement prediction from tweets and historical prices. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1970–1979. https://doi.org/10.18653/v1/P18-1183
Zhang, T., Zhou, B., Huang, J., Jia, Y., Zhang, B., Li, Z. (2017). A refined method for detecting interpretable and real-time bursty topic in microblog stream. In: Proceedings of the 2017 International Conference on Web Information Systems Engineering, pp. 3–17. Springer. https://doi.org/10.1007/978-3-319-68783-4_1
Zhang, Y., Maekawa, T., Hara, T. (2021). Using social media background to improve cold-start recommendation deep models. In: Proceedings of 2021 IEEE International Joint Conference on Neural Networks IJCNN, pp. 1–8. https://doi.org/10.1109/IJCNN52387.2021.9534327
Zhang, Y., Shirakawa, M., Hara, T. (2021). A general method for event detection on social media. In: Proceedings of the 25th European Conference on Advances in Databases and Information Systems ADBIS 2021. https://doi.org/10.1007/978-3-030-82472-3_5
Zhang, Y., Siriaraya, P., Kawai, Y., Jatowt, A. (2019). Analysis of street crime predictors in web open data. Journal of Intelligent Information Systems pp. 1–25. https://doi.org/10.1007/s10844-019-00587-4
Zhang, Y., Szabo, C., Sheng, Q. Z. (2016). Improved object and event monitoring on twitter through lexical analysis and user profiling. In: Proceedings of the 17th International Conference on Web Information System Engineering, pp. 19–34. https://doi.org/10.1007/978-3-319-48743-4_2
Zhang, Y., Szabo, C., Sheng, Q. Z., & Fang, X. S. (2018). SNAF: Observation filtering and location inference for event monitoring on twitter. World Wide Web, 21(2), 311–343. https://doi.org/10.1007/s11280-017-0453-1
Zhao, L., Chen, F., Lu, C. T., & Ramakrishnan, N. (2016). Online spatial event forecasting in microblogs. ACM Transactions on Spatial Algorithms and Systems (TSAS), 2(4), 1–39. https://doi.org/10.1145/2997642
Zhou, X., & Chen, L. (2014). Event detection over twitter social media streams. The VLDB Journal, 23(3), 381–400. https://doi.org/10.1007/s00778-013-0320-3
Acknowledgements
This research is partially supported by JST CREST Grant Number JPMJCR21F2.
Author information
Authors and Affiliations
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Shirakawa, M. & Hara, T. Generalized durative event detection on social media. J Intell Inf Syst 60, 73–95 (2023). https://doi.org/10.1007/s10844-022-00730-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-022-00730-8