Abstract
Given a network of Twitter users, can we capture their posting behavior over time, identify patterns that could probably describe, model or predict their activity? Can we identify temporal connectivity patterns that emerge from the use of specific attributes? More challengingly, are there particular attribute usage patterns which indicate an inherent anomaly? This work provides solid answers to all these questions, extending previous work employed on other social networks and attribute types. We propose TG-OUT, a pipeline of methods which : (a) model the temporal evolution of attribute induced graphs to detect peculiar attributes, (b) identify temporal patterns in attribute distributions, (c) investigate differences in patterns emerging from bot and/or non-bot accounts, (d) extract tailored sets of exploitable features. Experimental results show that: most of the individual attribute distributions remain stable over time following mostly power laws norm; the temporal evolution of attribute induced graphs obey certain laws and deviations are outliers; we discover that patterns present deviations which depend on the type of accounts which use each attribute; finally, we show that careful selection of only two features which are used to train a simple machine learning algorithm, produces a model which efficiently identifies attributes mainly used by bots.
Similar content being viewed by others
References
Akoglu, L, Chandy, R, Faloutsos, C: Opinion fraud detection in online reviews by network effects. In: ICWSM. The AAAI Press (2013)
Akoglu, L, McGlohon, M, Faloutsos, C: oddball: Spotting anomalies in weighted graphs. In: PAKDD (2), Lecture Notes in Computer Science, vol 6119. Springer, pp 410–421 (2010)
Akoglu, L, Tong, H, Koutra, D: Graph based anomaly detection and description: a survey. Data Mining Knowl Discov 29(3), 626–688 (2015)
Barabasi, AL: The origin of bursts and heavy tails in human dynamics. Nature 435(7039), 207–211 (2005)
Batista, GE, Prati, RC, Monard, MC: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1), 20–29 (2004)
Chakrabarti, D, Faloutsos, C: Graph mining: laws, generators, and algorithms. ACM Comput Surv (CSUR) 38(1), 2 (2006)
Chatzakou, D, Kourtellis, N, Blackburn, J, De Cristofaro, E, Stringhini, G, Vakali, A: Mean birds: detecting aggression and bullying on twitter. In: Proceedings of the 2017 ACM on web science conference, pp 13–22 (2017)
Chavoshi, N, Hamooni, H, Mueen, A: Debot: Twitter bot detection via warped correlation. In: ICDM, pp 817–822 (2016)
Chino, DY, Costa, AF, Traina, AJ, Faloutsos, C: Voltime: unsupervised anomaly detection on users’ online activity volume. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, pp 108–116 (2017)
Clauset, A, Shalizi, CR, Newman, ME: Power-law distributions in empirical data. SIAM Rev 51(4), 661–703 (2009)
Crandall, D, Cosley, D, Huttenlocher, D, Kleinberg, J, Suri, S: Feedback effects between similarity and social influence in online communities. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 160–168 (2008)
Cresci, S: A decade of social bot detection. Commun ACM 63(10), 72–83 (2020)
Davis, C, Varol, O, Ferrara, E, Flammini, A, Menczer, F: Botornot: a system to evaluate social bots. In: Proceedings of the 25th International conference companion on World Wide Web. International World Wide Web Conferences Steering Committee, pp 273–274 (2016)
De Choudhury, M, Counts, S, Horvitz, E: Major life changes and behavioral markers in social media: case of childbirth. In: Proceedings of the 2013 conference on computer supported cooperative work, pp 1431–1442 (2013)
De Choudhury, M, Counts, S, Horvitz, EJ, Hoff, A: Characterizing and predicting postpartum depression from shared facebook data. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, pp 626–638 (2014)
De Melo, POV, Akoglu, L, Faloutsos, C, Loureiro, AA: Surprising patterns for the call duration distribution of mobile phone users. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 354–369 (2010)
Devineni, P, Koutra, D, Faloutsos, M, Faloutsos, C: If walls could talk: patterns and anomalies in facebook wallposts. In: ASONAM. ACM, pp 367–374 (2015)
Eswaran, D, Rabbany, R, Dubrawski, AW, Faloutsos, C: Social-affiliation networks: patterns and the soar model
Faloutsos, M, Faloutsos, P, Faloutsos, C: On power-law relationships of the internet topology. ACM SIGCOMM Comput Commun Rev 29(4), 251–262 (1999)
Ferrara, E, Varol, O, Davis, C, Menczer, F, Flammini, A: The rise of social bots. Commun ACM 59(7), 96–104 (2016)
Ghosh, S, Viswanath, B, Kooti, F, Sharma, NK, Korlam, G, Benevenuto, F, Ganguly, N, Gummadi, KP: Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 61–70 (2012)
Giatsoglou, M, Chatzakou, D, Shah, N, Faloutsos, C, Vakali, A: Retweeting activity on twitter: signs of deception. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 122–134 (2015)
Gong, NZ, Talwalkar, A, Mackey, L, Huang, L, Shin, ECR, Stefanov, E, Shi, E, Song, D: Joint link prediction and attribute inference using a social-attribute network. ACM Trans Intell Syst Technol (TIST) 5(2), 1–20 (2014)
Guo, L, Tan, E, Chen, S, Zhang, X, Zhao, Y: Analyzing patterns of user content generation in online social networks. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 369–378 (2009)
He, H, Bai, Y, Garcia, EA, Li, S: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328 (2008)
Hooi, B, Shah, N, Beutel, A, Günnemann, S, Akoglu, L, Kumar, M, Makhija, D, Faloutsos, C: Birdnest: Bayesian inference for ratings-fraud detection. In: Proceedings of the 2016 SIAM international conference on data mining. SIAM, pp 495–503 (2016)
Hooi, B, Song, HA, Beutel, A, Shah, N, Shin, K, Faloutsos, C: FRAUDAR: bounding graph fraud in the face of camouflage. In: KDD. ACM, pp 895–904 (2016)
Jane Lytvynenko, RM: General data protection regulation - right to explanation https://www.buzzfeednews.com/article/janelytvynenko/twitter-cryptocurrency-scams-verified-accounts-russia-target (2018)
Clement, J: Number of Twitter users worldwide from 2014 to 2020 (accessed Jul 23, 2019). https://www.statista.com/statistics/303681/twitter-users-worldwide/ (2019)
Jiang, M, Cui, P, Beutel, A, Faloutsos, C, Yang, S: Catchsync: catching synchronized behavior in large directed graphs. In: KDD. ACM, pp 941–950 (2014)
Kim, M, Leskovec, J: Modeling social networks with node attributes using the multiplicative attribute graph model. arXiv:1106.5053 (2011)
Kim, M, Leskovec, J: Multiplicative attribute graph model of real-world networks. Internet Math 8(1-2), 113–160 (2012)
Koutra, D, Koutras, V, Prakash, BA, Faloutsos, C: Patterns amongst competing task frequencies: super-linearities, and the almond-dg model. In: PAKDD (1), Lecture notes in computer science, vol 7818. Springer, pp 201–212 (2013)
La Fond, T, Neville, J: Randomization tests for distinguishing social influence and homophily effects. In: Proceedings of the 19th international conference on World wide web. ACM, pp 601–610 (2010)
Leskovec, J, Chakrabarti, D, Kleinberg, J, Faloutsos, C, Ghahramani, Z: Kronecker graphs: an approach to modeling networks. J Mach Learn Res 11(Feb), 985–1042 (2010)
Leskovec, J, Kleinberg, J, Faloutsos, C: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 177–187 (2005)
Leskovec, J, Kleinberg, JM, Faloutsos, C: Graphs over time: densification laws, shrinking diameters and possible explanations. In: KDD. ACM, pp 177–187 (2005)
Lokot, T, Diakopoulos, N: News bots: automating news and information dissemination on twitter. Digit J 4(6), 682–699 (2016)
Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, Blondel, M, Prettenhofer, P, Weiss, R, Dubourg, V, et al: Scikit-learn: machine learning in python. J Mach Learn Res 12, 2825–2830 (2011)
Perozzi, B, Akoglu, L: Scalable anomaly ranking of attributed neighborhoods. In: Proceedings of the 2016 SIAM international conference on data mining. SIAM, pp 207–215 (2016)
Pfeiffer, JJ III, Moreno, S, La Fond, T, Neville, J, Gallagher, B: Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd international conference on World Wide Web. ACM, pp 831–842 (2014)
Pillutla, VK, Fang, Z, Devineni, P, Faloutsos, C, Koutra, D, Tang, J: On skewed multi-dimensional distributions: the fusionrp model, algorithms, and discoveries. In: Proceedings of the 2016 SIAM international conference on data mining. SIAM, pp 783–791 (2016)
Rauchfleisch, A, Kaiser, J: The false positive problem of automatic bot detection in social science research. Berkman Klein Center Research Publication (2020-3) (2020)
Sayyadiharikandeh, M, Varol, O, Yang, KC, Flammini, A, Menczer, F: Detection of novel social bots by ensembles of specialized classifiers. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 2725–2732 (2020)
Seshadri, M, Machiraju, S, Sridharan, A, Bolot, J, Faloutsos, C, Leskove, J: Mobile call graphs: beyond power-law and lognormal distributions. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 596–604 (2008)
Shah, N, Beutel, A, Gallagher, B, Faloutsos, C: Spotting suspicious link behavior with fbox: an adversarial perspective. arXiv:abs/1410.3915(2014)
Thomas, K, McCoy, D, Grier, C, Kolcz, A, Paxson, V: Trafficking fraudulent accounts: the role of the underground market in twitter spam and abuse. In: USENIX Security Symposium, pp 195–210 (2013)
Tsourakakis, CE: Fast counting of triangles in large real networks without counting: algorithms and laws. In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 608–617 (2008)
Varol, O, Ferrara, E, Davis, C, Menczer, F, Flammini, A: Online human-bot interactions: detection, estimation, and characterization. In: Eleventh international AAAI conference on Web and social media (2017)
Wang, B, Zubiaga, A, Liakata, M, Procter, R: Making the most of tweet-inherent features for social spam detection on twitter. arXiv:1503.074051503.07405 (2015)
Wang, Y, Liu, J, Qu, J, Huang, Y, Chen, J, Feng, X: Hashtag graph based topic model for tweet mining. In: 2014 IEEE International conference on data mining. IEEE, pp 1025–1030 (2014)
Yang, KC, Varol, O, Davis, C, Ferrara, E, Flammini, A, Menczer, F: Arming the public with artificial intelligence to counter social bots. Human Behav Emerg Technol 1(1), 48–61 (2019)
Yang, KC, Varol, O, Hui, PM, Menczer, F: Scalable and generalizable social bot detection through data selection. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 1096–1103 (2020)
Zhang, CM, Paxson, V: Detecting and analyzing automated activity on twitter. In: International conference on passive and active network measurement. Springer, pp 102–111 (2011)
Zhang, X: A very gentle note on the construction of Dirichlet process. The Australian National University Canberra (2008)
Zheleva, E, Sharara, H, Getoor, L: Co-evolution of social and affiliation networks. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1007–1016 (2009)
Acknowledgements
This research has been co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH — CREATE — INNOVATE (Project Code: T1EDK-03052), as well as from the H2020 Research and Innovation Programme under Grant Agreement No.875329.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Computational Aspects of Network Science Guest Editors: Apostolos N. Papadopoulos and Richard Chbeir
Rights and permissions
About this article
Cite this article
Dimitriadis, I., Poiitis, M., Faloutsos, C. et al. TG-OUT: temporal outlier patterns detection in Twitter attribute induced graphs. World Wide Web 25, 2429–2453 (2022). https://doi.org/10.1007/s11280-021-00986-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00986-0