Abstract
Compromising legitimate accounts has been the most used strategy to spread malicious content on OSN (Online Social Network). To address this problem, we propose a pure text mining approach to check if an account has been compromised based on its posts content. In the first step, the proposed approach extracts the writing style from the user account. The second step comprehends the k-Nearest Neighbors algorithm (k-NN) to evaluate the post content and identify the user. Finally, Baseline Updating (third step) consists of a continuous updating of the user baseline to support the current trends and seasonality issues of user’s posts. Experiments were carried out using a dataset from Twitter composed by tweets of 1000 users. All the three steps were individually evaluated, and the results show that the developed method is stable and can detect the compromised accounts. An important observation is the Baseline Updating contribution, which leads to an enhancement of accuracy superior of 60 %. Regarding average accuracy, the developed method achieved results over 93 %.
Similar content being viewed by others
Notes
References
Aggarwal CC (2014) Data classification: algorithms and applications CRC Press
Argamon S, Šarić M, Stein SS (2003) Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 475–480
Bahrainian S-A, Dengel A (2013) Sentiment analysis Summarization of twitter data. In: 2013 IEEE 16th International conference on Computational Science and Engineering (CSE). IEEE, pp 227–234
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12
Bhat SY, Abulaish M (2013) Community-based features for identifying spammers in online social networks. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 100–107
Bliss CA, Kloumann IM, Harris KD, Danforth CM, Dodds PS (2012) Twitter reciprocal reply networks exhibit assortativity with respect to happiness. J Comput Sci 3(5):388–397
Brocardo ML, Traore I, Saad S, Woungang I (2013) Authorship verification for short messages using stylometry. In: Computer, Information and Telecommunication Systems (CITS) international conference on. IEEE, pp 1–6
Brocardo ML, Traore I, Woungang I (2014) Authorship verification of e-mail and tweet messages applied for continuous authentication. Journal of Computer and System Sciences pages –
Canales O, Monaco V, Murphy T, Zych E, Stewart J, Castro CTA, Sotoye O, Torres L, Truley G (2011) A stylometry system for authenticating students taking online tests. P. of Student-Faculty Research Day, Ed., CSIS. Pace University
Cao Q, Sirivianos M, Yang X, Pregueiro T (2012) Aiding the detection of fake accounts in large scale social online services. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 15–15
Chen X, Hao P, Chandramouli R, Subbalakshmi KP (2011) Authorship similarity detection from email messages. In: Machine learning and data mining in pattern recognition. Springer, pp 375–386
Cingiz MÖ, Diri B, Biricik G (2015) Am i typing fresh tweets: detecting up-to-dateness and worth of categorical information in microblogs. Expert Syst Appl 42(12):5256–5263
Corney M, Vel OD, Anderson A, Mohay G (2002) Gender-preferential text mining of e-mail discourse. In: Computer security applications conference proceedings. 18th annual, pp 282–289
Cresci S, Pietro RD, Petrocchi M, Spognardi A, Tesconi M (2014) A fake follower story: improving fake accounts detection on twitter. IIT-CNR, Tech. Rep TR-03
da Silva NFF, Hruschka ER, Hruschka ER (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179
Derczynski L, Ritter A, Clark S, Bontcheva K (2013) Twitter part-of-speech tagging for all: overcoming sparse and noisy data
Donais JA, Frost RA, Peelar SM, Roddy RA (2013) Summary: A system for the automated author attribution of text and instant messages. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM international conference on. IEEE, pp 1484–1485
Duda RO, Hart PE, Stork DG (2012) Pattern Classification. Wiley, New York
Egele M, Stringhini G, Kruegel C, Vigna G (2013) Compa: detecting compromised accounts on social networks. In: NDSS
El Manar El S, Kassou I (2014) Authorship analysis studies: a survey. Int J Comput Appl 86(12)
Fan X, Yuan C (2015) An improved lower bound for bayesian network structure learning. In: AAAI, pp 3526–3532
Fan X, Yuan C, Malone BM (2014) Tightening bounds for Bayesian network structure learning. In: AAAI, pp 2439–2445
Fersini E, Messina E, Pozzi FA (2014) Sentiment analysis Bayesian ensemble learning. Decis Support Syst 68:26–38
Fong S, Zhuang Y, He J (2012) Not every friend on a social network can be trusted: classifying imposters using decision trees. In: 2012 International conference on future generation communication technology (FGCT), pp 58–63
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 35–47
Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on computer and communications security. ACM, pp 27–37
Hadjidj R, Debbabi M, Lounis H, Iqbal F, Szporer A, Benredjem D (2009) Towards an integrated e-mail forensic analysis framework. Digit Investig 5 (3):124–137
Hassan A, Abbasi A, Zeng D (2013) Twitter sentiment analysis: a bootstrap ensemble framework. In: 2013 International conference on social computing (SocialCom). IEEE, pp 357–364
Hogenboom A, Frasincar F, Jong FD, Kaymak U (2015) Polarity classification using structure-based vector representations of text. Decis Support Syst 74:46–56
Hsieh L-C, Lee C-W, Chiu T-H, Hsu W (2012) Live semantic sport highlight detection based on analyzing tweets of twitter. In: 2012 IEEE international conference on multimedia and expo (ICME). IEEE, pp 949–954
Igawa RA, Barbon Jr S, Paulo KCS, Kido GS, Guido RC, Júnior MLP, da Silva IN (2016) Account classification in online social networks with lbca and wavelets. Inf Sci 332:72–83
Igawa RA, de Almeida AMG, Zarpelao BB, Barbon Jr S (2015) Recognition of compromised accounts on twitter. In: Proceedings of the annual conference on Brazilian symposium on information systems: information systems: a computer socio-technical perspective. SBSI 2015, vol 1. Brazilian Computer Society, Porto Alegre, Brazil, Brazil, pp 2:9–2:14
Iqbal F, Binsalleeh H, Fung BCM, Debbabi M (2010) Mining writeprints from anonymous e-mails for forensic investigation. Digit Investig 7(1):56–64
Iqbal F, Binsalleeh H, Fung BCM, Debbabi M (2013) A unified data mining solution for authorship analysis in anonymous textual communications. Inf Sci 231:98–112
Iqbal F, Hadjidj R, Fung BCM, Debbabi M (2008) A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digit Investig 5:S42–S51
Iqbal F, Khan LA, Fung B, Debbabi M (2010) E-mail authorship verification for forensic investigation. In: Proceedings of the ACM symposium on applied computing. ACM, pp 1591–1598
Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Detecting suspicious following behavior in multimillion-node social networks. In: Proceedings of the companion publication of the 23rd international conference on world wide web companion. International World Wide Web Conferences Steering Committee, pp 305–306
Keretna S, Hossny A, Creighton D (2013) Recognising user identity in twitter social networks via text mining. In: 2013 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 3079–3082
Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Literary Linguistic Comput 17(4):401–412
Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 62
Koppel M, Schler J, Argamon S (2009) Computational methods in authorship attribution. J Am Soc Inf Sci Technol 60(1):9–26
Kucukyilmaz T, Barla Cambazoglu B, Aykanat C, Can F (2008) Chat mining: predicting user and message attributes in computer-mediated communication. Inf Process Manag 44(4):1448–1466
Layton R, Watters P, Dazeley R (2010) Authorship attribution for twitter in 140 characters or less. In: 2010 Second cybercrime and trustworthy computing workshop (CTC). IEEE, pp 1–8
Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 435–442
Li R, Wang S, Deng H, Wang R, Chang K C-C (2012) Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD, pp 1023–1031
Li X, Wang M, Liang T-P (2014) A multi-theoretical kernel-based approach to social network-based recommendation. Decis Support Syst 65:95–104
Liao H-Y, Chen K-Y, Liu D-R (2015) Virtual friend recommendations in virtual worlds. Decis Support Syst 69:59–69
Liu Z, Yang Z, Liu S, Shi Y (2013) Semi-random subspace method for writeprint identification. Neurocomputing 108:93–102
Lumezanu C, Feamster N (2012) Observing common spam in tweets and email. In: Proc. IMC. Citeseer
Martinez-Romo J, Araujo L (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8):2992–3000
Mostafa MM (2013) More than words: social networks text mining for consumer brand sentiments. Expert Syst Appl 40(10):4241–4251
Neme A, Pulido JRG, Muoz A, Hernn̈dez S, Dey T (2015) Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing 147:147–159. Advances in self-organizing maps subtitle of the special issue: selected papers from the workshop on self-organizing maps 2012 (WSOM 2012)
Potha N, Stamatatos E (2014) A profile-based method for authorship verification. In: Likas A, Blekas K, Kalles D (eds) Artificial intelligence: methods and applications, volume 8445 of lecture notes in computer science, pp 313–326. Springer International Publishing
Qian T, Liu B, Li C, Peng Z, Zhong M, He G, Li X, Gang X (2015) Tri-training for authorship attribution with limited training data: a comprehensive study. Neurocomputing pages –
Ramezani R, Sheydaei N, Kahani M (2013) Evaluating the effects of textual features on authorship attribution accuracy. In: 2013 3th International eConference on computer and knowledge engineering (ICCKE). IEEE, pp 108–113
Santos I, Miñambres-Marcos I, Laorden C, Galán-García P, Santamaría-Ibirika A, Bringas P (2014) Twitter content-based spam filtering. In: International Joint Conference SOCO13-CISIS13-ICEUTE13. Springer, pp 449–458
Smailović J, Grčar M, Lavrač N, žnidaršič M (2014) Stream-based active learning for sentiment analysis in the financial domain. Information Sciences
Song J, Lee S, Kim J (2011) Spam filtering in twitter using sender-receiver relationship. In: Recent advances in intrusion detection. Springer, pp 301–317
Stein T, Chen E, Mangla K (2011) Facebook immune system. In: Proceedings of the 4th workshop on social network systems. ACM, p 8
Sun J, Yang Z, Wang P, Liu S (2010) Variable length character n-gram approach for online writeprint identification. In: International conference on multimedia information networking and security (MINES). IEEE, pp 486–490
Theodoridis S, Pikrakis A, Koutroumbas K, Cavouras D (2010) Introduction to pattern recognition: a Matlab approach: a Matlab approach. Academic Press
Weathers D, Swain SD, Grover V (2015) Can online product reviews be more helpful? Examining characteristics of information content by product type. Decis Support Syst 79:12–23
Yu SJ (2012) The dynamic competitive recommendation algorithm in social network services. Inf Sci 187:1–14
Zadeh AH, Sharda R (2014) Modeling brand post popularity dynamics in online social networks. Decis Support Syst 65:59–68
Zangerle E, Specht G (2014) Sorry, I was hacked: a classification of compromised twitter accounts. In: Proceedings of the 29th annual ACM symposium on applied computing. ACM, pp 587–593
Zappavigna M (2011) Ambient affiliation: a linguistic perspective on twitter. New Media Soc 13(5): 788–806
Zhang C, Xindong W, Niu Z, Ding W (2014) Authorship identification from unstructured texts Knowledge-based systems
Zhang Z, Wang K (2013) A trust model for multimedia social networks. Soc Netw Anal Min 3(4): 969–979
Zhang Z, Liu Y, Ding W, Huang WW, Qin S, Chen P (2015) Proposing a new friend recommendation method, frutai, to enhance social media providers’ performance. Decis Support Syst 79:46–54
Zhou X, Sai W, Chen C, Chen G, Ying S (2014) Real-time recommendation for microblogs. Inf Sci 279:301–325
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barbon, S., Igawa, R.A. & Bogaz Zarpelão, B. Authorship verification applied to detection of compromised accounts on online social networks. Multimed Tools Appl 76, 3213–3233 (2017). https://doi.org/10.1007/s11042-016-3899-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3899-8