Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Detecting spammers on social networks

Published: 02 July 2015 Publication History

Abstract

Social network has become a very popular way for internet users to communicate and interact online. Users spend plenty of time on famous social networks (e.g., Facebook, Twitter, Sina Weibo, etc.), reading news, discussing events and posting messages. Unfortunately, this popularity also attracts a significant amount of spammers who continuously expose malicious behavior (e.g., post messages containing commercial URLs, following a larger amount of users, etc.), leading to great misunderstanding and inconvenience on users social activities. In this paper, a supervised machine learning based solution is proposed for an effective spammer detection. The main procedure of the work is: first, collect a dataset from Sina Weibo including 30,116 users and more than 16 million messages. Then, construct a labeled dataset of users and manually classify users into spammers and non-spammers. Afterwards, extract a set of feature from message content and users social behavior, and apply into SVM (Support Vector Machines) based spammer detection algorithm. The experiment shows that the proposed solution is capable to provide excellent performance with true positive rate of spammers and non-spammers reaching 99.1% and 99.9% respectively.

References

[1]
Facebook, {http://www.facebook.com/}
[2]
Welcome to Twitter, {http://twitter.com/}.
[3]
Weibo - SINA, {http://english.sina.com/weibo/}.
[4]
Statista, {http://www.statista.com/}.
[5]
Nexgate. 2013 State of Social Media Spam, {http://nexgate.com/wp-content/uploads/2013/09/Nexgate-2013-State-of-Social-Media-Spam-Research-Report.pdf}, 2013.
[6]
Weibocrawler, {http://weibocrawler.sourceforge.net/}.
[7]
Alexa Top 500 Global Sites, {http://www.alexa.com/topsites}.
[8]
M. Uemura, T. Tabata, Design and evaluation of a Bayesian-filter-based image spam filtering method, in: Proceedings of the International Conference on Information Security and Assurance (ISA), IEEE, 2008, pp. 46-51.
[9]
B. Zhou, Y. Yao, J. Luo, Cost-sensitive three-way email spam filtering, J. Intell. Inf. Syst., 42 (2013) 19-45.
[10]
J. Jung, E. Sit, An empirical study of spam traffic and the use of DNS black Lists, in: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, ACM, 2004, pp. 370-375.
[11]
M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, N. Feamster, Building a dynamic reputation system for DNS, in: Proceedings of the Third USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET), 2010.
[12]
Trust evaluation based content filtering in social interactive data, in: Proceedings of the 2013 International Conference on Cloud Computing and Big Data (CloudCom-Asia), IEEE, 2013, pp. 538-542.
[13]
J. Kincaird, Edgerank: the secret sauce that makes Facebook's news feed tick, TechCrunch, 2010, {http://techcrunch.com/2010/04/22/facebook-edgeran}.
[14]
S. Yardi, D. Romero, G. Schoenebeck, Detecting spam in a Twitter network, First Monday, 15 (2009).
[15]
G. Stringhini, C. Kruegel, G. Vigna, Detecting spammers on social networks, in: Proceedings of the 26th Annual Computer Security Applications Conference, ACM, 2010, pp. 1-9.
[16]
A.H. Wang, Don¿t follow me: spam detection in Twitter, Security and Cryptography (SECRYPT), in: Proceedings of the 2010 International Conference on. IEEE, 2010, pp. 1-10.
[17]
H. Gao, Y. Chen, K. Lee, D. Palsetia, A. Choudhary, Towards online spam filtering in social networks, in: Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2012.
[18]
F. Benevenuto, G. Magno, T. Rodrigues, V. Almeida, Detecting spammers on Twitter, in: Proceedings of the Seventh Annual Collaboration, Electronic messaging, Anti-abuse and Spam Conference (CEAS), 2010.
[19]
Y. Zhu, X. Wang, E. Zhong, N.N. Liu, H. Li, Q. Yang, Discovering spammers in social networks, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI), 2012.
[20]
X. Hu, J. Tang, Y. Zhang, H. Liu, Social spammer detection in microblogging, in: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, ACM, 2013, pp. 2633-2639.
[21]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, ACM SIGKDD Explor. Newsl., 11 (2009) 10-18.
[22]
F. Wang, C. Zhang, Robust self-tuning semi-supervised learning, Neurocomputing, 70 (2007) 2931-2939.
[23]
C. Cortes, V. Vapnik, Support-vector networks, Mach. learn., 20 (1995) 273-297.
[24]
LIBSVM - A Library for Support Vector Machines, {http://www.csie.ntu.edu.tw/~cjlin/libsvm/}
[25]
G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing, 70 (2006) 489-501.
[26]
G.-B. Huang, H. Zhou, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst., Man, Cybern., 42 (2012) 513-529.
[27]
X. Zheng, N. Chen, Z. Chen, C. Rong, G. Chen, W. Guo, Mobile cloud based framework for remote-resident multimedia discovery and access, J. Internet Technol., 15 (2014) 1043-1050.
[28]
G.E. Hinton, Learning multiple layers of representation, Trends. Cogn. Sci., 11 (2007) 428-434.
[29]
Y. Bengio, Scaling up deep learning, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2014, p. 1966
[30]
S. Zhou, Q. Chen, X. Wang, Active deep learning method for semi-supervised sentiment classification, Neurocomputing, 120 (2013) 536-546.

Cited By

View all
  • (2024)Username Squatting on Online Social Networks: A Study on XProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637637(621-637)Online publication date: 1-Jul-2024
  • (2024) OlapGNKnowledge-Based Systems10.1016/j.knosys.2023.111163283:COnline publication date: 11-Jan-2024
  • (2023)Social Network Analytic-Based Online Counterfeit Seller Detection using User Shared ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352413519:1(1-18)Online publication date: 5-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 159, Issue C
July 2015
306 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 02 July 2015

Author Tags

  1. Machine learning
  2. Social network
  3. Spammer
  4. Support vector machine

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Username Squatting on Online Social Networks: A Study on XProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637637(621-637)Online publication date: 1-Jul-2024
  • (2024) OlapGNKnowledge-Based Systems10.1016/j.knosys.2023.111163283:COnline publication date: 11-Jan-2024
  • (2023)Social Network Analytic-Based Online Counterfeit Seller Detection using User Shared ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352413519:1(1-18)Online publication date: 5-Jan-2023
  • (2023)Identify influential nodes in social networks with graph multi-head attention regression modelNeurocomputing10.1016/j.neucom.2023.01.078530:C(23-36)Online publication date: 14-Apr-2023
  • (2022)A New Joint Approach with Temporal and Profile Information for Social Bot DetectionSecurity and Communication Networks10.1155/2022/91193882022Online publication date: 1-Jan-2022
  • (2022)Towards automatic filtering of fake reviewsNeurocomputing10.1016/j.neucom.2018.04.074309:C(106-116)Online publication date: 21-Apr-2022
  • (2022)Mutual clustering coefficient-based suspicious-link detection approach for online social networksJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2018.10.01434:2(218-231)Online publication date: 1-Feb-2022
  • (2022)Feature selection using Benford’s law to support detection of malicious social media botsInformation Sciences: an International Journal10.1016/j.ins.2021.09.038582:C(369-381)Online publication date: 1-Jan-2022
  • (2022)Measuring the impact of spammers on e-mail and Twitter networksInternational Journal of Information Management: The Journal for Information Professionals10.1016/j.ijinfomgt.2018.09.00948:C(254-262)Online publication date: 21-Apr-2022
  • (2022)Machine learning algorithms for social media analysisComputer Science Review10.1016/j.cosrev.2021.10039540:COnline publication date: 6-May-2022
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media