Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2600428.2609632acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Leveraging knowledge across media for spammer detection in microblogging

Published: 03 July 2014 Publication History

Abstract

While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.

References

[1]
T. A. Almeida, J. M. G. Hidalgo, and A. Yamakami. Contributions to the study of sms spam filtering: new collection and results. In Proceedings of DocEng, 2011.
[2]
T. Baldwin, P. Cook, M. Lui, A. MacKinlay, and L. Wang. How noisy social media text, how diffrnt social media sources? In Proceedings of IJCNLP, 2013.
[3]
L. Bilge, T. Strufe, D. Balzarotti, and E. Kirda. All your contacts are belong to us: automated identity theft attacks on social networks. In WWW, 2009.
[4]
E. Blanzieri and A. Bryl. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 29(1):63--92, 2008.
[5]
Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. The socialbot network: when bots socialize for fame and money. In ACSAC, 2011.
[6]
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
[7]
H. M. Breland. Word frequency and word difficulty: A comparison of counts in four corpora. PSS, 1996.
[8]
F. Chung. Spectral graph theory. Number 92. Amer Mathematical Society, 1997.
[9]
C. Ding, T. Li, and M. Jordan. Convex and semi-nonnegative matrix factorizations. TPAMI, 2010.
[10]
T. Egener, J. Granado, and M. Guitton. High frequency of phenotypic deviations in physcomitrella patens plants transformed with a gene-disruption library. BMC Plant Biology, 2:6, 2002.
[11]
J. Eisenstein. What to do about bad language on the internet. In Proceedings of NAACL-HLT, 2013.
[12]
J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, 2008.
[13]
S. Ghosh, B. Viswanath, F. Kooti, N. Sharma, G. Korlam, F. Benevenuto, N. Ganguly, and K. Gummadi. Understanding and combating link farming in the twitter social network. In WWW, 2012.
[14]
J. M. Gómez Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García. Content based sms spam filtering. In Proceedings of DocEng, 2006.
[15]
M. A. Halliday and C. M. Matthiessen. An introduction to functional grammar. 2004.
[16]
M. Hart. Project gutenberg. Project Gutenberg, 1971.
[17]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of SIGIR, 1999.
[18]
X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In CIKM, 2009.
[19]
X. Hu, J. Tang, and H. Liu. Online social spammer detection. In AAAI, 2014.
[20]
X. Hu, J. Tang, Y. Zhang, and H. Liu. Social spammer detection in microblogging. In IJCAI, 2013.
[21]
X. Hu, L. Tang, J. Tang, and H. Liu. Exploiting social relations for sentiment analysis in microblogging. In WSDM, 2013.
[22]
K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: social honeypots
[23]
machine learning. In Proceedings of SIGIR, 2010.
[24]
T. Li, Y. Zhang, and V. Sindhwani. A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In Proceedings of ACL, 2009.
[25]
Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Splog detection using self-similarity analysis on blog temporal dynamics. In AirWeb, 2007.
[26]
V. Metsis, I. Androutsopoulos, and G. Paliouras. Spam filtering with naive bayes-which naive bayes? In Proceedings of CEAS, 2006.
[27]
D. O'Callaghan, M. Harrigan, J. Carthy, and P. Cunningham. Network analysis of recurring youtube spam campaigns. In Proceedings of ICWSM, 2012.
[28]
S. J. Pan and Q. Yang. A survey on transfer learning. TKDE, pages 1345--1359, 2010.
[29]
D. Seung and L. Lee. Algorithms for non-negative matrix factorization. NIPS, pages 556--562, 2001.
[30]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.
[31]
R. Wardhaugh. An introduction to sociolinguistics, volume 28. Wiley. com, 2011.
[32]
S. Webb, J. Caverlee, and C. Pu. Introducing the webb spam corpus: Using email spam to identify web spam automatically. In CEAS, 2006.
[33]
Z. Yang, C. Wilson, X. Wang, T. Gao, B. Zhao, and Y. Dai. Uncovering social network sybils in the wild. In Proceedings of IMC, 2011.
[34]
S. J. Yates. Oral and written linguistic aspects of computer conferencing. Pragmatics and beyond New Series, 1996.
[35]
Y. Zhu, X. Wang, E. Zhong, N. Liu, H. Li, and Q. Yang. Discovering spammers in social networks. In Proceedings of AAAI, 2012.

Cited By

View all
  • (2021)Survey on Astroturfing Detection and Analysis from an Information Technology PerspectiveSecurity and Communication Networks10.1155/2021/32946102021Online publication date: 1-Jan-2021
  • (2021)Social Spammer Detection Based on PSO-CatBoostSecurity, Privacy, and Anonymity in Computation, Communication, and Storage10.1007/978-3-030-68851-6_28(382-395)Online publication date: 5-Feb-2021
  • (2019)Cross-Domain Spam Detection in Social Media: A SurveyEmerging Technologies in Computer Engineering: Microservices in Big Data Analytics10.1007/978-981-13-8300-7_9(98-112)Online publication date: 18-May-2019
  • Show More Cited By

Index Terms

  1. Leveraging knowledge across media for spammer detection in microblogging

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
      July 2014
      1330 pages
      ISBN:9781450322577
      DOI:10.1145/2600428
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 July 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cross-media mining
      2. emails
      3. security
      4. sms
      5. social meida
      6. spammer detection
      7. twitter
      8. web

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SIGIR '14
      Sponsor:

      Acceptance Rates

      SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Survey on Astroturfing Detection and Analysis from an Information Technology PerspectiveSecurity and Communication Networks10.1155/2021/32946102021Online publication date: 1-Jan-2021
      • (2021)Social Spammer Detection Based on PSO-CatBoostSecurity, Privacy, and Anonymity in Computation, Communication, and Storage10.1007/978-3-030-68851-6_28(382-395)Online publication date: 5-Feb-2021
      • (2019)Cross-Domain Spam Detection in Social Media: A SurveyEmerging Technologies in Computer Engineering: Microservices in Big Data Analytics10.1007/978-981-13-8300-7_9(98-112)Online publication date: 18-May-2019
      • (2018)Unsupervised keyword extraction from microblog posts via hashtagsJournal of Web Engineering10.5555/3370048.337005317:1-2(93-120)Online publication date: 1-Mar-2018
      • (2018)Semi-Supervised Collaborative Learning for Social Spammer and Spam Message Detection in MicrobloggingProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3269324(1791-1794)Online publication date: 17-Oct-2018
      • (2018)Detecting Suspicious Members in an Online Emotional Support ServiceSecurity and Privacy in Communication Networks10.1007/978-3-030-01704-0_2(22-42)Online publication date: 29-Dec-2018
      • (2018)Social Communication Network: Case StudyEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_289(2576-2583)Online publication date: 12-Jun-2018
      • (2017)Robust Spammer Detection in MicroblogsACM Transactions on Intelligent Systems and Technology10.1145/30866378:6(1-31)Online publication date: 18-Aug-2017
      • (2017)ProGuard: Detecting Malicious Accounts in Social-Network-Based Online PromotionsIEEE Access10.1109/ACCESS.2017.26542725(1990-1999)Online publication date: 2017
      • (2017)Discovering social spammers from multiple viewsNeurocomputing10.1016/j.neucom.2016.11.013225:C(49-57)Online publication date: 15-Feb-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media