Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1571941.1572047acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Detecting spammers and content promoters in online video social networks

Published: 19 July 2009 Publication History

Abstract

A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users--promoters--may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem.
In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users.

References

[1]
comscore: Americans viewed 12 billion videos online in may 2008. http://www.comscore.com/press/release.asp?press=2324.
[2]
The new york times: Search ads come to youtube. http://bits.blogs.nytimes.com/2008/10/13/search-ads-come-to-youtube.
[3]
Youtube fact sheet. http://www.youtube.com/t/fact_sheet.
[4]
Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Int'l World Wide Web Conference (WWW), 2007.
[5]
F. Benevenuto, F. Duarte, T. Rodrigues, V. Almeida, J. Almeida, and K. Ross. Understanding video interactions in youtube. In ACM Multimedia (MM), 2008.
[6]
F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, C. Zhang, and K. Ross. Identifying video spammers in online social networks. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008.
[7]
S. Boll. Multitube--where web 2.0 and multimedia could meet. IEEE MultiMedia, 14, 2007.
[8]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Int'l World Wide Web Conference (WWW), 1998.
[9]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In Int'l ACM SIGIR, 2007.
[10]
M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Internet Measurement Conference (IMC), 2007.
[11]
F. Douglis. On social networking and communication paradigms. IEEE Internet Computing, 12, 2008.
[12]
R. Fan, P. Chen, and C. Lin. Working set selection using the second order information for training svm. Journal of Machine Learning Research (JMLR), 6, 2005.
[13]
D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Int'l Workshop on the Web and Databases (WebDB), 2004.
[14]
P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: A view from the edge. In Internet Measurement Conference (IMC), 2007.
[15]
L. Gomes, J. Almeida, V. Almeida, and W. Meira. Workload models of spam and legitimate e-mails. Performance Evaluation, 64, 2007.
[16]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Int'l. Conference on Very Large Data Bases (VLDB), 2004.
[17]
P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11, 2007.
[18]
C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, volume 13, 2002.
[19]
A. Jain, M. Murty, and P. Flynn. Data clustering: a review. ACM Computing Surveys, 31, 1999.
[20]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML), 1998.
[21]
S. Kamvar, M. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In Int'l World Wide Web Conference (WWW), 2003.
[22]
R. Kohavi and F. Provost. Glossary of terms. Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Machine Learning, 30, 1998.
[23]
G. Koutrika, F. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2007.
[24]
A. Langville and C. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006.
[25]
Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. Tseng. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Transactions on the Web (TWeb), 2, 2008.
[26]
A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Internet Measurement Conference (IMC), 2007.
[27]
K. Morik, P. Brockhausen, and T. Joachims. Combining statistical learning with a knowledge-based approach--a case study in intensive care monitoring. In Int'l Conference on Machine Learning (ICML), 1999.
[28]
M. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68, 2003.
[29]
A. Thomason. Blog spam: A review. In Conference on Email and Anti-Spam (CEAS), 2007.
[30]
G. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical report, 2001.
[31]
C. Wu, K. Cheng, Q. Zhu, and Y. Wu. Using visual features for anti-spam filtering. In IEEE Int'l Conference on Image Processing (ICIP), 2005.
[32]
Y. Xie, F. Yu, K. Achan, R. Panigrahy, G. Hulten, and I. Osipkov. Spamming botnets: Signatures and characteristics. In ACM SIGCOMM, 2008.
[33]
Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrival, 1, 1999.
[34]
Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Int'l Conference on Machine Learning (ICML), 1997.

Cited By

View all
  • (2024)Identifying Online User Reputation in Terms of Collective Rating BehaviorsOperations Research and Fuzziology10.12677/orf.2024.14437514:04(51-60)Online publication date: 2024
  • (2024)A robust ranking method for online rating systems with spammers by interval divisionExpert Systems with Applications10.1016/j.eswa.2023.121236235(121236)Online publication date: Jan-2024
  • (2024)An Efficient Algorithm for Exploitative Monetization Scam Video Detection Over Social Media PlatformsSECURITY AND PRIVACY10.1002/spy2.474Online publication date: 24-Oct-2024
  • Show More Cited By

Index Terms

  1. Detecting spammers and content promoters in online video social networks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
      July 2009
      896 pages
      ISBN:9781605584836
      DOI:10.1145/1571941
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 July 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. promoter
      2. social media
      3. social networks
      4. spammer
      5. video promotion
      6. video response
      7. video spam

      Qualifiers

      • Research-article

      Conference

      SIGIR '09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 18 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Identifying Online User Reputation in Terms of Collective Rating BehaviorsOperations Research and Fuzziology10.12677/orf.2024.14437514:04(51-60)Online publication date: 2024
      • (2024)A robust ranking method for online rating systems with spammers by interval divisionExpert Systems with Applications10.1016/j.eswa.2023.121236235(121236)Online publication date: Jan-2024
      • (2024)An Efficient Algorithm for Exploitative Monetization Scam Video Detection Over Social Media PlatformsSECURITY AND PRIVACY10.1002/spy2.474Online publication date: 24-Oct-2024
      • (2022)Followers forever: Prior commitment predicts post-scandal support of a social media celebritySocial Psychological Bulletin10.32872/spb.828317Online publication date: 6-Sep-2022
      • (2022)A robust reputation iterative algorithm based on Z-statistics in a rating system with thorny objectsJournal of the Operational Research Society10.1080/01605682.2022.210195274:6(1600-1612)Online publication date: 25-Jul-2022
      • (2022)Semi-supervised internet water army detection based on graph embeddingMultimedia Tools and Applications10.1007/s11042-022-13633-182:7(9891-9912)Online publication date: 16-Sep-2022
      • (2021)Advances in spam detection for email spam, web spam, social network spam, and review spamJournal of Computer Security10.3233/JCS-21002229:5(473-529)Online publication date: 26-Aug-2021
      • (2021)An Iterative Deviation-based Ranking Method to Evaluate User Reputation in Online Rating Systems✱2021 4th International Conference on Data Science and Information Technology10.1145/3478905.3478909(15-21)Online publication date: 23-Jul-2021
      • (2021)Exploring Interactions in YouTube to Support the Identification of Crime SuspectsProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466967(1-8)Online publication date: 7-Jun-2021
      • (2021)Fake Profile Detection from the Social Dataset for Movie Promotion2021 Sixth International Conference on Image Information Processing (ICIIP)10.1109/ICIIP53038.2021.9702684(495-498)Online publication date: 26-Nov-2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media