research-article

Detecting spammers and content promoters in online video social networks

Authors:

Fabrício Benevenuto,

Tiago Rodrigues,

Virgílio Almeida,

Jussara Almeida,

Marcos GonçalvesAuthors Info & Claims

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pages 620 - 627

https://doi.org/10.1145/1571941.1572047

Published: 19 July 2009 Publication History

Abstract

A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users--promoters--may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem.

In this paper, we go a step further by addressing the issue of detecting video spammers and promoters. Towards that end, we manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimates. Using our test collection, we provide a characterization of social and content attributes that may help distinguish each user class. We also investigate the feasibility of using a state-of-the-art supervised classification algorithm to detect spammers and promoters, and assess its effectiveness in our test collection. We found that our approach is able to correctly identify the majority of the promoters, misclassifying only a small percentage of legitimate users. In contrast, although we are able to detect a significant fraction of spammers, they showed to be much harder to distinguish from legitimate users.

References

[1]

comscore: Americans viewed 12 billion videos online in may 2008. http://www.comscore.com/press/release.asp?press=2324.

[2]

The new york times: Search ads come to youtube. http://bits.blogs.nytimes.com/2008/10/13/search-ads-come-to-youtube.

[3]

Youtube fact sheet. http://www.youtube.com/t/fact_sheet.

[4]

Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Int'l World Wide Web Conference (WWW), 2007.

Digital Library

[5]

F. Benevenuto, F. Duarte, T. Rodrigues, V. Almeida, J. Almeida, and K. Ross. Understanding video interactions in youtube. In ACM Multimedia (MM), 2008.

Digital Library

[6]

F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, C. Zhang, and K. Ross. Identifying video spammers in online social networks. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008.

Digital Library

[7]

S. Boll. Multitube--where web 2.0 and multimedia could meet. IEEE MultiMedia, 14, 2007.

Digital Library

[8]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Int'l World Wide Web Conference (WWW), 1998.

Digital Library

[9]

C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In Int'l ACM SIGIR, 2007.

Digital Library

[10]

M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Internet Measurement Conference (IMC), 2007.

Digital Library

[11]

F. Douglis. On social networking and communication paradigms. IEEE Internet Computing, 12, 2008.

Digital Library

[12]

R. Fan, P. Chen, and C. Lin. Working set selection using the second order information for training svm. Journal of Machine Learning Research (JMLR), 6, 2005.

Digital Library

[13]

D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Int'l Workshop on the Web and Databases (WebDB), 2004.

Digital Library

[14]

P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: A view from the edge. In Internet Measurement Conference (IMC), 2007.

Digital Library

[15]

L. Gomes, J. Almeida, V. Almeida, and W. Meira. Workload models of spam and legitimate e-mails. Performance Evaluation, 64, 2007.

Digital Library

[16]

Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Int'l. Conference on Very Large Data Bases (VLDB), 2004.

Digital Library

[17]

P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11, 2007.

Digital Library

[18]

C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. In IEEE Transactions on Neural Networks, volume 13, 2002.

Digital Library

[19]

A. Jain, M. Murty, and P. Flynn. Data clustering: a review. ACM Computing Surveys, 31, 1999.

Digital Library

[20]

T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML), 1998.

Digital Library

[21]

S. Kamvar, M. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In Int'l World Wide Web Conference (WWW), 2003.

Digital Library

[22]

R. Kohavi and F. Provost. Glossary of terms. Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Machine Learning, 30, 1998.

[23]

G. Koutrika, F. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Int'l Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2007.

Digital Library

[24]

A. Langville and C. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006.

Digital Library

[25]

Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. Tseng. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Transactions on the Web (TWeb), 2, 2008.

Digital Library

[26]

A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Internet Measurement Conference (IMC), 2007.

Digital Library

[27]

K. Morik, P. Brockhausen, and T. Joachims. Combining statistical learning with a knowledge-based approach--a case study in intensive care monitoring. In Int'l Conference on Machine Learning (ICML), 1999.

Digital Library

[28]

M. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68, 2003.

[29]

A. Thomason. Blog spam: A review. In Conference on Email and Anti-Spam (CEAS), 2007.

[30]

G. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical report, 2001.

[31]

C. Wu, K. Cheng, Q. Zhu, and Y. Wu. Using visual features for anti-spam filtering. In IEEE Int'l Conference on Image Processing (ICIP), 2005.

[32]

Y. Xie, F. Yu, K. Achan, R. Panigrahy, G. Hulten, and I. Osipkov. Spamming botnets: Signatures and characteristics. In ACM SIGCOMM, 2008.

Digital Library

[33]

Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrival, 1, 1999.

Digital Library

[34]

Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Int'l Conference on Machine Learning (ICML), 1997.

Digital Library

Cited By

陈虎(2024)Identifying Online User Reputation in Terms of Collective Rating BehaviorsOperations Research and Fuzziology10.12677/orf.2024.14437514:04(51-60)Online publication date: 2024
https://doi.org/10.12677/orf.2024.144375
Sun HChen D(2024)A robust ranking method for online rating systems with spammers by interval divisionExpert Systems with Applications10.1016/j.eswa.2023.121236235(121236)Online publication date: Jan-2024
https://doi.org/10.1016/j.eswa.2023.121236
Tripathi ABharti KGhosh M(2024)An Efficient Algorithm for Exploitative Monetization Scam Video Detection Over Social Media PlatformsSECURITY AND PRIVACY10.1002/spy2.474Online publication date: 24-Oct-2024
https://doi.org/10.1002/spy2.474
Show More Cited By

Index Terms

Detecting spammers and content promoters in online video social networks
1. Information systems
  1. World Wide Web
    1. Web applications
    2. Web services

Recommendations

Identifying video spammers in online social networks
AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web

In many video social networks, including YouTube, users are permitted to post video responses to other users' videos. Such a response can be legitimate or can be a video response spam, which is a video response whose content is not related to the topic ...
Detecting spammers on social networks
ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference

Social networking has become a popular way for users to meet and interact online. Users spend a significant amount of time on popular social network platforms (such as Facebook, MySpace, or Twitter), storing and sharing a wealth of personal information. ...
Detecting spammers and content promoters in online video social networks
INFOCOM'09: Proceedings of the 28th IEEE international conference on Computer Communications Workshops

Online video social networks provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content into the system. For instance, spammers may post an unrelated ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

July 2009

896 pages

ISBN:9781605584836

DOI:10.1145/1571941

General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Javed Aslam
Northeastern University, USA
,
Program Chairs:
Mark Sanderson
University of Sheffield, UK
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '09

Sponsor:

SIGIR '09: The 32nd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2009

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

99
Total Citations
View Citations
1,337
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

陈虎(2024)Identifying Online User Reputation in Terms of Collective Rating BehaviorsOperations Research and Fuzziology10.12677/orf.2024.14437514:04(51-60)Online publication date: 2024
https://doi.org/10.12677/orf.2024.144375
Sun HChen D(2024)A robust ranking method for online rating systems with spammers by interval divisionExpert Systems with Applications10.1016/j.eswa.2023.121236235(121236)Online publication date: Jan-2024
https://doi.org/10.1016/j.eswa.2023.121236
Tripathi ABharti KGhosh M(2024)An Efficient Algorithm for Exploitative Monetization Scam Video Detection Over Social Media PlatformsSECURITY AND PRIVACY10.1002/spy2.474Online publication date: 24-Oct-2024
https://doi.org/10.1002/spy2.474
Karg SLim MSchnall S(2022)Followers forever: Prior commitment predicts post-scandal support of a social media celebritySocial Psychological Bulletin10.32872/spb.828317Online publication date: 6-Sep-2022
https://doi.org/10.32872/spb.8283
Zhu HXiao YWang ZWu J(2022)A robust reputation iterative algorithm based on Z-statistics in a rating system with thorny objectsJournal of the Operational Research Society10.1080/01605682.2022.210195274:6(1600-1612)Online publication date: 25-Jul-2022
https://doi.org/10.1080/01605682.2022.2101952
He YYang PCheng P(2022)Semi-supervised internet water army detection based on graph embeddingMultimedia Tools and Applications10.1007/s11042-022-13633-182:7(9891-9912)Online publication date: 16-Sep-2022
https://doi.org/10.1007/s11042-022-13633-1
Akinyelu A(2021)Advances in spam detection for email spam, web spam, social network spam, and review spamJournal of Computer Security10.3233/JCS-21002229:5(473-529)Online publication date: 26-Aug-2021
https://dl.acm.org/doi/10.3233/JCS-210022
Huang JSun HChen XLiu XCao J(2021)An Iterative Deviation-based Ranking Method to Evaluate User Reputation in Online Rating Systems✱2021 4th International Conference on Data Science and Information Technology10.1145/3478905.3478909(15-21)Online publication date: 23-Jul-2021
https://dl.acm.org/doi/10.1145/3478905.3478909
Florentino ÉGoldschmidt RCavalcanti M(2021)Exploring Interactions in YouTube to Support the Identification of Crime SuspectsProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466967(1-8)Online publication date: 7-Jun-2021
https://dl.acm.org/doi/10.1145/3466933.3466967
Parihar PDevanand Kumar N(2021)Fake Profile Detection from the Social Dataset for Movie Promotion2021 Sixth International Conference on Image Information Processing (ICIIP)10.1109/ICIIP53038.2021.9702684(495-498)Online publication date: 26-Nov-2021
https://doi.org/10.1109/ICIIP53038.2021.9702684
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents