Abstract
As human computation on crowdsourcing systems has become popular and powerful for performing tasks, malicious users have started misusing these systems by posting malicious tasks, propagating manipulated contents, and targeting popular web services such as online social networks and search engines. Recently, these malicious users moved to Fiverr, a fast growing micro-task marketplace, where workers can post crowdturfing tasks (i.e., astroturfing campaigns run by crowd workers) and malicious customers can purchase those tasks for only $5. In this manuscript, we present a comprehensive analysis of crowdturfing in Fiverr and Twitter and develop predictive models to detect and prevent crowdturfing tasks in Fiverr and malicious crowd workers in Twitter. First, we identify the most popular types of crowdturfing tasks found in Fiverr and conduct case studies for these crowdturfing tasks. Second, we build crowdturfing task detection classifiers to filter these tasks and prevent them from becoming active in the marketplace. Our experimental results show that the proposed classification approach effectively detects crowdturfing tasks, achieving 97.35 % accuracy. Third, we analyze the real-world impact of crowdturfing tasks by purchasing active Fiverr tasks and quantifying their impact on a target site (Twitter). As part of this analysis, we show that current security systems inadequately detect crowdsourced manipulation, which confirms the necessity of our proposed crowdturfing task detection approach. Finally, we analyze the characteristics of paid Twitter workers, find distinguishing features between these workers and legitimate Twitter accounts, and use these features to build classifiers that detect Twitter workers. Our experimental results show that our classifiers are able to detect Twitter workers effectively, achieving 99.29 % accuracy.
Similar content being viewed by others
Notes
We refer to paid followers as workers for the remainder of this manuscript.
References
Alexa (2013) Fiverr.com site info: alexa. http://www.alexa.com/siteinfo/fiverr.com
Allahbakhsh M, Benatallah B, Ignjatovic A, Nezhad HRM, Bertino E, Dustdar S (2013) Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput 17(2):76–81. http://dblp.uni-trier.de/db/journals/internet/internet17.html#AllahbakhshBIMBD13
Baba Y, Kashima H, Kinoshita K, Yamaguchi G, Akiyoshi Y (2014) Leveraging non-expert crowdsourcing workers for improper task detection in crowdsourcing marketplaces. Expert Syst. Appl. 41(6), 2678–2687. http://dblp.uni-trier.de/db/journals/eswa/eswa41.html#BabaKKYA14
Bank TW (2013) Doing business in moldova: World Bank Group. http://www.doingbusiness.org/data/exploreeconomies/moldova/
Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. In: Proceedings of UIST
Chen C, Wu K, Srinivasan V, Zhang X (2011) Battling the internet water army: Detection of hidden paid posters. CoRR arXiv:1111.4297
Fast LA, Funder DC (2008) Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. J Personal Soc Psychol 94(2):334
Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) Crowddb: answering queries with crowdsourcing. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data (SIGMOD), pp 61–72
Gill AJ, Nowson S, Oberlander J (2009) What are they blogging about? Personality, topic and motivation in blogs. In: Proceedings of ICWSM
Google (2013) Google adsense? Maximize revenue from your online content. http://www.google.com/adsense
Halpin H, Blanco R (2012) Machine-learning for spammer detection in crowd-sourcing. In: Proceedings of workshop on human computation at AAAI, pp 85–86
Heymann P, Garcia-Molina H (2011) Turkalytics: analytics for human computation. In: Proceedings of the 20th international conference on World Wide Web (WWW), pp 477–486
Klout (2013a) Klout: the standard for influence. http://klout.com/
Klout (2013b) Klout: see how it works: klout. http://klout.com/corp/how-it-works
Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on twitter. In: Proceedings of ICWSM
Lee K, Tamilarasan P, Caverlee J (2013) Crowdturfers, campaigns, and social media: Tracking and revealing crowdsourced manipulation of social media. In: Proceedings of ICWSM
Lee K, Webb S, Ge H (2014) The dark side of micro-task marketplaces: characterizing fiverr and automatically detecting crowdturfing. In: Proceedings of ICWSM
Motoyama M, McCoy D, Levchenko K, Savage S, Voelker GM (2011) Dirty jobs: the role of freelance labor in web service abuse. In: Proceedings of USENIX security
Moz (2013) Moz: how people use search engines: the beginners guide to seo. http://moz.com/beginners-guide-to-seo/how-people-interact-with-search-engines
Pennebaker J, Francis M, Booth R (2001) Linguistic inquiry and word count. Erlbaum Publishers, Mahwah, NJ
Pham N (2013) Vietnam admits deploying bloggers to support government. http://www.bbc.co.uk/news/world-asia-20982985
Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers?: shifting demographics in mechanical turk. In: Proceedings of CHI
Sterling B (2010) The Chinese online ’water army’. http://www.wired.com/beyond_the_beyond/2010/06/the-chinese-online-water-army/
Stringhini G, Wang G, Egele M, Kruegel C, Vigna G, Zheng H, Zhao BY (2013) Follow the green: growth and dynamics in twitter follower markets. In: Proceedings of IMC
Thomas K, McCoy D, Grier C, Kolcz A, Paxson V (2013) Trafficking fraudulent accounts: the role of the underground market in twitter spam and abuse. In: Proceedings of USENIX security
Twitter (2013) Twitter: the twitter rules. https://support.twitter.com/articles/18311-the-twitter-rules
Venetis P, Garcia-Molina H (2012) Quality control for comparison microtasks. In: Proceedings of CrowdKDD workshop in conjunction with KDD
Wang G, Mohanlal M, Wilson C, Wang X, Metzger MJ, Zheng H, Zhao BY (2013) Social turing tests: Crowdsourcing sybil detection. In: Proceedings of NDSS
Wang G, Wilson C, Zhao X, Zhu Y, Mohanlal M, Zheng H, Zhao BY (2012) Serf and turf: crowdturfing for fun and profit. In: Proceedings of the 21st international conference on World Wide Web (WWW), pp 679–688
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA. http://www.cs.waikato.ac.nz/ml/weka/book.html
Yang C, Harkreader R, Zhang J, Shin S, Gu G (2012) Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st international conference on World Wide Web (WWW), pp 71–80
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of ICML
Acknowledgments
This work was supported in part by Google Faculty Research Award, Research Catalyst grant and faculty startup funds from Utah State University, and AFOSR Grant FA9550-12-1-0363. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor.
Author information
Authors and Affiliations
Corresponding author
Additional information
An early version of this manuscript appeared in the 2014 International AAAI Conference on Weblogs and Social Media (ICWSM) Lee et al. (2014).
Rights and permissions
About this article
Cite this article
Lee, K., Webb, S. & Ge, H. Characterizing and automatically detecting crowdturfing in Fiverr and Twitter. Soc. Netw. Anal. Min. 5, 2 (2015). https://doi.org/10.1007/s13278-014-0241-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0241-1